EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria...
Transcript of EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria...
![Page 1: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/1.jpg)
Embedded Linux QAHow to Not Lie With Statistics
Embedded Linux ConferenceNorth America 2018
Prof. Dr. Wolfgang Mauerer
Technical University of Applied Sciences Regensburg
Siemens AG, Corporate Research
W. Mauerer 20. Jun. 2018 1 / 44
![Page 2: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/2.jpg)
About
I Siemens Corporate Technology: Corporate Competence Centre Embedded Linux
I Technical University of Applied Science RegensburgI Theoretical Computer Science
I Head of Digitalisation Laboratory
Target Audience Assumptions
I System Builders & Architects, Software Architects
I Not necessarily RT-Linux and Safety-Critical Linux experts
I No statistical knowledge assumed
W. Mauerer 20. Jun. 2018 2 / 44
Introduction & Overview
![Page 3: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/3.jpg)
How to not lie with statistics
W. Mauerer 20. Jun. 2018 3 / 44
Disclaimer
![Page 4: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/4.jpg)
How to not lie with statistics
W. Mauerer 20. Jun. 2018 3 / 44
Disclaimer
![Page 5: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/5.jpg)
How to not lie with statistics How to avoid incidentallyover-interpreting measured data, presenting data in not so usefulways, and/or drawing unsupported conclusions and making inflatedclaims from/based on statistical analyses
W. Mauerer 20. Jun. 2018 3 / 44
Disclaimer
![Page 6: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/6.jpg)
1. Aspects of Embedded QA2. Dealing with Data
2.1 Measure Properly2.2 Visualise, Explore and Understand2.3 Model and Predict2.4 Mine QA Data?
3. Summary
W. Mauerer 20. Jun. 2018 4 / 44
1. Aspects of Embedded QA
Outline
![Page 7: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/7.jpg)
Functional properties
I Speed & Throughput
I Response Latencies
I Test coverage
I …
Non-Functional properties
I Stability
I Availability
I Scaleability
I Correctness (≈ Bugs)
I …
W. Mauerer 20. Jun. 2018 5 / 44
1. Aspects of Embedded QA
Embedded QA: What?
![Page 8: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/8.jpg)
Testing
I CI, Load Testing, …
I Statistics ☞ Process efficiency,gradual change detection
Review
I Manual Inspection of Patches/PullRequests
I Statistics ☞ Efficiency, Coverage
W. Mauerer 20. Jun. 2018 6 / 44
1. Aspects of Embedded QA
Embedded QA: How? I
![Page 9: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/9.jpg)
Certification
I SIL/ASIL (Safety), CC (Security), …
I Statistics ☞ Satisfy certificationcriteria
Formal Verification & Proofs
I Quality “Gold Standard”
I Feasible for smallcomponents/portions only
I Statistics ☞ Approximation, WCETestimation, …
W. Mauerer 20. Jun. 2018 7 / 44
1. Aspects of Embedded QA
Embedded QA: How? II
![Page 10: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/10.jpg)
1. Aspects of Embedded QA2. Dealing with Data
2.1 Measure Properly2.2 Visualise, Explore and Understand2.3 Model and Predict2.4 Mine QA Data?
3. Summary
W. Mauerer 20. Jun. 2018 8 / 44
2. Dealing with Data
Outline
![Page 11: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/11.jpg)
Measuring Properly
I Reproducibility – can others reproduce/interpret results?
I Sufficient Duration – when is certainty achieved?
I Traceability – do we understand what's going on?
First rule of data analysis
I 80% of effort: cleaning up data
I 20% of effort: analysis
W. Mauerer 20. Jun. 2018 9 / 44
2.1 Measure Properly
Measuring Properly
![Page 12: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/12.jpg)
Reproducibility: Tidy Data
1. Each variable forms a column
2. Each observation forms a row
3. Each type of observational unit forms a table (separate file)
W. Mauerer 20. Jun. 2018 10 / 44
2.1 Measure Properly
Reproducible Data Collection I
![Page 13: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/13.jpg)
I Alternative: Grammar of Graphics in Python: http://ggplot.yhathq.com/
I Interactive and reproducible analysis: Jupyter jupyter.org
W. Mauerer 20. Jun. 2018 11 / 44
2.1 Measure Properly
GNU R & Grammar of Graphics
![Page 14: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/14.jpg)
Messy example: Latency measurements 7
CPU ID System Low Load High Load I/O Load Network Load
ARM 1 Tegra 27 14 22 29ARM 2 rpi 31 450 50 60x86 3 qemu 32 50 110 62…ARM 1212346 Tegra 29 41 22 92
W. Mauerer 20. Jun. 2018 12 / 44
2.1 Measure Properly
Reproducible Data Collection II
![Page 15: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/15.jpg)
Tidy example: Latency measurements 3
CPU ID System Type ValueARM 1 Tegra Low Load 27ARM 2 Tegra High Load 14ARM 3 Tegra I/O Load 22ARM 4 Tegra Network Load 29ARM 5 rpi Low Load 31…ARM 67556356 Tegra Network Load 92
Tidyverse
I Clean up/transform data
I “Opinionated collection of R packages designed for data science”:www.tidyverse.org
W. Mauerer 20. Jun. 2018 12 / 44
2.1 Measure Properly
Reproducible Data Collection II
![Page 16: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/16.jpg)
Second rule of data analysis
Never underestimate the value of tidy data. Seriously.
W. Mauerer 20. Jun. 2018 13 / 44
2.1 Measure Properly
Reproducible Data Collection III
![Page 17: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/17.jpg)
Three ways of understanding data
1. Descriptive analysis
2. Exploratory analysis
3. Confirmatory analysis
Order matters
I Simple analyses before formal test
I Proper visual understanding often more important than explicit calculations
W. Mauerer 20. Jun. 2018 14 / 44
2.2 Visualise, Explore and Understand
Visualising and Understanding Data
![Page 18: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/18.jpg)
Categorical
I Binary (dead/alive)
I Nominal (colours)
I Ordinal (army ranks, kernelmaintainer pecking order)
Quantitative
I Discrete (integers)
I Continuous (reals)
Univariate
One variable per “subject”
Multivariate
Multiple variables per “subject”
W. Mauerer 20. Jun. 2018 15 / 44
2.2 Visualise, Explore and Understand
Standard Visualisations I
![Page 19: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/19.jpg)
Univariate: Scatter plots
I Point-to-point coverage
I Structure, Clusters
Univariate: Density
I Data summary
I Density, Histograms
Multivariate: Basic
I Colours
I Facets, Trellis
Multivariate: Advanced
I Principal component analysis,self-organising maps, clustering,partitioning
I Visualisation ⇔ analysis method
W. Mauerer 20. Jun. 2018 16 / 44
2.2 Visualise, Explore and Understand
Standard Visualisations II
![Page 20: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/20.jpg)
CPU ID Latency
0 0 640 1 620 2 591 0 581 1 562 0 762 1 63…
W. Mauerer 20. Jun. 2018 17 / 44
2.2 Visualise, Explore and Understand
Latency Visualisation I
![Page 21: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/21.jpg)
0e+00
1e+05
2e+05
0 20 40 60Latency [us]
Coun
t
ggplot(dat, aes(x=latency)) + geom_histogram()
W. Mauerer 20. Jun. 2018 18 / 44
2.2 Visualise, Explore and Understand
Latency Visualisation II
![Page 22: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/22.jpg)
0
50000
100000
150000
0 20 40 60Latency [us]
Coun
t
ggplot(dat, aes(x=latency)) + geom_histogram(binwidth=1)
W. Mauerer 20. Jun. 2018 18 / 44
2.2 Visualise, Explore and Understand
Latency Visualisation II
![Page 23: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/23.jpg)
0
50000
100000
150000
0 20 40 60Latency [us]
Coun
tCPU
0
1
2
3
ggplot(dat, aes(x=latency, fill=CPU)) + geom_histogram(binwidth=1)
W. Mauerer 20. Jun. 2018 18 / 44
2.2 Visualise, Explore and Understand
Latency Visualisation II
![Page 24: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/24.jpg)
0
20000
40000
60000
0 20 40 60Latency [us]
Coun
tCPU
0
1
2
3
ggplot(dat, aes(x=latency, fill=CPU)) +
geom_histogram(binwidth=1, position="dodge")
W. Mauerer 20. Jun. 2018 18 / 44
2.2 Visualise, Explore and Understand
Latency Visualisation II
![Page 25: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/25.jpg)
0 1
2 3
0200004000060000
0200004000060000
0 20 40 60 0 20 40 60Latency [us]
Coun
t
CPU0
1
2
3
ggplot(dat, aes(x=latency, fill=CPU)) +
geom_histogram(binwidth=1) + facet_wrap(˜CPU)
W. Mauerer 20. Jun. 2018 18 / 44
2.2 Visualise, Explore and Understand
Latency Visualisation II
![Page 26: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/26.jpg)
20000
40000
60000
80000
0 20 40 60Latency [us]
Coun
t[sqrt] CPU
0
1
2
3
ggplot(dat, aes(x=latency, fill=CPU)) +
geom_histogram(binwidth=1, position="dodge") + scale_y_sqrt()
W. Mauerer 20. Jun. 2018 18 / 44
2.2 Visualise, Explore and Understand
Latency Visualisation II
![Page 27: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/27.jpg)
5e+04
1e+05
0 50 100 150 200Latency [us]
Coun
t[sqrt]
0.00
0.25
0.50
0.75
1.00
0 50 100 150 200Latency [us]
ggplot(dat, aes(x=latency)) + stat_ecdf()
W. Mauerer 20. Jun. 2018 19 / 44
2.2 Visualise, Explore and Understand
Latency Visualisation III: Non-Parametric Methods
![Page 28: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/28.jpg)
(Empirical) Cumulative Distribution Function
I Sums up fraction of values that fall into [0, x] at position x
I Parameter free!
I Interpretation requires trained eye
W. Mauerer 20. Jun. 2018 20 / 44
2.2 Visualise, Explore and Understand
Latency Visualisation IV: Non-Parametric Methods
![Page 29: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/29.jpg)
Gaussian Distribution: Mean and SD suffice
http://en.wikipedia.org/wiki/Probability_density_function
W. Mauerer 20. Jun. 2018 21 / 44
2.2 Visualise, Explore and Understand
Beware of Summaries I
![Page 30: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/30.jpg)
Experiment
I Gaussian distribution with same mean/SD as measured distribution
I Different stories! (see next slide)
W. Mauerer 20. Jun. 2018 22 / 44
2.2 Visualise, Explore and Understand
Beware of Summaries II
![Page 31: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/31.jpg)
Gaussian Measured
0
40000
80000
120000
0 50 100 150 0 50 100 150Latency
Coun
t
ggplot(dat, aes(x=latency)) + geom_histogram(binwidth=1) +
facet_grid(˜type)
W. Mauerer 20. Jun. 2018 23 / 44
2.2 Visualise, Explore and Understand
Beware of Summaries III
![Page 32: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/32.jpg)
Gaussian Measured
5e+04
1e+05
0 50 100 150 0 50 100 150Latency
Coun
t
ggplot(dat, aes(x=latency)) + geom_histogram(binwidth=1) +
facet_grid(˜type) + scale_y_sqrt()
W. Mauerer 20. Jun. 2018 23 / 44
2.2 Visualise, Explore and Understand
Beware of Summaries III
![Page 33: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/33.jpg)
0.00
0.25
0.50
0.75
1.00
0 50 100 150Latency
Fra
ctio
nType Gaussian Measured
ggplot(dat, aes(x=latency, colour=type)) + stat_ecdf()
W. Mauerer 20. Jun. 2018 23 / 44
2.2 Visualise, Explore and Understand
Beware of Summaries III
![Page 34: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/34.jpg)
Why?
I Track behavioural changes aftersystem changes
I Load vs. idle behaviour
I Evaluate alternative choices
How?
I Comparing summaries 7
I Visual inspection/Explorativeanalysis 3
I Formal methods/tests 37
W. Mauerer 20. Jun. 2018 24 / 44
2.2 Visualise, Explore and Understand
Comparing Datasets/Distributions I
![Page 35: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/35.jpg)
System Configuration vs. Latencies
Base New 1 New 2
5e+04
1e+05
0 50 100 150 0 50 100 150 0 50 100 150Latency
Coun
t
W. Mauerer 20. Jun. 2018 25 / 44
2.2 Visualise, Explore and Understand
Comparing Datasets/Distributions II
![Page 36: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/36.jpg)
Scenario
1. Base: Typical latency measurement
2. New 1: Averages consistently smaller (5 µs)
3. New 2: Larger variance per sample, but identical average
W. Mauerer 20. Jun. 2018 25 / 44
2.2 Visualise, Explore and Understand
Comparing Datasets/Distributions II
![Page 37: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/37.jpg)
Point of view
Which one is better?
Formal Tests
I t-test: Check with identical mean values (Gaussian distribution/large samplesize)
I Wilcoxon signed-rank test: Test for identical distributions
Visual Tests
I Quantile-Quantile plot 37
I Facetted(!) histograms 37
I Empirical cumulative distribution functions (ecdf) 3
W. Mauerer 20. Jun. 2018 26 / 44
2.2 Visualise, Explore and Understand
Comparing Datasets/Distributions III
![Page 38: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/38.jpg)
Conclusion
I Formal tests often describe the obvious
I Preconditions not satisfied ☞ formal tests (may be) pointless
W. Mauerer 20. Jun. 2018 27 / 44
2.2 Visualise, Explore and Understand
Comparing Datasets/Distributions VI
![Page 39: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/39.jpg)
Making Predictions: Two Easy Steps
1. Find a (mathematical) model that describes the data
2. Extend the model outside the current (measured) range
Third rule of data analysis
If things sound too good to be true, they probably are not true.
W. Mauerer 20. Jun. 2018 28 / 44
2.3 Model and Predict
Making Predictions I
![Page 40: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/40.jpg)
1000
2000
40 80 120 160Kilo-Yggdrasils
Unicorns
Model
yi = β0 + β1 · xi + εi, i = 1 . . . N
i # of Observationβ0 Interceptβ1 Slopeεi Random deviation
Task
Estimate coefficients β̂0, β̂1 and ε̂i
W. Mauerer 20. Jun. 2018 29 / 44
2.3 Model and Predict
Making Predictions II
![Page 41: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/41.jpg)
1000
2000
40 80 120 160Kilo-Yggdrasils
Unicorns
Model
yi = β0 + β1 · xi + εi, i = 1 . . . N
i # of Observationβ0 Interceptβ1 Slopeεi Random deviation
Task
Estimate coefficients β̂0, β̂1 and ε̂i
W. Mauerer 20. Jun. 2018 29 / 44
2.3 Model and Predict
Making Predictions II
![Page 42: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/42.jpg)
1000
2000
40 80 120 160Kilo-Yggdrasils
Unicorns
Model
yi = β0 + β1 · xi + εi, i = 1 . . . N
i # of Observationβ0 Interceptβ1 Slopeεi Random deviation
Solution
Minimise f(β0, β1) =N∑i=1
(yi − β0 − β1xi)2
β̂0, β̂1 = argmin f(β0, β1)
W. Mauerer 20. Jun. 2018 29 / 44
2.3 Model and Predict
Making Predictions II
![Page 43: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/43.jpg)
Models and Reality
I No model is correct
I Some models may be useful
W. Mauerer 20. Jun. 2018 30 / 44
2.3 Model and Predict
Making Predictions III
![Page 44: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/44.jpg)
Data with Functional Relation
1000
2000
40 80 120 160Kilo-Yggdrasils
Unicorns
W. Mauerer 20. Jun. 2018 31 / 44
2.3 Model and Predict
Making Predictions IV
![Page 45: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/45.jpg)
Data with Functional Relation
1000
2000
40 80 120 160Kilo-Yggdrasils
Unicorns
W. Mauerer 20. Jun. 2018 31 / 44
2.3 Model and Predict
Making Predictions IV
![Page 46: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/46.jpg)
Assumptions (linearregression)
1. Errors are normallydistributed: E(εi) = 0.
2. Errors are uncorrelated:cov(εi, εj) = 0 for i 6= j.
3. Variance of errors is constant:var(εi) = σ2. (homoscedasticvs. heteroscedastic)
4. Design matrix (X) has fullrank.
-3 -2 -1 0 1 2 3
-20
24
Normal Q-Q Plot
Theoretical Quantiles
SampleQua
ntiles
W. Mauerer 20. Jun. 2018 32 / 44
2.3 Model and Predict
Model Validity
![Page 47: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/47.jpg)
Assumptions (linearregression)
1. Errors are normallydistributed: E(εi) = 0.
2. Errors are uncorrelated:cov(εi, εj) = 0 for i 6= j.
3. Variance of errors is constant:var(εi) = σ2. (homoscedasticvs. heteroscedastic)
4. Design matrix (X) has fullrank.
-2
0
2
4
40 80 120 160Kilo-Yggdrasils
Stud
entised
Residu
als
W. Mauerer 20. Jun. 2018 32 / 44
2.3 Model and Predict
Model Validity
![Page 48: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/48.jpg)
Assumptions (linearregression)
1. Errors are normallydistributed: E(εi) = 0.
2. Errors are uncorrelated:cov(εi, εj) = 0 for i 6= j.
3. Variance of errors is constant:var(εi) = σ2. (homoscedasticvs. heteroscedastic)
4. Design matrix (X) has fullrank.
-2
0
2
4
40 80 120 160Kilo-Yggdrasils
Stud
entised
Residu
als
W. Mauerer 20. Jun. 2018 32 / 44
2.3 Model and Predict
Model Validity
![Page 49: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/49.jpg)
Assumptions (linearregression)
1. Errors are normallydistributed: E(εi) = 0.
2. Errors are uncorrelated:cov(εi, εj) = 0 for i 6= j.
3. Variance of errors is constant:var(εi) = σ2. (homoscedasticvs. heteroscedastic)
4. Design matrix (X) has fullrank.
W. Mauerer 20. Jun. 2018 32 / 44
2.3 Model and Predict
Model Validity
![Page 50: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/50.jpg)
0
1000
2000
40 80 120 160Kilo-Yggdrasils
Unicorns
W. Mauerer 20. Jun. 2018 33 / 44
2.3 Model and Predict
Confidence and Prediction Intervals
![Page 51: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/51.jpg)
Use and Abuse of Formal Tests
I Rule of thumb: filter against speculation
I Sample size
I Existence of functional relationship
I Prerequisites of modelling approach
I Statistical significance vs. practical relevance
I Confidence intervals necessary!
W. Mauerer 20. Jun. 2018 34 / 44
2.3 Model and Predict
Predictions vs. Truth
![Page 52: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/52.jpg)
Current Trends
I Quantify qualityI Software
I Process
I Community
I Repository mining
Possible Uses
I Predict future behaviour
I Detect weaknesses
I Make quality objective
W. Mauerer 20. Jun. 2018 35 / 44
2.4 Mine QA Data?
Quality Data Mining
![Page 53: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/53.jpg)
0
100
200
300
0 20 40 60Time [a.u.]
Bug-Fixcommits
ggplot(bug.dat, aes(x=time, y=commits)) + geom_point()
W. Mauerer 20. Jun. 2018 36 / 44
2.4 Mine QA Data?
Example: Fixing Bugs I
![Page 54: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/54.jpg)
0
100
200
300
0 20 40 60Time [a.u.]
Bug-Fixcommits
ggplot(bug.dat, aes(x=time, y=commits)) +
geom_bar(stat='identity')
W. Mauerer 20. Jun. 2018 36 / 44
2.4 Mine QA Data?
Example: Fixing Bugs I
![Page 55: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/55.jpg)
0
100
200
0 20 40 60Time [a.u.]
Bug-Fixcommits
W. Mauerer 20. Jun. 2018 37 / 44
2.4 Mine QA Data?
Example: Fixing Bugs II (Linear Model)
![Page 56: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/56.jpg)
-200
0
200
0 30 60 90 120Time [a.u.]
Bug-Fixcommits
W. Mauerer 20. Jun. 2018 37 / 44
2.4 Mine QA Data?
Example: Fixing Bugs II (Linear Model)
![Page 57: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/57.jpg)
0
100
200
300
0 20 40 60Time [a.u.]
Bug-Fixcommits
W. Mauerer 20. Jun. 2018 38 / 44
2.4 Mine QA Data?
Example: Fixing Bugs III (Generalised Linear Model)
![Page 58: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/58.jpg)
0
100
200
300
0 30 60 90 120Time [a.u.]
Bug-Fixcommits
W. Mauerer 20. Jun. 2018 38 / 44
2.4 Mine QA Data?
Example: Fixing Bugs III (Generalised Linear Model)
![Page 59: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/59.jpg)
0
100
200
300
0 30 60 90 120Time [a.u.]
Bug-Fixcommits
W. Mauerer 20. Jun. 2018 38 / 44
2.4 Mine QA Data?
Example: Fixing Bugs III (Generalised Linear Model)
![Page 60: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/60.jpg)
0
100
200
300
0 20 40 60Time [a.u.]
Bug-Fixcommits
W. Mauerer 20. Jun. 2018 39 / 44
2.4 Mine QA Data?
Example: Fixing Bugs IV (Segmented Linear Model)
![Page 61: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/61.jpg)
0
250
500
750
1000
0 20 40 60 80Time [a.u.]
Bug-Fixcommits
W. Mauerer 20. Jun. 2018 39 / 44
2.4 Mine QA Data?
Example: Fixing Bugs IV (Segmented Linear Model)
![Page 62: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/62.jpg)
LM
-200
0
200
0 30 60 90 120Time [a.u.]
Bug-Fixcommits
Generalised LM
0
100
200
300
0 30 60 90 120Time [a.u.]
Bug-Fixcommits
Segmented LM
0
100
200
300
0 20 40 60Time [a.u.]
Bug-Fixcommits
Same data, three models
Three (totally) different stories…
W. Mauerer 20. Jun. 2018 40 / 44
2.4 Mine QA Data?
Example: Fixing Bugs V
![Page 63: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/63.jpg)
Issues
I Base data accuracy?
I Model checkingI Errors/Residuals: Normally distributed?
I Variance OK?
I Functional relationship?
I Does the analysis add value beyondI common sense?
I expert interpretation?
Model diagnostics essential
I Find appropriate technique
I Fit and plot model
I Check diagnostic data
W. Mauerer 20. Jun. 2018 41 / 44
2.4 Mine QA Data?
Issues
![Page 64: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/64.jpg)
1. Aspects of Embedded QA2. Dealing with Data
2.1 Measure Properly2.2 Visualise, Explore and Understand2.3 Model and Predict2.4 Mine QA Data?
3. Summary
W. Mauerer 20. Jun. 2018 42 / 44
3. Summary
Outline
![Page 65: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/65.jpg)
Living with Data
I “Listen” to the data, understandstructures
I Identify important variables &patterns
I Outliers and anomalies, appropriategraph (axis) transformations
I Test underlying assumptions
I Parsimonious models, hypotheses
I Identify mistakes (→ cleanup)
Don't
I expect miracles
I forget to use expert validation/help!
☞Do more statistical analysis! Do less statistical analysis!
W. Mauerer 20. Jun. 2018 43 / 44
3. Summary
Summary
![Page 66: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/66.jpg)
Living with Data
I “Listen” to the data, understandstructures
I Identify important variables &patterns
I Outliers and anomalies, appropriategraph (axis) transformations
I Test underlying assumptions
I Parsimonious models, hypotheses
I Identify mistakes (→ cleanup)
Don't
I expect miracles
I forget to use expert validation/help!
☞Do more statistical analysis! Do less statistical analysis!
W. Mauerer 20. Jun. 2018 43 / 44
3. Summary
Summary
![Page 67: EmbeddedLinuxQA HowtoNotLieWithStatistics...I Statistics Satisfycertification criteria FormalVerification&Proofs I Quality“GoldStandard” I Feasibleforsmall components/portionsonly](https://reader034.fdocuments.us/reader034/viewer/2022042316/5f04d75f7e708231d40ff9f2/html5/thumbnails/67.jpg)
Thanks for your interest!
W. Mauerer 20. Jun. 2018 44 / 44
3. Summary