CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of...

98
CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions 1 / 49 CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions 2015-06-15 CS147

Transcript of CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of...

Page 1: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

CS 147:Computer Systems Performance AnalysisSummarizing Variability and Determining Distributions

1 / 49

CS 147:Computer Systems Performance AnalysisSummarizing Variability and Determining Distributions

2015

-06-

15

CS147

Page 2: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Overview

Introduction

Indices of DispersionRangeVariance, Standard Deviation, C.V.QuantilesMiscellaneous MeasuresChoosing a Measure

Identifying DistributionsHistogramsKernel Density EstimationQuantile-Quantile Plots

Statistics of SamplesMeaning of a SampleGuessing the True Value

2 / 49

Overview

Introduction

Indices of DispersionRangeVariance, Standard Deviation, C.V.QuantilesMiscellaneous MeasuresChoosing a Measure

Identifying DistributionsHistogramsKernel Density EstimationQuantile-Quantile Plots

Statistics of SamplesMeaning of a SampleGuessing the True Value

2015

-06-

15

CS147

Overview

Page 3: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Introduction

Summarizing Variability

I A single number rarely tells entire story of a data setI Usually, you need to know how much the rest of the data set

varies from that index of central tendency

3 / 49

Summarizing Variability

I A single number rarely tells entire story of a data setI Usually, you need to know how much the rest of the data set

varies from that index of central tendency

2015

-06-

15

CS147Introduction

Summarizing Variability

Page 4: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Introduction

Why Is Variability Important?

I Consider two Web servers:I Server A services all requests in 1 secondI Server B services 90% of all requests in .5 seconds

I But 10% in 55 secondsI Both have mean service times of 1 secondI But which would you prefer to use?

4 / 49

Why Is Variability Important?

I Consider two Web servers:I Server A services all requests in 1 secondI Server B services 90% of all requests in .5 seconds

I But 10% in 55 secondsI Both have mean service times of 1 secondI But which would you prefer to use?

2015

-06-

15

CS147Introduction

Why Is Variability Important?

Page 5: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Introduction

Indices of Dispersion

I Measures of how much a data set variesI RangeI Variance and standard deviationI PercentilesI Semi-interquartile rangeI Mean absolute deviation

5 / 49

Indices of Dispersion

I Measures of how much a data set variesI RangeI Variance and standard deviationI PercentilesI Semi-interquartile rangeI Mean absolute deviation

2015

-06-

15

CS147Introduction

Indices of Dispersion

Page 6: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Range

Range

I Minimum & maximum values in data setI Can be tracked as data values arriveI Variability characterized by difference between minimum and

maximumI Often not useful, due to outliersI Minimum tends to go to zeroI Maximum tends to increase over timeI Not useful for unbounded variables

6 / 49

Range

I Minimum & maximum values in data setI Can be tracked as data values arriveI Variability characterized by difference between minimum and

maximumI Often not useful, due to outliersI Minimum tends to go to zeroI Maximum tends to increase over timeI Not useful for unbounded variables20

15-0

6-15

CS147Indices of Dispersion

RangeRange

Page 7: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Range

Example of Range

I For data set 2, 5.4, -17, 2056, 445, -4.8, 84.3, 92, 27, -10I Maximum is 2056I Minimum is -17I Range is 2073I While arithmetic mean is 268

7 / 49

Example of Range

I For data set 2, 5.4, -17, 2056, 445, -4.8, 84.3, 92, 27, -10I Maximum is 2056I Minimum is -17I Range is 2073I While arithmetic mean is 268

2015

-06-

15

CS147Indices of Dispersion

RangeExample of Range

Page 8: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Variance, Standard Deviation, C.V.

Variance (and Its Cousins)

I Sample variance is

s2 =1

n − 1

n∑i=1

(xi − x)2

I Expressed in units of the measured quantity, squaredI Which isn’t always easy to understand

I Standard deviation and coefficient of variation are derivedfrom variance

8 / 49

Variance (and Its Cousins)

I Sample variance is

s2 =1

n − 1

n∑i=1

(xi − x)2

I Expressed in units of the measured quantity, squaredI Which isn’t always easy to understand

I Standard deviation and coefficient of variation are derivedfrom variance20

15-0

6-15

CS147Indices of Dispersion

Variance, Standard Deviation, C.V.Variance (and Its Cousins)

Page 9: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Variance, Standard Deviation, C.V.

Variance Example

I For data set 2, 5.4, -17, 2056, 445, -4.8, 84.3, 92, 27, -10I Variance is 413746.6I You can see the problem with variance:

I Given a mean of 268, what does that variance indicate?

9 / 49

Variance Example

I For data set 2, 5.4, -17, 2056, 445, -4.8, 84.3, 92, 27, -10I Variance is 413746.6I You can see the problem with variance:

I Given a mean of 268, what does that variance indicate?

2015

-06-

15

CS147Indices of Dispersion

Variance, Standard Deviation, C.V.Variance Example

Page 10: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Variance, Standard Deviation, C.V.

Standard Deviation

I Square root of the varianceI In same units as units of metricI So easier to compare to metric

10 / 49

Standard Deviation

I Square root of the varianceI In same units as units of metricI So easier to compare to metric

2015

-06-

15

CS147Indices of Dispersion

Variance, Standard Deviation, C.V.Standard Deviation

Page 11: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Variance, Standard Deviation, C.V.

Standard Deviation Example

I For sample set we’ve been using, standard deviation is 643I Given mean of 268, standard deviation clearly shows lots of

variability from mean

11 / 49

Standard Deviation Example

I For sample set we’ve been using, standard deviation is 643I Given mean of 268, standard deviation clearly shows lots of

variability from mean

2015

-06-

15

CS147Indices of Dispersion

Variance, Standard Deviation, C.V.Standard Deviation Example

Page 12: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Variance, Standard Deviation, C.V.

Coefficient of Variation

I Ratio of standard deviation to meanI Normalizes units of these quantities into ratio or percentageI Often abbreviated C.O.V. or C.V.

12 / 49

Coefficient of Variation

I Ratio of standard deviation to meanI Normalizes units of these quantities into ratio or percentageI Often abbreviated C.O.V. or C.V.

2015

-06-

15

CS147Indices of Dispersion

Variance, Standard Deviation, C.V.Coefficient of Variation

Page 13: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Variance, Standard Deviation, C.V.

Coefficient of Variation Example

I For sample set we’ve been using, standard deviation is 643I Mean is 268I So C.O.V. is 643/268 ≈ 2.4

13 / 49

Coefficient of Variation Example

I For sample set we’ve been using, standard deviation is 643I Mean is 268I So C.O.V. is 643/268 ≈ 2.4

2015

-06-

15

CS147Indices of Dispersion

Variance, Standard Deviation, C.V.Coefficient of Variation Example

Page 14: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Quantiles

Percentiles

I Specification of how observations fall into bucketsI E.g., 5-percentile is observation that is at the lower 5% of the

setI While 95-percentile is observation at the 95% boundary

I Useful even for unbounded variables

14 / 49

Percentiles

I Specification of how observations fall into bucketsI E.g., 5-percentile is observation that is at the lower 5% of the

setI While 95-percentile is observation at the 95% boundary

I Useful even for unbounded variables

2015

-06-

15

CS147Indices of Dispersion

QuantilesPercentiles

Page 15: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Quantiles

Relatives of Percentiles

I Quantiles - fraction between 0 and 1I Instead of percentageI Also called fractiles

I Deciles—percentiles at 10% boundariesI First is 10-percentile, second is 20-percentile, etc.

I Quartiles—divide data set into four partsI 25% of sample below first quartile, etc.I Second quartile is also median

15 / 49

Relatives of Percentiles

I Quantiles - fraction between 0 and 1I Instead of percentageI Also called fractiles

I Deciles—percentiles at 10% boundariesI First is 10-percentile, second is 20-percentile, etc.

I Quartiles—divide data set into four partsI 25% of sample below first quartile, etc.I Second quartile is also median

2015

-06-

15

CS147Indices of Dispersion

QuantilesRelatives of Percentiles

Page 16: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Quantiles

Calculating Quantiles

To estimate α-quantile:I First sort the setI Then take [(n − 1)α+ 1]th element

I 1-indexedI Round to nearest integer indexI Exception: for small sets, may be better to choose

“intermediate” value as is done for median

16 / 49

Calculating Quantiles

To estimate α-quantile:I First sort the setI Then take [(n − 1)α+ 1]th element

I 1-indexedI Round to nearest integer indexI Exception: for small sets, may be better to choose

“intermediate” value as is done for median

2015

-06-

15

CS147Indices of Dispersion

QuantilesCalculating Quantiles

Page 17: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Quantiles

Quartile Example

I For data set 2, 5.4, -17, 2056, 445, -4.8, 84.3, 92, 27, -10(10 observations)

I Sort it: -17, -10, -4.8, 2, 5.4, 27, 84.3, 92, 445, 2056I First quartile, Q1, is -4.8I Third quartile, Q3, is 92

17 / 49

Quartile Example

I For data set 2, 5.4, -17, 2056, 445, -4.8, 84.3, 92, 27, -10(10 observations)

I Sort it: -17, -10, -4.8, 2, 5.4, 27, 84.3, 92, 445, 2056I First quartile, Q1, is -4.8I Third quartile, Q3, is 92

2015

-06-

15

CS147Indices of Dispersion

QuantilesQuartile Example

Page 18: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Quantiles

Interquartile Range

I Yet another measure of dispersionI The difference between Q3 and Q1I Semi-interquartile range is half that:

SIQR =Q3 −Q1

2

I Often interesting measure of what’s going on in middle ofrange

I Basically indicates distance of quartiles from median

18 / 49

Interquartile Range

I Yet another measure of dispersionI The difference between Q3 and Q1I Semi-interquartile range is half that:

SIQR =Q3 −Q1

2

I Often interesting measure of what’s going on in middle ofrange

I Basically indicates distance of quartiles from median2015

-06-

15

CS147Indices of Dispersion

QuantilesInterquartile Range

Page 19: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Quantiles

Semi-Interquartile Range Example

For data set -17, -10, -4.8, 2, 5.4, 27, 84.3, 92, 445, 2056I Q3 is 92I Q1 is -4.8

SIQR =Q3 −Q1

2=

92− (−4.8)2

= 48

I Compare to standard deviation of 643I Suggests that much of variability is caused by outliers

19 / 49

Semi-Interquartile Range Example

For data set -17, -10, -4.8, 2, 5.4, 27, 84.3, 92, 445, 2056I Q3 is 92I Q1 is -4.8

SIQR =Q3 −Q1

2=

92− (−4.8)2

= 48

I Compare to standard deviation of 643I Suggests that much of variability is caused by outliers

2015

-06-

15

CS147Indices of Dispersion

QuantilesSemi-Interquartile Range Example

Page 20: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Miscellaneous Measures

Mean Absolute Deviation

I Yet another measure of variability

I Mean absolute deviation =1n

n∑i=1

|xi − x |

I Good for hand calculation (doesn’t require multiplication orsquare roots)

20 / 49

Mean Absolute Deviation

I Yet another measure of variability

I Mean absolute deviation =1n

n∑i=1

|xi − x |

I Good for hand calculation (doesn’t require multiplication orsquare roots)

2015

-06-

15

CS147Indices of Dispersion

Miscellaneous MeasuresMean Absolute Deviation

Page 21: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Miscellaneous Measures

Mean Absolute Deviation Example

For data set -17, -10, -4.8, 2, 5.4, 27, 84.3, 92, 445, 2056I Mean absolute deviation is

110

10∑i=1

|xi − 268| = 393

21 / 49

Mean Absolute Deviation Example

For data set -17, -10, -4.8, 2, 5.4, 27, 84.3, 92, 445, 2056I Mean absolute deviation is

110

10∑i=1

|xi − 268| = 393

2015

-06-

15

CS147Indices of Dispersion

Miscellaneous MeasuresMean Absolute Deviation Example

Page 22: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Choosing a Measure

Sensitivity To Outliers

I From most to least,I RangeI VarianceI Mean absolute deviationI Semi-interquartile range

22 / 49

Sensitivity To Outliers

I From most to least,I RangeI VarianceI Mean absolute deviationI Semi-interquartile range

2015

-06-

15

CS147Indices of Dispersion

Choosing a MeasureSensitivity To Outliers

Page 23: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Indices of Dispersion Choosing a Measure

So, Which Index of Dispersion Should I Use?

Bounded?Unimodal

Symmetrical?Percentiles or SIQR

Range C.O.V.

Yes Yes

No No

But always remember what you’re looking for

23 / 49

So, Which Index of Dispersion Should I Use?

Bounded?Unimodal

Symmetrical?Percentiles or SIQR

Range C.O.V.

Yes Yes

No No

But always remember what you’re looking for2015

-06-

15

CS147Indices of Dispersion

Choosing a MeasureSo, Which Index of Dispersion Should I Use?

Page 24: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions

Finding a Distribution for Datasets

I If a data set has a common distribution, that’s the best way tosummarize it

I Saying a data set is uniformly distributed is more informativethan just giving mean and standard deviation

I So how do you determine if your data set fits a distribution?

24 / 49

Finding a Distribution for Datasets

I If a data set has a common distribution, that’s the best way tosummarize it

I Saying a data set is uniformly distributed is more informativethan just giving mean and standard deviation

I So how do you determine if your data set fits a distribution?

2015

-06-

15

CS147Identifying Distributions

Finding a Distribution for Datasets

Page 25: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions

Methods of Determining a Distribution

I Plot a histogramI Kernel density estimationI Quantile-quantile plotI Statistical methods (not covered in this class)

25 / 49

Methods of Determining a Distribution

I Plot a histogramI Kernel density estimationI Quantile-quantile plotI Statistical methods (not covered in this class)

2015

-06-

15

CS147Identifying Distributions

Methods of Determining a Distribution

Page 26: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Histograms

Plotting a Histogram

Suitable if you have relatively large number of data points

Procedure:1. Determine range of observations2. Divide range into buckets3. Count number of observations in each bucket4. Divide by total number of observations and plot as column

chart

26 / 49

Plotting a Histogram

Suitable if you have relatively large number of data points

Procedure:1. Determine range of observations2. Divide range into buckets3. Count number of observations in each bucket4. Divide by total number of observations and plot as column

chart

2015

-06-

15

CS147Identifying Distributions

HistogramsPlotting a Histogram

Page 27: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Histograms

Problems With Histogram Approach

I Determining cell sizeI If too small, too few observations per cellI If too large, no useful details in plot

I If fewer than five observations in a cell, cell size is too small

27 / 49

Problems With Histogram Approach

I Determining cell sizeI If too small, too few observations per cellI If too large, no useful details in plot

I If fewer than five observations in a cell, cell size is too small

2015

-06-

15

CS147Identifying Distributions

HistogramsProblems With Histogram Approach

Page 28: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

Kernel Density Estimation

I Basic idea: any observation represents probability of highnear near that observation

I Example:I Seeing 7 means pdf is high all around 7I Seeing 6.5 also means pdf is high near 7

I “Average out” observations to get smooth histogram

28 / 49

Kernel Density Estimation

I Basic idea: any observation represents probability of highnear near that observation

I Example:I Seeing 7 means pdf is high all around 7I Seeing 6.5 also means pdf is high near 7

I “Average out” observations to get smooth histogram

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKernel Density Estimation

Page 29: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Equations

I Want to estimate continuous p(x):

p̂(x) =1

nh

n∑i=1

K(

x − xi

h

)I Where K (x) is kernel functionI Must integrate to unity:

∫∞−∞ K (x)dx = 1

I Purpose is to select nearby samplesI h is bandwidth parameterI Controls how many nearby samples selectedI Large bandwidth⇒ more smoothing, less detail

29 / 49

KDE Equations

I Want to estimate continuous p(x):

p̂(x) =1

nh

n∑i=1

K(

x − xi

h

)I Where K (x) is kernel functionI Must integrate to unity:

∫∞−∞ K (x)dx = 1

I Purpose is to select nearby samplesI h is bandwidth parameterI Controls how many nearby samples selectedI Large bandwidth⇒ more smoothing, less detail20

15-0

6-15

CS147Identifying Distributions

Kernel Density EstimationKDE Equations

Page 30: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 31: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 32: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 33: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 34: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 35: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 36: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 37: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 38: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 39: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 40: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 41: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 42: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 43: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 44: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 45: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 46: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 47: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 48: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 49: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 50: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

30 / 49

KDE Intuition (Rectangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Rectangular)

Page 51: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 52: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 53: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 54: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 55: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 56: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 57: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 58: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 59: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 60: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 61: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 62: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 63: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 64: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 65: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 66: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 67: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 68: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 69: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 70: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 71: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

31 / 49

KDE Intuition (Triangular)

0 5 10 15 20

0

5

10

15

20

25

0 5 10 15 20

0

5

10

15

20

25

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Intuition (Triangular)

Page 72: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Example

I Sample data set: -17, -10, -4.8, 2, 5.4, 27, 84.3, 92, 445, 2056I One observation per sampleI KDE with Gaussian window (RHS dropped):

-50 0 50 100 150

0.00

0.01

0.02

0.03

p(x)

32 / 49

KDE Example

I Sample data set: -17, -10, -4.8, 2, 5.4, 27, 84.3, 92, 445, 2056I One observation per sampleI KDE with Gaussian window (RHS dropped):

-50 0 50 100 150

0.00

0.01

0.02

0.03

p(x)

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Example

Page 73: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Kernel Density Estimation

KDE Example #2

I Same data setI Narrower Gaussian windowI (Again, RHS dropped):

-50 0 50 100 150

0.00

0.01

0.02

0.03

p(x)

33 / 49

KDE Example #2

I Same data setI Narrower Gaussian windowI (Again, RHS dropped):

-50 0 50 100 150

0.00

0.01

0.02

0.03

p(x)

2015

-06-

15

CS147Identifying Distributions

Kernel Density EstimationKDE Example #2

Page 74: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Quantile-Quantile Plots

Quantile-Quantile Plots

I More suitable than KDE for small data setsI Basically, guess a distributionI Plot where quantiles of data should fall in that distribution

I Against where they actually fallI If plot is close to linear, data closely matches guessed

distribution

34 / 49

Quantile-Quantile Plots

I More suitable than KDE for small data setsI Basically, guess a distributionI Plot where quantiles of data should fall in that distribution

I Against where they actually fallI If plot is close to linear, data closely matches guessed

distribution

2015

-06-

15

CS147Identifying Distributions

Quantile-Quantile PlotsQuantile-Quantile Plots

Page 75: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Quantile-Quantile Plots

Obtaining Theoretical Quantiles

I Need to determine where quantiles should fall for a particulardistribution

I Requires inverting CDF for that distributionI Then determining quantiles for observed pointsI Then plugging quantiles into inverted CDF

35 / 49

Obtaining Theoretical Quantiles

I Need to determine where quantiles should fall for a particulardistribution

I Requires inverting CDF for that distributionI Then determining quantiles for observed pointsI Then plugging quantiles into inverted CDF

2015

-06-

15

CS147Identifying Distributions

Quantile-Quantile PlotsObtaining Theoretical Quantiles

Page 76: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Quantile-Quantile Plots

Inverting a Distribution

I Many common distributions have already been inverted (howconvenient...)

I For others that are hard to invert, tables and approximationsoften available (nearly as convenient)

36 / 49

Inverting a Distribution

I Many common distributions have already been inverted (howconvenient...)

I For others that are hard to invert, tables and approximationsoften available (nearly as convenient)

2015

-06-

15

CS147Identifying Distributions

Quantile-Quantile PlotsInverting a Distribution

Page 77: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Quantile-Quantile Plots

Is Our Sample Data Set Normally Distributed?

I Our data set was -17, -10, -4.8, 2, 5.4, 27, 84.3, 92, 445, 2056I Does this match normal distribution?I Normal distribution doesn’t invert nicely

I But there is an approximation:

xi = 4.91(

q 0.14i − (1− qi)

0.14)

I Or invert numerically

37 / 49

Is Our Sample Data Set Normally Distributed?

I Our data set was -17, -10, -4.8, 2, 5.4, 27, 84.3, 92, 445, 2056I Does this match normal distribution?I Normal distribution doesn’t invert nicely

I But there is an approximation:

xi = 4.91(

q 0.14i − (1− qi)

0.14)

I Or invert numerically

2015

-06-

15

CS147Identifying Distributions

Quantile-Quantile PlotsIs Our Sample Data Set NormallyDistributed?

Page 78: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Quantile-Quantile Plots

Data For Example Normal Quantile-Quantile Plot

i qi yi xi1 0.05 -17.0 -1.646842 0.15 -10.0 -1.034813 0.25 -4.8 -0.672344 0.35 2.0 -0.383755 0.45 5.4 -0.125106 0.55 27.0 0.125107 0.65 84.3 0.383758 0.75 92.0 0.672349 0.85 445.0 1.03481

10 0.95 2056.0 1.64684

38 / 49

Data For Example Normal Quantile-Quantile Plot

i qi yi xi1 0.05 -17.0 -1.646842 0.15 -10.0 -1.034813 0.25 -4.8 -0.672344 0.35 2.0 -0.383755 0.45 5.4 -0.125106 0.55 27.0 0.125107 0.65 84.3 0.383758 0.75 92.0 0.672349 0.85 445.0 1.03481

10 0.95 2056.0 1.646842015

-06-

15

CS147Identifying Distributions

Quantile-Quantile PlotsData For Example Normal Quantile-QuantilePlot

Page 79: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Quantile-Quantile Plots

Example Normal Quantile-Quantile Plot

-1 0 1

-500

0

500

1000

1500

2000

2500

39 / 49

Example Normal Quantile-Quantile Plot

-1 0 1

-500

0

500

1000

1500

2000

2500

2015

-06-

15

CS147Identifying Distributions

Quantile-Quantile PlotsExample Normal Quantile-Quantile Plot

Page 80: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Quantile-Quantile Plots

Analysis

I Definitely not normalI Because it isn’t linearI Tail at high end is too long for normal

I But perhaps the lower part of graph is normal?

40 / 49

Analysis

I Definitely not normalI Because it isn’t linearI Tail at high end is too long for normal

I But perhaps the lower part of graph is normal?

2015

-06-

15

CS147Identifying Distributions

Quantile-Quantile PlotsAnalysis

Page 81: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Quantile-Quantile Plots

Quantile-Quantile Plot of Partial Data

-1 0

0

50

100

41 / 49

Quantile-Quantile Plot of Partial Data

-1 0

0

50

100

2015

-06-

15

CS147Identifying Distributions

Quantile-Quantile PlotsQuantile-Quantile Plot of Partial Data

Page 82: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Quantile-Quantile Plots

Analysis of Partial Data Plot

I Again, at highest points it doesn’t fit normal distributionI But at lower points it fits somewhat wellI So, again, this distribution looks like normal with longer tail to

right

I (Really need more data points)I You can keep this up for a good, long time

42 / 49

Analysis of Partial Data Plot

I Again, at highest points it doesn’t fit normal distributionI But at lower points it fits somewhat wellI So, again, this distribution looks like normal with longer tail to

right

I (Really need more data points)I You can keep this up for a good, long time

2015

-06-

15

CS147Identifying Distributions

Quantile-Quantile PlotsAnalysis of Partial Data Plot

Page 83: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Quantile-Quantile Plots

Analysis of Partial Data Plot

I Again, at highest points it doesn’t fit normal distributionI But at lower points it fits somewhat wellI So, again, this distribution looks like normal with longer tail to

rightI (Really need more data points)

I You can keep this up for a good, long time

42 / 49

Analysis of Partial Data Plot

I Again, at highest points it doesn’t fit normal distributionI But at lower points it fits somewhat wellI So, again, this distribution looks like normal with longer tail to

rightI (Really need more data points)

I You can keep this up for a good, long time

2015

-06-

15

CS147Identifying Distributions

Quantile-Quantile PlotsAnalysis of Partial Data Plot

Page 84: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Quantile-Quantile Plots

Analysis of Partial Data Plot

I Again, at highest points it doesn’t fit normal distributionI But at lower points it fits somewhat wellI So, again, this distribution looks like normal with longer tail to

rightI (Really need more data points)I You can keep this up for a good, long time

42 / 49

Analysis of Partial Data Plot

I Again, at highest points it doesn’t fit normal distributionI But at lower points it fits somewhat wellI So, again, this distribution looks like normal with longer tail to

rightI (Really need more data points)I You can keep this up for a good, long time

2015

-06-

15

CS147Identifying Distributions

Quantile-Quantile PlotsAnalysis of Partial Data Plot

Page 85: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Identifying Distributions Quantile-Quantile Plots

Interpreting Quantile-Quantile Plots

Mnemonic: Q-Q plot shaped like “S” has Short tails;opposite has long ones.

43 / 49

Interpreting Quantile-Quantile Plots

Mnemonic: Q-Q plot shaped like “S” has Short tails;opposite has long ones.

2015

-06-

15

CS147Identifying Distributions

Quantile-Quantile PlotsInterpreting Quantile-Quantile Plots

Page 86: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Statistics of Samples Meaning of a Sample

What is a Sample?

I How tall is a human?I Could measure every person in the worldI Or could measure everyone in this room

I Population has parametersI Real and meaningful

I Sample has statisticsI Drawn from populationI Inherently erroneous

44 / 49

What is a Sample?

I How tall is a human?I Could measure every person in the worldI Or could measure everyone in this room

I Population has parametersI Real and meaningful

I Sample has statisticsI Drawn from populationI Inherently erroneous

2015

-06-

15

CS147Statistics of Samples

Meaning of a SampleWhat is a Sample?

Page 87: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Statistics of Samples Meaning of a Sample

Sample Statistics

I How tall is a human?I People in B126 have a mean heightI People in Edwards have a different mean

I Sample mean is itself a random variableI Has own distribution

45 / 49

Sample Statistics

I How tall is a human?I People in B126 have a mean heightI People in Edwards have a different mean

I Sample mean is itself a random variableI Has own distribution

2015

-06-

15

CS147Statistics of Samples

Meaning of a SampleSample Statistics

Page 88: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Statistics of Samples Meaning of a Sample

Estimating Population from Samples

I How tall is a human?I Measure everybody in this roomI Calculate sample mean xI Assume population mean µ equals x

I What is the error in our estimate?

46 / 49

Estimating Population from Samples

I How tall is a human?I Measure everybody in this roomI Calculate sample mean xI Assume population mean µ equals x

I What is the error in our estimate?

2015

-06-

15

CS147Statistics of Samples

Meaning of a SampleEstimating Population from Samples

Page 89: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Statistics of Samples Meaning of a Sample

Estimating Error

I Sample mean is a random variable⇒ Mean has some distribution∴ Multiple sample means have “mean of means”

I Knowing distribution of means, we can estimate error

47 / 49

Estimating Error

I Sample mean is a random variable⇒ Mean has some distribution∴ Multiple sample means have “mean of means”

I Knowing distribution of means, we can estimate error

2015

-06-

15

CS147Statistics of Samples

Meaning of a SampleEstimating Error

Page 90: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Statistics of Samples Guessing the True Value

Estimating the Value of a Random Variable

I How tall is Fred?

I Suppose average human height is 170 cm∴ Fred is 170 cm tallI Yeah, right

I Safer to assume a range

48 / 49

Estimating the Value of a Random Variable

I How tall is Fred?

I Suppose average human height is 170 cm∴ Fred is 170 cm tallI Yeah, right

I Safer to assume a range

2015

-06-

15

CS147Statistics of Samples

Guessing the True ValueEstimating the Value of a Random Variable

Page 91: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Statistics of Samples Guessing the True Value

Estimating the Value of a Random Variable

I How tall is Fred?I Suppose average human height is 170 cm

∴ Fred is 170 cm tallI Yeah, right

I Safer to assume a range

48 / 49

Estimating the Value of a Random Variable

I How tall is Fred?I Suppose average human height is 170 cm

∴ Fred is 170 cm tallI Yeah, right

I Safer to assume a range

2015

-06-

15

CS147Statistics of Samples

Guessing the True ValueEstimating the Value of a Random Variable

Page 92: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Statistics of Samples Guessing the True Value

Estimating the Value of a Random Variable

I How tall is Fred?I Suppose average human height is 170 cm

∴ Fred is 170 cm tall

I Yeah, rightI Safer to assume a range

48 / 49

Estimating the Value of a Random Variable

I How tall is Fred?I Suppose average human height is 170 cm

∴ Fred is 170 cm tall

I Yeah, rightI Safer to assume a range

2015

-06-

15

CS147Statistics of Samples

Guessing the True ValueEstimating the Value of a Random Variable

Page 93: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Statistics of Samples Guessing the True Value

Estimating the Value of a Random Variable

I How tall is Fred?I Suppose average human height is 170 cm

∴ Fred is 170 cm tallI Yeah, right

I Safer to assume a range

48 / 49

Estimating the Value of a Random Variable

I How tall is Fred?I Suppose average human height is 170 cm

∴ Fred is 170 cm tallI Yeah, right

I Safer to assume a range

2015

-06-

15

CS147Statistics of Samples

Guessing the True ValueEstimating the Value of a Random Variable

Page 94: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Statistics of Samples Guessing the True Value

Estimating the Value of a Random Variable

I How tall is Fred?I Suppose average human height is 170 cm

∴ Fred is 170 cm tallI Yeah, right

I Safer to assume a range

48 / 49

Estimating the Value of a Random Variable

I How tall is Fred?I Suppose average human height is 170 cm

∴ Fred is 170 cm tallI Yeah, right

I Safer to assume a range

2015

-06-

15

CS147Statistics of Samples

Guessing the True ValueEstimating the Value of a Random Variable

Page 95: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Statistics of Samples Guessing the True Value

Confidence Intervals

I How tall is Fred?

I Suppose 90% of humans are between 155 and 190 cm∴ Fred is between 155 and 190 cm

I We are 90% confident that Fred is between 155 and 190 cm

49 / 49

Confidence Intervals

I How tall is Fred?

I Suppose 90% of humans are between 155 and 190 cm∴ Fred is between 155 and 190 cm

I We are 90% confident that Fred is between 155 and 190 cm

2015

-06-

15

CS147Statistics of Samples

Guessing the True ValueConfidence Intervals

Page 96: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Statistics of Samples Guessing the True Value

Confidence Intervals

I How tall is Fred?I Suppose 90% of humans are between 155 and 190 cm

∴ Fred is between 155 and 190 cmI We are 90% confident that Fred is between 155 and 190 cm

49 / 49

Confidence Intervals

I How tall is Fred?I Suppose 90% of humans are between 155 and 190 cm

∴ Fred is between 155 and 190 cmI We are 90% confident that Fred is between 155 and 190 cm

2015

-06-

15

CS147Statistics of Samples

Guessing the True ValueConfidence Intervals

Page 97: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Statistics of Samples Guessing the True Value

Confidence Intervals

I How tall is Fred?I Suppose 90% of humans are between 155 and 190 cm∴ Fred is between 155 and 190 cm

I We are 90% confident that Fred is between 155 and 190 cm

49 / 49

Confidence Intervals

I How tall is Fred?I Suppose 90% of humans are between 155 and 190 cm∴ Fred is between 155 and 190 cm

I We are 90% confident that Fred is between 155 and 190 cm

2015

-06-

15

CS147Statistics of Samples

Guessing the True ValueConfidence Intervals

Page 98: CS 147: Computer Systems Performance Analysisgeoff/classes/hmc.cs147.201201/...INormalizes units of these quantities into ratio or percentage IOften abbreviated C.O.V. or C.V. 2015-06-15

Statistics of Samples Guessing the True Value

Confidence Intervals

I How tall is Fred?I Suppose 90% of humans are between 155 and 190 cm∴ Fred is between 155 and 190 cm

I We are 90% confident that Fred is between 155 and 190 cm

49 / 49

Confidence Intervals

I How tall is Fred?I Suppose 90% of humans are between 155 and 190 cm∴ Fred is between 155 and 190 cm

I We are 90% confident that Fred is between 155 and 190 cm

2015

-06-

15

CS147Statistics of Samples

Guessing the True ValueConfidence Intervals