Model Fitting
description
Transcript of Model Fitting
![Page 1: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/1.jpg)
Model Fitting
Jean-Yves Le Boudec
1
![Page 2: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/2.jpg)
Contents
1. What is model fitting ?2. Linear Regression
3. Linear regression with norm minimization4. Choosing a distribution
5. Heavy Tail
2
![Page 3: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/3.jpg)
Virus Infection DataWe would like to capture the growth of infected hosts
(explanatory model)
An exponential model seems appropriate
How can we fit the model, in particular, what is the value of ?
3
![Page 4: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/4.jpg)
Least Square Fit of Virus Infection Data
4
Least square fit
= 0.5173
Mean doubling time 1.34 hours
Prediction at +6 hours: 100 000 hosts
![Page 5: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/5.jpg)
Least Square Fit of Virus Infection Data In Log Scale
5
Least square fit
= 0.39
Mean doubling time 1.77 hours
Prediction at +6 hours: 39 000 hosts
![Page 6: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/6.jpg)
Compare the Two
6
LS fit in natural scale
LS fit in log scale
![Page 7: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/7.jpg)
Which Fitting Method should I use ?Which optimization criterion should I use ?
The answer is in a statistical model.Model not only the interesting part, but also the noise
For example
7
= 0.5173
![Page 8: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/8.jpg)
How can I tell which is correct ?
8
= 0.39
![Page 9: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/9.jpg)
Look at Residuals= validate model
9
![Page 10: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/10.jpg)
10
![Page 11: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/11.jpg)
Least Square Fit = Gaussian iid NoiseAssume model (homoscedasticity)
The theorem says: minimize least squares = compute MLE for this model
This is how we computed the estimates for the virus example
11
![Page 12: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/12.jpg)
Least Square and Projection
Skrivañ war an daol petra zo: data point, predicted response and estimated parameter for virus example
12
Data point
Predicted response
Estimated parameter
ManifoldWhere the data point would lie if there would be no noise
![Page 13: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/13.jpg)
Confidence Intervals
13
![Page 14: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/14.jpg)
14
![Page 15: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/15.jpg)
Robustness to « Outliers »
15
![Page 16: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/16.jpg)
A Simple Example
Least Square
Model: noise
What is m ?
Confidence interval ?
L1 Norm Minimization
Model : noise
What is m ?
Confidence interval ?
16
![Page 17: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/17.jpg)
Mean Versus Median
17
![Page 18: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/18.jpg)
2. Linear RegressionAlso called « ANOVA » (Analysis of Variance »)
= least square + linear dependence on parameter
A special case where computations are easy
18
![Page 19: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/19.jpg)
Example 4.3
What is the parameter ?Is it a linear model ?How many degrees of freedom ?What do we assume on i?
What is the matrix X ?
19
![Page 20: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/20.jpg)
20
![Page 21: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/21.jpg)
Does this model have full rank ?
21
![Page 22: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/22.jpg)
Some Terminology
xi are called explanatory variableAssumed fixed and known
yi are called response variablesThey are « the data »Assumed to be one sample output of the model 22
![Page 23: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/23.jpg)
Least Square and Projection
23
Data point
Predicted response
Estimated parameter
ManifoldWhere the data point would lie if there would be no noise
![Page 24: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/24.jpg)
Solution of the Linear Regression Model
24
![Page 25: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/25.jpg)
Least Square and ProjectionThe theorem gives H and K
25
residuals
Predicted response
Estimated parameter
ManifoldWhere the data point would lie if there would be no noise
data
![Page 26: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/26.jpg)
The Theorem Gives with Confidence Interval
26
![Page 27: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/27.jpg)
SSRConfidence Intervals use the quantity s
s2 is called « Sum of Squared Residuals »
27
residuals
Predicted response
data
![Page 28: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/28.jpg)
Validate the Assumptions with Residuals
28
![Page 29: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/29.jpg)
ResidualsResiduals are given by the theorem
29
residuals
Predicted response
data
![Page 30: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/30.jpg)
Standardized ResidualsThe residuals ei are an estimate of the noise terms i
They are not (exactly) normal iid
The variance of ei is ????
A: 1- Hi,i
Standardized residuals are not exactly normal iid either but their variance is 1
30
![Page 31: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/31.jpg)
Which of these two models could be a linear regression model ?
A: both
Linear regression does not mean that yi is a linear function of xi
Achtung: There is a hidden assumptionNoise is iid gaussian -> homoscedasticity
31
![Page 32: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/32.jpg)
32
![Page 33: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/33.jpg)
3. Linear Regression with L1 norm minimization
= L1 norm minimization + linear dependency on parameterMore robustLess traditional
33
![Page 34: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/34.jpg)
This is convex programming
34
![Page 35: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/35.jpg)
35
![Page 36: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/36.jpg)
Confidence IntervalsNo closed form
Compare to median !
Boostrap:How ?
36
![Page 37: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/37.jpg)
37
![Page 38: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/38.jpg)
4. Choosing a DistributionKnow a catalog of distributions, guess a fit
ShapeKurtosis, SkewnessPower lawsHazard Rate
Fit Verify the fit visually or with a test (see later)
38
![Page 39: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/39.jpg)
Distribution ShapeDistributions have a shape
By definition: the shape is what remains the same when we ShiftRescale
Example: normal distribution: what is the shape parameter ?
Example: exponential distribution: what is the shape parameter ?
39
![Page 40: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/40.jpg)
Standard DistributionsIn a given catalog of distributions, we give only the distributions with different shapes. For each shape, we pick one particular distribution, which we call standard.
Standard normal: N(0,1)
Standard exponential: Exp(1)
Standard Uniform: U(0,1)
40
![Page 41: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/41.jpg)
Log-Normal Distribution
41
![Page 42: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/42.jpg)
42
![Page 43: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/43.jpg)
Skewness and Curtosis
43
![Page 44: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/44.jpg)
Power Laws and Pareto Distribution
44
![Page 45: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/45.jpg)
Complementary Distribution FunctionsLog-log Scales
45
ParetoLognormal Normal
![Page 46: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/46.jpg)
Zipf’s Law
46
![Page 47: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/47.jpg)
47
![Page 48: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/48.jpg)
Hazard RateInterpretation: probability that a flow dies in next dt seconds given still alive
Used to classify distribsAging
Memoriless
Fat tail
Ex: normal ? Exponential ? Pareto ? Log Normal ? 48
![Page 49: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/49.jpg)
The Weibull DistributionStandard Weibull CDF:
Aging for c > 1Memoriless for c = 1Fat tailed for c <1
49
![Page 50: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/50.jpg)
Fitting A DistributionAssume iidUse maximum likelihoodEx: assume gaussian; what are parameters ?
Frequent issuesCensoringCombinations
50
![Page 51: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/51.jpg)
Censored DataWe want to fit a log normal distrib, but we have only data samples with values less than some max
Lognormal is fat tailed so we cannot ignore the tail
Idea: use the model
and estimate F0 and a (truncation threshold)
51
![Page 52: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/52.jpg)
52
![Page 53: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/53.jpg)
CombinationsWe want to fit a log normal distrib to the body and pareto to the tail
Model:
MLE satisfies
53
![Page 54: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/54.jpg)
54
![Page 55: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/55.jpg)
5. Heavy TailsRecall what fat tail isHeavier than fat:
55
![Page 56: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/56.jpg)
Heavy Tail means Central Limit does not hold
Central limit theorem:
a sum of n independent random variables with finite second moment tends to have a normal distribution, when n is large
explains why we can often use normal assumption
But it does not always hold. It does not hold if random variables have infinite second moment.
56
![Page 57: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/57.jpg)
Central Limit Theorem for Heavy Tails
57
One Sample of 10000 pointsPareto p = 1
normal qqplot histogram complementary d.f.log-log
![Page 58: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/58.jpg)
58
1 sample, 10000 points average of 1000 samples
p=1
p=1.5
p=2
p=2.5
p=3
![Page 59: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/59.jpg)
Convergence for heavy tailed distributions
59
![Page 60: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/60.jpg)
Importance of Second Moment
60
![Page 61: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/61.jpg)
RWP with Heavy TailStationary ?
61
![Page 62: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/62.jpg)
Evidence of Heavy Tail
62
![Page 63: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/63.jpg)
Testing Heavy TailAssume you have very large data set
Else no statement can be made
One can look at empirical cdf in log scale
63
![Page 64: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/64.jpg)
Taqqu’s methodA better method (numerically safer is as follows).
Aggregate data multiple times
64
![Page 65: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/65.jpg)
We should have
and
If ≈ log ( m2 / m1) then measure p = / pest = average of all p’s
65
![Page 66: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/66.jpg)
66
Example
log ( 2) / plog ( 2)
![Page 67: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/67.jpg)
Evidence of Heavy Tail
67
p = 1.08 ± 0.1
![Page 68: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/68.jpg)
A Load Generator: SurgeDesigned to create load for a web serverUsed in next labSophisticated load modelIt is an example of a benchmark, there are many others – see lecture
68
![Page 69: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/69.jpg)
User Equivalent ModelIdea: find a stochastice model that represents user wellUser modelled as sequence of downloads, followed by “think time”
Tool can implement several “user equivalents”
Used to generate real work over TCP connections
69
![Page 70: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/70.jpg)
Characterization of UE
70
Weibull dsitributions
![Page 71: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/71.jpg)
Successive file requests are not independent
Q: What would be the distribution if they were independent ?A: geometric
71
![Page 72: Model Fitting](https://reader036.fdocuments.us/reader036/viewer/2022062521/56816719550346895ddb8e1c/html5/thumbnails/72.jpg)
Fitting the distributions
Done by Surge authors with aest tool + ad-hoc (least quare fit of histogram)What other method could one use ?A: maximum likelihood with numerical optimization – issue is non iid-ness
72