Model Fitting

Model Fitting

Jean-Yves Le Boudec

1

Contents

1. What is model fitting ?2. Linear Regression

3. Linear regression with norm minimization4. Choosing a distribution

5. Heavy Tail

2

Virus Infection DataWe would like to capture the growth of infected hosts

(explanatory model)

An exponential model seems appropriate

How can we fit the model, in particular, what is the value of ?

3

Least Square Fit of Virus Infection Data

4

Least square fit

= 0.5173

Mean doubling time 1.34 hours

Prediction at +6 hours: 100 000 hosts

Least Square Fit of Virus Infection Data In Log Scale

5

Least square fit

= 0.39

Mean doubling time 1.77 hours

Prediction at +6 hours: 39 000 hosts

Compare the Two

6

LS fit in natural scale

LS fit in log scale

Which Fitting Method should I use ?Which optimization criterion should I use ?

The answer is in a statistical model.Model not only the interesting part, but also the noise

For example

7

= 0.5173

How can I tell which is correct ?

8

= 0.39

Look at Residuals= validate model

9

Least Square Fit = Gaussian iid NoiseAssume model (homoscedasticity)

The theorem says: minimize least squares = compute MLE for this model

This is how we computed the estimates for the virus example

11

Least Square and Projection

Skrivañ war an daol petra zo: data point, predicted response and estimated parameter for virus example

12

Data point

Predicted response

Estimated parameter

ManifoldWhere the data point would lie if there would be no noise

Confidence Intervals

13

Robustness to « Outliers »

15

A Simple Example

Least Square

Model: noise

What is m ?

Confidence interval ?

L1 Norm Minimization

Model : noise

What is m ?

Confidence interval ?

16

Mean Versus Median

17

2. Linear RegressionAlso called « ANOVA » (Analysis of Variance »)

= least square + linear dependence on parameter

A special case where computations are easy

18

Example 4.3

What is the parameter ?Is it a linear model ?How many degrees of freedom ?What do we assume on i?

What is the matrix X ?

19

Does this model have full rank ?

21

Some Terminology

xi are called explanatory variableAssumed fixed and known

yi are called response variablesThey are « the data »Assumed to be one sample output of the model 22

Least Square and Projection

23

Data point

Predicted response

Estimated parameter


Solution of the Linear Regression Model

24

Least Square and ProjectionThe theorem gives H and K

25

residuals

Predicted response

Estimated parameter


data

The Theorem Gives with Confidence Interval

26

SSRConfidence Intervals use the quantity s

s2 is called « Sum of Squared Residuals »

27

residuals

Predicted response

data

Validate the Assumptions with Residuals

28

ResidualsResiduals are given by the theorem

29

residuals

Predicted response

data

Standardized ResidualsThe residuals ei are an estimate of the noise terms i

They are not (exactly) normal iid

The variance of ei is ????

A: 1- Hi,i

Standardized residuals are not exactly normal iid either but their variance is 1

30

Which of these two models could be a linear regression model ?

A: both

Linear regression does not mean that yi is a linear function of xi

Achtung: There is a hidden assumptionNoise is iid gaussian -> homoscedasticity

31

3. Linear Regression with L1 norm minimization

= L1 norm minimization + linear dependency on parameterMore robustLess traditional

33

This is convex programming

34

Confidence IntervalsNo closed form

Compare to median !

Boostrap:How ?

36

4. Choosing a DistributionKnow a catalog of distributions, guess a fit

ShapeKurtosis, SkewnessPower lawsHazard Rate

Fit Verify the fit visually or with a test (see later)

38

Distribution ShapeDistributions have a shape

By definition: the shape is what remains the same when we ShiftRescale

Example: normal distribution: what is the shape parameter ?

Example: exponential distribution: what is the shape parameter ?

39

Standard DistributionsIn a given catalog of distributions, we give only the distributions with different shapes. For each shape, we pick one particular distribution, which we call standard.

Standard normal: N(0,1)

Standard exponential: Exp(1)

Standard Uniform: U(0,1)

40

Log-Normal Distribution

41

Skewness and Curtosis

43

Power Laws and Pareto Distribution

44

Complementary Distribution FunctionsLog-log Scales

45

ParetoLognormal Normal

Zipf’s Law

46

Hazard RateInterpretation: probability that a flow dies in next dt seconds given still alive

Used to classify distribsAging

Memoriless

Fat tail

Ex: normal ? Exponential ? Pareto ? Log Normal ? 48

The Weibull DistributionStandard Weibull CDF:

Aging for c > 1Memoriless for c = 1Fat tailed for c <1

49

Fitting A DistributionAssume iidUse maximum likelihoodEx: assume gaussian; what are parameters ?

Frequent issuesCensoringCombinations

50

Censored DataWe want to fit a log normal distrib, but we have only data samples with values less than some max

Lognormal is fat tailed so we cannot ignore the tail

Idea: use the model

and estimate F0 and a (truncation threshold)

51

CombinationsWe want to fit a log normal distrib to the body and pareto to the tail

Model:

MLE satisfies

53

5. Heavy TailsRecall what fat tail isHeavier than fat:

55

Heavy Tail means Central Limit does not hold

Central limit theorem:

a sum of n independent random variables with finite second moment tends to have a normal distribution, when n is large

explains why we can often use normal assumption

But it does not always hold. It does not hold if random variables have infinite second moment.

56

Central Limit Theorem for Heavy Tails

57

One Sample of 10000 pointsPareto p = 1

normal qqplot histogram complementary d.f.log-log

58

1 sample, 10000 points average of 1000 samples

p=1

p=1.5

p=2

p=2.5

p=3

Convergence for heavy tailed distributions

59

Importance of Second Moment

60

RWP with Heavy TailStationary ?

61

Evidence of Heavy Tail

62

Testing Heavy TailAssume you have very large data set

Else no statement can be made

One can look at empirical cdf in log scale

63

Taqqu’s methodA better method (numerically safer is as follows).

Aggregate data multiple times

64

We should have

and

If ≈ log ( m2 / m1) then measure p = / pest = average of all p’s

65

66

Example

log ( 2) / plog ( 2)

Evidence of Heavy Tail

67

p = 1.08 ± 0.1

A Load Generator: SurgeDesigned to create load for a web serverUsed in next labSophisticated load modelIt is an example of a benchmark, there are many others – see lecture

68

User Equivalent ModelIdea: find a stochastice model that represents user wellUser modelled as sequence of downloads, followed by “think time”

Tool can implement several “user equivalents”

Used to generate real work over TCP connections

69

Characterization of UE

70

Weibull dsitributions

Successive file requests are not independent

Q: What would be the distribution if they were independent ?A: geometric

71

Fitting the distributions

Done by Surge authors with aest tool + ad-hoc (least quare fit of histogram)What other method could one use ?A: maximum likelihood with numerical optimization – issue is non iid-ness

72

Model Fitting

Documents

Transcript of Model Fitting