Applied Statistics III

Applied Statistics Vincent JEANNIN – ESGF 4IFM

Q1 2012

1

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

2

Summary of the session (est. 4.5h) • Reminders of last session • Multiple regression • Introduction to econometrics • Estimations • Games: beat the statistics

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Reminders of last session

3

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

3 Methods

• Historical • Parametrical • Monte-Carlo

4

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Options: what to look at to calculate the VaR?

4 risk factors: • Underlying price • Interest rate • Volatility • Time

4 answers: • Delta/Gamma approximation knowing the distribution of the underlying • Rho approximation knowing the distribution of the underlying rate • Vega approximation knowing the distribution of implied volatility • Theta (time decay)

Yes but,… Does the underling price/rate/volatility vary independently?

Might be a bit more complicated than expected…

5

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Portfolio scale: what to look at to calculate the VaR?

Big question, is the VaR additive?

NO! Keywords for the future: covariance, correlation, diversification

6

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

VAR 𝑎𝑋 + 𝑏𝑌 = 𝑎2𝑉𝐴𝑅 𝑋 + 𝑏2𝑉𝐴𝑅 𝑌 + 2𝑎𝑏𝐶𝑂𝑉(𝑋, 𝑌)

Parametric VaR on 2 assets?

𝑃 𝑋 ≤ −1.645 ∗ 𝜎 + 𝜇 = 0.05

𝑃 𝑋 ≤ −2.326 ∗ 𝜎 + 𝜇 = 0.01

Asset 1 Mean 0

SD 2.34% Weight 50%

Asset 2 Mean 0

SD 1.50% Weight 50%

Correlation 0.59

What is the VaR (95%)?

2.83%

7

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Linear regression model

Minimize the sum of the square vertical distances between the observations and the linear approximation

𝑦 = 𝑓 𝑥 = 𝑎𝑥 + 𝑏

Residual ε

OLS: Ordinary Least Square

Minimising residuals

𝐸 = 𝜀𝑖2

𝑛

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏 2

𝑛

𝑖=1

𝑎 =𝐶𝑜𝑣𝑥𝑦

𝜎2𝑥

𝑏 = 𝑦 − 𝑎 𝑥

8

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑟 =𝐶𝑜𝑣𝑥𝑦

𝜎𝑥𝜎𝑦 Value between -1 and 1

Dispersion Regression

Total Dispersion 𝑅2 =

9

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

10

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

11

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Differentiation can happen before the OLS

What do you suggest?

12

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑌𝐷𝑖𝑓𝑓 = ln(𝑌)

Let’s create a new variable

Magic!

13

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Only one parameters to estimate: • Slope β

Minimising residuals

𝐸 = 𝜀𝑖2

𝑛

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖2

𝑛

𝑖=1

When E is minimal?

When partial derivatives i.r.w. a is 0

New idea… No intercept

14

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝐸 = 𝜀𝑖2

𝑛

𝑖=1

= 𝑦𝑖 − 𝑎𝑥𝑖2

𝑛

𝑖=1

𝜕𝐸

𝜕𝑎= −2𝑥𝑖𝑦𝑖 + 2𝑎𝑥𝑖

2

𝑛

𝑖=1

= 0

𝑦𝑖 − 𝑎𝑥𝑖2 = 𝑦𝑖

2 − 2𝑎𝑥𝑖𝑦𝑖 + 𝑎2𝑥𝑖2

Quick high school reminder if necessary…

𝑥𝑖𝑦𝑖 − 𝑎𝑥𝑖2

𝑛

𝑖=1

= 0

𝑎 ∗ 𝑥𝑖2

𝑛

𝑖=1

= 𝑥𝑖𝑦𝑖

𝑛

𝑖=1

𝑎 = 𝑥𝑖𝑦𝑖

𝑛𝑖=1

𝑥𝑖2𝑛

𝑖=1

𝑎 =𝑥𝑖𝑦𝑖

𝑥𝑖2

Any better?

Multiple regressions

15

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

𝑦 = 𝑏0 + 𝑏1𝑋1+𝑏2𝑋2+…+𝑏𝑛𝑋𝑛 + ε

More than one explanatory variables

Choosing factors can be difficult

Much tougher without software

16

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Variables may not be dependent form each other

Financial methods such APT (Arbitrage Pricing Theory) tries to have pure and independent factors

Used a lot in economics

R-Square is very often very poor

17

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Ratio Investment / GDP , World Bank, developing countries

𝑅 = 19.5 −5.8𝐶𝑜𝑟𝑟𝑢𝑝𝑡𝑖𝑜𝑛 + 6.3𝐶𝑜𝑟𝑟𝑢𝑝𝑡𝑖𝑜𝑛𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 + 2𝑆𝑐ℎ𝑜𝑜𝑙 − 1.1𝐺𝐷𝑃 − 2𝐷𝑖𝑠𝑡𝑜𝑟𝑡𝑖𝑜𝑛

Let’s discuss…

• Corruption: current corruption • CorruptionPrediction: future corruption • School: level of education • GDP: GDP • Distortion: how badly policies are run

18

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Opposite effect of corruption variables

Any logic with this?

The current level of corruption decreases investment

The future level of corruption increases investment

Investors learn how to live with corruption…

19

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

R-Squared is 0.24, very poor…

• General to specific: this starts off with a comprehensive model, including all the likely explanatory variables, then simplifies it.

• Specific to general: this begins with a simple model that is easy to understand, then explanatory variables are added to improve the model’s explanatory power.

How to find the right model?

20

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Golden rules

Be logic

Have the best R-Squared

Not over complicate

Introduction to econometrics

21

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

3 steps

Identify

Fit

Forecast

𝑂𝑏𝑠 = 𝑀𝑜𝑑𝑒𝑙 + 𝜀 with 𝜀 being a white noise What is a model?

22

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

3 components

Trend

Seasonality

Residual

23

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Stationary series are easier to forecast… Transform it!

A series is stationary if the mean and the variance are stable

Which one is more likely to be stationary?

24

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Properties of stationary series

(𝑌1, 𝑌2, 𝑌3, … , 𝑌𝑛)

(𝑌2, 𝑌3, 𝑌4, … , 𝑌𝑛+1)

Same distribution of the following

Distribution not time dependent

Rare occurrence

Stationarity accepted if

𝐸(𝑌𝑡) = 𝜇 Constant in the time

𝐶𝑜𝑣(𝑌𝑡 , 𝑌𝑡−𝑛) Depends only on n

25

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

About the residuals…

White noise!

Normality test

Have an idea with

Skewness

Kurtosis

Proper tests: KS, Durbin Watson, Portmanteau,…

26

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

eps<-resid(TReg)

ks.test(eps, "pnorm")

layout(matrix(1:4,2,2))

plot(TReg)

27

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

lag.plot(DATA$Val, 9, do.lines=FALSE)

Differentiation seems to be interesting

28

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Check ACF/PACF for autocorrelation

29

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

𝑋𝑡 = 𝑐 + 𝜑1𝑋𝑡−1 + 𝜑2𝑋𝑡−2 + ⋯+ 𝜑𝑛𝑋𝑡−𝑛 + 𝜀𝑡

𝜑𝑛 Parameters of the model

𝜀𝑛 White noise

Auto Regressive model

AR(n)

Estimations

30

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Small sample: Binomial Distribution

Large sample: Normal Distribution

)()1()!(!

!)( xnx pp

xnx

nxf

)1(, pnpnpN

n is the size of the sample, x, the number individuals with the particular characteristic

𝐸 𝑋 = 𝑛𝑝

𝑉 𝑋 = 𝑛𝑝(1 − 𝑝)

31

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Binomial Distribution

𝐸 𝑌 = 𝑝 𝑉 𝑌 =𝑝(1 − 𝑝)

𝑛

Normal approximation

𝑌~𝑁 𝑝,𝑝(1 − 𝑝)

𝑛 Standardisation possible

𝑌∗~𝑁 0,1

𝑌∗ =𝑌 − 𝑝

𝑝(1 − 𝑝)𝑛

Normal approximation works only if

𝑛𝑝 ≥ 5 𝑛(1 − 𝑝) ≥ 5

Estimate a proportion 𝑌 =

𝑋

𝑛

32

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

𝑃 𝑝1 < 𝑝 < 𝑝2 = 0.95 Let’s look for p with a 95% confidence interval

Easy solve!

𝑃 𝜇 − 1.96 ∗ 𝜎 ≤ 𝑋 ≤ 𝜇 + 1.96 ∗ 𝜎 = 0.95

33

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

52 Heads out of 100 toss…

𝑌~𝑁 0.52,0.04996

95% confidence interval

𝑝1 = 0.62

𝑌~𝑁 ? , ?

𝑝2 = 0.42

34

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Mean estimation

Problem

The SD of the actual population is unknown

Mean has a Student’s distribution

Similarity with normal

35

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Student’s properties

• It is symmetric about its mean • It has a mean of zero • It has a standard deviation and variance greater than 1. • There are actually many t distributions, one for each degree of freedom • As the sample size increases, the t distribution approaches the normal distribution. • It is bell shaped. • The t-scores can be negative or positive, but the probabilities are always positive.

Normal-ish distribution in a discrete environment with a confidence interval

36

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Student’s Statistic

S=𝑛

𝑛−1𝜎

𝑃 𝑥 −𝑆

𝑛∗ 𝑡𝛼/2 < 𝜇 < 𝑥 +

𝑆

𝑛∗ 𝑡𝛼/2 = 0.95

Degree of freedom

n-1

37

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

IPO Premiums IPO1 / 12% IPO2 / 15% IPO3 / 13% IPO4 / 18% IPO5 / 20% IPO6 / 5%

SD: 𝜎=4.81%

DF: 𝐷𝐹=5

S: 𝑆=5.27%

t: 𝑡=2.571

𝜇1: 𝜇1=19.36%

𝑥 : 𝑥 =13.83%

𝜇2: 𝜇2=8.30%

38

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Is a frequency difference significant?

𝑌1~𝑁 𝑝1,𝑝1(1 − 𝑝1)

𝑛1 𝑌2~𝑁 𝑝2,

𝑝2(1 − 𝑝2)

𝑛2

𝑍 = 𝑌1 − 𝑌2

𝐸(𝑍) = 𝐸(𝑌1) − E(𝑌2)

𝑉(𝑍) = 𝑉(𝑌1) + V(𝑌2) Assumption of independence

𝑍~𝑁 𝑝1 − 𝑝2,𝑝1(1 − 𝑝1)

𝑛1+

𝑝2(1 − 𝑝2)

𝑛2

39

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Observations 100 Friendly Takeover, 80 success 60 Hostiles Takeover, 50 success

Is the difference significant? 95% confidence

Friendly 80%

Hostiles 83%

Global frequency

𝑝 =𝑛1𝐹1 + 𝑛2𝐹2

𝑛1 +𝑛2 𝑝 =

80 + 50

100 + 60= 81.25%

40

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

𝑡∗ =𝐹1 − 𝐹2

𝑝 (1 − 𝑝 )1𝑛1

+1𝑛2

𝑡∗ = −0.52298

If 𝑃(−1.96 < 𝑡∗ < 1.96) = 0.95the frequencies are the same

with a 95% confidence interval

The frequencies are equal

Their difference is not significant

Actual difference due to fluctuation of samples

41

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Is a SD difference significant?

Fisher Snedecor distribution

𝑆𝑥 2

𝑆𝑦 2

𝜎𝑝 2

𝜎𝑞 2

Total variance

Total variance

Sample variance

Sample variance

𝑆𝑥 2

𝑆𝑦 2∗𝜎𝑝 2

𝜎𝑞 2~𝐹(𝑛𝑝 − 1, 𝑛𝑞 − 1)

42

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

𝜎𝑝 2 = 𝜎𝑞 2 You want to test

𝑆𝑥 2

𝑆𝑦 2~𝐹(𝑛𝑝 − 1, 𝑛𝑞 − 1)

43

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

𝑆𝑥 2

𝑆𝑦 2~𝐹(5,4)

44

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

95% confidence interval F-Table

𝑆𝑥 2

𝑆𝑦 2< 6.26 If SD are equals (at 95% CI)

Games: Beat the Statistics

45

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Is Martingale safe?

Bet on 2:1, double when you lose…

Risk of ruin?

46

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Bet on 2:1

Is this really 2:1? 18

37= 0.4865

Obvious how casino is making money!

The probability of the casino to win is always bigger than the probability of the player to win!

47

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

You’ll be right with a martingale… Eventually! But when?

The 2011 recorded record series is 26 reds in Las Vegas, Nevada

You were on the black and hoping the reversal, you begun with $2

At the 27 round you need

227 = $134,217,728

And don’t forget you lost already

21 + 22 + ⋯+ 226 = $134,217,726

Casino limit stakes

Your pocket may not be deep enough anyway!

And if you win at the 27th roll, you made…

$2 Quite risky…

48

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

“No one can possibly win at roulette unless he steals money from the table while the

croupier isn’t looking.” — Albert Einstein

49

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

Binomial approach

𝑃 𝑥 = 𝐶𝑥𝑛𝑝𝑥(1 − 𝑝)𝑛−𝑥

50

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 4

IFM

Q1

20

12

$255, $1 flat bet

$255, $1 start, martingale double when you lose

Ruin in 255 times for flat bet

Ruin in 8 times for martingale

1,000,000 times comparison, 100 rounds maximum

51

vin

zjea

nn

in@

ho

tmai

l.co

m

ESG

F 5

IFM

Q1

20

12

Conclusion

Multiple Regression

Econometrics

Estimations

Statistics & Games

Applied Statistics III

Economy & Finance

Transcript of Applied Statistics III