Applied Statistics III
-
Upload
vincent-jeannin -
Category
Economy & Finance
-
view
273 -
download
3
description
Transcript of Applied Statistics III
Applied Statistics Vincent JEANNIN – ESGF 4IFM
Q1 2012
1
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
2
Summary of the session (est. 4.5h) • Reminders of last session • Multiple regression • Introduction to econometrics • Estimations • Games: beat the statistics
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Reminders of last session
3
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
3 Methods
• Historical • Parametrical • Monte-Carlo
4
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Options: what to look at to calculate the VaR?
4 risk factors: • Underlying price • Interest rate • Volatility • Time
4 answers: • Delta/Gamma approximation knowing the distribution of the underlying • Rho approximation knowing the distribution of the underlying rate • Vega approximation knowing the distribution of implied volatility • Theta (time decay)
Yes but,… Does the underling price/rate/volatility vary independently?
Might be a bit more complicated than expected…
5
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Portfolio scale: what to look at to calculate the VaR?
Big question, is the VaR additive?
NO! Keywords for the future: covariance, correlation, diversification
6
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
VAR 𝑎𝑋 + 𝑏𝑌 = 𝑎2𝑉𝐴𝑅 𝑋 + 𝑏2𝑉𝐴𝑅 𝑌 + 2𝑎𝑏𝐶𝑂𝑉(𝑋, 𝑌)
Parametric VaR on 2 assets?
𝑃 𝑋 ≤ −1.645 ∗ 𝜎 + 𝜇 = 0.05
𝑃 𝑋 ≤ −2.326 ∗ 𝜎 + 𝜇 = 0.01
Asset 1 Mean 0
SD 2.34% Weight 50%
Asset 2 Mean 0
SD 1.50% Weight 50%
Correlation 0.59
What is the VaR (95%)?
2.83%
7
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Linear regression model
Minimize the sum of the square vertical distances between the observations and the linear approximation
𝑦 = 𝑓 𝑥 = 𝑎𝑥 + 𝑏
Residual ε
OLS: Ordinary Least Square
Minimising residuals
𝐸 = 𝜀𝑖2
𝑛
𝑖=1
= 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏 2
𝑛
𝑖=1
𝑎 =𝐶𝑜𝑣𝑥𝑦
𝜎2𝑥
𝑏 = 𝑦 − 𝑎 𝑥
8
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝑟 =𝐶𝑜𝑣𝑥𝑦
𝜎𝑥𝜎𝑦 Value between -1 and 1
Dispersion Regression
Total Dispersion 𝑅2 =
9
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
10
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
11
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Differentiation can happen before the OLS
What do you suggest?
12
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝑌𝐷𝑖𝑓𝑓 = ln(𝑌)
Let’s create a new variable
Magic!
13
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Only one parameters to estimate: • Slope β
Minimising residuals
𝐸 = 𝜀𝑖2
𝑛
𝑖=1
= 𝑦𝑖 − 𝑎𝑥𝑖2
𝑛
𝑖=1
When E is minimal?
When partial derivatives i.r.w. a is 0
New idea… No intercept
14
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝐸 = 𝜀𝑖2
𝑛
𝑖=1
= 𝑦𝑖 − 𝑎𝑥𝑖2
𝑛
𝑖=1
𝜕𝐸
𝜕𝑎= −2𝑥𝑖𝑦𝑖 + 2𝑎𝑥𝑖
2
𝑛
𝑖=1
= 0
𝑦𝑖 − 𝑎𝑥𝑖2 = 𝑦𝑖
2 − 2𝑎𝑥𝑖𝑦𝑖 + 𝑎2𝑥𝑖2
Quick high school reminder if necessary…
𝑥𝑖𝑦𝑖 − 𝑎𝑥𝑖2
𝑛
𝑖=1
= 0
𝑎 ∗ 𝑥𝑖2
𝑛
𝑖=1
= 𝑥𝑖𝑦𝑖
𝑛
𝑖=1
𝑎 = 𝑥𝑖𝑦𝑖
𝑛𝑖=1
𝑥𝑖2𝑛
𝑖=1
𝑎 =𝑥𝑖𝑦𝑖
𝑥𝑖2
Any better?
Multiple regressions
15
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
𝑦 = 𝑏0 + 𝑏1𝑋1+𝑏2𝑋2+…+𝑏𝑛𝑋𝑛 + ε
More than one explanatory variables
Choosing factors can be difficult
Much tougher without software
16
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Variables may not be dependent form each other
Financial methods such APT (Arbitrage Pricing Theory) tries to have pure and independent factors
Used a lot in economics
R-Square is very often very poor
17
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Ratio Investment / GDP , World Bank, developing countries
𝑅 = 19.5 −5.8𝐶𝑜𝑟𝑟𝑢𝑝𝑡𝑖𝑜𝑛 + 6.3𝐶𝑜𝑟𝑟𝑢𝑝𝑡𝑖𝑜𝑛𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 + 2𝑆𝑐ℎ𝑜𝑜𝑙 − 1.1𝐺𝐷𝑃 − 2𝐷𝑖𝑠𝑡𝑜𝑟𝑡𝑖𝑜𝑛
Let’s discuss…
• Corruption: current corruption • CorruptionPrediction: future corruption • School: level of education • GDP: GDP • Distortion: how badly policies are run
18
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Opposite effect of corruption variables
Any logic with this?
The current level of corruption decreases investment
The future level of corruption increases investment
Investors learn how to live with corruption…
19
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
R-Squared is 0.24, very poor…
• General to specific: this starts off with a comprehensive model, including all the likely explanatory variables, then simplifies it.
• Specific to general: this begins with a simple model that is easy to understand, then explanatory variables are added to improve the model’s explanatory power.
How to find the right model?
20
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Golden rules
Be logic
Have the best R-Squared
Not over complicate
Introduction to econometrics
21
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
3 steps
Identify
Fit
Forecast
𝑂𝑏𝑠 = 𝑀𝑜𝑑𝑒𝑙 + 𝜀 with 𝜀 being a white noise What is a model?
22
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
3 components
Trend
Seasonality
Residual
23
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Stationary series are easier to forecast… Transform it!
A series is stationary if the mean and the variance are stable
Which one is more likely to be stationary?
24
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Properties of stationary series
(𝑌1, 𝑌2, 𝑌3, … , 𝑌𝑛)
(𝑌2, 𝑌3, 𝑌4, … , 𝑌𝑛+1)
Same distribution of the following
Distribution not time dependent
Rare occurrence
Stationarity accepted if
𝐸(𝑌𝑡) = 𝜇 Constant in the time
𝐶𝑜𝑣(𝑌𝑡 , 𝑌𝑡−𝑛) Depends only on n
25
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
About the residuals…
White noise!
Normality test
Have an idea with
Skewness
Kurtosis
Proper tests: KS, Durbin Watson, Portmanteau,…
26
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
eps<-resid(TReg)
ks.test(eps, "pnorm")
layout(matrix(1:4,2,2))
plot(TReg)
27
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
lag.plot(DATA$Val, 9, do.lines=FALSE)
Differentiation seems to be interesting
28
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Check ACF/PACF for autocorrelation
29
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
𝑋𝑡 = 𝑐 + 𝜑1𝑋𝑡−1 + 𝜑2𝑋𝑡−2 + ⋯+ 𝜑𝑛𝑋𝑡−𝑛 + 𝜀𝑡
𝜑𝑛 Parameters of the model
𝜀𝑛 White noise
Auto Regressive model
AR(n)
Estimations
30
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Small sample: Binomial Distribution
Large sample: Normal Distribution
)()1()!(!
!)( xnx pp
xnx
nxf
)1(, pnpnpN
n is the size of the sample, x, the number individuals with the particular characteristic
𝐸 𝑋 = 𝑛𝑝
𝑉 𝑋 = 𝑛𝑝(1 − 𝑝)
31
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Binomial Distribution
𝐸 𝑌 = 𝑝 𝑉 𝑌 =𝑝(1 − 𝑝)
𝑛
Normal approximation
𝑌~𝑁 𝑝,𝑝(1 − 𝑝)
𝑛 Standardisation possible
𝑌∗~𝑁 0,1
𝑌∗ =𝑌 − 𝑝
𝑝(1 − 𝑝)𝑛
Normal approximation works only if
𝑛𝑝 ≥ 5 𝑛(1 − 𝑝) ≥ 5
Estimate a proportion 𝑌 =
𝑋
𝑛
32
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
𝑃 𝑝1 < 𝑝 < 𝑝2 = 0.95 Let’s look for p with a 95% confidence interval
Easy solve!
𝑃 𝜇 − 1.96 ∗ 𝜎 ≤ 𝑋 ≤ 𝜇 + 1.96 ∗ 𝜎 = 0.95
33
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
52 Heads out of 100 toss…
𝑌~𝑁 0.52,0.04996
95% confidence interval
𝑝1 = 0.62
𝑌~𝑁 ? , ?
𝑝2 = 0.42
34
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Mean estimation
Problem
The SD of the actual population is unknown
Mean has a Student’s distribution
Similarity with normal
35
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Student’s properties
• It is symmetric about its mean • It has a mean of zero • It has a standard deviation and variance greater than 1. • There are actually many t distributions, one for each degree of freedom • As the sample size increases, the t distribution approaches the normal distribution. • It is bell shaped. • The t-scores can be negative or positive, but the probabilities are always positive.
Normal-ish distribution in a discrete environment with a confidence interval
36
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Student’s Statistic
S=𝑛
𝑛−1𝜎
𝑃 𝑥 −𝑆
𝑛∗ 𝑡𝛼/2 < 𝜇 < 𝑥 +
𝑆
𝑛∗ 𝑡𝛼/2 = 0.95
Degree of freedom
n-1
37
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
IPO Premiums IPO1 / 12% IPO2 / 15% IPO3 / 13% IPO4 / 18% IPO5 / 20% IPO6 / 5%
SD: 𝜎=4.81%
DF: 𝐷𝐹=5
S: 𝑆=5.27%
t: 𝑡=2.571
𝜇1: 𝜇1=19.36%
𝑥 : 𝑥 =13.83%
𝜇2: 𝜇2=8.30%
38
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Is a frequency difference significant?
𝑌1~𝑁 𝑝1,𝑝1(1 − 𝑝1)
𝑛1 𝑌2~𝑁 𝑝2,
𝑝2(1 − 𝑝2)
𝑛2
𝑍 = 𝑌1 − 𝑌2
𝐸(𝑍) = 𝐸(𝑌1) − E(𝑌2)
𝑉(𝑍) = 𝑉(𝑌1) + V(𝑌2) Assumption of independence
𝑍~𝑁 𝑝1 − 𝑝2,𝑝1(1 − 𝑝1)
𝑛1+
𝑝2(1 − 𝑝2)
𝑛2
39
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Observations 100 Friendly Takeover, 80 success 60 Hostiles Takeover, 50 success
Is the difference significant? 95% confidence
Friendly 80%
Hostiles 83%
Global frequency
𝑝 =𝑛1𝐹1 + 𝑛2𝐹2
𝑛1 +𝑛2 𝑝 =
80 + 50
100 + 60= 81.25%
40
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
𝑡∗ =𝐹1 − 𝐹2
𝑝 (1 − 𝑝 )1𝑛1
+1𝑛2
𝑡∗ = −0.52298
If 𝑃(−1.96 < 𝑡∗ < 1.96) = 0.95the frequencies are the same
with a 95% confidence interval
The frequencies are equal
Their difference is not significant
Actual difference due to fluctuation of samples
41
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Is a SD difference significant?
Fisher Snedecor distribution
𝑆𝑥 2
𝑆𝑦 2
𝜎𝑝 2
𝜎𝑞 2
Total variance
Total variance
Sample variance
Sample variance
𝑆𝑥 2
𝑆𝑦 2∗𝜎𝑝 2
𝜎𝑞 2~𝐹(𝑛𝑝 − 1, 𝑛𝑞 − 1)
42
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
𝜎𝑝 2 = 𝜎𝑞 2 You want to test
𝑆𝑥 2
𝑆𝑦 2~𝐹(𝑛𝑝 − 1, 𝑛𝑞 − 1)
43
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
𝑆𝑥 2
𝑆𝑦 2~𝐹(5,4)
44
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
95% confidence interval F-Table
𝑆𝑥 2
𝑆𝑦 2< 6.26 If SD are equals (at 95% CI)
Games: Beat the Statistics
45
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Is Martingale safe?
Bet on 2:1, double when you lose…
Risk of ruin?
46
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Bet on 2:1
Is this really 2:1? 18
37= 0.4865
Obvious how casino is making money!
The probability of the casino to win is always bigger than the probability of the player to win!
47
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
You’ll be right with a martingale… Eventually! But when?
The 2011 recorded record series is 26 reds in Las Vegas, Nevada
You were on the black and hoping the reversal, you begun with $2
At the 27 round you need
227 = $134,217,728
And don’t forget you lost already
21 + 22 + ⋯+ 226 = $134,217,726
Casino limit stakes
Your pocket may not be deep enough anyway!
And if you win at the 27th roll, you made…
$2 Quite risky…
48
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
“No one can possibly win at roulette unless he steals money from the table while the
croupier isn’t looking.” — Albert Einstein
49
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
Binomial approach
𝑃 𝑥 = 𝐶𝑥𝑛𝑝𝑥(1 − 𝑝)𝑛−𝑥
50
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 4
IFM
Q1
20
12
$255, $1 flat bet
$255, $1 start, martingale double when you lose
Ruin in 255 times for flat bet
Ruin in 8 times for martingale
1,000,000 times comparison, 100 rounds maximum
51
vin
zjea
nn
in@
ho
tmai
l.co
m
ESG
F 5
IFM
Q1
20
12
Conclusion
Multiple Regression
Econometrics
Estimations
Statistics & Games