JLarge Slides MT123 2011

110
Financial Econometrics Lecture Slides: MFE, Michaelmas Term 2011 Weeks 1-3 Random Variables, Estimators and Asymptotic Approximation Jeremy Large St Hugh’s College and Oxford-Man Institute of Quantitative Finance, University of Oxford [email protected] September 27, 2011 1

Transcript of JLarge Slides MT123 2011

Page 1: JLarge Slides MT123 2011

Financial Econometrics Lecture Slides:

MFE, Michaelmas Term 2011

Weeks 1-3

Random Variables, Estimators

and Asymptotic Approximation

Jeremy Large

St Hugh’s College and Oxford-Man Institute of Quantitative Finance,University of Oxford

[email protected]

September 27, 2011

1

Page 2: JLarge Slides MT123 2011

Contents

1 Basic probability 101.1 Reading: see lecture notes . . . . . . . . . . . . . . . . . . . . . . 10

1.2 Sample spaces, events and axioms . . . . . . . . . . . . . . . . . . 111.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Random variables 242.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2 Example random variables . . . . . . . . . . . . . . . . . . . . . . 272.3 Random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.4 Distribution functions . . . . . . . . . . . . . . . . . . . . . . . . 322.5 Quantile functions . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.6 Some common random variables . . . . . . . . . . . . . . . . . . . 392.7 Multivariate random variables . . . . . . . . . . . . . . . . . . . . 472.8 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

2.9 Covariance matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 632.10 Back to distributions . . . . . . . . . . . . . . . . . . . . . . . . . 71

2.11 Conditional distributions . . . . . . . . . . . . . . . . . . . . . . . 75

3 Estimators 85

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853.2 Bias and mean square error of estimators . . . . . . . . . . . . . . 87

4 Simulating random variables 894.1 Pseudo random numbers . . . . . . . . . . . . . . . . . . . . . . . 894.2 Inverting distribution functions . . . . . . . . . . . . . . . . . . . 90

5 Asymptotic approximation 925.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945.3 Some payback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.4 Some more theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

2

Page 3: JLarge Slides MT123 2011

Overview of the course

First of two examples

Lehman Brothers : share price (2001 - 2005)

-

20

40

60

80

100

120

140

2001 2002 2003 2004 2005

Figure 1: A time-series of Lehman Brothers end-of-day share prices (dollars).

3

Page 4: JLarge Slides MT123 2011

Second of two examples

500

550

600

650

700

750

4000 5000 6000 7000 8000 9000

Copper price

Gold

price

Figure 2: A cross-plot of gold prices against copper prices, Q1 2006 (with trend line).

4

Page 5: JLarge Slides MT123 2011

Time-series Regression(Lehman) (gold-copper)

One quantity changes over time Some quantities interact

Forecasting and explaining Explaining

Questions

• How can you make money from each?

• Is there randomness in the two examples? Or is everything ‘deterministic’?

5

Page 6: JLarge Slides MT123 2011

Time-series Regression(Lehman) (gold-copper)

One quantity changes over time Some quantities interact

Forecasting and explaining Explaining

Comments

• In the second part of this term Prof Neil Shephard will talk about time-series, regression relationships, and mixtures of the two

• In Hilary Term, Prof Anders Rahbek will go more deeply into time-series:forecasting, volatility

• I provide theoretical underpinnings for both:

– notation

– framework

– proofs

– → a fairly abstract and theoretical start

6

Page 7: JLarge Slides MT123 2011

Lecture plan, Thursdays, Weeks 1-4:

• 1pm: Lecture starts

• 1:40pm: 5-minute break, stretch legs

• 2:25pm: 20-minute break for coffee

• 3:25pm: 5-minute break, stretch legs

• 4:15pm: end

• 4:30pm to 5pm: Office hours in this lecture room

Classes take place in Weeks 3-9 this term. Thursday morning.

Kasper Lund-Jensen is the class teacher.

7

Page 8: JLarge Slides MT123 2011

Weekly assignments:

Weekly assignments are distributed at each Thursday lecture:

→ Intended to take about three hours (I would recommend you never spendlonger than four hours on them)

→ Hand them in at SBS reception by 4pm the Monday 11 days later.

→ Kasper returns your answers, and provides solutions in the classes the follow-ing Thursday.

→ grade of either 1 or 0.

→ 1 point will be awarded if the assignment is mostly complete and correct. No

points will be awarded if the assignment is substantially incomplete.

→ Over this term and next, the best 10 out of 16 assignments will count towardsthe final grade.

8

Page 9: JLarge Slides MT123 2011

What will be in the exams and quizzes next term?

All course contents are examinable, unless they have been flagged otherwise (notethe starring system in the lecture notes for this part of the course)

Best guide to exam question style : weekly assignments

Best guide to content : highly unlikely to stray beyond material appearing

in the slides covered in lectures, or the assignments.

9

Page 10: JLarge Slides MT123 2011

1 Basic probability

Financial econometrics, and much of finance theory, takes the view thatasset prices are random.

So, probability theory is the basis of all modern econometrics and much of

economics and finance.

We will also need some linear algebra.

1.1 Reading: see lecture notes

10

Page 11: JLarge Slides MT123 2011

1.2 Sample spaces, events and axioms

Example: Vodafone trades to the nearest 0.25p, so 0.25p is the price tick size.

Vodafone prices over one day:

8 9 10 11 12 13 14 15 16

138.75

139.00

139.25

139.50

139.75

140.00

140.25

Figure 3: Sample path of the best bid (best available marginal price to a seller) for Vodafone onthe LSE’s electronic limit order book SETS for the first working day in 2004.

Write Yi as the price of a very simple asset at time i (after i changes, say).

11

Page 12: JLarge Slides MT123 2011

A very simple model: price starts at zero and it can move 1 “tick” up or downeach time period, or stay the same!

Possible pricestime Yi

i = 0 0i = 1 −1, 0, 1

i = 2 −2,−1, 0, 1, 2i = 3 −3,−2,−1, 0, 1, 2, 3

i = 4 −4,−3,−2,−1, 0, 1, 2, 3, 4

Thus, for example, Y4 can take on 9 different values.

This ‘toy model’ allows us to try out most deep ideas in probability theory

12

Page 13: JLarge Slides MT123 2011

Sample space. The set Ω, is called the sample space, if it contains all possible(primitive) outcomes that we are considering, e.g. if we think about Y4 then its

sample space isΩ = −4,−3,−2, 1, 0, 1, 2, 3, 4 .

Event. An event is a subset of Ω (which could be Ω itself), e.g. Let

A = 1

i.e. Y4 = 1. Further let B be the event that Y4 is strictly positive, so

B = 1, 2, 3, 4.

Example 1.1 (value at risk) An important concept is downside risk — how

much you can lose, how quickly and how often. In this case the event of a largeloss might be defined as

−4,−3 .

A rapid fall of 3 ticks or more. In practice value at risk tends to be computedover a day or more, rather than over tiny time periods.

13

Page 14: JLarge Slides MT123 2011

Probability axioms based on the triple (Ω,F , Pr)

F is the ‘power set’ of Ω, which just means it contains all the subsets of Ω:

A ∈ F ↔ A is a subset of Ω.

(technical note: F sometimes contains many – but not all – subsets of Ω)

And Pr is a real-valued function on F (not on Ω) that satisfies

1. Pr(A) ≥ 0, for all A ∈ F (for all A in the set F)

2. Pr(Ω) = 1.

3. If Ai ∈ F : i = 1, 2, ...,∞ (which is an infinitely large set of elements of

F) are disjoint then

Pr

( ∞⋃

j=1

Aj

)

=

∞∑

k=1

Pr(Ak).

In the Vodafone example:

Pr(Y4 > 0) =

4∑

i=1

Pr(Y4 = i).

14

Page 15: JLarge Slides MT123 2011

Comments:

• Only events have probabilities.

• Events, E, are subsets of Ω, not elements. So E ⊆ Ω or, equivalently,E ∈ F .

• Probabilities are always ≥ zero.

• A realization is when a single ω ∈ Ω is picked (‘happens’).

• However, strictly speaking this realization has no probability (giving it a

probability makes no sense).

• ⋃ signifies ‘or’ ;⋂

signifies ‘and’

15

Page 16: JLarge Slides MT123 2011

1.3 Independence

Consider two events A, B which are in F .

When does occurrence of one event not affect the probability of another event

also happening?

When the two events are independent.

Write that the events A, B are independent (in F) iff

Pr A ∩ B = Pr(A) × Pr(B).

Write

A ⊥⊥ B.

16

Page 17: JLarge Slides MT123 2011

Example 1.2 Let S and T be any subsets of −1, 0, 1.

(e.g. suppose that S = −1, 1 and T = 1 )

Define A and B by:A is [ (Y4 − Y3) ∈ S]

andB is [ (Y3 − Y2) ∈ T ].

Many models assume that for any S and T ,

A ⊥⊥ B.

Informally, we mean that (Y4 − Y3) is independent of (Y3 − Y2), so we write thisquickly as:

(Y4 − Y3) ⊥⊥ (Y3 − Y2).

We will formalize this later, in terms of ‘random variables’.

17

Page 18: JLarge Slides MT123 2011

1.4 Conditional Probability

Definition

Two events, A and B. We might be interested in Pr(A) or Pr(B) or Pr(A ∩B).

Want to know Pr(A|B), assuming Pr(B) > 0.

I constrain my world so that B happens and I ask if A then happens.

This can only be if both A and B happen, so we define

Pr(A|B) =Pr(A ∩ B)

Pr(B).

Think of this as a function of A, with B fixed in the background.

→ that way, it obeys the three standard probability axioms.

This is a vital concept in econometrics.

18

Page 19: JLarge Slides MT123 2011

Joint conditional probabilities

Pr(A ∩ B|C).

If

Pr(A ∩ B|C) = Pr(A|C) × Pr(B|C),

we say that conditionally on C, A and B are independent. This is often written

as(A ⊥⊥ B) |C.

19

Page 20: JLarge Slides MT123 2011

Conditional probabilities, and time

Suppose we are at time 3, then we know the value of Y3 = 2, say. Then Y4 mustbe in

1, 2, 3 ,

so lets think of 1, 2, 3 as a new sample space. It is not too hard to define new

events and new probablities

Pr(Y4 > 1|Y3 = 2) = 1 − Pr(Y4 = 1|Y3 = 2).

Here, as ever, conditional probabilities are simply standard probabilities,• but on another sample space.

Lets never forget the stuff to the right of |.

20

Page 21: JLarge Slides MT123 2011

Example 1.3 May be interested in interested in the forecast distributions, acrossall x:

Pr(Y4 = x|Y3 = y),

Pr(Y4 = x|Y2 = y)Pr(Y4 = x|Y1 = y)Pr(Y4 = x|Y1 = a, Y2 = b, Y3 = c),

the last of which is the distribution of Y4 given we know that the price at time 1,2, 3 were a, b and c.

The last conditional probability is a one-step ahead forecast distribution given the

path of the process.

21

Page 22: JLarge Slides MT123 2011

A flexible notation for the example on the page before

May be interested in interested in the forecast distributions:

Pr(Y4|Y3)Pr(Y4|Y2)

Pr(Y4|Y1)Pr(Y4|Y1, Y2, Y3),

the distribution of Y4 given we know the price at time 3, 2 or 1.

The last conditional probability is a one-step ahead forecast distribution giventhe path of the process.

22

Page 23: JLarge Slides MT123 2011

Example 1.4 In many models in financial econometrics:

Pr(Yi|Yi−1, Yi−2, Yi−3, . . . ) = Pr(Yi| Yi−1 ).

That is, given the value of Yi−1, it is irrelevant to the value of Yi, what the valueof Y was two or more periods before.

This is the Markov Assumption.

A consequence of the Markov Assumption:

(Yi ⊥⊥ Yi−2) |Yi−1.

23

Page 24: JLarge Slides MT123 2011

2 Random variables

2.1 Basics

A random variable is a function from Ω to R.

Typically, it is called X(ω).

Most of econometrics is about random variables.

We drop reference to ω, so we will write X as the random variable.

Properties of X are events, for example: ‘X > 0’ is the event

ω : X(ω) > 0, (1)

which is a subset of Ω, like every other event, and can have a probability.

24

Page 25: JLarge Slides MT123 2011

Independence Two random variables, Y1 and Y2 are independent if

for any events A1 about Y1, and A2 about Y2:

A1 ⊥⊥ A2.

If they are independent, then we write

Y1 ⊥⊥ Y2.

Exercise: prove that if Y1 and Y2 are independent, then for any y1 and y2:

Pr[Y1 ≤ y1 and Y2 ≤ y2] = Pr[Y1 ≤ y1] Pr[Y2 ≤ y2].

25

Page 26: JLarge Slides MT123 2011

i.i.d. A sequence of random variables Y1, Y2, ..., YN , ... is said to be i.i.d. (in-dependently and identically distributed) if

• any pair Yi and Yj are independent, and have the same distribution.

26

Page 27: JLarge Slides MT123 2011

2.2 Example random variables

Bernoulli random variable:

Could be heads (ω = H) or tails (ω = T ).

Let X(H) = 1 and X(T ) = 0.

We say X is a Bernoulli random variable with two ‘points of support’, 0, 1.

Write Pr(X = 1) = p and Pr(X = 0) = 1 − p.

Now lets make a new random variable:

U1 = 2X − 1 ∈ −1, 1 .

27

Page 28: JLarge Slides MT123 2011

A sequence of Bernoulli random variables:

Write Xi as above but for time i, where i can be 1,2,3,...

Assume that Xi are independent and identically distributed (i.i.d.).

Binomial tree process

Yi = Yi−1 + Ui, i = 1, 2, 3, ..., Y0 = 0, (2)

Ui = 2Xi − 1. (3)

What is a Random Process? Nothing other than a sequence of randomvariables, e.g. Y0, Y1, Y2, ...

→ for example, we record a price at a sequence of times of our choosing.

28

Page 29: JLarge Slides MT123 2011

0 50 100

0

5

10 1st sample path of Y i

0 50 100

−10

0

0 50 100

0

5

10

0 50 100

0

10

20 4th sample path of Yi

0 50 100

−20

−10

0

0 50 100

0

5

10

0 50 100

0

10

7th sample path of Yi

0 50 100

0

5

−25 0 25

0.02

0.04

Histogram of Y100. Binomial density

29

Page 30: JLarge Slides MT123 2011

Definition of a Binomial random variable:

Suppose we carry out n independent Bernoulli trials with Pr (Xi = 1) = p

→ then this is a Binomial RV, called Zn

Zn =n∑

i=1

Xi.

And we might want to define the random process, Z:

Z = Zn : n = 1, 2, 3, ... (4)

30

Page 31: JLarge Slides MT123 2011

2.3 Random walk

The binomial tree (2) can be written as

Yi = 2i∑

j=1

Xj − i, i = 0, 1, 2, ..., Y0 = 0.

Special case of the random walk process

Yi = Yi−1 + ǫi,

where ǫi are i.i.d.

ǫi are called the ‘shocks’, or ‘residuals’, or ‘innovations’. Note that if we think of

Yi as log-prices thenǫi = Yi − Yi−1,

are returns.

Hence the log-price process can be transformed into an i.i.d. sample by ‘taking

first differences’.

31

Page 32: JLarge Slides MT123 2011

2.4 Distribution functions

Distribution function of a random variable X is

FX(x) = Pr(X ≤ x).

Density function for continuous X,

fX(x) =∂FX(x)

∂x.

Clearly

FX(x) =

∫ x

−∞fX(y)dy.

Note that for continuous variables (in inverted commas):

Pr(X = x) = 0,

for every x.

For X with countable support we often write fX(x) for Pr(X = x).

32

Page 33: JLarge Slides MT123 2011

Conditional distribution functions

Distribution function of a random variable X conditional on some positive-

probability event A isFX |A(x) = Pr(X ≤ x|A).

Conditional density function for continuous X,

fX |A(x) =∂FX |A(x)

∂x.

Clearly

FX |A(x) =

∫ x

−∞fX |A(y)dy.

33

Page 34: JLarge Slides MT123 2011

MeanThis is defined (when it exists) as

E(X) =

∫ ∞

−∞xfX(x)dx.

It is often used as a measure of the average value of a random variable (alterna-

tives include mode and median).

Discrete r.v. : replace integration with summation.

Example 2.1 Suppose X is a Bernoulli trial with Pr(X = 1) = p and Pr(X =0) = 1 − p. Then

E(X) = 1 × Pr(X = 1) + 0 × Pr(X = 0)

= p. (5)

34

Page 35: JLarge Slides MT123 2011

VarianceVariance is defined as

Var(X) = E X − E(X)2

=

∫x − E(X)2 fX(x)dx

= E(X2) − E(X)2 .

The standard deviation is defined as√

Var(X).

A further very important formula:

Var(a + bX) = b2Var(X).

(exercise: prove this)

35

Page 36: JLarge Slides MT123 2011

Conditional MeanThe conditional expectation of a random variable X given a +ve probability

event A is

E(X|A) =

∫xfX |A(x)dx.

Conditional VarianceBy analogy:

E(X2|A) − E(X|A)2.

36

Page 37: JLarge Slides MT123 2011

2.5 Quantile functions

Inverting the distribution function. i.e. we ask: for a given u ∈ [0, 1], find x such

thatu = FX(x).

We callx = F−1

X (u),

the quantile function of X.

The 0.1 quantile tells us the value of X such that only 10% of the populationfall below that value. The most well known quantile is

x = F−1

X (0.5),

which is called the median.

37

Page 38: JLarge Slides MT123 2011

Example 2.2 Quantiles are central in simple value at risk (VaR) calculations,which measure the degree of risk taken by banks. In simple VaR calculations one

looks at the marginal distribution of the returns over a day, written Yi − Yi−1,and calculates

F−1

Yi−Yi−1(0.05),

the 5% percentile of the return distribution.

38

Page 39: JLarge Slides MT123 2011

2.6 Some common random variables

Normal

The normal distribution is important. Does not look immediately attractive

fX (x) =1√

2πσ2exp

−(x − µ)2

2σ2

, x, µ ∈ R, σ2 ∈ R+.

Density peaks at µ and is symmetric around µ.

39

Page 40: JLarge Slides MT123 2011

Model for returns on daily Sterling/$ 1985 to 2000.

−2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Flexible estimator Fitted normal

40

Page 41: JLarge Slides MT123 2011

fX (x) =1√

2πσ2exp

−(x − µ)2

2σ2

, x, µ ∈ R, σ2 ∈ R+.

Centred at µ, σ2 determines its scale (spread).

The notation for a normal r.v. is X ∼ N(µ, σ2

).

µ is the mean; σ2 is the variance• We will prove this later

• Notice that together, the mean and variance define the normal distribution

If an i.i.d. sequence has normal random variables, we write it is N.I.D.• And we will also see NID(µ, σ2).

Another word for normal is ‘Gaussian’.

41

Page 42: JLarge Slides MT123 2011

If X ∼ N(µ, σ2

)and γ and λ are non-random then

γ + λX ∼ N(γ + λµ, λ2σ2).

One can writeX

law= µ + σu,

where u ∼ N(0, 1). Equality in law, means the left and right hand side quantitieshave the same law or distribution. Finally, if X and Y are independent normal

variables with means µx and µy and variances σ2x and σ2

y, then

X + Y ∼ N(µx + µy, σ2x + σ2

y).

That is: the means and variances add up, and normality is maintained.

This is a very convenient result for asset pricing, as we will see later.

42

Page 43: JLarge Slides MT123 2011

Example: Suppose that Ui are i.i.d. N(µ, σ2) then the ‘drifting’ random walk

Yi = Yi−1 + Ui, Y0 = 0,

has the feature thatYi ∼ N(iµ, iσ2),

orYi+s|Yi ∼ N(Yi + sµ, sσ2).

43

Page 44: JLarge Slides MT123 2011

Consider a change to the Binomial tree that we saw earlier:

Replace the scaled and recentred Bernoulli variable with a normal random vari-able, Ui N(µ, σ2).

Select µ = 0 and σ2 = 4× 0.5× 0.5 so that it matches the mean and variance of

the previous Binomial tree.

0 50 100

−2.5

0.0

2.5

5.0

7.5 1st sample path of Y i

0 50 100

−5

0

5

0 50 100

−10

−5

0

0 50 100

0

5

10

4th sample path of Yi

0 50 100

0

5

10

15

0 50 100

−20

−10

0

0 50 100

−10

−5

0

5 7th sample path of Yi

0 50 100

−5

0

5

10

−25 0 25

0.02

0.04Histogram of Y100. Gaussian density

44

Page 45: JLarge Slides MT123 2011

Uniform

Sometimes variables are constrained to live on small intervals. The leading ex-ample of this is the standard uniform

fX(x) = 1, x ∈ [0, 1].

Used in economic theory as a stylised way of introducing uncertainty into a model

and in simulation.

Chi-squared

Suppose Xii.i.d.∼ N(0, 1) (often written NID(0, 1)) then

Y =

ν∑

i=1

X2i ∼ χ2

ν

is a Chi-Squared random variable with “degrees of freedom” ν.

45

Page 46: JLarge Slides MT123 2011

Student t

Student t random variable is generated by a ratio of random variables.

Crude notation for it is:

tν =N(0, 1)√

χ2ν/ν

,

where N(0, 1) ⊥⊥ χ2ν. This is symmetrically distributed about 0.

46

Page 47: JLarge Slides MT123 2011

2.7 Multivariate random variables

Consider a multivariate q × 1 vector where each element is a random variable

X = (X1, ..., Xq)′ .

This vector is itself a random variable (a ‘q-dimensional multivariate randomvariable’).

Note that we didn’t say that the elements of X had to be independent random

variables.

Important example: the elements of this vector could represent the returns on acollection of q assets, such as the FTSE100 equities, daily.

→ Because of this example, multivariate random variables play a central rolein portfolio allocation and risk assessments, as well as all aspects of econometrics.

47

Page 48: JLarge Slides MT123 2011

Sustained example: returns from a portfolioConside the bivariate case where q = 2. We might think of

X =

(X1

X2

)=

(YZ

),

where X1 = Y is the return over the next day on IBM and X2 = Z is the return

over the next day on the S&P composite index.Consider the case of measuring the outperformance of the index by IBM. This

isY − Z.

We can write this as

(1,−1)

(Y

Z

)= b′X

where

b =

(1−1

), so b′ = (1,−1) .

Thus the outperformance can be measured using linear algebra.

This outperformance can be thought of as a simple portfolio, buying IBM and

selling the index.

48

Page 49: JLarge Slides MT123 2011

Consider, slightly more abstractly, a portfolio made up of c shares in Y andd in Z. Then the portfolio returns

cY + dZ.

This can be written in terms of vectors as

(c, d)

(YZ

)= f ′X, f =

(cd

).

More generally, we might write p portfolios, each with different portfolio weights

as

B11Y + B12ZB21Y + B22Z

B31Y + B32Z...

Bp1Y + Bp2Z

= BX,

where

B =

B11 B12

B21 B22

B31 B32

......

Bp1 Bp2

.

This is a very powerful way of writing out portfolios compactly.

49

Page 50: JLarge Slides MT123 2011

So far, the p portfolios each contained only two assets.

But you can extend this easily from 2 to q underlying assets

X =

X1

X2

X3

...Xq

, B =

B11 B12 B13 · · · B1q

B21 B22 B23 · · · B2q

B31 B32 B33 · · · B3q...

...... . . . ...

Bp1 Bp2 Bp3 Bpq

Now the p portfolios, depending upon q assets, have returns

BX =

∑qj=1

B1jXj∑qj=1

B2jXj∑qj=1

B3jXj...∑q

j=1BpjXj

.

Again this is quite a simple representation of quite a complicated situation.

50

Page 51: JLarge Slides MT123 2011

Back on trackIn particular if X is a 2 × 1 vector

X =

(X1

X2

)and x =

(x1

x2

),

then

FX(x) = Pr(X1 ≤ x1, X2 ≤ x2),

which in the continuous case becomes

FX(x) =

∫ x2

−∞

∫ x1

−∞fX(y1, y2)dy1dy2. (6)

51

Page 52: JLarge Slides MT123 2011

Likewise

fX(x1, x2) =∂2FX(x1, x2)

∂x1∂x2

.

When X1 ⊥⊥ X2 then this simplifies to

fX(x1, x2) =∂2FX1

(x1)FX2(x2)

∂x1∂x2

=∂FX1

(x1)

∂x1

∂FX2(x2)

∂x2

= fX1(x1)fX2

(x2).

52

Page 53: JLarge Slides MT123 2011

(a) Standard normal density

−2.50.0

2.5−2.5

0.02.5

0.05

0.10

0.15

(b) NIG(1,0,0,1) density

−2.50.0

2.5−2.5

0.02.5

0.1

0.2

0.3

(c) Standard log−density

−2.50.0

2.5−2.5

0.02.5

−10

.0−

7.5

−5.

0−

2.5

(d) NIG(1,0,0,1) log−density

−2.50.0

2.5−2.5

0.02.5

−7.

5−

5.0

−2.

50.

0

53

Page 54: JLarge Slides MT123 2011

An important point is that from Eq. (6),∫ ∞

−∞fX(y, x2)dy =

∂FX(∞, x2)

∂x2

=∂ Pr(X1 ≤ ∞, X2 ≤ x2)

∂x2

=∂FX2

(x2)

∂x2

= fX2(x2).

Hence if we integrate out a variable from a density function we produce the

‘marginal density’ of the other random variable.

54

Page 55: JLarge Slides MT123 2011

Lets suppose that X2 is a discrete r.v.

Then the conditional distribution function of X1 takes on the form

FX1|X2=x2(x1) = Pr(X1 ≤ x1|X2 = x2),

while, if X1 is continuous we define

fX1|X2=x2(x1) =

∂ Pr(X1 ≤ x1|X2 = x2)

∂x1

,

which has the properties of a density.

Now, if both X2 and X1 are continuous r.v.s, we define

fX1|X2=x2(x1) =

fX(x1, x2)

fX2(x2)

.

Intuitive, but the theory behind this is beyond the scope of this course.

55

Page 56: JLarge Slides MT123 2011

2.8 Moments

General case

An expectation of a function of a random variable.

Define, for a continuous X, if it exists

E g(X) =

∫g(x)fX(x)dx. (7)

The expectation obeys some important rules. For example if a, b are constantsthen

E a + bg(X) = a + bE g(X) .

This follows from the definition of expectations as solutions to integrals (7).

56

Page 57: JLarge Slides MT123 2011

Special ’base-’ cases of moments

The most basic moment is known as the first moment:

E X =

∫xfX(x)dx. (8)

We’ve also seen the second moment:

EX2

=

∫x2fX(x)dx. (9)

Even though you’ll see these much more than others, try to see them as special

cases

57

Page 58: JLarge Slides MT123 2011

Example 2.3 If X ∼ N(µ, σ2), then

E(X) =

∫ ∞

−∞x

1√2πσ2

exp

−(x − µ)2

2σ2

dx

= µ +

∫ ∞

−∞(x − µ)

1√2πσ2

exp

−(x − µ)2

2σ2

dx

= µ,

using the fact that a density integrates to one.

Exercise: fill in the working here (use properties of antisymmetric functions)

58

Page 59: JLarge Slides MT123 2011

Multivariate meanRecall we write

X =

X1

X2

X3

...Xq

.

Now each Xj has a mean, E(Xj), so it would be nice to collect these together.The following notation does this. We define

E(X) =

E(X1)E(X2)

E(X3)...

E(Xq)

.

This is the mean of the vector.

59

Page 60: JLarge Slides MT123 2011

We wrote the return on p portfolios as

BX,

where B is a p × q weight matrix.Then

E(BX) = BE(X).

60

Page 61: JLarge Slides MT123 2011

Why? Recall a mean of a vector is the mean of all the elements of the vector

E(BX) =

E(∑q

j=1B1jXj

)

E(∑q

j=1B2jXj

)

E(∑q

j=1B3jXj

)

...

E(∑q

j=1BpjXj

)

.

But, for i = 1, 2, ..., p,

E

(q∑

j=1

BijXj

)

=

q∑

j=1

E(BijXj) =

q∑

j=1

BijE(Xj).

61

Page 62: JLarge Slides MT123 2011

Hence

E(BX) =

∑qj=1

B1jE(Xj)∑qj=1

B2jE(Xj)∑qj=1

B3jE(Xj)...∑q

j=1BpjE(Xj)

= BE(X),

as stated. This is an important result for econometrics.

62

Page 63: JLarge Slides MT123 2011

2.9 Covariance matrices

Univariate covariance

The covariance of X and Y is defined (when it exists) as

Cov(X, Y ) = E(X − E(X) Y − E(Y ))

=

∫(x − E(X) y − E(Y ))fX,Y (x, y)dxdy

= E(XY ) − E(X)E(Y ).

63

Page 64: JLarge Slides MT123 2011

Cov(a + bX, c + dY ) = bdCov(X, Y ).

Hence covariances are location invariant.

Var(aX + bY ) = a2Var(X) + b2Var(Y ) + 2abCov(X, Y ).

64

Page 65: JLarge Slides MT123 2011

Independence implies uncorrelatednessRecall, if moment exist

Cov(X, Y ) = E(XY ) − E(X)E(Y ).

So if X ⊥⊥ Y then

Cov(X, Y ) = E(X)E(Y ) − E(X)E(Y ) = 0.

If the covariance between X and Y is zero we say they are uncorrelated

X ⊥ Y.

So(X ⊥⊥ Y ) =⇒ (X ⊥Y ) .

The reverse is not true (in Gaussian case it is!).

65

Page 66: JLarge Slides MT123 2011

Example 2.4 SupposeX ∼ N(0, 1), Y = X2.

ThenCov(X, Y ) = E(XY ) − E(X)E(Y ) = E(X3) = 0.

66

Page 67: JLarge Slides MT123 2011

CorrelationThe correlation of X and Y is defined (when it exists) as

Cor(X, Y ) =Cov(X, Y )√

Var(X)Var(Y ).

Now

Cor(X, Y ) ∈ [−1, 1],

which follows from the Cauchy-Schwarz inequality.

67

Page 68: JLarge Slides MT123 2011

Think of

X =

X1

X2

X3

...Xp

.

Then we define the covariance matrix of X as

Cov(X) =

Var(X1) Cov(X1, X2) · · · Cov(X1, Xp)

Cov(X2, X1) Var(X2) · · · Cov(X2, Xp)...

... . . . ...

Cov(Xp, X1) Cov(Xp, X2) · · · Var(Xp)

.

This is a symmetric p × p matrix.

* Covariance matrices are always ‘positive semi-definite’ (which means that the

e-values are all ≥ 0 [and real]).

68

Page 69: JLarge Slides MT123 2011

The covariance matrix can be calculated as

Cov(X) = E(X − E(X) X − E(X)′

).

Example 2.5 In the IBM and S&P example then we have approximately that

E(X) =

(0.0206

−0.00721

)

Cov(X) =

(5.07 1.791.79 1.62

).

A very important result is that if B is a q × p matrix of constants, then

• E (a + BX) = a + BE(X)

• Cov(a + BX) = BCov(X)B′.

69

Page 70: JLarge Slides MT123 2011

Correlation matricesCorresponding to the covariance matrix is the correlation matrix, which is

(when it exists)

Cor(X) =

1 Cor(X1, X2) · · · Cor(X1, Xp)

Cor(X2, X1) 1 · · · Cor(X2, Xp)...

... . . . ...

Cor(Xp, X1) Cor(Xp, X2) · · · 1

= Cor(X)′.

This matrix is invariant to location and scale changes, but obviously not general

linear transformations.

70

Page 71: JLarge Slides MT123 2011

2.10 Back to distributions

Multivariate normal

The p-dimensional X ∼ N(µ, Σ). E(X) = µ, Cov(X) = Σ. Assume |Σ| > 0.Σ is always symmetric of course. Then

fX(x) = |2πΣ|−1/2 exp

−1

2(x − µ)′ Σ−1 (x − µ)

, x ∈ Rp.

Here Σ−1 is a matrix inverse, which exists due to the |Σ| > 0 assumption. Further

|2πΣ|−1/2 = (2π)−p/2 |Σ|−1/2 .

Exercises:

1. explain why the density has a single peak at µ.

2. how does this simplify if Σ = σ2I?

3. if I tell you Σ and µ, do you know everything about the normal distribution?

71

Page 72: JLarge Slides MT123 2011

Example 2.6 SupposeΣ = σ2Ip,

which means that the elements of X are independent and homoskedastic.Then

fX(x) =(2πσ2

)−p/2exp

− 1

2σ2(x − µ)′ (x − µ)

=(2πσ2

)−p/2exp

− 1

2σ2

p∑

i=1

(xi − µi)2

72

Page 73: JLarge Slides MT123 2011

Let X be a p-dimensional multivariate normal.

Then, if a is q × 1 and B is q × p and both are constants, then

Y = (a + BX) ∼ N(a + Bµ, BΣB′), (10)

a q-dimensional normal.

That is: all linear transformations of normals are normal.

73

Page 74: JLarge Slides MT123 2011

In particular if p = 2, then for X = (X1, X2)′

Σ = Cov(X) =

Var (X1) Cov (X1, X2)

Cov (X1, X2) Var (X2)

=

(σ2

1 ρσ1σ2

ρσ1σ2 σ22

),

where ρ = Cor(X1, X2).

This is an important model.

In an ‘abuse of notation’, we can write (!) X1 as X and X2 as Y

In which case we get the formulation(

XY

)∼ N

(µx

µy

),

(σ2

x ρσxσy

ρσxσy σ2y

).

Keep this model in mind for the next few slides ...

74

Page 75: JLarge Slides MT123 2011

2.11 Conditional distributions

Basic recap: Consider two (possibly multivariate) discrete random variables

X, Y , then

FX |Y =y(x) = Pr(X ≤ x|Y = y) =Pr(X ≤ x, Y = y)

Pr (Y = y).

Likewise in the continuous case, the conditional density is defined by :

fX |Y =y(x) =∂FX |Y =y(x)

∂x=

fX,Y (x, y)

fY (y).

So,fX,Y (x, y) = fX |Y =y(x)fY (y).

(known as the marginal-conditional decomposition)

Useful to consider this in the context of Normals...

75

Page 76: JLarge Slides MT123 2011

Example of two standard normals

X and Y are Standard Normals. So, both have mean of 0, Variance of 1. Thecorrelation between them is ρ. In this case

Y |(X = x) ∼ Nρx,(1 − ρ2

).

Put another way,

fY |X=x(y) =1√

2π (1 − ρ2)e− 1

2(y−ρx)2

1−ρ2 . (11)

It is natural to write:

E(Y |X = x) = ρx;

V ar(Y |X = x) = 1 − ρ2,

and we often will. Called ‘Conditional Moments’ ...

76

Page 77: JLarge Slides MT123 2011

Conditional first moment

Recall

fX |Y =y(x) =fX,Y (x, y)

fY (y).

The definition of the first conditional moment is

EX |Y =y(X) =

∫xfX |Y =y(x)dx.

We also write this (as previous slide) by

E(X|Y = y) =

∫xfX |Y =y(x)dx.

77

Page 78: JLarge Slides MT123 2011

A concise notation of great use

Recall X and Y as Standard Normals, correlation between them is ρ.

We saw that

E(Y |X = x) = ρx.

It will be really helpful to condense this further:

E(Y |X) = ρX.

Write the random variable itself in the place of the particular value that we knowit takes under the conditioning, i.e. capitalize X.

Likewise, we can write “Y ’s variance conditional on X” as:

V ar(Y |X) = 1 − ρ2. (12)

78

Page 79: JLarge Slides MT123 2011

General conditional moments

More generally, the definition of a conditional moment is

EX |Y =y(g(X)) =

∫g(x)fX |Y =y(x)dx,

which is a function of y, say h(y).

This gives the random variable h(Y ).

... and we could consider its expectation: EY (h(Y )), or:

EY ( EX |Y (g(X)) ), (13)

i.e. (more concisely using the notation of the last slide):

E( E(g(X) | Y ) ). (14)

79

Page 80: JLarge Slides MT123 2011

Law of Iterated Expectations

Now recallfX,Y (x, y) = fX |Y =y(x)fY (y).

Doing some algebra, we have that

EX(g(X)) = EY

EX |Y (g(X))

.

This is the Law of Iterated Expectations, and is very important.

• It allows you to break complex expectations up into manageable chunks.

You can also write the law as:

E( E(g(X) | Y ) ) = E(g(X)).

A related result:

VarX(X) = EY

(VarX |Y X|Y

)+ VarY (EX |Y X|Y ).

80

Page 81: JLarge Slides MT123 2011

General bivariate normal. In this case(

X

Y

)∼ N

(µX

µY

),

(σ2

X ρσXσY

ρσXσY σ2Y

),

then

Y |(X = x) ∼ N

µY +

ρσY

σX(x − µX), σ2

Y

(1 − ρ2

).

81

Page 82: JLarge Slides MT123 2011

Again,

Y |(X = x) ∼ N

µY +

ρσY

σX(x − µX), σ2

Y

(1 − ρ2

).

To be brief we often write:

Y |X ∼ N

µY +

ρσY

σX(X − µX), σ2

Y

(1 − ρ2

).

• Conditional variance does not depend upon x or X.

• Change in the conditional mean is

ρσY

σX(x − µX),

so is linear in x. Effect is compared to mean, i.e. x − µX . Dividing by σX

removes the scale of x, times by σY puts the variables onto the y scale.

82

Page 83: JLarge Slides MT123 2011

Example 2.7 Y is the return on an asset, X is the return on market portfolio.Then

βY |X =ρσY

σX

is often called the beta of Y and is a measure of how Y moves with the market.

Notice that we can also write:

βY |X =cov(X, Y )

varX

83

Page 84: JLarge Slides MT123 2011

MartingaleIn modelling dynamics martingales play a large role.

Consider a sequence asset prices recorded through time

Y1, Y2, Y3, ...

where the subscript reflects time. A natural object to study is

E(Yi|Y1, ..., Yi−1),

the conditional expectation (which we assume exists) of “the future given thepast”. Then if

E(Yi|Y1, ..., Yi−1) = Yi−1,

then the sequence is said to be a martingale with respect to its own past history.

Exercise: Use the Law of Iterated Expectations to prove that if Yi is any

Martingale, with fixed Y1, then

E(Y3) = Y1.

84

Page 85: JLarge Slides MT123 2011

3 Estimators

3.1 Introduction

A statistic S(X) is a function of a (vector) random variable X.

When we learn about a feature of the probability model we say we are estimating

the model.

If S(X) is intended to describe a feature of the probability model, then we callit an estimator.

If x is the observed value of X, then we call S(x) the resulting estimate.

85

Page 86: JLarge Slides MT123 2011

Example 3.1 Let

S(X) =1

n

n∑

i=1

Xi.

If Xi ∼ NID(µ, σ2) then using the fact that S(X) is a linear combination ofnormals we have that

S(X) ∼ N

(µ,

σ2

n

).

If n is very large the estimator is very close to µ, the average value of the normal

distribution.

86

Page 87: JLarge Slides MT123 2011

3.2 Bias and mean square error of estimators

Estimate some quantity θ.

Wish for S(X) to be close to θ on average.

Bias: E S(X) − θ.

Example 3.2 If Xi ∼ NID(µ, σ2) then

S(X) = X =1

n

n∑

i=1

Xi,

the sample mean (sample average), has a zero bias as an estimator of µ.

When the bias is zero, the estimator is said to be unbiased.

87

Page 88: JLarge Slides MT123 2011

Very large dispersion?

Imprecision of estimator can be measured with the Mean Square Error criterion:

MSE: = E[S(X) − θ2

]= Var S(X) + [E S(X) − θ]2 .

RMSE = Root MSE = Square-root of the MSE.

→ Which is better: an unbiased estimator, or a biased estimator which is more

precise?

88

Page 89: JLarge Slides MT123 2011

4 Simulating random variables

Simulation is a key technique in advanced modern econometrics.

Produce random variables from known distribution functions.

4.1 Pseudo random numbers

All of the simulation methods built out of draws based on a sequence of indepen-dent and identically distributed (standard) uniform random numbers Ui ∈ [0, 1].

Let’s regard the problem of producing such uniform numbers as solved - matlab

does this for us.

An example is given below (!)Ui

.734

.452

.234

.123

.987

89

Page 90: JLarge Slides MT123 2011

4.2 Inverting distribution functions

Key point: given a source of unlimited simulated i.i.d. uniforms we can produce

i.i.d. draws from any continuous distribution FX(x).

Proof : As Ui is uniform, so

Pr(Ui ≤ FX(x)) = FX(x).

Thus

Pr(Ui ≤ FX(x)) = Pr(F−1

X (Ui) ≤ x) = Pr(Xi ≤ x),

So if we takeXi = F−1

X (Ui), (15)

then we produce random numbers from any continuous distribution,→ plug the stream of simulated uniforms into the quantile function (15).

Discrete random variables are very similar but need some attention at the jump-

points in the CDF, F .

90

Page 91: JLarge Slides MT123 2011

Example 4.1 The exponential distribution. Recall FX(x) = 1−exp(− 1

µx), and

so the quantile function is

F−1

X (p) = −µ log (1 − p) .

Hence−µ log (1 − Ui)

are i.i.d. exponential draws. e.g. µ = 1

Ui Xi

.734 1.324

.452 0.601

.234 0.266

.123 0.131

.987 4.343

91

Page 92: JLarge Slides MT123 2011

5 Asymptotic approximation

5.1 Motivation

Classical convergence

Xn = 3 +1

n→ 3

as n → ∞.

A little more fuzzy when we think of

Xn = 3 +Y

n

?→ 3,

where Y is a random variable.

There are different measures of convergence. Some need moments, others don’t:“convergence in probability” and

“convergence in distribution” .

Formally we will think of a sequence of random variables X1, X2, . . . , Xn which,as n gets large, will be such that Xn will behave like some other random variable

or constant X.

92

Page 93: JLarge Slides MT123 2011

Example 5.1 We are interested in

Xn =1

n

n∑

j=1

Yj.

Then it forms a sequence

X1 = Y1, X2 =1

2(Y1 + Y2) , X3 =

1

3(Y1 + Y2 + Y3).

What does 1

n

∑nj=1

Yj behave like for large n? What does Xn converge to for large

n?

93

Page 94: JLarge Slides MT123 2011

5.2 Definitions

Sequence of random variables Xn. Ask if

Xn − X

is small as n goes to infinity.

You can measure smallness in many ways and so there are lots of different notionsof convergence.

We discuss three, the second of which will be the most important for us.

94

Page 95: JLarge Slides MT123 2011

Definition. (Convergence in mean square) Let X and X1, X2, . . . be randomvariables. If

limn→∞

E[(Xn − X)2

]= 0,

then the sequence X1, X2, . . . is said to converge in mean square to the randomvariable X. A shorthand notation is

Xnm.s.→ X. (16)

Necessary and sufficient conditions for Xnm.s.→ X are that

limn→∞

E(Xn − X) = 0, [asymptotic unbiased] limn→∞

Var(Xn − X) = 0.

95

Page 96: JLarge Slides MT123 2011

Suppose Y1, ..., Yn are i.i.d. with mean µ and variance σ2. Then define

Xn =1

n

n∑

i=1

Yi,

which has

E (Xn) =1

n

n∑

i=1

E(Yi) = µ,

and

Var (Xn) =1

n2Var

n∑

i=1

Yi =1

n2

n∑

i=1

Var(Yi)

=1

nσ2.

Hence Xn is unbiased and the variance goes to zero. Hence

Xnm.s.→ µ.

96

Page 97: JLarge Slides MT123 2011

Definition. (Convergence in probability) If for all ε, η > 0 ∃ no s.t.

Pr(|Xn − X| < η) > 1 − ε, ∀ n > n0,

then the sequence X1, X2, . . . is said to converge in probability to the randomvariable X. A shorthand notation is

Xnp→ X. (17)

97

Page 98: JLarge Slides MT123 2011

Definition. (Convergence almost surely) Let X and X1, X2, . . . be random vari-ables. If, for all ε, η > 0, there exists a n0 s.t.

Pr(|Xn − X| < η, ∀ n > n0) > 1 − ε,

then we say that Xn almost surely converges to X, which we write as Xna.s.→

X.

Thus almost sure convergence is about ensuring that the joint behaviour of allevents n > n0 is well behaved.

But convergence in probability just looks at the probabilities for each n.

98

Page 99: JLarge Slides MT123 2011

Xna.s.→ X ⇒ Xn

p→ X.

Further note that Xna.s.→ X does not imply or is not implied by Xn

m.s.→ X.

99

Page 100: JLarge Slides MT123 2011

Theorem. Weak Law Large Numbers (WLLN). Let Xi ∼ iid, E(Xi), Var (Xi)exist, then

1

n

n∑

i=1

Xip→ E(Xi),

as n → ∞.

Proof. See lecture notes (uses Chebyshev’s inequality or the generic result that

θm.s→ θ ⇒ θ

p→ θ).

100

Page 101: JLarge Slides MT123 2011

Theorem. (Kolmogorov’s) Strong Law of Large Numbers (SLLN). Let Xi ∼iid, E(Xi) exist, then

1

n

n∑

i=1

Xia.s.→ E(Xi),

as n → ∞.Proof. Difficult. See, for example, Gallant (1997 p. 132).

101

Page 102: JLarge Slides MT123 2011

5.3 Some payback

The most important rules are

• If Anp→ a , then g(An)

p→ g (a) where g(.) is a continuous function at a.

Example 5.2 Suppose Xi ∼ iid, E(Xi), Var (Xi) exist, and E(Xi is non-zero,

then1

n

n∑

i=1

Xip→ E(Xi),

which implies1

1

n

∑ni=1

Xi

p→ 1

E(Xi)

102

Page 103: JLarge Slides MT123 2011

• If g and h are both continuous functions and

Anp→ a, Bn

p→ b,

as n → ∞, theng(An)h(Bn)

p→ g(a)h(b).

Suppose Yi ∼ iid, E(Yi), Var (Yi) exist. then(

1

n

n∑

i=1

Xi

)(1

n

n∑

i=1

Yi

)p→ E(Xi)E(Yi).

103

Page 104: JLarge Slides MT123 2011

5.4 Some more theory

Refined measure of convergence

Convergence almost surely or in probability is quite a rough measure for it says

thatXn − X

implodes to zero with large values of n.

Does not indicate speed of convergence nor give any distributional shape to

Xn − X.

To improve our understanding we need to have a concept called convergencein distribution.

104

Page 105: JLarge Slides MT123 2011

Definition. (Convergence in Distribution) The sequence X1, X2, . . . of randomvariables is said to converge in distribution to the random variable X if

FXn(x) → FX (x) . (18)

A shorthand notation is

Xnd→ X. (19)

105

Page 106: JLarge Slides MT123 2011

Generic tools — Central Limit Theorems

Most famous of these is the Lindeberg-Levy ‘CLT’.

Theorem (Lindeberg-Levy) Let X1, X2, . . . be independent, identically distributedrandom variables, so that EXi = µ, Var (Xi) = σ2.

Set

Xn = (X1 + · · · + Xn)/n.

Then √n(Xn − µ

) d→ N(0, σ2

).

106

Page 107: JLarge Slides MT123 2011

ExampleSuppose Xi are i.i.d. χ2

1 (that is, informally, N(0, 1)2). Xi has mean of 1 and

variance of 2. The Lindeberg-Levy CLT shows that

√n(X − 1

) d→ N (0, 2) .

−2.5 0.0 2.5 5.0 7.5 10.0

0.2

0.4Average of χ2

1 drawsN(s=1.4)

−15 −10 −5 0 5

0.1

0.2

0.3 Log of average of χ21 draws

N(s=1.68)

−2.5 0.0 2.5 5.0 7.5 10.0

0.1

0.2

0.3

Average of χ21 draws

N(s=1.42)

−10.0 −7.5 −5.0 −2.5 0.0 2.5 5.0

0.1

0.2

0.3 Log of average of χ21 draws

N(s=1.49)

−5.0 −2.5 0.0 2.5 5.0 7.5

0.1

0.2

0.3 Average of χ21 draws

N(s=1.41)

−7.5 −5.0 −2.5 0.0 2.5 5.0

0.1

0.2

0.3 Log of average of χ21 draws

N(s=1.42)

Figure 4: Left panel: estimated density, using 10,000 simulations, of√

n(X − 1

)from a sample

of iid χ2

1variables. Right panel looks at

√n(log(X)− log(1)

). From top to base, graphs have

n=3, 10 and 50.

107

Page 108: JLarge Slides MT123 2011

Very important results in this context due to Slutsky’s Theorem:

• Suppose Xnd→ X and Yn

P→ µ. Then XnYnd→ Xµ and Xn/Yn

d→ X/µ ifµ 6= 0.

• More generally, suppose Xnd→ X and Yn

P→ µ. Let ϕ be a continuous

mapping. Then ϕ (Xn, Yn)d→ ϕ (X, µ) .

108

Page 109: JLarge Slides MT123 2011

Suppose X1, ..., Xn are univariate i.i.d. with mean µ and variance σ2.

Lindeberg-Levy: √n(Xn − µ

)

σd→ N (0, 1) .

And we also can show that:

σ2 =1

n

n∑

i=1

(Xi − Xn

)2=

1

n

n∑

i=1

X2i − X

2

na.s.→ σ2.

Then by Slutsky’s Theorem

√n(Xn − µ

)

σd→ N (0, 1) .

109

Page 110: JLarge Slides MT123 2011

Multivariate CLTs:

These will be very important for us.

(Multivariate Lindeberg-Levy)

Let X1, X2, . . . be i.i.d. r.v.s, so that EXi = µ, Var (Xi) = Σ.

Set

Xn = (X1 + · · · + Xn)/n.

Then √n(Xn − µ

) d→ N (0, Σ) .

110