Tutorial Financial Econometrics/Statistics 2005 SAMSI program on Financial Mathematics, Statistics,...

Post on 27-Mar-2015

242 views 4 download

Tags:

Transcript of Tutorial Financial Econometrics/Statistics 2005 SAMSI program on Financial Mathematics, Statistics,...

TutorialFinancial Econometrics/Statistics

2005 SAMSI program on Financial Mathematics, Statistics, and Econometrics

Goal

At the index level

Part I: Modeling

... in which we see what basic properties of stock prices/indices we want to capture

Contents

Returns and their (static) properties

Pricing models

Time series properties of returns

Why returns?

Prices are generally found to be non-stationary

Makes life difficult (or simpler...)

Traditional statistics prefers stationary data

Returns are found to be stationary

Which returns?

Two type of returns can be defined

Discrete compounding

Continuous compounding

1

log

t

tt P

PR

11

t

tt P

PR

Discrete compounding

If you make 10% on half of your money and 5% on the other half, you have in total 7.5%

Discrete compounding is additive over portfolio formation

Continuous compounding

If you made 3% during the first half year and 2% during the second part of the year, you made (exactly) 5% in total

Continuous compounding is additive over time

Empirical properties of returns

Mean St.dev. Annualized

volatility

Skewness Kurtosis Min Max

IBM -0.0% 2.46% 39.03% -23.51 1124.61 -138% 12.4%

IBM

(corr)

0.0% 1.64% 26.02% -0.28 15.56 -26.1% 12.4%

S&P 0.0% 0.95% 15.01% -1.4 39.86 -22.9% 8.7%

Data period: July 1962- December 2004; daily frequency

Stylized facts

Expected returns difficult to assess

What’s the ‘equity premium’?

Index volatility < individual stock volatility

Negative skewness

Crash risk

Large kurtosis

Fat tails (thus EVT analysis?)

Pricing models

Finance considers the final value of an asset to be ‘known’

as a random variable , that is

In such a setting, finding the price of an asset is equivalent to finding its expected return:

11

P

XE

P

XERE

X

Pricing models 2

As a result, pricing models model expected returns ...

... in terms of known quantities or a few ‘almost known’ quantities

Capital Asset Pricing Model

One of the best known pricing models

The theorem/model states

ftmtti

ftti rERrER ,,

mt

mtti

ft

mt

ftti

ti RVar

RRCov

rRE

rRE ,,,,

Black-Scholes

Also Black-Scholes is a pricing model

(Exact) contemporaneous relation between asset prices/returns

y volatilit,moneynesspriceStock

price CallBS

Time series properties of returns

Traditionally model fitting exercise without much finance

mostly univariate time series and, thus, less scope for tor the ‘traditional’ cross-sectional pricing models

lately more finance theory is integrated

Focuses on the dynamics/dependence in returns

Random walk hypothesis

Standard paradigm in the 1960-1970

Prices follow a random walk

Returns are i.i.d.

Normality often imposed as well

Compare Black-Scholes assumptions

Box-Jenkins analysis

Linear time series analysis

Box-Jenkins analysis generally identifies a white noise

This has been taken long as support for the random walk hypothesis

Recent developments

Some autocorrelation effects in ‘momentum’

Some (linear) predictability

Largely academic discussion

Higher moments and risk

Risk predictability

There is strong evidence for autocorrelation in squared returns

also holds for other powers

‘volatility clustering’

While direction of change is difficult to predict, (absolute) size of change is

risk is predictable

The ARCH model

First model to capture this effect

No mean effects for simplicity

ARCH in mean

2

21

,0~

1

N

RR

t

ttt

ARCH properties

Uncorrelated returns

martingale difference returns

Correlated squared returns

with limited set of possible patterns

Symmetric distribution if innovations are symmetric

Fat tailed distribution, even if innovations are not

The GARCH model

Generalized ARCH

Beware of time indices ...

2

22

21

21

1

,0~

1

N

R

R

t

ttt

ttt

GARCH model

Parsimonious way to describe various correlation patterns

for squared returns

Higher-order extension trivial

Math-stat analysis not that trivial

See inference section later

Stochastic volatility models

Use latent volatility process

2

2

1

1

,0

0~

exp

N

hh

hR

t

t

ttt

ttt

Stochastic volatility models

Also SV models lead to volatility clustering

Leverage

Negative innovation correlation means that volatility increases and price decreases go together

Negative return/volatility correlation

(One) structural story: default risk

Continuous time modeling

Mathematical finance uses continuous time, mainly for ‘simplicity’

Compare asymptotic statistics as approximation theory

Empirical finance (at least originally) focused on discrete time models

Consistency

The volatility clustering and other empirical evidence is consistent with appropriate continuous time models

A simple continuous time stochastic volatility model

2

1ln

ttt

ttt

dWdtd

dWdtSd

Approximation theory

There is a large literature that deals with the approximation of continuous time stochastic volatility models with discrete time models

Important applications

Inference

Simulation

Pricing

Other asset classes

So far we only discussed stock(indices)

Stock derivatives can be studied using a derivative pricing models

Financial econometrics also deals with many other asset classes Term structure (including credit risk)

Commodities

Mutual funds

Energy markets

...

Term structure modeling

Model a complete curve at a single point in time

There exist models

in discrete/continuous time

descriptive/pricing

for standard interest rates/derivatives

...

Part 2: Inference

Contents

Parametric inference for ARCH-type models

Rank based inference

Analogy principle

The classical approach to estimation is based on the analogy principle

if you want to estimate an expectation, take an average

if you want to estimate a probability, take a frequency

...

Moment estimation (GMM)

Consider an ARCH-type model

We suppose that can be calculated on the basis of observations if is known

Moment condition

tttR 1

1 t

021

21 ttt RE

Moment estimation - 2

The estimator now is taken to solve

In case of “underidentification”: use instruments

In case of “overidentification”: minimize distance-to-zero

0ˆ1

1

21

2

n

tnttRn

Likelihood estimation

In case the density of the innovations is known, say it is , one can write down the density/likelihood of observed returns

Estimator: maximize this

n

t t

t

t

Rf

1 11

1

f

Doing the math ...

Maximizing the log-likelihood boils down to solving

with

n

tttt f

f

1

21log

'1

2

1

1

t

tt

R

Efficiency consideration

Which of the above estimators is “better”?

Analysis using Hájek-Le Cam theory of asymptotic statistics

Approximate complicated statistical experiment with very simple ones

Something which works well in the approximating experiment, will also do well in the original one

Quasi MLE

In order for maximum likelihood to work, one needs the density of the innovations

If this is not know, one can guess a density (e.g., the normal)

This is known as

ML under non-standard conditions (Huber)

Quasi maximum likelihood

Pseudo maximum likelihood

Will it work?

For ARCH-type models, postulating the Gaussian density can be shown to lead to consistent estimates

There is a large theory on when this works or not

We say “for ARCH-type models the Gaussian distribution has the QMLE property”

The QMLE pitfall

One often sees people referring to Gaussian MLE

Then, they remark that we know financial innovations are fat-tailed ...

... and they switch to t-distributions

The t-distribution does not possess the QMLE property (but, see later)

How to deal with SV-models?

The SV models look the same

But now, is a latent process and hence not observed

Likelihood estimation still works “in principle”, but unobserved variances have to be integrated out

tttR 1 1 t

Inference for continuous time models Continuous time inference can, in theory, be

based on

continuous record observations

discretely sampled observations

Essentially all known approaches are based on approximating discrete time models

Rank based inference

... in which we discuss the main ideas of rank based inference

The statistical model

Consider a model where ‘somewhere’ there

exist i.i.d. random errors

The observations are

The parameter of interest is some

We denote the density of the errors by

ntt 1

nttY 1

p

f

Formal model

We have an outcome space , with the number of observations and the dimension of

Take standard Borel sigma-fields

Model for sample size :

Asymptotics refer to

nkk

Y

n

n

fPE fn ;:,

n

Example: Linear regression

Linear regression model

(with observations )

Innovation density and cdf

iTii XY niii XY 1,

f F

Example ARCH(1)

Consider the standard ARCH(1) model

Innovation density and cdf

ttt YY 2110

f F

Maintained hypothesis

For given and sample size , the

innovations can be calculated from the

observations

For cross-sectional models one may even often write

Latent variable (e.g., SV) models ...

n ntt 1

nttY 1

;iii Y

Innovation ranks

The ranks are the ranks of the

innovations

We also write for the ranks

of the innovations based on

a value for the parameter of interest

Ranks of observations are generally not very useful

nRR ,,1 n ,,1

n,,1 nRR ,,1

Basic properties

The distribution does

not depend on nor on

permutation of

This is (fortunately) not true for

at least ‘essentially’

nf RRL ,,1, f

n,,1

nf RRL ,,1,0

Invariance

Suppose we generate the innovations as transformation

with i.i.d. standard uniform

Now, the ranks are even invariant with respect to

nii 1

niiU 1

ii UF 1

niiR 1F

Reconstruction

For large sample size we have

and, thus,

n

1n

RU ii

11

n

RF i

i

Rank based statistics

The idea is to apply whatever procedure you have that uses innovations on the innovations reconstructed from the ranks

This makes the procedure robust to distributional changes

Efficiency loss due to ‘ ’?

Rank based autocorrelations

Time-series properties can be studied using rank based autocorrelations

These can be interpreted as ‘standard’ autocorrelations

rank based

for given reference density and distribution free

n

tltt

nf RR

f

f

nlr

1,

'1

Robustness

An important property of rank based statistics is the distributional invariance

As a result: a rank based estimator is consistent for any reference density

All densities satisfy the QMLE property when using rank based inference

RB̂

Limiting distribution

The limiting distribution of depends on both the chosen reference density and the actual underlying density

The optimal choice for the reference density is the actual density

How ‘efficient’ is this estimator?

Semiparametrically efficient

RB̂

gf

Remark

All procedures are distribution free with respect to the innovation density

They are, clearly, not distribution free with respect to the parameter of interest

f

Signs and ranks

Why ranks?

So far, we have been considering ‘completely’ unrestricted sets of innovation densities

For this class of densities ranks are ‘maximal invariant’

This is crucial for proving semiparametric efficiency

Alternatives

Alternative specifications may impose

zero-median innovations

symmetric innovations

zero-mean innovations

This is generally a bad idea ...

Zero-median innovations

The maximal invariant now becomes the ranks and signs of the innovations

The ideas remain the same, but for a more precise reconstruction

Split sample of innovations in positive and negative part and treat those separately

tt signs

But ranks are still ...

Yes, the ranks are still invariant

... and the previous results go through

But the efficiency bound has now changed and rank based procedures are no longer semiparametrically efficient

... but sign-and-rank based procedures are

Symmetric innovations

In the symmetric case, the signed-ranks become maximal invariant

signs of the innovations

ranks of the absolute values

The reconstruction now becomes still more precise (and efficient)

Semiparametric efficiency

General result

Using the maximal invariant to reconstitute the central sequence leads to semiparametrically efficient inference

in the model for which this maximal invariant is derived

In general use

invariant maximal,,nffE

Proof

The proof is non-trivial, but some intuition can be given using tangent spaces