MARTIN, V. L.; A. S. HURN AND D. HARRIS. Econometic Modelling with Time Series_Specification,...

Econometric Modelling with Time Series

Specification, Estimation and Testing

V. L. Martin, A. S. Hurn and D. Harris

iv

Preface

This book provides a general framework for specifying, estimating and test-

ing time series econometric models. Special emphasis is given to estima-

tion by maximum likelihood, but other methods are also discussed includ-

ing quasi-maximum likelihood estimation, generalized method of moments,

nonparametrics and estimation by simulation. An important advantage of

adopting the principle of maximum likelihood as the unifying framework for

the book is that many of the estimators and test statistics proposed in econo-

metrics can be derived within a likelihood framework, thereby providing a

coherent vehicle for understanding their properties and interrelationships.

In contrast to many existing econometric textbooks, which deal mainly

with the theoretical properties of estimators and test statistics through a

theorem-proof presentation, this book is very concerned with implemen-

tation issues in order to provide a fast-track between the theory and ap-

plied work. Consequently many of the econometric methods discussed in

the book are illustrated by means of a suite of programs written in GAUSS

and MATLABR.1 The computer code emphasizes the computational side of

econometrics and follows the notation in the book as closely as possible,

thereby reinforcing the principles presented in the text. More generally, the

computer code also helps to bridge the gap between theory and practice

by enabling the reproduction of both theoretical and empirical results pub-

lished in recent journal articles. The reader, as a result, may build on the

code and tailor it to more involved applications.

Organization of the Book

Part ONE of the book is an exposition of the basic maximum likelihood

framework. To implement this approach, three conditions are required: the

probability distribution of the stochastic process must be known and spec-

ified correctly, the parametric specifications of the moments of the distri-

bution must be known and specified correctly, and the likelihood must be

tractable. The properties of maximum likelihood estimators are presented

and three fundamental testing procedures namely, the Likelihood Ratio

test, the Wald test and the Lagrange Multiplier test are discussed in detail.

There is also a comprehensive treatment of iterative algorithms to compute

maximum likelihood estimators when no analytical expressions are available.

Part TWO is the usual regression framework taught in standard econo-

metric courses but presented within the maximum likelihood framework.1 GAUSS is a registered trademark of Aptech Systems, Inc. http://www.aptech.com/ andMATLABR is a registered trademark of The MathWorks, Inc. http://www.mathworks.com/.

vBoth nonlinear regression models and non-spherical models exhibiting ei-

ther autocorrelation or heteroskedasticity, or both, are presented. A further

advantage of the maximum likelihood strategy is that it provides a mecha-

nism for deriving new estimators and new test statistics, which are designed

specifically for non-standard problems.

Part THREE provides a coherent treatment of a number of alternative es-

timation procedures which are applicable when the conditions to implement

maximum likelihood estimation are not satisfied. For the case where the

probability distribution is incorrectly specified, quasi-maximum likelihood

is appropriate. If the joint probability distribution of the data is treated as

unknown, then a generalized method of moments estimator is adopted. This

estimator has the advantage of circumventing the need to specify the dis-

tribution and hence avoids any potential misspecification from an incorrect

choice of the distribution. An even less restrictive approach is not to specify

either the distribution or the parametric form of the moments of the distri-

bution and use nonparametric procedures to model either the distribution

of variables or the relationships between variables. Simulation estimation

methods are used for models where the likelihood is intractable arising, for

example, from the presence of latent variables. Indirect inference, efficient

methods of moments and simulated methods of moments are presented and

compared.

Part FOUR examines stationary time series models with a special empha-

sis on using maximum likelihood methods to estimate and test these models.

Both single equation models, including the autoregressive moving average

class of models, and multiple equation models, including vector autoregres-

sions and structural vector autoregressions, are dealt with in detail. Also

discussed are linear factor models where the factors are treated as latent.

The presence of the latent factor means that the full likelihood is generally

not tractable. However, if the models are specified in terms of the normal

distribution with moments based on linear parametric representations, a

Kalman filter is used to rewrite the likelihood in terms of the observable

variables thereby making estimation and testing by maximum likelihood

feasible.

Part FIVE focusses on nonstationary time series models and in particular

tests for unit roots and cointegration. Some important asymptotic results

for nonstationary time series are presented followed by a comprehensive dis-

cussion of testing for unit roots. Cointegration is tackled from the perspec-

tive that the well-known Johansen estimator may be usefully interpreted

as a maximum likelihood estimator based on the assumption of a normal

distribution applied to a system of equations that is subject to a set of

vi

cross-equation restrictions arising from the assumption of common long-run

relationships. Further, the trace and maximum eigenvalue tests of cointegra-

tion are shown to be likelihood ratio tests.

Part SIX is concerned with nonlinear time series models. Models that are

nonlinear in mean include the threshold class of model, bilinear models and

also artificial neural network modelling, which, contrary to many existing

treatments, is again addressed from the econometric perspective of estima-

tion and testing based on maximum likelihood methods. Nonlinearities in

variance are dealt with in terms of the GARCH class of models. The final

chapter focusses on models that deal with discrete or truncated time series

data.

Even in a project of this size and scope, sacrifices have had to be made to

keep the length of the book manageable. Accordingly, there are a number

of important topics that have had to be omitted.

(i) Although Bayesian methods are increasingly being used in many areas

of statistics and econometrics, no material on Bayesian econometrics is

included. This is an important field in its own right and the interested

reader is referred to recent books by Koop (2003), Geweke (2005), Koop,

Poirier and Tobias (2007) and Greenberg (2008), inter alia. Where ap-

propriate, references to Bayesian methods are provided in the body of

the text.

(ii) With great reluctance a chapter on bootstrapping was not included be-

cause of space issues. A good place to start reading is the introductory

text by Efron and Tibshirani (1993) and the useful surveys by Horowitz

(1997) and Li and Maddala (1996b,1996a).

(iii) In Part SIX, in the chapter dealing with modelling the variance of time

series, there are important recent developments in stochastic volatility

and realized volatility that would be worthy of inclusion. For stochastic

volatility, there is an excellent volume of readings edited by Shephard

(2005), while the seminal articles in the area of realized volatility are

Anderson et al. (2001, 2003).

The fact that these areas have not been covered should not be regarded as a

value judgement about their relative importance. Instead the subject matter

chosen for inclusion reflects a balance between the interests of the authors

and purely operational decisions aimed at preserving the flow and continuity

of the book.

vii

Computer Code

Specifically, computer code is available from a companion website to repro-

duce relevant examples in the text, to reproduce figures in the text that are

not part of an example, to reproduce the applications presented in the final

section of each chapter, and to complete the exercises. Where applicable,

the time series data used in these examples, applications and exercises are

also available in a number of different formats.

Presenting numerical results in the examples immediately gives rise to two

important issues concerning numerical precision.

(1) In all of the examples listed in the front of the book where computer code

has been used, the numbers appearing in the text are rounded versions of

those generated by the code. Accordingly, the rounded numbers should

be interpreted as such and not be used independently of the computer

code to try and reproduce the numbers reported in the text.

(2) In many of the examples, simulation has been used to demonstrate a

concept. Since GAUSS and MATLAB have different random number gen-

erators, the results generated by the different sets of code will not be

identical to one another. For consistency we have always used the GAUSS

output for reporting purposes.

Although GAUSS and MATLAB are very similar high-level programming

languages, there are some important differences that require explanation.

Probably the most important difference is one of programming style. GAUSS

programs are script files that allow calls to both inbuilt GAUSS and user-

defined procedures. MATLAB, on the other hand, does not support the use

of user-defined functions in script files. Furthermore, MATLAB programming

style favours writing user-defined functions in separate files and then calling

them as if they were in-built functions. This style of programming does not

suit the learning-by-doing environment that the book tries to create. Con-

sequently, the MATLAB programs are written mainly as function files with a

main function and all the required user-defined functions required to im-

plement the procedure in the same file. The only exception to this rule is

that a few MATLAB utility files, which greatly facilitate the conversion and

interpretation of code from GAUSS to MATLAB, which are provided as sep-

arate stand-alone MATLAB function files. Finally, all the figures in the text

were created using MATLAB together with a utility file laprint.m written by

Arno Linnemann of the University of Kessel.2

2 A user guide is available athttp://www.uni-kassel.de/fb16/rat/matlab/laprint/laprintdoc.ps.

viii

Acknowledgements

Creating a manuscript of this scope and magnitude is a daunting task and

there are many people to whom we are indebted. In particular, we would

like to thank Kenneth Lindsay, Adrian Pagan and Andy Tremayne for their

careful reading of various chapters of the manuscript and for many helpful

comments and suggestions. Gael Martin helped with compiling a suitable

list of references to Bayesian econometric methods. Ayesha Scott compiled

the index, a painstaking task for a manuscript of this size. Many others

have commented on earlier drafts of chapters and we are grateful to the

following individuals: our colleagues, Gunnar Bardsen, Ralf Becker, Adam

Clements, Vlad Pavlov and Joseph Jeisman; and our graduate students, Tim

Christensen, Christopher Coleman-Fenn, Andrew McClelland, Jessie Wang

and Vivianne Vilar.

We also wish to express our deep appreciation to the team at Cambridge

University Press, particularly Peter C. B. Phillips for his encouragement

and support throughout the long gestation period of the book as well as

for reading and commenting on earlier drafts. Scott Parris, with his energy

and enthusiasm for the project, was a great help in sustaining the authors

during the long slog of completing the manuscript. Our thanks are also due

to our CUP readers who provided detailed and constructive feedback at

various stages in the compilation of the final document. Michael Erkelenz of

Fine Line Writers edited the entire manuscript, helped to smooth out the

prose and provided particular assistance with the correct use of adjectival

constructions in the passive voice.

It is fair to say that writing this book was an immense task that involved

the consumption of copious quantities of chillies, champagne and port over a

protracted period of time. The biggest debt of gratitude we owe, therefore, is

to our respective families. To Gael, Sarah and David; Cath, Iain, Robert and

Tim; and Fiona and Caitlin: thank you for your patience, your good humour

in putting up with and cleaning up after many a pizza night, your stoicism

in enduring yet another vacant stare during an important conversation and,

ultimately, for making it all worthwhile.

Vance Martin, Stan Hurn & David Harris

November 2011

Contents

List of illustrations page 1

Computer Code used in the Examples 4

PART ONE MAXIMUM LIKELIHOOD 1

1 The Maximum Likelihood Principle 3

1.1 Introduction 3

1.2 Motivating Examples 3

1.3 Joint Probability Distributions 9

1.4 Maximum Likelihood Framework 12

1.4.1 The Log-Likelihood Function 12

1.4.2 Gradient 18

1.4.3 Hessian 20

1.5 Applications 23

1.5.1 Stationary Distribution of the Vasicek Model 23

1.5.2 Transitional Distribution of the Vasicek Model 25

1.6 Exercises 28

2 Properties of Maximum Likelihood Estimators 35

2.1 Introduction 35

2.2 Preliminaries 35

2.2.1 Stochastic Time Series Models and Their Prop-

erties 36

2.2.2 Weak Law of Large Numbers 41

2.2.3 Rates of Convergence 45

2.2.4 Central Limit Theorems 47

2.3 Regularity Conditions 55

2.4 Properties of the Likelihood Function 57

x Contents

2.4.1 The Population Likelihood Function 57

2.4.2 Moments of the Gradient 58

2.4.3 The Information Matrix 61

2.5 Asymptotic Properties 63

2.5.1 Consistency 63

2.5.2 Normality 67

2.5.3 Efficiency 68

2.6 Finite-Sample Properties 72

2.6.1 Unbiasedness 73

2.6.2 Sufficiency 74

2.6.3 Invariance 75

2.6.4 Non-Uniqueness 76

2.7 Applications 76

2.7.1 Portfolio Diversification 78

2.7.2 Bimodal Likelihood 80

2.8 Exercises 82

3 Numerical Estimation Methods 91

3.1 Introduction 91

3.2 Newton Methods 92

3.2.1 Newton-Raphson 93

3.2.2 Method of Scoring 94

3.2.3 BHHH Algorithm 95

3.2.4 Comparative Examples 98

3.3 Quasi-Newton Methods 101

3.4 Line Searching 102

3.5 Optimisation Based on Function Evaluation 104

3.6 Computing Standard Errors 106

3.7 Hints for Practical Optimization 109

3.7.1 Concentrating the Likelihood 109

3.7.2 Parameter Constraints 110

3.7.3 Choice of Algorithm 111

3.7.4 Numerical Derivatives 112

3.7.5 Starting Values 113

3.7.6 Convergence Criteria 113

3.8 Applications 114

3.8.1 Stationary Distribution of the CIR Model 114

3.8.2 Transitional Distribution of the CIR Model 116

3.9 Exercises 118

Contents xi

4 Hypothesis Testing 124

4.1 Introduction 124

4.2 Overview 124

4.3 Types of Hypotheses 126

4.3.1 Simple and Composite Hypotheses 126

4.3.2 Linear Hypotheses 127

4.3.3 Nonlinear Hypotheses 128

4.4 Likelihood Ratio Test 129

4.5 Wald Test 133

4.5.1 Linear Hypotheses 134

4.5.2 Nonlinear Hypotheses 136

4.6 Lagrange Multiplier Test 137

4.7 Distribution Theory 139

4.7.1 Asymptotic Distribution of the Wald Statistic 139

4.7.2 Asymptotic Relationships Among the Tests 142

4.7.3 Finite Sample Relationships 143

4.8 Size and Power Properties 145

4.8.1 Size of a Test 145

4.8.2 Power of a Test 146


4.9.1 Exponential Regression Model 148

4.9.2 Gamma Regression Model 151

4.10 Exercises 153

PART TWO REGRESSION MODELS 159

5 Linear Regression Models 161


5.2 Specification 162

5.2.1 Model Classification 162

5.2.2 Structural and Reduced Forms 163

5.3 Estimation 166

5.3.1 Single Equation: Ordinary Least Squares 166

5.3.2 Multiple Equations: FIML 170

5.3.3 Identification 175

5.3.4 Instrumental Variables 177

5.3.5 Seemingly Unrelated Regression 181

5.4 Testing 182


xii Contents

5.5.1 Linear Taylor Rule 187

5.5.2 The Klein Model of the U.S. Economy 189

5.6 Exercises 191

6 Nonlinear Regression Models 199



6.3 Maximum Likelihood Estimation 201

6.4 Gauss-Newton 208

6.4.1 Relationship to Nonlinear Least Squares 212

6.4.2 Relationship to Ordinary Least Squares 213

6.4.3 Asymptotic Distributions 213

6.5 Testing 214

6.5.1 LR, Wald and LM Tests 214

6.5.2 Nonnested Tests 218


6.6.1 Robust Estimation of the CAPM 221

6.6.2 Stochastic Frontier Models 224

6.7 Exercises 228

7 Autocorrelated Regression Models 234



7.3 Maximum Likelihood Estimation 236

7.3.1 Exact Maximum Likelihood 237

7.3.2 Conditional Maximum Likelihood 238

7.4 Alternative Estimators 240

7.4.1 Gauss-Newton 241

7.4.2 Zig-zag Algorithms 244

7.4.3 Cochrane-Orcutt 247


7.5.1 Maximum Likelihood Estimator 249

7.5.2 Least Squares Estimator 253

7.6 Lagged Dependent Variables 258

7.7 Testing 260

7.7.1 Alternative LM Test I 262

7.7.2 Alternative LM Test II 263

7.7.3 Alternative LM Test III 264

7.8 Systems of Equations 265

7.8.1 Estimation 266

7.8.2 Testing 268

Contents xiii


7.9.1 Illiquidity and Hedge Funds 268

7.9.2 Beach-Mackinnon Simulation Study 269

7.10 Exercises 271

8 Heteroskedastic Regression Models 280



8.3 Estimation 283

8.3.1 Maximum Likelihood 283

8.3.2 Relationship with Weighted Least Squares 286


8.5 Testing 289

8.6 Heteroskedasticity in Systems of Equations 295

8.6.1 Specification 295


8.6.3 Testing 299

8.6.4 Heteroskedastic and Autocorrelated Disturbances 300


8.7.1 The Great Moderation 302

8.7.2 Finite Sample Properties of the Wald Test 304

8.8 Exercises 306

PART THREE OTHER ESTIMATION METHODS 313

9 Quasi-Maximum Likelihood Estimation 315


9.2 Misspecification 316

9.3 The Quasi-Maximum Likelihood Estimator 320

9.4 Asymptotic Distribution 323

9.4.1 Misspecification and the Information Equality 325

9.4.2 Independent and Identically Distributed Data 328

9.4.3 Dependent Data: Martingale Difference Score 329

9.4.4 Dependent Data and Score 330

9.4.5 Variance Estimation 331

9.5 Quasi-Maximum Likelihood and Linear Regression 333

9.5.1 Nonnormality 336

9.5.2 Heteroskedasticity 337

9.5.3 Autocorrelation 338

9.5.4 Variance Estimation 342

xiv Contents

9.6 Testing 346


9.7.1 Autoregressive Models for Count Data 348

9.7.2 Estimating the Parameters of the CKLS Model 351

9.8 Exercises 354

10 Generalized Method of Moments 361



10.2.1 Population Moments 362

10.2.2 Empirical Moments 363

10.2.3 GMM Models from Conditional Expectations 368

10.2.4 GMM and Maximum Likelihood 371

10.3 Estimation 372

10.3.1 The GMM Objective Function 372

10.3.2 Asymptotic Properties 373

10.3.3 Estimation Strategies 378

10.4 Over-Identification Testing 382


10.5.1 Monte Carlo Evidence 387

10.5.2 Level Effect in Interest Rates 393

10.6 Exercises 396

11 Nonparametric Estimation 404


11.2 The Kernel Density Estimator 405

11.3 Properties of the Kernel Density Estimator 409

11.3.1 Finite Sample Properties 410

11.3.2 Optimal Bandwidth Selection 410

11.3.3 Asymptotic Properties 414

11.3.4 Dependent Data 416

11.4 Semi-Parametric Density Estimation 417

11.5 The Nadaraya-Watson Kernel Regression Estimator 419

11.6 Properties of Kernel Regression Estimators 423

11.7 Bandwidth Selection for Kernel Regression 427

11.8 Multivariate Kernel Regression 430

11.9 Semi-parametric Regression of the Partial Linear Model 432


11.10.1Derivatives of a Nonlinear Production Function 434

11.10.2Drift and Diffusion Functions of SDEs 436

11.11 Exercises 439

Contents xv

12 Estimation by Simulation 447


12.2 Motivating Example 448

12.3 Indirect Inference 450


12.3.2 Relationship with Indirect Least Squares 455

12.4 Efficient Method of Moments (EMM) 456


12.4.2 Relationship with Instrumental Variables 458

12.5 Simulated Generalized Method of Moments (SMM) 459

12.6 Estimating Continuous-Time Models 461

12.6.1 Brownian Motion 464

12.6.2 Geometric Brownian Motion 467

12.6.3 Stochastic Volatility 470


12.7.1 Simulation Properties 473

12.7.2 Empirical Properties 475

12.8 Exercises 477

PART FOUR STATIONARY TIME SERIES 483

13 Linear Time Series Models 485


13.2 Time Series Properties of Data 486


13.3.1 Univariate Model Classification 489

13.3.2 Multivariate Model Classification 491

13.3.3 Likelihood 493

13.4 Stationarity 493

13.4.1 Univariate Examples 494

13.4.2 Multivariate Examples 495

13.4.3 The Stationarity Condition 496

13.4.4 Wolds Representation Theorem 497

13.4.5 Transforming a VAR to a VMA 498

13.5 Invertibility 501

13.5.1 The Invertibility Condition 501

13.5.2 Transforming a VMA to a VAR 502

13.6 Estimation 502

13.7 Optimal Choice of Lag Order 506

xvi Contents


13.9 Testing 511

13.10 Analyzing Vector Autoregressions 513

13.10.1Granger Causality Testing 515

13.10.2Impulse Response Functions 517

13.10.3Variance Decompositions 523


13.11.1Barros Rational Expectations Model 525

13.11.2The Campbell-Shiller Present Value Model 526

13.12 Exercises 528

14 Structural Vector Autoregressions 537



14.2.1 Short-Run Restrictions 542

14.2.2 Long-Run Restrictions 544

14.2.3 Short-Run and Long-Run Restrictions 548

14.2.4 Sign Restrictions 550

14.3 Estimation 553

14.4 Identification 558

14.5 Testing 559


14.6.1 Peersmans Model of Oil Price Shocks 561

14.6.2 A Portfolio SVAR Model of Australia 563

14.7 Exercises 566

15 Latent Factor Models 571



15.2.1 Empirical 572

15.2.2 Theoretical 574

15.3 The Recursions of the Kalman Filter 575

15.3.1 Univariate 576

15.3.2 Multivariate 581

15.4 Extensions 585

15.4.1 Intercepts 585

15.4.2 Dynamics 585

15.4.3 Nonstationary Factors 587

15.4.4 Exogenous and Predetermined Variables 589

15.5 Factor Extraction 589

15.6 Estimation 591

Contents xvii

15.6.1 Identification 591

15.6.2 Maximum Likelihood 591

15.6.3 Principal Components Estimator 593

15.7 Relationship to VARMA Models 596


15.8.1 The Hodrick-Prescott Filter 597

15.8.2 A Factor Model of Spreads with Money Shocks 601

15.9 Exercises 603

PART FIVE NON-STATIONARY TIME SERIES 613

16 Nonstationary Distribution Theory 615



16.2.1 Models of Trends 616

16.2.2 Integration 618

16.3 Estimation 620

16.3.1 Stationary Case 621

16.3.2 Nonstationary Case: Stochastic Trends 624

16.3.3 Nonstationary Case: Deterministic Trends 626

16.4 Asymptotics for Integrated Processes 629

16.4.1 Brownian Motion 630

16.4.2 Functional Central Limit Theorem 631

16.4.3 Continuous Mapping Theorem 635

16.4.4 Stochastic Integrals 637

16.5 Multivariate Analysis 638


16.6.1 Least Squares Estimator of the AR(1) Model 641

16.6.2 Trend Misspecification 643

16.7 Exercises 644

17 Unit Root Testing 651



17.3 Detrending 653

17.3.1 Ordinary Least Squares: Dickey and Fuller 655

17.3.2 First Differences: Schmidt and Phillips 656

17.3.3 Generalized Least Squares: Elliott, Rothenberg

and Stock 657

17.4 Testing 658

xviii Contents

17.4.1 Dickey-Fuller Tests 659

17.4.2 M Tests 660


17.5.1 Ordinary Least Squares Detrending 664

17.5.2 Generalized Least Squares Detrending 665

17.5.3 Simulating Critical Values 667

17.6 Power 668

17.6.1 Near Integration and the Ornstein-Uhlenbeck

Processes 669

17.6.2 Asymptotic Local Power 671

17.6.3 Point Optimal Tests 671

17.6.4 Asymptotic Power Envelope 673

17.7 Autocorrelation 675

17.7.1 Dickey-Fuller Test with Autocorrelation 675

17.7.2 M Tests with Autocorrelation 676

17.8 Structural Breaks 678

17.8.1 Known Break Point 681

17.8.2 Unknown Break Point 684


17.9.1 Power and the Initial Value 685

17.9.2 Nelson-Plosser Data Revisited 687

17.10 Exercises 687

18 Cointegration 695


18.2 Long-Run Economic Models 696

18.3 Specification: VECM 698

18.3.1 Bivariate Models 698

18.3.2 Multivariate Models 700

18.3.3 Cointegration 701

18.3.4 Deterministic Components 703

18.4 Estimation 705

18.4.1 Full-Rank Case 706

18.4.2 Reduced-Rank Case: Iterative Estimator 707

18.4.3 Reduced Rank Case: Johansen Estimator 709

18.4.4 Zero-Rank Case 715

18.5 Identification 716

18.5.1 Triangular Restrictions 716

18.5.2 Structural Restrictions 717


Contents xix

18.6.1 Asymptotic Distribution of the Eigenvalues 718

18.6.2 Asymptotic Distribution of the Parameters 720

18.7 Testing 724

18.7.1 Cointegrating Rank 724

18.7.2 Cointegrating Vector 727

18.7.3 Exogeneity 730

18.8 Dynamics 731

18.8.1 Impulse responses 731

18.8.2 Cointegrating Vector Interpretation 732


18.9.1 Rank Selection Based on Information Criteria 733

18.9.2 Effects of Heteroskedasticity on the Trace Test 735

18.10 Exercises 737

PART SIX NONLINEAR TIME SERIES 747

19 Nonlinearities in Mean 749



19.3 Threshold Models 755



19.3.3 Testing 758

19.4 Artificial Neural Networks 761



19.4.3 Testing 766

19.5 Bilinear Time Series Models 767



19.5.3 Testing 769

19.6 Markov Switching Model 770

19.7 Nonparametric Autoregression 774

19.8 Nonlinear Impulse Responses 775


19.9.1 A Multiple Equilibrium Model of Unemployment 779

19.9.2 Bivariate Threshold Models of G7 Countries 781

19.10 Exercises 784

xx Contents

20 Nonlinearities in Variance 795


20.2 Statistical Properties of Asset Returns 795

20.3 The ARCH Model 799



20.3.3 Testing 804

20.4 Univariate Extensions 807

20.4.1 GARCH 807

20.4.2 Integrated GARCH 812

20.4.3 Additional Variables 813

20.4.4 Asymmetries 814

20.4.5 Garch-in-Mean 815

20.4.6 Diagnostics 817

20.5 Conditional Nonnormality 818

20.5.1 Parametric 819

20.5.2 Semi-Parametric 821

20.5.3 Nonparametric 821

20.6 Multivariate GARCH 825

20.6.1 VECH 826

20.6.2 BEKK 827

20.6.3 DCC 830

20.6.4 DECO 836


20.7.1 DCC and DECO Models of U.S. Zero Coupon

Yields 837

20.7.2 A Time-Varying Volatility SVAR Model 838

20.8 Exercises 841

21 Discrete Time Series Models 850



21.3 Qualitative Data 853



21.3.3 Testing 861

21.3.4 Binary Autoregressive Models 863

21.4 Ordered Data 865

21.5 Count Data 867

21.5.1 The Poisson Regression Model 869

Contents xxi

21.5.2 Integer Autoregressive Models 871

21.6 Duration Data 874


21.7.1 An ACH Model of U.S. Airline Trades 876

21.7.2 EMM Estimator of Integer Models 879

21.8 Exercises 881

Appendix A Change of Variable in Probability Density Func-

tions 887

Appendix B The Lag Operator 888

B.1 Basics 888

B.2 Polynomial Convolution 889

B.3 Polynomial Inversion 890

B.4 Polynomial Decomposition 891

Appendix C FIML Estimation of a Structural Model 892

C.1 Log-likelihood Function 892

C.2 First-order Conditions 892

C.3 Solution 893

Appendix D Additional Nonparametric Results 897

D.1 Mean 897

D.2 Variance 899

D.3 Mean Square Error 901

D.4 Roughness 902

D.4.1 Roughness Results for the Gaussian Distribution 902

D.4.2 Roughness Results for the Gaussian Kernel 903

References 905

Author index 915

Subject index 918

Illustrations

1.1 Probability distributions of y for various models 51.2 Probability distributions of y for various models 71.3 Log-likelihood function for Poisson distribution 151.4 Log-likelihood function for exponential distribution 151.5 Log-likelihood function for the normal distribution 171.6 Eurodollar interest rates 241.7 Stationary density of Eurodollar interest rates 251.8 Transitional density of Eurodollar interest rates 272.1 Demonstration of the weak law of large numbers 422.2 Demonstration of the Lindeberg-Levy central limit theorem 492.3 Convergence of log-likelihood function 652.4 Consistency of sample mean for normal distribution 652.5 Consistency of median for Cauchy distribution 662.6 Illustrating asymptotic normality 692.7 Bivariate normal distribution 772.8 Scatter plot of returns on Apple and Ford stocks 782.9 Gradient of the bivariate normal model 813.1 Stationary density of Eurodollar interest rates: CIR model 1153.2 Estimated variance function of CIR model 1174.1 Illustrating the LR and Wald tests 1254.2 Illustrating the LM test 1264.3 Simulated and asymptotic distributions of the Wald test 1425.1 Simulating a bivariate regression model 1665.2 Sampling distribution of a weak instrument 1805.3 U.S. data on the Taylor Rule 1886.1 Simulated exponential models 2016.2 Scatter of plot Martin Marietta returns data 2226.3 Stochastic frontier disturbance distribution 2257.1 Simulated models with autocorrelated disturbances 236

2 Illustrations

7.2 Distribution of maximum likelihood estimator in an autocorre-lated regression model 252

8.1 Simulated data from heteroskedastic models 2828.2 The Great Moderation 3038.3 Sampling distribution of Wald test 3058.4 Power of Wald test 3059.1 Comparison of true and misspecified log-likelihood functions 3179.2 U.S. Dollar/British Pound exchange rates 3459.3 Estimated variance function of CKLS model 35311.1 Bias and variance of the kernel estimate of density 41111.2 Kernel estimate of distribution of stock index returns 41311.3 Bivariate normal density 41411.4 Semiparametric density estimator 41911.5 Parametric conditional mean estimates 42011.6 Nadaraya-Watson nonparametric kernel regression 42411.7 Effect of bandwidth on kernel regression 42511.8 Cross validation bandwidth selection 42911.9 Two-dimensional product kernel 43111.10 Semiparametric regression 43311.11 Nonparametric production function 43511.12 Nonparametric estimates of drift and diffusion functions 43812.1 Simulated AR(1) model 45012.2 Illustrating Brownian motion 46213.1 U.S. macroeconomic data 48713.2 Plots of simulated stationary time series 49013.3 Choice of optimal lag order 50814.1 Bivariate SVAR model 54114.2 Bivariate SVAR with short-run restrictions 54514.3 Bivariate SVAR with long-run restrictions 54714.4 Bivariate SVAR with short- and long-run restrictions 54914.5 Bivariate SVAR with sign restrictions 55214.6 Impuse responses of Peermans model 56415.1 Daily U.S. zero coupon rates 57315.2 Alternative priors for latent factors in the Kalman filter 58815.3 Factor loadings of a term structure model 59515.4 Hodrick-Prescott filter of real U.S. GPD 60116.1 Nelson-Plosser data 61816.2 Simulated distribution of AR1 parameter 62416.3 Continuous-time processes 63316.4 Functional Central Limit Theorem 63516.5 Distribution of a stochastic integral 63816.6 Mixed normal distribution 64017.1 Real U.S. GDP 652

Illustrations 3

17.2 Detrending 65817.3 Near unit root process 66917.4 Aymptotic power curve of ADF tests 67217.5 Asymptotic power envelope of ADF tests 67417.6 Structural breaks in U.S. GDP 67917.7 Union of rejections approach 68618.1 Permanent income hypothesis 69618.2 Long run money demand 69718.3 Term structure of U.S. yields 69818.4 Error correction phase diagram 69919.1 Properties of an AR(2) model 75019.2 Limit cycle 75119.3 Strange attractor 75219.4 Nonlinear error correction model 75319.5 U.S. unemployment 75419.6 Threshold functions 75719.7 Decomposition of an ANN 76219.8 Simulated bilinear time series models 76819.9 Markov switching model of U.S. output 77319.10 Nonparametric estimate of a TAR(1) model 77519.11 Simulated TAR models for G7 countries 78320.1 Statistical properties of FTSE returns 79620.2 Distribution of FTSE returns 79920.3 News impact curve 80120.4 ACF of GARCH(1,1) models 81020.5 Conditional variance of FTSE returns 81220.6 Risk-return preferences 81620.7 BEKK model of U.S. zero coupon bonds 82920.8 DECO model of interest rates 83820.9 SVAR model of U.K. Libor spread 84021.1 U.S. Federal funds target rate from 1984 to 2009 85221.2 Money demand equation with a floor interest rate 85321.3 Duration descriptive statistics for AMR 877

Computer Code used in the Examples(Code is written in GAUSS in which case the extension is .g

and in MATLAB in which case the extension is .m)

1.1 basic sample.* 41.2 basic sample.* 61.3 basic sample.* 61.4 basic sample.* 61.5 basic sample.* 71.6 basic sample.* 81.7 basic sample.* 81.8 basic sample.* 91.10 basic poisson.* 131.11 basic exp.* 141.12 basic normal like.* 161.14 basic poisson.* 181.15 basic exp.* 191.16 basic normal like.* 191.18 basic exp.* 221.19 basic normal.* 222.5 prop wlln1.* 412.6 prop wlln2.* 422.8 prop moment.* 452.10 prop lindlevy.* 482.21 prop consistency.* 642.22 prop normal.* 642.23 prop cauchy.* 652.25 prop asymnorm.* 682.28 prop edgeworth.* 722.29 prop bias.* 733.2 max exp.* 933.3 max exp.* 953.4 max exp.* 973.6 max weibull.* 99


3.7 max exp.* 1023.8 max exp.* 1034.3 test weibull.* 1334.5 test weibull.* 1354.7 test weibull.* 1394.10 test asymptotic.* 1414.11 text size.* 1454.12 test power.* 1474.13 test power.* 1475.5 linear simulation.* 1655.6 linear estimate.* 1695.7 linear fiml.* 1715.8 linear fiml.* 1735.10 linear weak.* 1795.14 linear lr.*, linear wd.*, linear lm.* 1825.15 linear fiml lr.*, linear fiml wd.*, linear fiml lm.* 1856.3 nls simulate.* 2006.5 nls exponential.* 2066.7 nls consumption estimate.* 2106.8 nls contest.* 2156.11 nls money.* 2197.1 auto simulate.* 2357.5 auto invest.* 2407.8 auto distribution.* 2517.11 auto test.* 2607.12 auto system.* 2678.1 hetero simulate.* 2818.3 hetero estimate.* 2848.7 hetero test.* 2938.9 hetero system.* 2988.10 hetero system.* 2998.11 hetero general.* 30110.2 gmm table.* 36610.3 gmm table.* 36710.11 gmm ccapm.* 38211.1 npd kernel.* 40711.2 npd property.* 41011.3 npd ftse.* 41211.4 npd bivariate.* 41411.5 npd seminonlin.* 41811.6 npr parametric.* 41911.7 npr nadwatson.* 42211.8 npr property.* 424

6 Computer Code used in the Examples

11.10 npr bivariate.* 43011.11 npr semi.* 43212.1 sim mom.* 45012.3 sim accuracy.* 45312.4 sim ma1indirect.* 45412.5 sim ma1emm.* 45712.6 sim ma1overid.* 46012.7 sim brownind.*,sim brownemm.* 46613.1 stsm simulate.* 48913.8 stsm root.* 49613.9 stsm root.* 49713.17 stsm varma.* 50413.21 stsm anderson.* 51113.24 stsm recursive.* 51313.25 stsm recursive.* 51613.26 stsm recursive.* 52213.27 stsm recursive.* 52314.2 svar bivariate.* 54014.5 svar bivariate.* 54414.9 svar bivariate.* 54714.10 svar bivariate.* 54814.12 svar bivariate.* 55214.13 svar shortrun.* 55414.14 svar longrun.* 55614.15 svar recursive.* 55714.17 svar test.* 56014.18 svar test.* 56115.1 kalman termfig.* 57215.5 kalman uni.* 58015.6 kalman multi.* 58315.8 kalman smooth.* 59015.9 kalman uni.* 59215.10 kalman term.* 59215.11 kalman fvar.* 59415.12 kalman panic.* 59416.1 nts nelplos.* 61616.2 nts nelplos.* 61616.3 nts nelplos.* 61716.4 nts moment.* 62216.5 nts moment.* 62416.6 nts moment.* 62816.7 nts yts.* 63216.8 nts fclt.* 635


16.10 nts stochint.* 63716.11 nts mixednormal.* 63917.1 unit qusgdp.* 65717.2 unit qusgdp.* 66117.3 unit asypower1.* 67117.4 unit asypowerenv.* 67417.5 unit maicsim.* 67717.6 unit qusgdp.* 67917.8 unit qusgdp.* 68317.9 unit qusgdp.* 68518.1 coint lrgraphs.* 69618.2 coint lrgraphs.* 69618.3 coint lrgraphs.* 69718.4 coint lrgraphs.* 70218.6 coint bivterm.* 70718.7 coint bivterm.* 70818.8 coint bivterm.* 71218.9 coint permincome.* 71418.10 coint bivterm.* 71518.11 coint triterm.* 71618.13 coint simevals.* 71918.16 coint bivterm.* 72819.1 nlm features.* 75019.2 nlm features.* 75019.3 nlm features.* 75119.4 nlm features.* 75219.6 nlm tarsim.* 76019.7 nlm annfig.* 76219.8 nlm bilinear.* 76719.9 nlm hamilton.* 77219.10 nlm tar.* 77419.11 nlm girf.* 77820.1 garch nic.* 80020.2 garch estimate.* 80420.3 garch test.* 80620.4 garch simulate.* 80920.5 garch estimate.* 81020.6 garch seasonality.* 81320.7 garch mean.* 81620.9 mgarch bekk.* 82821.2 discrete mpol.* 85221.3 discrete floor.* 85221.4 discrete simulation.* 857

8 Computer Code used in the Examples

21.7 discrete probit.* 85921.8 discrete probit.* 86221.9 discrete ordered.* 86621.11 discrete thinning.* 87121.12 discrete poissonauto.* 873

Code Disclaimer Information

Note that the computer code is provided for illustrative purposes only and

although care has been taken to ensure that it works properly, it has not been

thoroughly tested under all conditions and on all platforms. The authors and

Cambridge University Press cannot guarantee or imply reliability, service-

ability, or function of this computer code. All code is therefore provided as

is without any warranties of any kind.

PART ONE

MAXIMUM LIKELIHOOD

1The Maximum Likelihood Principle

1.1 Introduction

Maximum likelihood estimation is a general method for estimating the pa-

rameters of econometric models from observed data. The principle of max-

imum likelihood plays a central role in the exposition of this book, since a

number of estimators used in econometrics can be derived within this frame-

work. Examples include ordinary least squares, generalized least squares and

full-information maximum likelihood. In deriving the maximum likelihood

estimator, a key concept is the joint probability density function (pdf) of

the observed random variables, yt. Maximum likelihood estimation requires

that the following conditions are satisfied.

(1) The form of the joint pdf of yt is known.

(2) The specification of the moments of the joint pdf are known.

(3) The joint pdf can be evaluated for all values of the parameters, .

Parts ONE and TWO of this book deal with models in which all these

conditions are satisfied. Part THREE investigates models in which these

conditions are not satisfied and considers four important cases. First, if the

distribution of yt is misspecified, resulting in both conditions 1 and 2 being

violated, estimation is by quasi-maximum likelihood (Chapter 9). Second,

if condition 1 is not satisfied, a generalized method of moments estimator

(Chapter 10) is required. Third, if condition 2 is not satisfied, estimation

relies on nonparametric methods (Chapter 11). Fourth, if condition 3 is

violated, simulation-based estimation methods are used (Chapter 12).

1.2 Motivating Examples

To highlight the role of probability distributions in maximum likelihood esti-

mation, this section emphasizes the link between observed sample data and

4 The Maximum Likelihood Principle

the probability distribution from which they are drawn. This relationship

is illustrated with a number of simulation examples where samples of size

T = 5 are drawn from a range of alternative models. The realizations of

these draws for each model are listed in Table 1.1.

Table 1.1

Realisations of yt from alternative models: t = 1, 2, , 5.

Model t=1 t=2 t=3 t=4 t=5

Time Invariant -2.720 2.470 0.495 0.597 -0.960Count 2.000 4.000 3.000 4.000 0.000Linear Regression 2.850 3.105 5.693 8.101 10.387Exponential Regression 0.874 8.284 0.507 3.722 5.865Autoregressive 0.000 -1.031 -0.283 -1.323 -2.195Bilinear 0.000 -2.721 0.531 1.350 -2.451ARCH 0.000 3.558 6.989 7.925 8.118Poisson 3.000 10.000 17.000 20.000 23.000

Example 1.1 Time Invariant Model

Consider the model

yt = zt ,

where zt is a disturbance term and is a parameter. Let zt be a standardized

normal distribution, N(0, 1), defined by

f(z) =12pi

exp

[z

2

2

].

The distribution of yt is obtained from the distribution of zt using the change

of variable technique (see Appendix A for details)

f(y ; ) = f(z)

zy ,

where = {2}. Applying this rule, and recognising that z = y/, yields

f(y ; ) =12pi

exp

[(y/)

2

2

] 1 = 1

2pi2exp

[ y

2

22

],

or yt N(0, 2). In this model, the distribution of yt is time invariant

because neither the mean nor the variance depend on time. This property

is highlighted in panel (a) of Figure 1.1 where the parameter is = 2.

For comparative purposes the distributions of both yt and zt are given. As

yt = 2zt, the distribution of yt is flatter than the distribution of zt.


(a) Time Invariant Modelf(y)

y

z

y

(b) Count Model

f(y)

y

(c) Linear Regression Model

f(y)

y

(d) Exponential Regression Modelf(y)

y

-10 0 10 20-10 0 10 20

0 1 2 3 4 5 6 7 8 9-10 0 10

0

0.2

0.4

0.6

0.8

1

0

0.05

0.1

0.15

0.2

0

0.1

0.2

0.3

0

0.1

0.2

0.3

0.4

Figure 1.1 Probability distributions of y generated from the time invariant,count, linear regression and exponential regression models. Except for thetime invariant and count models, the solid line represents the density att = 1, the dashed line represents the density at t = 3 and the dotted linerepresents the density at t = 5.

As the distribution of yt in Example 1.1 does not depend on lagged values

yti, yt is independently distributed. In addition, since the distribution of ytis the same at each t, yt is identically distributed. These two properties are

abbreviated as iid. Conversely, the distribution is dependent if yt depends

on its own lagged values and non-identical if it changes over time.


Example 1.2 Count Model

Consider a time series of counts modelled as a series of draws from a

Poisson distribution

f (y; ) =y exp[]

y!, y = 0, 1, 2, ,

where > 0 is an unknown parameter. A sample of T = 5 realizations of

yt, given in Table 1.1, is drawn from the Poisson probability distribution in

panel (b) of Figure 1.1 for = 2. By assumption, this distribution is the

same at each point in time. In contrast to the data in the previous example

where the random variable is continuous, the data here are discrete as they

are positive integers that measure counts.

Example 1.3 Linear Regression Model

Consider the regression model

yt = xt + zt , zt iidN(0, 1) ,

where xt is an explanatory variable that is independent of zt and = {, 2}.The distribution of y conditional on xt is

f(y |xt; ) = 12pi2

exp

[(y xt)

2

22

],

which is a normal distribution with conditional mean xt and variance 2,

or yt N(xt, 2). This distribution is illustrated in panel (c) of Figure 1.1

with = 3, = 2 and explanatory variable xt = {0, 1, 2, 3, 4}. The effect ofxt is to shift the distribution of yt over time into the positive region, resulting

in the draws of yt given in Table 1.1 becoming increasingly positive. As the

variance at each point in time is constant, the spread of the distributions of

yt is the same for all t.

Example 1.4 Exponential Regression Model

Consider the exponential regression model

f(y |xt; ) = 1t

exp

[ yt

],

where t = 0+1xt is the time-varying conditional mean, xt is an explana-

tory variable and = {0, 1}. This distribution is highlighted in panel (d)of Figure 1.1 with 0 = 1, 1 = 1 and xt = {0, 1, 2, 3, 4}. As 1 > 0, the ef-fect of xt is to cause the distribution of yt to become more positively skewed

over time.


(a) Autoregressive Modelf(y)

y

(b) Bilinear Model

f(y)

y

(c) Autoregressive Heteroskedastic Model

f(y)

y

(d) ARCH Modelf(y)

y

-10 0 10 20-10 0 10

-10 0 10 20-10 0 10

0

0.1

0.2

0.3

0.4

0.5

0

0.1

0.2

0.3

0.4

0.5

0

0.05

0.1

0.15

0.2

0

0.05

0.1

0.15

0.2

Figure 1.2 Probability distributions of y generated from the autoregressive,bilinear, autoregressive with heteroskedasticity and ARCH models. Thesolid line represents the density at t = 1, the dashed line represents thedensity at t = 3 and the dotted line represents the density at t = 5.

Example 1.5 Autoregressive Model

An example of a first-order autoregressive model, denoted AR(1), is

yt = yt1 + ut , ut iidN(0, 2) ,


with || < 1 and = {, 2}. The distribution of y, conditional on yt1, is

f(y | yt1; ) = 12pi2

exp

[(y yt1)

2

22

],

which is a normal distribution with conditional mean yt1 and variance 2,or yt N(yt1, 2). If 0 < < 1, then a large positive (negative) value ofyt1 shifts the distribution into the positive (negative) region for yt, raisingthe probability that the next draw from this distribution is also positive

(negative). This property of the autoregressive model is highlighted in panel

(a) of Figure 1.2 with = 0.8, = 2 and initial value y1 = 0.

Example 1.6 Bilinear Time Series Model

The autoregressive model discussed above specifies a linear relationship

between yt and yt1. The following bilinear model is an example of a non-linear time series model

yt = yt1 + yt1ut1 + ut , ut iidN(0, 2) ,

where yt1ut1 represents the bilinear term and = {, , 2}. The distri-bution of yt conditional on yt1 is

f(y | yt1; ) = 12pi2

exp

[(y t)

2

22

],

which is a normal distribution with conditional mean t = yt1+yt1ut1and variance 2. To highlight the nonlinear property of the model, substitute

out ut1 in the equation for the mean

t = yt1 + yt1(yt1 yt2 yt2ut2)= yt1 + y2t1 yt1yt2 2yt1yt2ut2 ,

which shows that the mean is a nonlinear function of yt1. Setting =0 yields the linear AR(1) model of Example 1.5. The distribution of the

bilinear model is illustrated in panel (b) of Figure 1.2 with = 0.8, = 0.4,

= 2 and initial value y1 = 0.

Example 1.7 Autoregressive Model with Heteroskedasticity

An example of an AR(1) model with heteroskedasticity is

yt = yt1 + tzt2t = 0 + 1wt

zt iidN(0, 1) ,

where = {, 0, 1} and wt is an explanatory variable. The distribution


of yt conditional on yt1 and wt is

f(y | yt1, wt; ) = 12pi2t

exp

[(y yt1)

2

22t

],

which is a normal distribution with conditional mean yt1 and conditionalvariance 0 + 1wt. For this model, the distribution shifts because of the

dependence on yt1 and the spread of the distribution changes because ofwt. These features are highlighted in panel (c) of Figure 1.2 with = 0.8,

0 = 0.8, 1 = 0.8, wt is defined as a uniform random number on the unit

interval and the initial value is y1 = 0.

Example 1.8 Autoregressive Conditional Heteroskedasticity

The autoregressive conditional heteroskedasticity (ARCH) class of models

is a special case of the heteroskedastic regression model where wt in Example

1.7 is expressed in terms of lagged values of the disturbance term squared.

An example of a regression model as in Example 1.3 with ARCH is

yt = xt + ut

ut = tzt

2t = 0 + 1u2t1

zt iidN(0, 1),

where xt is an explanatory variable and = {, 0, 1}. The distributionof y conditional on yt1, xt and xt1 is

f (y | yt1, xt, xt1; ) = 12pi(0 + 1 (yt1 xt1)2

) exp

(y xt)22(0 + 1 (yt1 xt1)2

) .

For this model, a large shock, represented by a large value of ut, results in

an increased variance in the next period if 1 > 0. The distribution from

which yt is drawn in the next period will therefore have a larger variance.

The distribution of this model is shown in panel (d) of Figure 1.2 with = 3,

0 = 0.8, 1 = 0.8 and xt = {0, 1, 2, 3, 4}.

1.3 Joint Probability Distributions

The motivating examples of the previous section focus on the distribution

of yt at time t which is generally a function of its own lags and the current


and lagged values of explanatory variables xt. The derivation of the maxi-

mum likelihood estimator of the model parameters requires using all of the

information t = 1, 2, , T by defining the joint probability density function(pdf). In the case where both yt and xt are stochastic, the joint probability

pdf for a sample of T observations is

f(y1, y2, , yT , x1, x2, , xT ;) , (1.1)where is a vector of parameters. An important feature of the previous

examples is that yt depends on the explanatory variable xt. To capture this

conditioning, the joint distribution in (1.1) is expressed as

f(y1, y2, , yT , x1, x2, , xT ;) = f(y1, y2, , yT |x1, x2, , xT ;) f(x1, x2, , xT ;) , (1.2)

where the first term on the right hand side of (1.2) represents the conditional

distribution of {y1, y2, , yT } on {x1, x2, , xT } and the second term isthe marginal distribution of {x1, x2, , xT }. Assuming that the parametervector can be decomposed into {, x} such that expression (1.2) becomesf(y1, y2, , yT , x1, x2, , xT ;) = f(y1, y2, , yT |x1, x2, , xT ; )

f(x1, x2, , xT ; x) . (1.3)In these circumstances, the maximum likelihood estimation of the parame-

ters is based on the conditional distribution without loss of information

from the exclusion of the marginal distribution f(x1, x2, , xT ; x).The conditional distribution on the right hand side of expression (1.3)

simplifies further in the presence of additional restrictions.

Independent and identically distributed (iid)

In the simplest case, {y1, y2, , yT } is independent of {x1, x2, , xT } andyt is iid with density function f(y; ). The conditional pdf in equation (1.3)

is then

f(y1, y2, , yT |x1, x2, , xT ; ) =Tt=1

f(yt; ) . (1.4)

Examples of this case are the time invariant model (Example 1.1) and the

count model (Example 1.2).

If both yt and xt are iid and yt is dependent on xt then the decomposition

in equation (1.3) implies that inference can be based on

f(y1, y2, , yT |x1, x2, , xT ; ) =Tt=1

f(yt |xt; ) . (1.5)


Examples include the regression models in Examples 1.3 and 1.4 if sampling

is iid.

Dependent

Now assume that {y1, y2, , yT } depends on its own lags but is independentof the explanatory variable {x1, x2, , xT }. The joint pdf is expressed asa sequence of conditional distributions where conditioning is based on lags

of yt. By using standard rules of probability the distributions for the first

three observations are, respectively,

f(y1; ) = f(y1; )

f(y1, y2 ; ) = f(y2|y1; )f(y1; )f(y1, y2, y3; ) = f(y3|y2, y1; )f(y2|y1; )f(y1; ) ,

where y1 is the initial value with marginal probability density

Extending this sequence to a sample of T observations, yields the joint

pdf

f(y1, y2, , yT ; ) = f(y1 ; )Tt=2

f(yt|yt1, yt2, , y1; ) . (1.6)

Examples of this general case are the AR model (Example 1.5), the bilinear

model (Example 1.6) and the ARCH model (Example 1.8). Extending the

model to allow for dependence on explanatory variables, xt, gives

f(y1, y2, , yT |x1, x2, , xT ; ) =

f(y1 |x1; )Tt=2

f(yt|yt1, yt2, , y1, xt, xt1, x1; ) . (1.7)

An example is the autoregressive model with heteroskedasticity (Example

1.7).


The joint pdf for the AR(1) model in Example 1.5 is

f(y1, y2, , yT ; ) = f(y1; )Tt=2

f(yt|yt1; ) ,

where the conditional distribution is

f (yt|yt1; ) = 12pi2

exp

[(yt yt1)

2

22

],


and the marginal distribution is

f (y1; ) =1

2pi2/ (1 2) exp[ y

21

22/ (1 2)].

Non-stochastic explanatory variables

In the case of non-stochastic explanatory variables, because xt is determin-

istic its probability mass is degenerate. Explanatory variables of this form

are also referred to as fixed in repeated samples. The joint probability in

expression (1.3) simplifies to

f(y1, y2, , yT , x1, x2, , xT ;) = f(y1, y2, , yT |x1, x2, , xT ; ) .

Now = and there is no potential loss of information from using the

conditional distribution to estimate .

1.4 Maximum Likelihood Framework

As emphasized previously, a time series of data represents the observed

realization of draws from a joint pdf. The maximum likelihood principle

makes use of this result by providing a general framework for estimating the

unknown parameters, , from the observed time series data, {y1, y2, , yT }.

1.4.1 The Log-Likelihood Function

The standard interpretation of the joint pdf in (1.7) is that f is a function of

yt for given parameters, . In defining the maximum likelihood estimator this

interpretation is reversed, so that f is taken as a function of for given yt.

The motivation behind this change in the interpretation of the arguments of

the pdf is to regard {y1, y2, , yT } as a realized data set which is no longerrandom. The maximum likelihood estimator is then obtained by finding the

value of which is most likely to have generated the observed data. Here

the phrase most likely is loosely interpreted in a probability sense.

It is important to remember that the likelihood function is simply a re-

definition of the joint pdf in equation (1.7). For many problems it is simpler

to work with the logarithm of this joint density function. The log-likelihood


function is defined as

lnLT () =1

Tln f(y1 |x1; )

+1

T

Tt=2

ln f(yt|yt1, yt2, , y1, xt, xt1, x1; ) ,(1.8)

where the change of status of the arguments in the joint pdf is highlighted

by making the sole argument of this function and the T subscript indicates

that the log-likelihood is an average over the sample of the logarithm of the

density evaluated at yt. It is worth emphasizing that the term log-likelihood

function, used here without any qualification, is also known as the average

log-likelihood function. This convention is also used by, among others, Newey

and McFadden (1994) and White (1994). This definition of the log-likelihood

function is consistent with the theoretical development of the properties of

maximum likelihood estimators discussed in Chapter 2, particularly Sections

2.3 and 2.5.1.

For the special case where yt is iid, the log-likelihood function is based on

the joint pdf in (1.4) and is

lnLT () =1

T

Tt=1

ln f(yt; ) .

In all cases, the log-likelihood function, lnLT (), is a scalar that represents

a summary measure of the data for given .

The maximum likelihood estimator of is defined as that value of , de-

noted , that maximizes the log-likelihood function. In a large number of

cases, this may be achieved using standard calculus. Chapter 3 discusses nu-

merical approaches to the problem of finding maximum likelihood estimates

when no analytical solutions exist, or are difficult to derive.

Example 1.10 Poisson Distribution

Let {y1, y2, , yT } be iid observations from a Poisson distribution

f(y; ) =y exp[]

y!,

where > 0. The log-likelihood function for the sample is

lnLT () =1

T

Tt=1

ln f(yt; ) =1

T

Tt=1

yt ln ln(y1!y2! yT !)T

.

Consider the following T = 3 observations, yt = {8, 3, 4}. The log-likelihood


function is

lnLT () =15

3ln ln(8!3!4!)

3= 5 ln 5.191 .

A plot of the log-likelihood function is given in panel (a) of Figure 1.3 for

values of ranging from 0 to 10. Even though the Poisson distribution is

a discrete distribution in terms of the random variable y, the log-likelihood

function is continuous in the unknown parameter . Inspection shows that

a maximum occurs at = 5 with a log-likelihood value of

lnLT (5) = 5 ln 5 5 5.191 = 2.144 .The contribution to the log-likelihood function at the first observation y1 =

8, evaluated at = 5 is

ln f(y1; 5) = y1 ln 5 5 ln(y1!) = 8 ln 5 5 ln(8!) = 2.729 .For the other two observations, the contributions are ln f(y2; 5) = 1.963,ln f(y3; 5) = 1.740. The probabilities f(yt; ) are between 0 and 1 by def-inition and therefore all of the contributions are negative because they are

computed as the logarithm of f(yt; ). The average of these T = 3 contri-

butions is lnLT (5) = 2.144, which corresponds to the value already givenabove. A plot of ln f(yt; 5) in panel (b) of Figure 1.3 shows that observations

closer to = 5 have a relatively greater contribution to the log-likelihood

function than observations further away in the sense that they are smaller

negative numbers.

Example 1.11 Exponential Distribution

Let {y1, y2, , yT } be iid drawings from an exponential distributionf(y; ) = exp[y] ,

where > 0. The log-likelihood function for the sample is

lnLT () =1

T

Tt=1

ln f(yt; ) =1

T

Tt=1

(ln yt) = ln 1T

Tt=1

yt .

Consider the following T = 6 observations, yt = {2.1, 2.2, 3.1, 1.6, 2.5, 0.5}.The log-likelihood function is

lnLT () = ln 1T

Tt=1

yt = ln 2 .

Plots of the log-likelihood function, lnLT (), and the likelihood LT ()

functions are given in Figure 1.4, which show that a maximum occurs at


(a) Log-likelihood functionlnLT()

(b) Log-density function

lnf(yt;5)

yt1 2 3 4 5 6 7 8 9 100 5 10 15

-3

-2.5

-2

-1.5

-1

-0.5

0-30

-25

-20

-15

-10

-5

0

Figure 1.3 Plot of lnLT () and and ln f(yt; = 5) for the Poisson distri-bution example with a sample size of T = 3.

(a) Log-likelihood function

lnLT()

(b) Likelihood function

LT()105

0 1 2 30 1 2 3

0.5

1

1.5

2

2.5

3

3.5

4

-40

-35

-30

-25

-20

-15

-10

Figure 1.4 Plot of lnLT () for the exponential distribution example.

= 0.5. Table 1.2 provides details of the calculations. Let the log-likelihood

function at each observation evaluated at the maximum likelihood estimate

be denoted ln lt() = ln f(yt; ). The second column shows ln lt() evaluated

at = 0.5

ln lt(0.5) = ln(0.5) 0.5yt ,resulting in a maximum value of the log-likelihood function of

lnLT (0.5) =1

6

6t=1

ln lt(0.5) =10.159

6= 1.693 .


Table 1.2Maximum likelihood calculations for the exponential distribution example. The

maximum likelihood estimate is T = 0.5.

yt ln lt(0.5) gt(0.5) ht(0.5)

2.1 -1.743 -0.100 -4.0002.2 -1.793 -0.200 -4.0003.1 -2.243 -1.100 -4.0001.6 -1.493 0.400 -4.0002.5 -1.943 -0.500 -4.0000.5 -0.943 1.500 -4.000

lnLT (0.5) = 1.693 GT (0.5) = 0.000 HT (0.5) = 4.000

Example 1.12 Normal Distribution

Let {y1, y2, , yT } be iid observations drawn from a normal distribution

f(y; ) =12pi2

exp

[(y )

2

22

],

with unknown parameters ={, 2

}. The log-likelihood function is

lnLT () =1

T

Tt=1

ln f(yt; )

=1

T

Tt=1

( 12ln 2pi 1

2ln2 (yt )

2

22

)= 1

2ln 2pi 1

2ln2 1

22T

Tt=1

(yt )2.

Consider the following T = 6 observations, yt = {5,1, 3, 0, 2, 3}. Thelog-likelihood function is

lnLT () = 12ln 2pi 1

2ln2 1

122

6t=1

(yt )2 .

A plot of this function in Figure 1.5 shows that a maximum occurs at = 2

and 2 = 4.



PSfrag

2

lnLT(,2)

11.5

22.5

3

33.5

44.5

5

Figure 1.5 Plot of lnLT () for the normal distribution example.

From Example 1.9, the log-likelihood function for the AR(1) model is

lnLT () =1

T

(1

2ln(1 2) 1

22(1 2) y21)

12ln 2pi 1

2ln2 1

22T

Tt=2

(yt yt1)2 .

The first term is commonly excluded from lnLT () as its contribution dis-

appears asymptotically since

limT

1

T

(1

2ln(1 2) 1

22(1 2) y21) = 0 .

As the aim of maximum likelihood estimation is to find the value of that

maximizes the log-likelihood function, a natural way to do this is to use the

rules of calculus. This involves computing the first derivatives and second

derivatives of the log-likelihood function with respect to the parameter vec-

tor .


1.4.2 Gradient

Differentiating lnLT (), with respect to a (K1) parameter vector, , yieldsa (K 1) gradient vector, also known as the score, given by

GT () = lnLT ()

=

lnLT ()

1 lnLT ()

2...

lnLT ()

K

=

1

T

Tt=1

gt() , (1.9)

where the subscript T emphasizes that the gradient is the sample average

of the individual gradients

gt() = ln lt()

.

The maximum likelihood estimator of , denoted , is obtained by setting

the gradients equal to zero and solving the resultantK first-order conditions.

The maximum likelihood estimator, , therefore satisfies the condition

GT () = lnLT ()

=

= 0 . (1.10)


From Example 1.10, the first derivative of lnLT () with respect to is

GT () =1

T

Tt=1

yt 1 .

The maximum likelihood estimator is the solution of the first-order condition

1

T

Tt=1

yt 1 = 0 ,

which yields the sample mean as the maximum likelihood estimator

=1

T

Tt=1

yt = y .

Using the data for yt in Example 1.10, the maximum likelihood estimate is

= 15/3 = 5. Evaluating the gradient at = 5 verifies that it is zero at the


maximum likelihood estimate

GT () =1

T

Tt=1

yt 1 = 153 5 1 = 0 .


From Example 1.11, the first derivative of lnLT () with respect to is

GT () =1

1T

Tt=1

yt .

Setting GT () = 0 and solving the resultant first-order condition yields

=TTt=1 yt

=1

y,

which is the reciprocal of the sample mean. Using the same observed data for

yt as in Example 1.11, the maximum likelihood estimate is = 6/12 = 0.5.

The third column of Table 1.2 gives the gradients at each observation

evaluated at = 0.5

gt(0.5) =1

0.5 yt .

The gradient is

GT (0.5) =1

6

6t=1

gt(0.5) = 0 ,

which follows from the properties of the maximum likelihood estimator.


From Example 1.12, the first derivatives of the log-likelihood function are

lnLT ()

=

1

2T

Tt=1

(yt) , lnLT ()(2)

= 122

+1

24T

Tt=1

(yt)2 ,

yielding the gradient vector

GT () =

1

2T

Tt=1

(yt )

122

+1

24T

Tt=1

(yt )2

.


Evaluating the gradient at and setting GT () = 0, gives

GT () =

1

2T

Tt=1

(yt )

122

+1

24T

Tt=1

(yt )2

= 00

.

Solving for = {, 2}, the maximum likelihood estimators are

=1

T

Tt=1

yt = y , 2 =

1

T

Tt=1

(yt y)2 .

Using the data from Example 1.12, the maximum likelihood estimates are

=5 1 + 3 + 0 + 2 + 3

6= 2

2 =(5 2)2 + (1 2)2 + (3 2)2 + (0 2)2 + (2 2)2 + (3 2)2

6= 4 ,

which agree with the values given in Example 1.12.

1.4.3 Hessian

To establish that maximizes the log-likelihood function, it is necessary to

determine that the Hessian

HT () =2 lnLT ()

, (1.11)

associated with the log-likelihood function is negative definite. As is a

(K 1) vector, the Hessian is the (K K) symmetric matrix

HT () =

2 lnLT ()

11

2 lnLT ()

12. . .

2 lnLT ()

1K

2 lnLT ()

21

2 lnLT ()

22. . .

2 lnLT ()

2K

......

......

2 lnLT ()

K1

2 lnLT ()

K2. . .

2 lnLT ()

KK

=

1

T

Tt=1

ht() ,


where the subscript T emphasizes that the Hessian is the sample average of

the individual elements

ht() =2 ln lt()

.

The second-order condition for a maximum requires that the Hessian matrix

evaluated at ,

HT () =2 lnLT ()

=

, (1.12)

is negative definite. The conditions for negative definiteness are

|H11| < 0, H11 H12H21 H22

> 0,H11 H12 H13H21 H22 H23H31 H32 H33

< 0, where Hij is the ij

th element of HT (). In the case of K = 1, the condition

is

H11 < 0 . (1.13)

For the case of K = 2, the condition is

H11 < 0, H11H22 H12H21 > 0 . (1.14)


From Examples 1.10 and 1.14, the second derivative of lnLT () with re-

spect to is

HT () = 12T

Tt=1

yt .

Evaluating the Hessian at the maximum likelihood estimator, = y, yields

HT () = 12T

Tt=1

yt = 1y2T

Tt=1

yt = 1y< 0 .

As y is always positive because it is the mean of a sample of positive integers,

the Hessian is negative and a maximum is achieved. Using the data for ytin Example 1.10, verifies that the Hessian at = 5 is negative

HT () = 12T

Tt=1

yt = 1552 3 = 0.200 .



From Examples 1.11 and 1.15, the second derivative of lnLT () with re-

spect to is

HT () = 12.

Evaluating the Hessian at the maximum likelihood estimator yields

HT () = 12

< 0 .

As this term is negative for any , the condition in equation (1.13) is satisfied

and a maximum is achieved. The last column of Table 1.2 shows that the

Hessian at each observation evaluated at the maximum likelihood estimate

is constant. The value of the Hessian is

HT (0.5) =1

6

6t=1

ht(0.5) =24.000

6= 4 ,

which is negative confirming that a maximum has been reached.


From Examples 1.12 and 1.16, the second derivatives of lnLT () with

respect to are

2 lnLT ()

2= 1

2

2 lnLT ()

2= 1

4T

Tt=1

(yt )

2 lnLT ()

(2)2=

1

24 16T

Tt=1

(yt )2 ,

so that the Hessian is

HT () =

12

14T

Tt=1

(yt )

14T

Tt=1

(yt ) 124

16T

Tt=1

(yt )2

.

Given that GT () = 0, from Example 1.16 it follows thatT

t=1(yt ) = 0

1.5 Applications 23

and therefore

HT () =

1

20

0 124

.From equation (1.14)

H11 = T2

< 0, H11H22 H12H21 = ( T2)( T

24) 02 > 0 ,

establishing that the second-order condition for a maximum is satisfied.

Using the maximum likelihood estimates from Example 1.16, the Hessian is

HT (, 2) =

1

40

0 12 42

= 0.250 0.000

0.000 0.031

.

1.5 Applications

To highlight the features of maximum likelihood estimation discussed thus

far, two applications are presented that focus on estimating the discrete time

version of the Vasicek (1977) model of interest rates, rt. The first application

is based on the marginal (stationary) distribution while the second focuses

on the conditional (transitional) distribution that gives the distribution of rtconditional on rt1. The interest rate data used are from At-Sahalia (1996).The data, plotted in Figure 1.6, consists of daily 7-day Eurodollar rates

(expressed as percentages) for the period 1 June 1973 to the 25 February

1995, a total of T = 5505 observations.

The Vasicek model expresses the change in the interest rate, rt, as a

function of a constant and the lagged interest rate

rt rt1 = + rt1 + utut iidN

(0, 2

),

(1.15)

where = {, , 2} are unknown parameters, with the restriction < 0.

1.5.1 Stationary Distribution of the Vasicek Model

As a preliminary step to estimating the parameters of the Vasicek model in

equation (1.15), consider the alternative model where the level of the interest


%

t1975 1980 1985 1990 1995

4

8

12

16

20

24

Figure 1.6 Daily 7-day Eurodollar interest rates from the 1 June 1973 to25 February 1995 expressed as a percentage.

rate is independent of previous interest rates

rt = s + vt , vt iidN(0, 2s ) .

The stationary distribution of rt for this model is

f(r;s, 2s) =

12pi2s

exp

[(r s)

2

22s

]. (1.16)

The relationship between the parameters of the stationary distribution and

the parameters of the model in equation (1.15) is

s = , 2s =

2

(2 + ). (1.17)

which are obtained as the unconditional mean and variance of (1.15).

The log-likelihood function based on the stationary distribution in equa-

tion (1.16) for a sample of T observations is


2ln2s

1

22sT

Tt=1

(rt s)2 ,

where = {s, 2s}. Maximizing lnLT () with respect to gives

s =1

T

Tt=1

rt , 2s =

1

T

Tt=1

(rt s)2 . (1.18)

Using the Eurodollar interest rates, the maximum likelihood estimates are

s = 8.362, 2s = 12.893. (1.19)

1.5 Applications 25

f(r)

Interest Rate-5 0 5 10 15 20 25

Figure 1.7 Estimated stationary distribution of the Vasicek model based onevaluating (1.16) at the maximum likelihood estimates (1.19), using dailyEurodollar rates from the 1 June 1973 to 25 February 1995.

The stationary distribution is estimated by evaluating equation (1.16) at

the maximum likelihood estimates in (1.19) and is given by

f(r; s,

2s

)=

12pi2s

exp

[(r s)

2

22s

]

=1

2pi 12.893 exp[(r 8.362)

2

2 12.893

], (1.20)

which is presented in Figure 1.7.

Inspection of the estimated distribution shows a potential problem with

the Vasicek stationary distribution, namely that the support of the distri-

bution is not restricted to being positive. The probability of negative values

for the interest rate is

Pr (r < 0) =

0

12pi 12.893 exp

[(r 8.362)

2

2 12.893

]dr = 0.01 .

To avoid this problem, alternative models of interest rates are specified where

the stationary distribution is just defined over the positive region. A well

known example is the CIR interest rate model (Cox, Ingersoll and Ross,

1985) which is discussed in Chapters 2, 3 and 12.

1.5.2 Transitional Distribution of the Vasicek Model

In contrast to the stationary model specification of the previous section,

the full dynamics of the Vasicek model in equation (1.15) are now used by


specifying the transitional distribution

f(r | rt1;, , 2

)=

12pi2

exp

[(r rt1)

2

22

], (1.21)

where ={, , 2

}and the substitution = 1+ is made for convenience.

This distribution is now of the same form as the conditional distribution of

the AR(1) model in Examples 1.5, 1.9 and 1.13.

The log-likelihood function based on the transitional distribution in equa-

tion (1.21) is


2ln2 1

22(T 1)Tt=2

(rt rt1)2 ,

where the sample size is reduced by one observation as a result of the lagged

term rt1. This form of the log-likelihood function does not contain themarginal distribution f(r1; ), a point that is made in Example 1.13. The

first derivatives of the log-likelihood function are

lnL()

=

1

2(T 1)Tt=2

(rt rt1)

lnL()

=

1

2(T 1)Tt=2

(rt rt1)rt1

lnL()

(2)= 1

22+

1

24(T 1)Tt=2

(rt rt1)2 .

Setting these derivatives to zero yields the maximum likelihood estimators

= rt rt1

=

Tt=2

(rt rt)(rt1 rt1)

Tt=2

(rt1 rt1)2

2 =1

T 1Tt=2

(rt rt1)2 ,

where

rt =1

T 1Tt=2

rt , rt1 =1

T 1Tt=2

rt1 .

1.5 Applications 27

The maximum likelihood estimates for the Eurodollar interest rates are

= 0.053, = 0.994, 2 = 0.165. (1.22)

An estimate of is obtained by using the relationship = 1+. Rearranging

for and evaluating at gives = 1 = 0.006.The estimated transitional distribution is obtained by evaluating (1.21)

at the maximum likelihood estimates in (1.22)

f(r | rt1; , , 2

)=

12pi2

exp

[(r rt1)

2

22

]. (1.23)

Plots of this distribution are given in Figure 1.8 for three values of the

conditioning variable rt1, corresponding to the minimum (2.9%), median(8.1%) and maximum (24.3%) interest rates in the sample.

f(r)

r0 5 10 15 20 25 30

.

Figure 1.8 Estimated transitional distribution of the Vasicek model, basedon evaluating (1.23) at the maximum likelihood estimates in (1.22) usingEurodollar rates from 1 June 1973 to 25 February 1995. The dashed line isthe transitional density for the minimum (2.9%), the solid line is the transi-tional density for the median (8.1%) and the dotted line is the transitionaldensity for the maximum (24.3%) Eurodollar rate.

The location of the three transitional distributions changes over time,

while the spread of each distribution remains constant at 2 = 0.165. A

comparison of the estimates of the variances of the stationary and transi-

tional distributions, in equations (1.19) and (1.22), respectively, shows that

2 < 2s . This result is a reflection of the property that by conditioning

on information, in this case rt1, the transitional distribution is better attracking the time series behaviour of the interest rate, rt, than the stationary

distribution where there is no conditioning on lagged dependent variables.


Having obtained the estimated transitional distribution using the maxi-

mum likelihood estimates in (1.22), it is also possible to use these estimates

to reestimate the stationary interest rate distribution in (1.20) by using the

expressions in (1.17). The alternative estimates of the mean and variance of

the stationary distribution are

s = =

0.053

0.006= 8.308,

2s = 2

(2 +

) = 0.1650.006 (2 0.006) = 12.967 .

As these estimates are based on the transitional distribution, which incorpo-

rates the full dynamic specification of the Vasicek model, they represent the

maximum likelihood estimates of the parameters of the stationary distribu-

tion. This relationship between the maximum likelihood estimators of the

transitional and stationary distributions is based on the invariance property

of maximum likelihood estimators which is discussed in Chapter 2. While

the parameter estimates of the stationary distribution using the estimates

of the transitional distribution are numerically close to estimates obtained

in the previous section, the latter estimates are obtained from a misspecified

model as the stationary model excludes the dynamic structure in equation

(1.15). Issues relating to misspecified models are discussed in Chapter 9.

1.6 Exercises

(1) Sampling Data

Gauss file(s) basic_sample.g

Matlab file(s) basic_sample.m

This exercise reproduces the simulation results in Figures 1.1 and 1.2.

For each model, simulate T = 5 draws of yt and plot the corresponding

distribution at each point in time. Where applicable the explanatory

variable in these exercises is xt = {0, 1, 2, 3, 4} and wt are draws from auniform distribution on the unit circle.

(a) Time invariant model

yt = 2zt , zt iidN(0, 1) .

(b) Count model

f (y; 2) =2y exp[2]

y!, y = 1, 2, .

1.6 Exercises 29

(c) Linear regression model

yt = 3xt + 2zt , zt iidN(0, 1) .

(d) Exponential regression model

f(y; ) =1

texp

[ yt

], t = 1 + 2xt .

(e) Autoregressive model

yt = 0.8yt1 + 2zt , zt iidN(0, 1) .

(f) Bilinear time series model

yt = 0.8yt1 + 0.4yt1ut1 + 2zt , zt iidN(0, 1) .

(g) Autoregressive model with heteroskedasticity

yt = 0.8yt1 + tzt , zt iidN(0, 1)

2t = 0.8 + 0.8wt .

(h) The ARCH regression model

yt = 3xt + ut

ut = tzt

2t = 4 + 0.9u2t1

zt iidN(0, 1) .

(2) Poisson Distribution

Gauss file(s) basic_poisson.g

Matlab file(s) basic_poisson.m

A sample of T = 4 observations, yt = {6, 2, 3, 1}, is drawn from thePoisson distribution

f(y; ) =y exp[]

y!.

(a) Write the log-likelihood function, lnLT ().

(b) Derive and interpret the maximum likelihood estimator, .

(c) Compute the maximum likelihood estimate, .

(d) Compute the log-likelihood function at for each observation.

(e) Compute the value of the log-likelihood function at .


(f) Compute

gt() =d ln lt()

d

=

and ht() =d2 ln lt()

d2

=

,

for each observation.

(g) Compute

GT () =1

T

Tt=1

gt() and HT () =1

T

Tt=1

ht() .

(3) Exponential Distribution

Gauss file(s) basic_exp.g

Matlab file(s) basic_exp.m

A sample of T = 4 observations, yt = {5.5, 2.0, 3.5, 5.0}, is drawn fromthe exponential distribution

f(y; ) = exp[y] .(a) Write the log-likelihood function, lnLT ().

(b) Derive and interpret the maximum likelihood estimator, .

(c) Compute the maximum likelihood estimate, .

(d) Compute the log-likelihood function at for each observation.

(e) Compute the value of the log-likelihood function at .

(f) Compute

gt() =d ln lt()

d

=

and ht() =d2 ln lt()

d2

=

,

for each observation.

(g) Compute

GT () =1

T

Tt=1

gt() and HT () =1

T

Tt=1

ht() .

(4) Alternative Form of Exponential Distribution

Consider a random sample of size T , {y1, y2, , yT }, of iid randomvariables from the exponential distribution with parameter

f(y; ) =1

exp

[y

].

(a) Derive the log-likelihood function, lnLT ().

(b) Derive the first derivative of the log-likelihood function, GT ().

1.6 Exercises 31

(c) Derive the second derivative of the log-likelihood function, HT ().

(d) Derive the maximum likelihood estimator of . Compare the result

with that obtained in Exercise 3.

(5) Normal Distribution

Gauss file(s) basic_normal.g, basic_normal_like.g

Matlab file(s) basic_normal.m, basic_normal_like.m

A sample of T = 5 observations consisting of the values {1, 2, 5, 1, 2} isdrawn from the normal distribution

f(y; ) =12pi2

exp

[(y )

2

22

],

where = {, 2}.(a) Assume that 2 = 1.

(i) Derive the log-likelihood function, lnLT ().

(ii) Derive and interpret the maximum likelihood estimator, .

(iii) Compute the maximum likelihood estimate, .

(iv) Compute ln lt(), gt() and ht().

(v) Compute lnLT (), GT () and HT ().

(b) Repeat part (a) for the case where both the mean and the variance

are unknown, = {, 2}.

(6) A Model of the Number of Strikes

Gauss file(s) basic_count.g, strike.dat

Matlab file(s) basic_count.m, strike.mat

The data are the number of strikes per annum, yt, in the U.S. from 1968

to 1976, taken from Kennan (1985). The number of strikes is specified

as a Poisson-distributed random variable with unknown parameter

f (y; ) =y exp[]

y!.

(a) Write the log-likelihood function for a sample of T observations.

(b) Derive and interpret the maximum likelihood estimator of .

(c) Estimate and interpret the result.

(d) Use the estimate from part (c), to plot the distribution of the number

of strikes and interpret this plot.


(e) Compute a histogram of yt and comment on its consistency with

the distribution of strike numbers estimated in part (d).

(7) A Model of the Duration of Strikes

Gauss file(s) basic_strike.g, strike.dat

Matlab file(s) basic_strike.m, strike.mat

The data are 62 observations, taken from the same source as Exercise

6, of the duration of strikes in the U.S. per annum expressed in days,

yt. Durations are assumed to be drawn from an exponential distribution

with unknown parameter

f (y; ) =1

exp

[y

].

(a) Write the log-likelihood function for a sample of T observations.

(b) Derive and interpret the maximum likelihood estimator of .

(c) Use the data on strike durations to estimate . Interpret the result.

(d) Use the estimates from part (c) to plot the distribution of strike

durations and interpret this plot.

(e) Compute a histogram of yt and comment on its consistency with

the distribution of duration times estimated in part (d).

(8) Asset Prices

Gauss file(s) basic_assetprices.g, assetprices.xls

Matlab file(s) basic_assetprices.m, assetprices.mat

The data consist of the Australian, Singapore and NASDAQ stock mar-

ket indexes for the period 3 January 1989 to 31 December 2009, a total

of T = 5478 observations. Consider the following model of asset prices,

pt, that is commonly adopted in the financial econometrics literature

ln pt ln pt1 = + ut , ut iidN(0, 2) ,where = {, 2} are unknown parameters.(a) Use the transformation of variable technique to show that the con-

ditional distribution of p is the log-normal distribution

f (p | pt1; ) = 12pi2p

exp

[ ln p ln pt1

22

].

(b) For a sample of size T , construct the log-likelihood function and de-

rive the maximum likelihood estimator of based on the conditional

distribution of p.

1.6 Exercises 33

(c) Use the results in part (b) to compute for the three stock indexes.

(d) Estimate the asset price distribution for each index using the max-

imum likelihood parameter estimates obtained in part (c).

(e) Letting rt = ln pt ln pt1 represent the return on an asset, derivethe maximum likelihood estimator of based on the distribution of

rt. Compute for the three stock market indexes and compare the

estimates to those obtained in part (c).

(9) Stationary Distribution of the Vasicek Model

Gauss file(s) basic_stationary.g, eurodata.dat

Matlab file(s) basic_stationary.m, eurodata.mat

The data are daily 7-day Eurodollar rates, expressed as percentages,

from 1 June 1973 to the 25 February 1995, a total of T = 5505 observa-

tions. The Vasicek discret

MARTIN, V. L.; A. S. HURN AND D. HARRIS. Econometic Modelling with Time Series_Specification,...

Documents

Transcript of MARTIN, V. L.; A. S. HURN AND D. HARRIS. Econometic Modelling with Time Series_Specification,...