MARTIN, V. L.; A. S. HURN AND D. HARRIS. Econometic Modelling with Time Series_Specification,...
-
Upload
gabriel-couto -
Category
Documents
-
view
361 -
download
7
Transcript of MARTIN, V. L.; A. S. HURN AND D. HARRIS. Econometic Modelling with Time Series_Specification,...
-
Econometric Modelling with Time Series
Specification, Estimation and Testing
V. L. Martin, A. S. Hurn and D. Harris
-
iv
Preface
This book provides a general framework for specifying, estimating and test-
ing time series econometric models. Special emphasis is given to estima-
tion by maximum likelihood, but other methods are also discussed includ-
ing quasi-maximum likelihood estimation, generalized method of moments,
nonparametrics and estimation by simulation. An important advantage of
adopting the principle of maximum likelihood as the unifying framework for
the book is that many of the estimators and test statistics proposed in econo-
metrics can be derived within a likelihood framework, thereby providing a
coherent vehicle for understanding their properties and interrelationships.
In contrast to many existing econometric textbooks, which deal mainly
with the theoretical properties of estimators and test statistics through a
theorem-proof presentation, this book is very concerned with implemen-
tation issues in order to provide a fast-track between the theory and ap-
plied work. Consequently many of the econometric methods discussed in
the book are illustrated by means of a suite of programs written in GAUSS
and MATLABR.1 The computer code emphasizes the computational side of
econometrics and follows the notation in the book as closely as possible,
thereby reinforcing the principles presented in the text. More generally, the
computer code also helps to bridge the gap between theory and practice
by enabling the reproduction of both theoretical and empirical results pub-
lished in recent journal articles. The reader, as a result, may build on the
code and tailor it to more involved applications.
Organization of the Book
Part ONE of the book is an exposition of the basic maximum likelihood
framework. To implement this approach, three conditions are required: the
probability distribution of the stochastic process must be known and spec-
ified correctly, the parametric specifications of the moments of the distri-
bution must be known and specified correctly, and the likelihood must be
tractable. The properties of maximum likelihood estimators are presented
and three fundamental testing procedures namely, the Likelihood Ratio
test, the Wald test and the Lagrange Multiplier test are discussed in detail.
There is also a comprehensive treatment of iterative algorithms to compute
maximum likelihood estimators when no analytical expressions are available.
Part TWO is the usual regression framework taught in standard econo-
metric courses but presented within the maximum likelihood framework.1 GAUSS is a registered trademark of Aptech Systems, Inc. http://www.aptech.com/ andMATLABR is a registered trademark of The MathWorks, Inc. http://www.mathworks.com/.
-
vBoth nonlinear regression models and non-spherical models exhibiting ei-
ther autocorrelation or heteroskedasticity, or both, are presented. A further
advantage of the maximum likelihood strategy is that it provides a mecha-
nism for deriving new estimators and new test statistics, which are designed
specifically for non-standard problems.
Part THREE provides a coherent treatment of a number of alternative es-
timation procedures which are applicable when the conditions to implement
maximum likelihood estimation are not satisfied. For the case where the
probability distribution is incorrectly specified, quasi-maximum likelihood
is appropriate. If the joint probability distribution of the data is treated as
unknown, then a generalized method of moments estimator is adopted. This
estimator has the advantage of circumventing the need to specify the dis-
tribution and hence avoids any potential misspecification from an incorrect
choice of the distribution. An even less restrictive approach is not to specify
either the distribution or the parametric form of the moments of the distri-
bution and use nonparametric procedures to model either the distribution
of variables or the relationships between variables. Simulation estimation
methods are used for models where the likelihood is intractable arising, for
example, from the presence of latent variables. Indirect inference, efficient
methods of moments and simulated methods of moments are presented and
compared.
Part FOUR examines stationary time series models with a special empha-
sis on using maximum likelihood methods to estimate and test these models.
Both single equation models, including the autoregressive moving average
class of models, and multiple equation models, including vector autoregres-
sions and structural vector autoregressions, are dealt with in detail. Also
discussed are linear factor models where the factors are treated as latent.
The presence of the latent factor means that the full likelihood is generally
not tractable. However, if the models are specified in terms of the normal
distribution with moments based on linear parametric representations, a
Kalman filter is used to rewrite the likelihood in terms of the observable
variables thereby making estimation and testing by maximum likelihood
feasible.
Part FIVE focusses on nonstationary time series models and in particular
tests for unit roots and cointegration. Some important asymptotic results
for nonstationary time series are presented followed by a comprehensive dis-
cussion of testing for unit roots. Cointegration is tackled from the perspec-
tive that the well-known Johansen estimator may be usefully interpreted
as a maximum likelihood estimator based on the assumption of a normal
distribution applied to a system of equations that is subject to a set of
-
vi
cross-equation restrictions arising from the assumption of common long-run
relationships. Further, the trace and maximum eigenvalue tests of cointegra-
tion are shown to be likelihood ratio tests.
Part SIX is concerned with nonlinear time series models. Models that are
nonlinear in mean include the threshold class of model, bilinear models and
also artificial neural network modelling, which, contrary to many existing
treatments, is again addressed from the econometric perspective of estima-
tion and testing based on maximum likelihood methods. Nonlinearities in
variance are dealt with in terms of the GARCH class of models. The final
chapter focusses on models that deal with discrete or truncated time series
data.
Even in a project of this size and scope, sacrifices have had to be made to
keep the length of the book manageable. Accordingly, there are a number
of important topics that have had to be omitted.
(i) Although Bayesian methods are increasingly being used in many areas
of statistics and econometrics, no material on Bayesian econometrics is
included. This is an important field in its own right and the interested
reader is referred to recent books by Koop (2003), Geweke (2005), Koop,
Poirier and Tobias (2007) and Greenberg (2008), inter alia. Where ap-
propriate, references to Bayesian methods are provided in the body of
the text.
(ii) With great reluctance a chapter on bootstrapping was not included be-
cause of space issues. A good place to start reading is the introductory
text by Efron and Tibshirani (1993) and the useful surveys by Horowitz
(1997) and Li and Maddala (1996b,1996a).
(iii) In Part SIX, in the chapter dealing with modelling the variance of time
series, there are important recent developments in stochastic volatility
and realized volatility that would be worthy of inclusion. For stochastic
volatility, there is an excellent volume of readings edited by Shephard
(2005), while the seminal articles in the area of realized volatility are
Anderson et al. (2001, 2003).
The fact that these areas have not been covered should not be regarded as a
value judgement about their relative importance. Instead the subject matter
chosen for inclusion reflects a balance between the interests of the authors
and purely operational decisions aimed at preserving the flow and continuity
of the book.
-
vii
Computer Code
Specifically, computer code is available from a companion website to repro-
duce relevant examples in the text, to reproduce figures in the text that are
not part of an example, to reproduce the applications presented in the final
section of each chapter, and to complete the exercises. Where applicable,
the time series data used in these examples, applications and exercises are
also available in a number of different formats.
Presenting numerical results in the examples immediately gives rise to two
important issues concerning numerical precision.
(1) In all of the examples listed in the front of the book where computer code
has been used, the numbers appearing in the text are rounded versions of
those generated by the code. Accordingly, the rounded numbers should
be interpreted as such and not be used independently of the computer
code to try and reproduce the numbers reported in the text.
(2) In many of the examples, simulation has been used to demonstrate a
concept. Since GAUSS and MATLAB have different random number gen-
erators, the results generated by the different sets of code will not be
identical to one another. For consistency we have always used the GAUSS
output for reporting purposes.
Although GAUSS and MATLAB are very similar high-level programming
languages, there are some important differences that require explanation.
Probably the most important difference is one of programming style. GAUSS
programs are script files that allow calls to both inbuilt GAUSS and user-
defined procedures. MATLAB, on the other hand, does not support the use
of user-defined functions in script files. Furthermore, MATLAB programming
style favours writing user-defined functions in separate files and then calling
them as if they were in-built functions. This style of programming does not
suit the learning-by-doing environment that the book tries to create. Con-
sequently, the MATLAB programs are written mainly as function files with a
main function and all the required user-defined functions required to im-
plement the procedure in the same file. The only exception to this rule is
that a few MATLAB utility files, which greatly facilitate the conversion and
interpretation of code from GAUSS to MATLAB, which are provided as sep-
arate stand-alone MATLAB function files. Finally, all the figures in the text
were created using MATLAB together with a utility file laprint.m written by
Arno Linnemann of the University of Kessel.2
2 A user guide is available athttp://www.uni-kassel.de/fb16/rat/matlab/laprint/laprintdoc.ps.
-
viii
Acknowledgements
Creating a manuscript of this scope and magnitude is a daunting task and
there are many people to whom we are indebted. In particular, we would
like to thank Kenneth Lindsay, Adrian Pagan and Andy Tremayne for their
careful reading of various chapters of the manuscript and for many helpful
comments and suggestions. Gael Martin helped with compiling a suitable
list of references to Bayesian econometric methods. Ayesha Scott compiled
the index, a painstaking task for a manuscript of this size. Many others
have commented on earlier drafts of chapters and we are grateful to the
following individuals: our colleagues, Gunnar Bardsen, Ralf Becker, Adam
Clements, Vlad Pavlov and Joseph Jeisman; and our graduate students, Tim
Christensen, Christopher Coleman-Fenn, Andrew McClelland, Jessie Wang
and Vivianne Vilar.
We also wish to express our deep appreciation to the team at Cambridge
University Press, particularly Peter C. B. Phillips for his encouragement
and support throughout the long gestation period of the book as well as
for reading and commenting on earlier drafts. Scott Parris, with his energy
and enthusiasm for the project, was a great help in sustaining the authors
during the long slog of completing the manuscript. Our thanks are also due
to our CUP readers who provided detailed and constructive feedback at
various stages in the compilation of the final document. Michael Erkelenz of
Fine Line Writers edited the entire manuscript, helped to smooth out the
prose and provided particular assistance with the correct use of adjectival
constructions in the passive voice.
It is fair to say that writing this book was an immense task that involved
the consumption of copious quantities of chillies, champagne and port over a
protracted period of time. The biggest debt of gratitude we owe, therefore, is
to our respective families. To Gael, Sarah and David; Cath, Iain, Robert and
Tim; and Fiona and Caitlin: thank you for your patience, your good humour
in putting up with and cleaning up after many a pizza night, your stoicism
in enduring yet another vacant stare during an important conversation and,
ultimately, for making it all worthwhile.
Vance Martin, Stan Hurn & David Harris
November 2011
-
Contents
List of illustrations page 1
Computer Code used in the Examples 4
PART ONE MAXIMUM LIKELIHOOD 1
1 The Maximum Likelihood Principle 3
1.1 Introduction 3
1.2 Motivating Examples 3
1.3 Joint Probability Distributions 9
1.4 Maximum Likelihood Framework 12
1.4.1 The Log-Likelihood Function 12
1.4.2 Gradient 18
1.4.3 Hessian 20
1.5 Applications 23
1.5.1 Stationary Distribution of the Vasicek Model 23
1.5.2 Transitional Distribution of the Vasicek Model 25
1.6 Exercises 28
2 Properties of Maximum Likelihood Estimators 35
2.1 Introduction 35
2.2 Preliminaries 35
2.2.1 Stochastic Time Series Models and Their Prop-
erties 36
2.2.2 Weak Law of Large Numbers 41
2.2.3 Rates of Convergence 45
2.2.4 Central Limit Theorems 47
2.3 Regularity Conditions 55
2.4 Properties of the Likelihood Function 57
-
x Contents
2.4.1 The Population Likelihood Function 57
2.4.2 Moments of the Gradient 58
2.4.3 The Information Matrix 61
2.5 Asymptotic Properties 63
2.5.1 Consistency 63
2.5.2 Normality 67
2.5.3 Efficiency 68
2.6 Finite-Sample Properties 72
2.6.1 Unbiasedness 73
2.6.2 Sufficiency 74
2.6.3 Invariance 75
2.6.4 Non-Uniqueness 76
2.7 Applications 76
2.7.1 Portfolio Diversification 78
2.7.2 Bimodal Likelihood 80
2.8 Exercises 82
3 Numerical Estimation Methods 91
3.1 Introduction 91
3.2 Newton Methods 92
3.2.1 Newton-Raphson 93
3.2.2 Method of Scoring 94
3.2.3 BHHH Algorithm 95
3.2.4 Comparative Examples 98
3.3 Quasi-Newton Methods 101
3.4 Line Searching 102
3.5 Optimisation Based on Function Evaluation 104
3.6 Computing Standard Errors 106
3.7 Hints for Practical Optimization 109
3.7.1 Concentrating the Likelihood 109
3.7.2 Parameter Constraints 110
3.7.3 Choice of Algorithm 111
3.7.4 Numerical Derivatives 112
3.7.5 Starting Values 113
3.7.6 Convergence Criteria 113
3.8 Applications 114
3.8.1 Stationary Distribution of the CIR Model 114
3.8.2 Transitional Distribution of the CIR Model 116
3.9 Exercises 118
-
Contents xi
4 Hypothesis Testing 124
4.1 Introduction 124
4.2 Overview 124
4.3 Types of Hypotheses 126
4.3.1 Simple and Composite Hypotheses 126
4.3.2 Linear Hypotheses 127
4.3.3 Nonlinear Hypotheses 128
4.4 Likelihood Ratio Test 129
4.5 Wald Test 133
4.5.1 Linear Hypotheses 134
4.5.2 Nonlinear Hypotheses 136
4.6 Lagrange Multiplier Test 137
4.7 Distribution Theory 139
4.7.1 Asymptotic Distribution of the Wald Statistic 139
4.7.2 Asymptotic Relationships Among the Tests 142
4.7.3 Finite Sample Relationships 143
4.8 Size and Power Properties 145
4.8.1 Size of a Test 145
4.8.2 Power of a Test 146
4.9 Applications 148
4.9.1 Exponential Regression Model 148
4.9.2 Gamma Regression Model 151
4.10 Exercises 153
PART TWO REGRESSION MODELS 159
5 Linear Regression Models 161
5.1 Introduction 161
5.2 Specification 162
5.2.1 Model Classification 162
5.2.2 Structural and Reduced Forms 163
5.3 Estimation 166
5.3.1 Single Equation: Ordinary Least Squares 166
5.3.2 Multiple Equations: FIML 170
5.3.3 Identification 175
5.3.4 Instrumental Variables 177
5.3.5 Seemingly Unrelated Regression 181
5.4 Testing 182
5.5 Applications 187
-
xii Contents
5.5.1 Linear Taylor Rule 187
5.5.2 The Klein Model of the U.S. Economy 189
5.6 Exercises 191
6 Nonlinear Regression Models 199
6.1 Introduction 199
6.2 Specification 199
6.3 Maximum Likelihood Estimation 201
6.4 Gauss-Newton 208
6.4.1 Relationship to Nonlinear Least Squares 212
6.4.2 Relationship to Ordinary Least Squares 213
6.4.3 Asymptotic Distributions 213
6.5 Testing 214
6.5.1 LR, Wald and LM Tests 214
6.5.2 Nonnested Tests 218
6.6 Applications 221
6.6.1 Robust Estimation of the CAPM 221
6.6.2 Stochastic Frontier Models 224
6.7 Exercises 228
7 Autocorrelated Regression Models 234
7.1 Introduction 234
7.2 Specification 234
7.3 Maximum Likelihood Estimation 236
7.3.1 Exact Maximum Likelihood 237
7.3.2 Conditional Maximum Likelihood 238
7.4 Alternative Estimators 240
7.4.1 Gauss-Newton 241
7.4.2 Zig-zag Algorithms 244
7.4.3 Cochrane-Orcutt 247
7.5 Distribution Theory 248
7.5.1 Maximum Likelihood Estimator 249
7.5.2 Least Squares Estimator 253
7.6 Lagged Dependent Variables 258
7.7 Testing 260
7.7.1 Alternative LM Test I 262
7.7.2 Alternative LM Test II 263
7.7.3 Alternative LM Test III 264
7.8 Systems of Equations 265
7.8.1 Estimation 266
7.8.2 Testing 268
-
Contents xiii
7.9 Applications 268
7.9.1 Illiquidity and Hedge Funds 268
7.9.2 Beach-Mackinnon Simulation Study 269
7.10 Exercises 271
8 Heteroskedastic Regression Models 280
8.1 Introduction 280
8.2 Specification 280
8.3 Estimation 283
8.3.1 Maximum Likelihood 283
8.3.2 Relationship with Weighted Least Squares 286
8.4 Distribution Theory 289
8.5 Testing 289
8.6 Heteroskedasticity in Systems of Equations 295
8.6.1 Specification 295
8.6.2 Estimation 297
8.6.3 Testing 299
8.6.4 Heteroskedastic and Autocorrelated Disturbances 300
8.7 Applications 302
8.7.1 The Great Moderation 302
8.7.2 Finite Sample Properties of the Wald Test 304
8.8 Exercises 306
PART THREE OTHER ESTIMATION METHODS 313
9 Quasi-Maximum Likelihood Estimation 315
9.1 Introduction 315
9.2 Misspecification 316
9.3 The Quasi-Maximum Likelihood Estimator 320
9.4 Asymptotic Distribution 323
9.4.1 Misspecification and the Information Equality 325
9.4.2 Independent and Identically Distributed Data 328
9.4.3 Dependent Data: Martingale Difference Score 329
9.4.4 Dependent Data and Score 330
9.4.5 Variance Estimation 331
9.5 Quasi-Maximum Likelihood and Linear Regression 333
9.5.1 Nonnormality 336
9.5.2 Heteroskedasticity 337
9.5.3 Autocorrelation 338
9.5.4 Variance Estimation 342
-
xiv Contents
9.6 Testing 346
9.7 Applications 348
9.7.1 Autoregressive Models for Count Data 348
9.7.2 Estimating the Parameters of the CKLS Model 351
9.8 Exercises 354
10 Generalized Method of Moments 361
10.1 Introduction 361
10.2 Motivating Examples 362
10.2.1 Population Moments 362
10.2.2 Empirical Moments 363
10.2.3 GMM Models from Conditional Expectations 368
10.2.4 GMM and Maximum Likelihood 371
10.3 Estimation 372
10.3.1 The GMM Objective Function 372
10.3.2 Asymptotic Properties 373
10.3.3 Estimation Strategies 378
10.4 Over-Identification Testing 382
10.5 Applications 387
10.5.1 Monte Carlo Evidence 387
10.5.2 Level Effect in Interest Rates 393
10.6 Exercises 396
11 Nonparametric Estimation 404
11.1 Introduction 404
11.2 The Kernel Density Estimator 405
11.3 Properties of the Kernel Density Estimator 409
11.3.1 Finite Sample Properties 410
11.3.2 Optimal Bandwidth Selection 410
11.3.3 Asymptotic Properties 414
11.3.4 Dependent Data 416
11.4 Semi-Parametric Density Estimation 417
11.5 The Nadaraya-Watson Kernel Regression Estimator 419
11.6 Properties of Kernel Regression Estimators 423
11.7 Bandwidth Selection for Kernel Regression 427
11.8 Multivariate Kernel Regression 430
11.9 Semi-parametric Regression of the Partial Linear Model 432
11.10 Applications 433
11.10.1Derivatives of a Nonlinear Production Function 434
11.10.2Drift and Diffusion Functions of SDEs 436
11.11 Exercises 439
-
Contents xv
12 Estimation by Simulation 447
12.1 Introduction 447
12.2 Motivating Example 448
12.3 Indirect Inference 450
12.3.1 Estimation 451
12.3.2 Relationship with Indirect Least Squares 455
12.4 Efficient Method of Moments (EMM) 456
12.4.1 Estimation 456
12.4.2 Relationship with Instrumental Variables 458
12.5 Simulated Generalized Method of Moments (SMM) 459
12.6 Estimating Continuous-Time Models 461
12.6.1 Brownian Motion 464
12.6.2 Geometric Brownian Motion 467
12.6.3 Stochastic Volatility 470
12.7 Applications 472
12.7.1 Simulation Properties 473
12.7.2 Empirical Properties 475
12.8 Exercises 477
PART FOUR STATIONARY TIME SERIES 483
13 Linear Time Series Models 485
13.1 Introduction 485
13.2 Time Series Properties of Data 486
13.3 Specification 488
13.3.1 Univariate Model Classification 489
13.3.2 Multivariate Model Classification 491
13.3.3 Likelihood 493
13.4 Stationarity 493
13.4.1 Univariate Examples 494
13.4.2 Multivariate Examples 495
13.4.3 The Stationarity Condition 496
13.4.4 Wolds Representation Theorem 497
13.4.5 Transforming a VAR to a VMA 498
13.5 Invertibility 501
13.5.1 The Invertibility Condition 501
13.5.2 Transforming a VMA to a VAR 502
13.6 Estimation 502
13.7 Optimal Choice of Lag Order 506
-
xvi Contents
13.8 Distribution Theory 508
13.9 Testing 511
13.10 Analyzing Vector Autoregressions 513
13.10.1Granger Causality Testing 515
13.10.2Impulse Response Functions 517
13.10.3Variance Decompositions 523
13.11 Applications 525
13.11.1Barros Rational Expectations Model 525
13.11.2The Campbell-Shiller Present Value Model 526
13.12 Exercises 528
14 Structural Vector Autoregressions 537
14.1 Introduction 537
14.2 Specification 538
14.2.1 Short-Run Restrictions 542
14.2.2 Long-Run Restrictions 544
14.2.3 Short-Run and Long-Run Restrictions 548
14.2.4 Sign Restrictions 550
14.3 Estimation 553
14.4 Identification 558
14.5 Testing 559
14.6 Applications 561
14.6.1 Peersmans Model of Oil Price Shocks 561
14.6.2 A Portfolio SVAR Model of Australia 563
14.7 Exercises 566
15 Latent Factor Models 571
15.1 Introduction 571
15.2 Motivating Examples 572
15.2.1 Empirical 572
15.2.2 Theoretical 574
15.3 The Recursions of the Kalman Filter 575
15.3.1 Univariate 576
15.3.2 Multivariate 581
15.4 Extensions 585
15.4.1 Intercepts 585
15.4.2 Dynamics 585
15.4.3 Nonstationary Factors 587
15.4.4 Exogenous and Predetermined Variables 589
15.5 Factor Extraction 589
15.6 Estimation 591
-
Contents xvii
15.6.1 Identification 591
15.6.2 Maximum Likelihood 591
15.6.3 Principal Components Estimator 593
15.7 Relationship to VARMA Models 596
15.8 Applications 597
15.8.1 The Hodrick-Prescott Filter 597
15.8.2 A Factor Model of Spreads with Money Shocks 601
15.9 Exercises 603
PART FIVE NON-STATIONARY TIME SERIES 613
16 Nonstationary Distribution Theory 615
16.1 Introduction 615
16.2 Specification 616
16.2.1 Models of Trends 616
16.2.2 Integration 618
16.3 Estimation 620
16.3.1 Stationary Case 621
16.3.2 Nonstationary Case: Stochastic Trends 624
16.3.3 Nonstationary Case: Deterministic Trends 626
16.4 Asymptotics for Integrated Processes 629
16.4.1 Brownian Motion 630
16.4.2 Functional Central Limit Theorem 631
16.4.3 Continuous Mapping Theorem 635
16.4.4 Stochastic Integrals 637
16.5 Multivariate Analysis 638
16.6 Applications 640
16.6.1 Least Squares Estimator of the AR(1) Model 641
16.6.2 Trend Misspecification 643
16.7 Exercises 644
17 Unit Root Testing 651
17.1 Introduction 651
17.2 Specification 651
17.3 Detrending 653
17.3.1 Ordinary Least Squares: Dickey and Fuller 655
17.3.2 First Differences: Schmidt and Phillips 656
17.3.3 Generalized Least Squares: Elliott, Rothenberg
and Stock 657
17.4 Testing 658
-
xviii Contents
17.4.1 Dickey-Fuller Tests 659
17.4.2 M Tests 660
17.5 Distribution Theory 662
17.5.1 Ordinary Least Squares Detrending 664
17.5.2 Generalized Least Squares Detrending 665
17.5.3 Simulating Critical Values 667
17.6 Power 668
17.6.1 Near Integration and the Ornstein-Uhlenbeck
Processes 669
17.6.2 Asymptotic Local Power 671
17.6.3 Point Optimal Tests 671
17.6.4 Asymptotic Power Envelope 673
17.7 Autocorrelation 675
17.7.1 Dickey-Fuller Test with Autocorrelation 675
17.7.2 M Tests with Autocorrelation 676
17.8 Structural Breaks 678
17.8.1 Known Break Point 681
17.8.2 Unknown Break Point 684
17.9 Applications 685
17.9.1 Power and the Initial Value 685
17.9.2 Nelson-Plosser Data Revisited 687
17.10 Exercises 687
18 Cointegration 695
18.1 Introduction 695
18.2 Long-Run Economic Models 696
18.3 Specification: VECM 698
18.3.1 Bivariate Models 698
18.3.2 Multivariate Models 700
18.3.3 Cointegration 701
18.3.4 Deterministic Components 703
18.4 Estimation 705
18.4.1 Full-Rank Case 706
18.4.2 Reduced-Rank Case: Iterative Estimator 707
18.4.3 Reduced Rank Case: Johansen Estimator 709
18.4.4 Zero-Rank Case 715
18.5 Identification 716
18.5.1 Triangular Restrictions 716
18.5.2 Structural Restrictions 717
18.6 Distribution Theory 718
-
Contents xix
18.6.1 Asymptotic Distribution of the Eigenvalues 718
18.6.2 Asymptotic Distribution of the Parameters 720
18.7 Testing 724
18.7.1 Cointegrating Rank 724
18.7.2 Cointegrating Vector 727
18.7.3 Exogeneity 730
18.8 Dynamics 731
18.8.1 Impulse responses 731
18.8.2 Cointegrating Vector Interpretation 732
18.9 Applications 732
18.9.1 Rank Selection Based on Information Criteria 733
18.9.2 Effects of Heteroskedasticity on the Trace Test 735
18.10 Exercises 737
PART SIX NONLINEAR TIME SERIES 747
19 Nonlinearities in Mean 749
19.1 Introduction 749
19.2 Motivating Examples 749
19.3 Threshold Models 755
19.3.1 Specification 755
19.3.2 Estimation 756
19.3.3 Testing 758
19.4 Artificial Neural Networks 761
19.4.1 Specification 761
19.4.2 Estimation 764
19.4.3 Testing 766
19.5 Bilinear Time Series Models 767
19.5.1 Specification 767
19.5.2 Estimation 768
19.5.3 Testing 769
19.6 Markov Switching Model 770
19.7 Nonparametric Autoregression 774
19.8 Nonlinear Impulse Responses 775
19.9 Applications 779
19.9.1 A Multiple Equilibrium Model of Unemployment 779
19.9.2 Bivariate Threshold Models of G7 Countries 781
19.10 Exercises 784
-
xx Contents
20 Nonlinearities in Variance 795
20.1 Introduction 795
20.2 Statistical Properties of Asset Returns 795
20.3 The ARCH Model 799
20.3.1 Specification 799
20.3.2 Estimation 801
20.3.3 Testing 804
20.4 Univariate Extensions 807
20.4.1 GARCH 807
20.4.2 Integrated GARCH 812
20.4.3 Additional Variables 813
20.4.4 Asymmetries 814
20.4.5 Garch-in-Mean 815
20.4.6 Diagnostics 817
20.5 Conditional Nonnormality 818
20.5.1 Parametric 819
20.5.2 Semi-Parametric 821
20.5.3 Nonparametric 821
20.6 Multivariate GARCH 825
20.6.1 VECH 826
20.6.2 BEKK 827
20.6.3 DCC 830
20.6.4 DECO 836
20.7 Applications 837
20.7.1 DCC and DECO Models of U.S. Zero Coupon
Yields 837
20.7.2 A Time-Varying Volatility SVAR Model 838
20.8 Exercises 841
21 Discrete Time Series Models 850
21.1 Introduction 850
21.2 Motivating Examples 850
21.3 Qualitative Data 853
21.3.1 Specification 853
21.3.2 Estimation 857
21.3.3 Testing 861
21.3.4 Binary Autoregressive Models 863
21.4 Ordered Data 865
21.5 Count Data 867
21.5.1 The Poisson Regression Model 869
-
Contents xxi
21.5.2 Integer Autoregressive Models 871
21.6 Duration Data 874
21.7 Applications 876
21.7.1 An ACH Model of U.S. Airline Trades 876
21.7.2 EMM Estimator of Integer Models 879
21.8 Exercises 881
Appendix A Change of Variable in Probability Density Func-
tions 887
Appendix B The Lag Operator 888
B.1 Basics 888
B.2 Polynomial Convolution 889
B.3 Polynomial Inversion 890
B.4 Polynomial Decomposition 891
Appendix C FIML Estimation of a Structural Model 892
C.1 Log-likelihood Function 892
C.2 First-order Conditions 892
C.3 Solution 893
Appendix D Additional Nonparametric Results 897
D.1 Mean 897
D.2 Variance 899
D.3 Mean Square Error 901
D.4 Roughness 902
D.4.1 Roughness Results for the Gaussian Distribution 902
D.4.2 Roughness Results for the Gaussian Kernel 903
References 905
Author index 915
Subject index 918
-
Illustrations
1.1 Probability distributions of y for various models 51.2 Probability distributions of y for various models 71.3 Log-likelihood function for Poisson distribution 151.4 Log-likelihood function for exponential distribution 151.5 Log-likelihood function for the normal distribution 171.6 Eurodollar interest rates 241.7 Stationary density of Eurodollar interest rates 251.8 Transitional density of Eurodollar interest rates 272.1 Demonstration of the weak law of large numbers 422.2 Demonstration of the Lindeberg-Levy central limit theorem 492.3 Convergence of log-likelihood function 652.4 Consistency of sample mean for normal distribution 652.5 Consistency of median for Cauchy distribution 662.6 Illustrating asymptotic normality 692.7 Bivariate normal distribution 772.8 Scatter plot of returns on Apple and Ford stocks 782.9 Gradient of the bivariate normal model 813.1 Stationary density of Eurodollar interest rates: CIR model 1153.2 Estimated variance function of CIR model 1174.1 Illustrating the LR and Wald tests 1254.2 Illustrating the LM test 1264.3 Simulated and asymptotic distributions of the Wald test 1425.1 Simulating a bivariate regression model 1665.2 Sampling distribution of a weak instrument 1805.3 U.S. data on the Taylor Rule 1886.1 Simulated exponential models 2016.2 Scatter of plot Martin Marietta returns data 2226.3 Stochastic frontier disturbance distribution 2257.1 Simulated models with autocorrelated disturbances 236
-
2 Illustrations
7.2 Distribution of maximum likelihood estimator in an autocorre-lated regression model 252
8.1 Simulated data from heteroskedastic models 2828.2 The Great Moderation 3038.3 Sampling distribution of Wald test 3058.4 Power of Wald test 3059.1 Comparison of true and misspecified log-likelihood functions 3179.2 U.S. Dollar/British Pound exchange rates 3459.3 Estimated variance function of CKLS model 35311.1 Bias and variance of the kernel estimate of density 41111.2 Kernel estimate of distribution of stock index returns 41311.3 Bivariate normal density 41411.4 Semiparametric density estimator 41911.5 Parametric conditional mean estimates 42011.6 Nadaraya-Watson nonparametric kernel regression 42411.7 Effect of bandwidth on kernel regression 42511.8 Cross validation bandwidth selection 42911.9 Two-dimensional product kernel 43111.10 Semiparametric regression 43311.11 Nonparametric production function 43511.12 Nonparametric estimates of drift and diffusion functions 43812.1 Simulated AR(1) model 45012.2 Illustrating Brownian motion 46213.1 U.S. macroeconomic data 48713.2 Plots of simulated stationary time series 49013.3 Choice of optimal lag order 50814.1 Bivariate SVAR model 54114.2 Bivariate SVAR with short-run restrictions 54514.3 Bivariate SVAR with long-run restrictions 54714.4 Bivariate SVAR with short- and long-run restrictions 54914.5 Bivariate SVAR with sign restrictions 55214.6 Impuse responses of Peermans model 56415.1 Daily U.S. zero coupon rates 57315.2 Alternative priors for latent factors in the Kalman filter 58815.3 Factor loadings of a term structure model 59515.4 Hodrick-Prescott filter of real U.S. GPD 60116.1 Nelson-Plosser data 61816.2 Simulated distribution of AR1 parameter 62416.3 Continuous-time processes 63316.4 Functional Central Limit Theorem 63516.5 Distribution of a stochastic integral 63816.6 Mixed normal distribution 64017.1 Real U.S. GDP 652
-
Illustrations 3
17.2 Detrending 65817.3 Near unit root process 66917.4 Aymptotic power curve of ADF tests 67217.5 Asymptotic power envelope of ADF tests 67417.6 Structural breaks in U.S. GDP 67917.7 Union of rejections approach 68618.1 Permanent income hypothesis 69618.2 Long run money demand 69718.3 Term structure of U.S. yields 69818.4 Error correction phase diagram 69919.1 Properties of an AR(2) model 75019.2 Limit cycle 75119.3 Strange attractor 75219.4 Nonlinear error correction model 75319.5 U.S. unemployment 75419.6 Threshold functions 75719.7 Decomposition of an ANN 76219.8 Simulated bilinear time series models 76819.9 Markov switching model of U.S. output 77319.10 Nonparametric estimate of a TAR(1) model 77519.11 Simulated TAR models for G7 countries 78320.1 Statistical properties of FTSE returns 79620.2 Distribution of FTSE returns 79920.3 News impact curve 80120.4 ACF of GARCH(1,1) models 81020.5 Conditional variance of FTSE returns 81220.6 Risk-return preferences 81620.7 BEKK model of U.S. zero coupon bonds 82920.8 DECO model of interest rates 83820.9 SVAR model of U.K. Libor spread 84021.1 U.S. Federal funds target rate from 1984 to 2009 85221.2 Money demand equation with a floor interest rate 85321.3 Duration descriptive statistics for AMR 877
-
Computer Code used in the Examples(Code is written in GAUSS in which case the extension is .g
and in MATLAB in which case the extension is .m)
1.1 basic sample.* 41.2 basic sample.* 61.3 basic sample.* 61.4 basic sample.* 61.5 basic sample.* 71.6 basic sample.* 81.7 basic sample.* 81.8 basic sample.* 91.10 basic poisson.* 131.11 basic exp.* 141.12 basic normal like.* 161.14 basic poisson.* 181.15 basic exp.* 191.16 basic normal like.* 191.18 basic exp.* 221.19 basic normal.* 222.5 prop wlln1.* 412.6 prop wlln2.* 422.8 prop moment.* 452.10 prop lindlevy.* 482.21 prop consistency.* 642.22 prop normal.* 642.23 prop cauchy.* 652.25 prop asymnorm.* 682.28 prop edgeworth.* 722.29 prop bias.* 733.2 max exp.* 933.3 max exp.* 953.4 max exp.* 973.6 max weibull.* 99
-
Computer Code used in the Examples 5
3.7 max exp.* 1023.8 max exp.* 1034.3 test weibull.* 1334.5 test weibull.* 1354.7 test weibull.* 1394.10 test asymptotic.* 1414.11 text size.* 1454.12 test power.* 1474.13 test power.* 1475.5 linear simulation.* 1655.6 linear estimate.* 1695.7 linear fiml.* 1715.8 linear fiml.* 1735.10 linear weak.* 1795.14 linear lr.*, linear wd.*, linear lm.* 1825.15 linear fiml lr.*, linear fiml wd.*, linear fiml lm.* 1856.3 nls simulate.* 2006.5 nls exponential.* 2066.7 nls consumption estimate.* 2106.8 nls contest.* 2156.11 nls money.* 2197.1 auto simulate.* 2357.5 auto invest.* 2407.8 auto distribution.* 2517.11 auto test.* 2607.12 auto system.* 2678.1 hetero simulate.* 2818.3 hetero estimate.* 2848.7 hetero test.* 2938.9 hetero system.* 2988.10 hetero system.* 2998.11 hetero general.* 30110.2 gmm table.* 36610.3 gmm table.* 36710.11 gmm ccapm.* 38211.1 npd kernel.* 40711.2 npd property.* 41011.3 npd ftse.* 41211.4 npd bivariate.* 41411.5 npd seminonlin.* 41811.6 npr parametric.* 41911.7 npr nadwatson.* 42211.8 npr property.* 424
-
6 Computer Code used in the Examples
11.10 npr bivariate.* 43011.11 npr semi.* 43212.1 sim mom.* 45012.3 sim accuracy.* 45312.4 sim ma1indirect.* 45412.5 sim ma1emm.* 45712.6 sim ma1overid.* 46012.7 sim brownind.*,sim brownemm.* 46613.1 stsm simulate.* 48913.8 stsm root.* 49613.9 stsm root.* 49713.17 stsm varma.* 50413.21 stsm anderson.* 51113.24 stsm recursive.* 51313.25 stsm recursive.* 51613.26 stsm recursive.* 52213.27 stsm recursive.* 52314.2 svar bivariate.* 54014.5 svar bivariate.* 54414.9 svar bivariate.* 54714.10 svar bivariate.* 54814.12 svar bivariate.* 55214.13 svar shortrun.* 55414.14 svar longrun.* 55614.15 svar recursive.* 55714.17 svar test.* 56014.18 svar test.* 56115.1 kalman termfig.* 57215.5 kalman uni.* 58015.6 kalman multi.* 58315.8 kalman smooth.* 59015.9 kalman uni.* 59215.10 kalman term.* 59215.11 kalman fvar.* 59415.12 kalman panic.* 59416.1 nts nelplos.* 61616.2 nts nelplos.* 61616.3 nts nelplos.* 61716.4 nts moment.* 62216.5 nts moment.* 62416.6 nts moment.* 62816.7 nts yts.* 63216.8 nts fclt.* 635
-
Computer Code used in the Examples 7
16.10 nts stochint.* 63716.11 nts mixednormal.* 63917.1 unit qusgdp.* 65717.2 unit qusgdp.* 66117.3 unit asypower1.* 67117.4 unit asypowerenv.* 67417.5 unit maicsim.* 67717.6 unit qusgdp.* 67917.8 unit qusgdp.* 68317.9 unit qusgdp.* 68518.1 coint lrgraphs.* 69618.2 coint lrgraphs.* 69618.3 coint lrgraphs.* 69718.4 coint lrgraphs.* 70218.6 coint bivterm.* 70718.7 coint bivterm.* 70818.8 coint bivterm.* 71218.9 coint permincome.* 71418.10 coint bivterm.* 71518.11 coint triterm.* 71618.13 coint simevals.* 71918.16 coint bivterm.* 72819.1 nlm features.* 75019.2 nlm features.* 75019.3 nlm features.* 75119.4 nlm features.* 75219.6 nlm tarsim.* 76019.7 nlm annfig.* 76219.8 nlm bilinear.* 76719.9 nlm hamilton.* 77219.10 nlm tar.* 77419.11 nlm girf.* 77820.1 garch nic.* 80020.2 garch estimate.* 80420.3 garch test.* 80620.4 garch simulate.* 80920.5 garch estimate.* 81020.6 garch seasonality.* 81320.7 garch mean.* 81620.9 mgarch bekk.* 82821.2 discrete mpol.* 85221.3 discrete floor.* 85221.4 discrete simulation.* 857
-
8 Computer Code used in the Examples
21.7 discrete probit.* 85921.8 discrete probit.* 86221.9 discrete ordered.* 86621.11 discrete thinning.* 87121.12 discrete poissonauto.* 873
Code Disclaimer Information
Note that the computer code is provided for illustrative purposes only and
although care has been taken to ensure that it works properly, it has not been
thoroughly tested under all conditions and on all platforms. The authors and
Cambridge University Press cannot guarantee or imply reliability, service-
ability, or function of this computer code. All code is therefore provided as
is without any warranties of any kind.
-
PART ONE
MAXIMUM LIKELIHOOD
-
1The Maximum Likelihood Principle
1.1 Introduction
Maximum likelihood estimation is a general method for estimating the pa-
rameters of econometric models from observed data. The principle of max-
imum likelihood plays a central role in the exposition of this book, since a
number of estimators used in econometrics can be derived within this frame-
work. Examples include ordinary least squares, generalized least squares and
full-information maximum likelihood. In deriving the maximum likelihood
estimator, a key concept is the joint probability density function (pdf) of
the observed random variables, yt. Maximum likelihood estimation requires
that the following conditions are satisfied.
(1) The form of the joint pdf of yt is known.
(2) The specification of the moments of the joint pdf are known.
(3) The joint pdf can be evaluated for all values of the parameters, .
Parts ONE and TWO of this book deal with models in which all these
conditions are satisfied. Part THREE investigates models in which these
conditions are not satisfied and considers four important cases. First, if the
distribution of yt is misspecified, resulting in both conditions 1 and 2 being
violated, estimation is by quasi-maximum likelihood (Chapter 9). Second,
if condition 1 is not satisfied, a generalized method of moments estimator
(Chapter 10) is required. Third, if condition 2 is not satisfied, estimation
relies on nonparametric methods (Chapter 11). Fourth, if condition 3 is
violated, simulation-based estimation methods are used (Chapter 12).
1.2 Motivating Examples
To highlight the role of probability distributions in maximum likelihood esti-
mation, this section emphasizes the link between observed sample data and
-
4 The Maximum Likelihood Principle
the probability distribution from which they are drawn. This relationship
is illustrated with a number of simulation examples where samples of size
T = 5 are drawn from a range of alternative models. The realizations of
these draws for each model are listed in Table 1.1.
Table 1.1
Realisations of yt from alternative models: t = 1, 2, , 5.
Model t=1 t=2 t=3 t=4 t=5
Time Invariant -2.720 2.470 0.495 0.597 -0.960Count 2.000 4.000 3.000 4.000 0.000Linear Regression 2.850 3.105 5.693 8.101 10.387Exponential Regression 0.874 8.284 0.507 3.722 5.865Autoregressive 0.000 -1.031 -0.283 -1.323 -2.195Bilinear 0.000 -2.721 0.531 1.350 -2.451ARCH 0.000 3.558 6.989 7.925 8.118Poisson 3.000 10.000 17.000 20.000 23.000
Example 1.1 Time Invariant Model
Consider the model
yt = zt ,
where zt is a disturbance term and is a parameter. Let zt be a standardized
normal distribution, N(0, 1), defined by
f(z) =12pi
exp
[z
2
2
].
The distribution of yt is obtained from the distribution of zt using the change
of variable technique (see Appendix A for details)
f(y ; ) = f(z)
zy ,
where = {2}. Applying this rule, and recognising that z = y/, yields
f(y ; ) =12pi
exp
[(y/)
2
2
] 1 = 1
2pi2exp
[ y
2
22
],
or yt N(0, 2). In this model, the distribution of yt is time invariant
because neither the mean nor the variance depend on time. This property
is highlighted in panel (a) of Figure 1.1 where the parameter is = 2.
For comparative purposes the distributions of both yt and zt are given. As
yt = 2zt, the distribution of yt is flatter than the distribution of zt.
-
1.2 Motivating Examples 5
(a) Time Invariant Modelf(y)
y
z
y
(b) Count Model
f(y)
y
(c) Linear Regression Model
f(y)
y
(d) Exponential Regression Modelf(y)
y
-10 0 10 20-10 0 10 20
0 1 2 3 4 5 6 7 8 9-10 0 10
0
0.2
0.4
0.6
0.8
1
0
0.05
0.1
0.15
0.2
0
0.1
0.2
0.3
0
0.1
0.2
0.3
0.4
Figure 1.1 Probability distributions of y generated from the time invariant,count, linear regression and exponential regression models. Except for thetime invariant and count models, the solid line represents the density att = 1, the dashed line represents the density at t = 3 and the dotted linerepresents the density at t = 5.
As the distribution of yt in Example 1.1 does not depend on lagged values
yti, yt is independently distributed. In addition, since the distribution of ytis the same at each t, yt is identically distributed. These two properties are
abbreviated as iid. Conversely, the distribution is dependent if yt depends
on its own lagged values and non-identical if it changes over time.
-
6 The Maximum Likelihood Principle
Example 1.2 Count Model
Consider a time series of counts modelled as a series of draws from a
Poisson distribution
f (y; ) =y exp[]
y!, y = 0, 1, 2, ,
where > 0 is an unknown parameter. A sample of T = 5 realizations of
yt, given in Table 1.1, is drawn from the Poisson probability distribution in
panel (b) of Figure 1.1 for = 2. By assumption, this distribution is the
same at each point in time. In contrast to the data in the previous example
where the random variable is continuous, the data here are discrete as they
are positive integers that measure counts.
Example 1.3 Linear Regression Model
Consider the regression model
yt = xt + zt , zt iidN(0, 1) ,
where xt is an explanatory variable that is independent of zt and = {, 2}.The distribution of y conditional on xt is
f(y |xt; ) = 12pi2
exp
[(y xt)
2
22
],
which is a normal distribution with conditional mean xt and variance 2,
or yt N(xt, 2). This distribution is illustrated in panel (c) of Figure 1.1
with = 3, = 2 and explanatory variable xt = {0, 1, 2, 3, 4}. The effect ofxt is to shift the distribution of yt over time into the positive region, resulting
in the draws of yt given in Table 1.1 becoming increasingly positive. As the
variance at each point in time is constant, the spread of the distributions of
yt is the same for all t.
Example 1.4 Exponential Regression Model
Consider the exponential regression model
f(y |xt; ) = 1t
exp
[ yt
],
where t = 0+1xt is the time-varying conditional mean, xt is an explana-
tory variable and = {0, 1}. This distribution is highlighted in panel (d)of Figure 1.1 with 0 = 1, 1 = 1 and xt = {0, 1, 2, 3, 4}. As 1 > 0, the ef-fect of xt is to cause the distribution of yt to become more positively skewed
over time.
-
1.2 Motivating Examples 7
(a) Autoregressive Modelf(y)
y
(b) Bilinear Model
f(y)
y
(c) Autoregressive Heteroskedastic Model
f(y)
y
(d) ARCH Modelf(y)
y
-10 0 10 20-10 0 10
-10 0 10 20-10 0 10
0
0.1
0.2
0.3
0.4
0.5
0
0.1
0.2
0.3
0.4
0.5
0
0.05
0.1
0.15
0.2
0
0.05
0.1
0.15
0.2
Figure 1.2 Probability distributions of y generated from the autoregressive,bilinear, autoregressive with heteroskedasticity and ARCH models. Thesolid line represents the density at t = 1, the dashed line represents thedensity at t = 3 and the dotted line represents the density at t = 5.
Example 1.5 Autoregressive Model
An example of a first-order autoregressive model, denoted AR(1), is
yt = yt1 + ut , ut iidN(0, 2) ,
-
8 The Maximum Likelihood Principle
with || < 1 and = {, 2}. The distribution of y, conditional on yt1, is
f(y | yt1; ) = 12pi2
exp
[(y yt1)
2
22
],
which is a normal distribution with conditional mean yt1 and variance 2,or yt N(yt1, 2). If 0 < < 1, then a large positive (negative) value ofyt1 shifts the distribution into the positive (negative) region for yt, raisingthe probability that the next draw from this distribution is also positive
(negative). This property of the autoregressive model is highlighted in panel
(a) of Figure 1.2 with = 0.8, = 2 and initial value y1 = 0.
Example 1.6 Bilinear Time Series Model
The autoregressive model discussed above specifies a linear relationship
between yt and yt1. The following bilinear model is an example of a non-linear time series model
yt = yt1 + yt1ut1 + ut , ut iidN(0, 2) ,
where yt1ut1 represents the bilinear term and = {, , 2}. The distri-bution of yt conditional on yt1 is
f(y | yt1; ) = 12pi2
exp
[(y t)
2
22
],
which is a normal distribution with conditional mean t = yt1+yt1ut1and variance 2. To highlight the nonlinear property of the model, substitute
out ut1 in the equation for the mean
t = yt1 + yt1(yt1 yt2 yt2ut2)= yt1 + y2t1 yt1yt2 2yt1yt2ut2 ,
which shows that the mean is a nonlinear function of yt1. Setting =0 yields the linear AR(1) model of Example 1.5. The distribution of the
bilinear model is illustrated in panel (b) of Figure 1.2 with = 0.8, = 0.4,
= 2 and initial value y1 = 0.
Example 1.7 Autoregressive Model with Heteroskedasticity
An example of an AR(1) model with heteroskedasticity is
yt = yt1 + tzt2t = 0 + 1wt
zt iidN(0, 1) ,
where = {, 0, 1} and wt is an explanatory variable. The distribution
-
1.3 Joint Probability Distributions 9
of yt conditional on yt1 and wt is
f(y | yt1, wt; ) = 12pi2t
exp
[(y yt1)
2
22t
],
which is a normal distribution with conditional mean yt1 and conditionalvariance 0 + 1wt. For this model, the distribution shifts because of the
dependence on yt1 and the spread of the distribution changes because ofwt. These features are highlighted in panel (c) of Figure 1.2 with = 0.8,
0 = 0.8, 1 = 0.8, wt is defined as a uniform random number on the unit
interval and the initial value is y1 = 0.
Example 1.8 Autoregressive Conditional Heteroskedasticity
The autoregressive conditional heteroskedasticity (ARCH) class of models
is a special case of the heteroskedastic regression model where wt in Example
1.7 is expressed in terms of lagged values of the disturbance term squared.
An example of a regression model as in Example 1.3 with ARCH is
yt = xt + ut
ut = tzt
2t = 0 + 1u2t1
zt iidN(0, 1),
where xt is an explanatory variable and = {, 0, 1}. The distributionof y conditional on yt1, xt and xt1 is
f (y | yt1, xt, xt1; ) = 12pi(0 + 1 (yt1 xt1)2
) exp
(y xt)22(0 + 1 (yt1 xt1)2
) .
For this model, a large shock, represented by a large value of ut, results in
an increased variance in the next period if 1 > 0. The distribution from
which yt is drawn in the next period will therefore have a larger variance.
The distribution of this model is shown in panel (d) of Figure 1.2 with = 3,
0 = 0.8, 1 = 0.8 and xt = {0, 1, 2, 3, 4}.
1.3 Joint Probability Distributions
The motivating examples of the previous section focus on the distribution
of yt at time t which is generally a function of its own lags and the current
-
10 The Maximum Likelihood Principle
and lagged values of explanatory variables xt. The derivation of the maxi-
mum likelihood estimator of the model parameters requires using all of the
information t = 1, 2, , T by defining the joint probability density function(pdf). In the case where both yt and xt are stochastic, the joint probability
pdf for a sample of T observations is
f(y1, y2, , yT , x1, x2, , xT ;) , (1.1)where is a vector of parameters. An important feature of the previous
examples is that yt depends on the explanatory variable xt. To capture this
conditioning, the joint distribution in (1.1) is expressed as
f(y1, y2, , yT , x1, x2, , xT ;) = f(y1, y2, , yT |x1, x2, , xT ;) f(x1, x2, , xT ;) , (1.2)
where the first term on the right hand side of (1.2) represents the conditional
distribution of {y1, y2, , yT } on {x1, x2, , xT } and the second term isthe marginal distribution of {x1, x2, , xT }. Assuming that the parametervector can be decomposed into {, x} such that expression (1.2) becomesf(y1, y2, , yT , x1, x2, , xT ;) = f(y1, y2, , yT |x1, x2, , xT ; )
f(x1, x2, , xT ; x) . (1.3)In these circumstances, the maximum likelihood estimation of the parame-
ters is based on the conditional distribution without loss of information
from the exclusion of the marginal distribution f(x1, x2, , xT ; x).The conditional distribution on the right hand side of expression (1.3)
simplifies further in the presence of additional restrictions.
Independent and identically distributed (iid)
In the simplest case, {y1, y2, , yT } is independent of {x1, x2, , xT } andyt is iid with density function f(y; ). The conditional pdf in equation (1.3)
is then
f(y1, y2, , yT |x1, x2, , xT ; ) =Tt=1
f(yt; ) . (1.4)
Examples of this case are the time invariant model (Example 1.1) and the
count model (Example 1.2).
If both yt and xt are iid and yt is dependent on xt then the decomposition
in equation (1.3) implies that inference can be based on
f(y1, y2, , yT |x1, x2, , xT ; ) =Tt=1
f(yt |xt; ) . (1.5)
-
1.3 Joint Probability Distributions 11
Examples include the regression models in Examples 1.3 and 1.4 if sampling
is iid.
Dependent
Now assume that {y1, y2, , yT } depends on its own lags but is independentof the explanatory variable {x1, x2, , xT }. The joint pdf is expressed asa sequence of conditional distributions where conditioning is based on lags
of yt. By using standard rules of probability the distributions for the first
three observations are, respectively,
f(y1; ) = f(y1; )
f(y1, y2 ; ) = f(y2|y1; )f(y1; )f(y1, y2, y3; ) = f(y3|y2, y1; )f(y2|y1; )f(y1; ) ,
where y1 is the initial value with marginal probability density
Extending this sequence to a sample of T observations, yields the joint
pdf
f(y1, y2, , yT ; ) = f(y1 ; )Tt=2
f(yt|yt1, yt2, , y1; ) . (1.6)
Examples of this general case are the AR model (Example 1.5), the bilinear
model (Example 1.6) and the ARCH model (Example 1.8). Extending the
model to allow for dependence on explanatory variables, xt, gives
f(y1, y2, , yT |x1, x2, , xT ; ) =
f(y1 |x1; )Tt=2
f(yt|yt1, yt2, , y1, xt, xt1, x1; ) . (1.7)
An example is the autoregressive model with heteroskedasticity (Example
1.7).
Example 1.9 Autoregressive Model
The joint pdf for the AR(1) model in Example 1.5 is
f(y1, y2, , yT ; ) = f(y1; )Tt=2
f(yt|yt1; ) ,
where the conditional distribution is
f (yt|yt1; ) = 12pi2
exp
[(yt yt1)
2
22
],
-
12 The Maximum Likelihood Principle
and the marginal distribution is
f (y1; ) =1
2pi2/ (1 2) exp[ y
21
22/ (1 2)].
Non-stochastic explanatory variables
In the case of non-stochastic explanatory variables, because xt is determin-
istic its probability mass is degenerate. Explanatory variables of this form
are also referred to as fixed in repeated samples. The joint probability in
expression (1.3) simplifies to
f(y1, y2, , yT , x1, x2, , xT ;) = f(y1, y2, , yT |x1, x2, , xT ; ) .
Now = and there is no potential loss of information from using the
conditional distribution to estimate .
1.4 Maximum Likelihood Framework
As emphasized previously, a time series of data represents the observed
realization of draws from a joint pdf. The maximum likelihood principle
makes use of this result by providing a general framework for estimating the
unknown parameters, , from the observed time series data, {y1, y2, , yT }.
1.4.1 The Log-Likelihood Function
The standard interpretation of the joint pdf in (1.7) is that f is a function of
yt for given parameters, . In defining the maximum likelihood estimator this
interpretation is reversed, so that f is taken as a function of for given yt.
The motivation behind this change in the interpretation of the arguments of
the pdf is to regard {y1, y2, , yT } as a realized data set which is no longerrandom. The maximum likelihood estimator is then obtained by finding the
value of which is most likely to have generated the observed data. Here
the phrase most likely is loosely interpreted in a probability sense.
It is important to remember that the likelihood function is simply a re-
definition of the joint pdf in equation (1.7). For many problems it is simpler
to work with the logarithm of this joint density function. The log-likelihood
-
1.4 Maximum Likelihood Framework 13
function is defined as
lnLT () =1
Tln f(y1 |x1; )
+1
T
Tt=2
ln f(yt|yt1, yt2, , y1, xt, xt1, x1; ) ,(1.8)
where the change of status of the arguments in the joint pdf is highlighted
by making the sole argument of this function and the T subscript indicates
that the log-likelihood is an average over the sample of the logarithm of the
density evaluated at yt. It is worth emphasizing that the term log-likelihood
function, used here without any qualification, is also known as the average
log-likelihood function. This convention is also used by, among others, Newey
and McFadden (1994) and White (1994). This definition of the log-likelihood
function is consistent with the theoretical development of the properties of
maximum likelihood estimators discussed in Chapter 2, particularly Sections
2.3 and 2.5.1.
For the special case where yt is iid, the log-likelihood function is based on
the joint pdf in (1.4) and is
lnLT () =1
T
Tt=1
ln f(yt; ) .
In all cases, the log-likelihood function, lnLT (), is a scalar that represents
a summary measure of the data for given .
The maximum likelihood estimator of is defined as that value of , de-
noted , that maximizes the log-likelihood function. In a large number of
cases, this may be achieved using standard calculus. Chapter 3 discusses nu-
merical approaches to the problem of finding maximum likelihood estimates
when no analytical solutions exist, or are difficult to derive.
Example 1.10 Poisson Distribution
Let {y1, y2, , yT } be iid observations from a Poisson distribution
f(y; ) =y exp[]
y!,
where > 0. The log-likelihood function for the sample is
lnLT () =1
T
Tt=1
ln f(yt; ) =1
T
Tt=1
yt ln ln(y1!y2! yT !)T
.
Consider the following T = 3 observations, yt = {8, 3, 4}. The log-likelihood
-
14 The Maximum Likelihood Principle
function is
lnLT () =15
3ln ln(8!3!4!)
3= 5 ln 5.191 .
A plot of the log-likelihood function is given in panel (a) of Figure 1.3 for
values of ranging from 0 to 10. Even though the Poisson distribution is
a discrete distribution in terms of the random variable y, the log-likelihood
function is continuous in the unknown parameter . Inspection shows that
a maximum occurs at = 5 with a log-likelihood value of
lnLT (5) = 5 ln 5 5 5.191 = 2.144 .The contribution to the log-likelihood function at the first observation y1 =
8, evaluated at = 5 is
ln f(y1; 5) = y1 ln 5 5 ln(y1!) = 8 ln 5 5 ln(8!) = 2.729 .For the other two observations, the contributions are ln f(y2; 5) = 1.963,ln f(y3; 5) = 1.740. The probabilities f(yt; ) are between 0 and 1 by def-inition and therefore all of the contributions are negative because they are
computed as the logarithm of f(yt; ). The average of these T = 3 contri-
butions is lnLT (5) = 2.144, which corresponds to the value already givenabove. A plot of ln f(yt; 5) in panel (b) of Figure 1.3 shows that observations
closer to = 5 have a relatively greater contribution to the log-likelihood
function than observations further away in the sense that they are smaller
negative numbers.
Example 1.11 Exponential Distribution
Let {y1, y2, , yT } be iid drawings from an exponential distributionf(y; ) = exp[y] ,
where > 0. The log-likelihood function for the sample is
lnLT () =1
T
Tt=1
ln f(yt; ) =1
T
Tt=1
(ln yt) = ln 1T
Tt=1
yt .
Consider the following T = 6 observations, yt = {2.1, 2.2, 3.1, 1.6, 2.5, 0.5}.The log-likelihood function is
lnLT () = ln 1T
Tt=1
yt = ln 2 .
Plots of the log-likelihood function, lnLT (), and the likelihood LT ()
functions are given in Figure 1.4, which show that a maximum occurs at
-
1.4 Maximum Likelihood Framework 15
(a) Log-likelihood functionlnLT()
(b) Log-density function
lnf(yt;5)
yt1 2 3 4 5 6 7 8 9 100 5 10 15
-3
-2.5
-2
-1.5
-1
-0.5
0-30
-25
-20
-15
-10
-5
0
Figure 1.3 Plot of lnLT () and and ln f(yt; = 5) for the Poisson distri-bution example with a sample size of T = 3.
(a) Log-likelihood function
lnLT()
(b) Likelihood function
LT()105
0 1 2 30 1 2 3
0.5
1
1.5
2
2.5
3
3.5
4
-40
-35
-30
-25
-20
-15
-10
Figure 1.4 Plot of lnLT () for the exponential distribution example.
= 0.5. Table 1.2 provides details of the calculations. Let the log-likelihood
function at each observation evaluated at the maximum likelihood estimate
be denoted ln lt() = ln f(yt; ). The second column shows ln lt() evaluated
at = 0.5
ln lt(0.5) = ln(0.5) 0.5yt ,resulting in a maximum value of the log-likelihood function of
lnLT (0.5) =1
6
6t=1
ln lt(0.5) =10.159
6= 1.693 .
-
16 The Maximum Likelihood Principle
Table 1.2Maximum likelihood calculations for the exponential distribution example. The
maximum likelihood estimate is T = 0.5.
yt ln lt(0.5) gt(0.5) ht(0.5)
2.1 -1.743 -0.100 -4.0002.2 -1.793 -0.200 -4.0003.1 -2.243 -1.100 -4.0001.6 -1.493 0.400 -4.0002.5 -1.943 -0.500 -4.0000.5 -0.943 1.500 -4.000
lnLT (0.5) = 1.693 GT (0.5) = 0.000 HT (0.5) = 4.000
Example 1.12 Normal Distribution
Let {y1, y2, , yT } be iid observations drawn from a normal distribution
f(y; ) =12pi2
exp
[(y )
2
22
],
with unknown parameters ={, 2
}. The log-likelihood function is
lnLT () =1
T
Tt=1
ln f(yt; )
=1
T
Tt=1
( 12ln 2pi 1
2ln2 (yt )
2
22
)= 1
2ln 2pi 1
2ln2 1
22T
Tt=1
(yt )2.
Consider the following T = 6 observations, yt = {5,1, 3, 0, 2, 3}. Thelog-likelihood function is
lnLT () = 12ln 2pi 1
2ln2 1
122
6t=1
(yt )2 .
A plot of this function in Figure 1.5 shows that a maximum occurs at = 2
and 2 = 4.
Example 1.13 Autoregressive Model
-
1.4 Maximum Likelihood Framework 17
PSfrag
2
lnLT(,2)
11.5
22.5
3
33.5
44.5
5
Figure 1.5 Plot of lnLT () for the normal distribution example.
From Example 1.9, the log-likelihood function for the AR(1) model is
lnLT () =1
T
(1
2ln(1 2) 1
22(1 2) y21)
12ln 2pi 1
2ln2 1
22T
Tt=2
(yt yt1)2 .
The first term is commonly excluded from lnLT () as its contribution dis-
appears asymptotically since
limT
1
T
(1
2ln(1 2) 1
22(1 2) y21) = 0 .
As the aim of maximum likelihood estimation is to find the value of that
maximizes the log-likelihood function, a natural way to do this is to use the
rules of calculus. This involves computing the first derivatives and second
derivatives of the log-likelihood function with respect to the parameter vec-
tor .
-
18 The Maximum Likelihood Principle
1.4.2 Gradient
Differentiating lnLT (), with respect to a (K1) parameter vector, , yieldsa (K 1) gradient vector, also known as the score, given by
GT () = lnLT ()
=
lnLT ()
1 lnLT ()
2...
lnLT ()
K
=
1
T
Tt=1
gt() , (1.9)
where the subscript T emphasizes that the gradient is the sample average
of the individual gradients
gt() = ln lt()
.
The maximum likelihood estimator of , denoted , is obtained by setting
the gradients equal to zero and solving the resultantK first-order conditions.
The maximum likelihood estimator, , therefore satisfies the condition
GT () = lnLT ()
=
= 0 . (1.10)
Example 1.14 Poisson Distribution
From Example 1.10, the first derivative of lnLT () with respect to is
GT () =1
T
Tt=1
yt 1 .
The maximum likelihood estimator is the solution of the first-order condition
1
T
Tt=1
yt 1 = 0 ,
which yields the sample mean as the maximum likelihood estimator
=1
T
Tt=1
yt = y .
Using the data for yt in Example 1.10, the maximum likelihood estimate is
= 15/3 = 5. Evaluating the gradient at = 5 verifies that it is zero at the
-
1.4 Maximum Likelihood Framework 19
maximum likelihood estimate
GT () =1
T
Tt=1
yt 1 = 153 5 1 = 0 .
Example 1.15 Exponential Distribution
From Example 1.11, the first derivative of lnLT () with respect to is
GT () =1
1T
Tt=1
yt .
Setting GT () = 0 and solving the resultant first-order condition yields
=TTt=1 yt
=1
y,
which is the reciprocal of the sample mean. Using the same observed data for
yt as in Example 1.11, the maximum likelihood estimate is = 6/12 = 0.5.
The third column of Table 1.2 gives the gradients at each observation
evaluated at = 0.5
gt(0.5) =1
0.5 yt .
The gradient is
GT (0.5) =1
6
6t=1
gt(0.5) = 0 ,
which follows from the properties of the maximum likelihood estimator.
Example 1.16 Normal Distribution
From Example 1.12, the first derivatives of the log-likelihood function are
lnLT ()
=
1
2T
Tt=1
(yt) , lnLT ()(2)
= 122
+1
24T
Tt=1
(yt)2 ,
yielding the gradient vector
GT () =
1
2T
Tt=1
(yt )
122
+1
24T
Tt=1
(yt )2
.
-
20 The Maximum Likelihood Principle
Evaluating the gradient at and setting GT () = 0, gives
GT () =
1
2T
Tt=1
(yt )
122
+1
24T
Tt=1
(yt )2
= 00
.
Solving for = {, 2}, the maximum likelihood estimators are
=1
T
Tt=1
yt = y , 2 =
1
T
Tt=1
(yt y)2 .
Using the data from Example 1.12, the maximum likelihood estimates are
=5 1 + 3 + 0 + 2 + 3
6= 2
2 =(5 2)2 + (1 2)2 + (3 2)2 + (0 2)2 + (2 2)2 + (3 2)2
6= 4 ,
which agree with the values given in Example 1.12.
1.4.3 Hessian
To establish that maximizes the log-likelihood function, it is necessary to
determine that the Hessian
HT () =2 lnLT ()
, (1.11)
associated with the log-likelihood function is negative definite. As is a
(K 1) vector, the Hessian is the (K K) symmetric matrix
HT () =
2 lnLT ()
11
2 lnLT ()
12. . .
2 lnLT ()
1K
2 lnLT ()
21
2 lnLT ()
22. . .
2 lnLT ()
2K
......
......
2 lnLT ()
K1
2 lnLT ()
K2. . .
2 lnLT ()
KK
=
1
T
Tt=1
ht() ,
-
1.4 Maximum Likelihood Framework 21
where the subscript T emphasizes that the Hessian is the sample average of
the individual elements
ht() =2 ln lt()
.
The second-order condition for a maximum requires that the Hessian matrix
evaluated at ,
HT () =2 lnLT ()
=
, (1.12)
is negative definite. The conditions for negative definiteness are
|H11| < 0, H11 H12H21 H22
> 0,H11 H12 H13H21 H22 H23H31 H32 H33
< 0, where Hij is the ij
th element of HT (). In the case of K = 1, the condition
is
H11 < 0 . (1.13)
For the case of K = 2, the condition is
H11 < 0, H11H22 H12H21 > 0 . (1.14)
Example 1.17 Poisson Distribution
From Examples 1.10 and 1.14, the second derivative of lnLT () with re-
spect to is
HT () = 12T
Tt=1
yt .
Evaluating the Hessian at the maximum likelihood estimator, = y, yields
HT () = 12T
Tt=1
yt = 1y2T
Tt=1
yt = 1y< 0 .
As y is always positive because it is the mean of a sample of positive integers,
the Hessian is negative and a maximum is achieved. Using the data for ytin Example 1.10, verifies that the Hessian at = 5 is negative
HT () = 12T
Tt=1
yt = 1552 3 = 0.200 .
-
22 The Maximum Likelihood Principle
Example 1.18 Exponential Distribution
From Examples 1.11 and 1.15, the second derivative of lnLT () with re-
spect to is
HT () = 12.
Evaluating the Hessian at the maximum likelihood estimator yields
HT () = 12
< 0 .
As this term is negative for any , the condition in equation (1.13) is satisfied
and a maximum is achieved. The last column of Table 1.2 shows that the
Hessian at each observation evaluated at the maximum likelihood estimate
is constant. The value of the Hessian is
HT (0.5) =1
6
6t=1
ht(0.5) =24.000
6= 4 ,
which is negative confirming that a maximum has been reached.
Example 1.19 Normal Distribution
From Examples 1.12 and 1.16, the second derivatives of lnLT () with
respect to are
2 lnLT ()
2= 1
2
2 lnLT ()
2= 1
4T
Tt=1
(yt )
2 lnLT ()
(2)2=
1
24 16T
Tt=1
(yt )2 ,
so that the Hessian is
HT () =
12
14T
Tt=1
(yt )
14T
Tt=1
(yt ) 124
16T
Tt=1
(yt )2
.
Given that GT () = 0, from Example 1.16 it follows thatT
t=1(yt ) = 0
-
1.5 Applications 23
and therefore
HT () =
1
20
0 124
.From equation (1.14)
H11 = T2
< 0, H11H22 H12H21 = ( T2)( T
24) 02 > 0 ,
establishing that the second-order condition for a maximum is satisfied.
Using the maximum likelihood estimates from Example 1.16, the Hessian is
HT (, 2) =
1
40
0 12 42
= 0.250 0.000
0.000 0.031
.
1.5 Applications
To highlight the features of maximum likelihood estimation discussed thus
far, two applications are presented that focus on estimating the discrete time
version of the Vasicek (1977) model of interest rates, rt. The first application
is based on the marginal (stationary) distribution while the second focuses
on the conditional (transitional) distribution that gives the distribution of rtconditional on rt1. The interest rate data used are from At-Sahalia (1996).The data, plotted in Figure 1.6, consists of daily 7-day Eurodollar rates
(expressed as percentages) for the period 1 June 1973 to the 25 February
1995, a total of T = 5505 observations.
The Vasicek model expresses the change in the interest rate, rt, as a
function of a constant and the lagged interest rate
rt rt1 = + rt1 + utut iidN
(0, 2
),
(1.15)
where = {, , 2} are unknown parameters, with the restriction < 0.
1.5.1 Stationary Distribution of the Vasicek Model
As a preliminary step to estimating the parameters of the Vasicek model in
equation (1.15), consider the alternative model where the level of the interest
-
24 The Maximum Likelihood Principle
%
t1975 1980 1985 1990 1995
4
8
12
16
20
24
Figure 1.6 Daily 7-day Eurodollar interest rates from the 1 June 1973 to25 February 1995 expressed as a percentage.
rate is independent of previous interest rates
rt = s + vt , vt iidN(0, 2s ) .
The stationary distribution of rt for this model is
f(r;s, 2s) =
12pi2s
exp
[(r s)
2
22s
]. (1.16)
The relationship between the parameters of the stationary distribution and
the parameters of the model in equation (1.15) is
s = , 2s =
2
(2 + ). (1.17)
which are obtained as the unconditional mean and variance of (1.15).
The log-likelihood function based on the stationary distribution in equa-
tion (1.16) for a sample of T observations is
lnLT () = 12ln 2pi 1
2ln2s
1
22sT
Tt=1
(rt s)2 ,
where = {s, 2s}. Maximizing lnLT () with respect to gives
s =1
T
Tt=1
rt , 2s =
1
T
Tt=1
(rt s)2 . (1.18)
Using the Eurodollar interest rates, the maximum likelihood estimates are
s = 8.362, 2s = 12.893. (1.19)
-
1.5 Applications 25
f(r)
Interest Rate-5 0 5 10 15 20 25
Figure 1.7 Estimated stationary distribution of the Vasicek model based onevaluating (1.16) at the maximum likelihood estimates (1.19), using dailyEurodollar rates from the 1 June 1973 to 25 February 1995.
The stationary distribution is estimated by evaluating equation (1.16) at
the maximum likelihood estimates in (1.19) and is given by
f(r; s,
2s
)=
12pi2s
exp
[(r s)
2
22s
]
=1
2pi 12.893 exp[(r 8.362)
2
2 12.893
], (1.20)
which is presented in Figure 1.7.
Inspection of the estimated distribution shows a potential problem with
the Vasicek stationary distribution, namely that the support of the distri-
bution is not restricted to being positive. The probability of negative values
for the interest rate is
Pr (r < 0) =
0
12pi 12.893 exp
[(r 8.362)
2
2 12.893
]dr = 0.01 .
To avoid this problem, alternative models of interest rates are specified where
the stationary distribution is just defined over the positive region. A well
known example is the CIR interest rate model (Cox, Ingersoll and Ross,
1985) which is discussed in Chapters 2, 3 and 12.
1.5.2 Transitional Distribution of the Vasicek Model
In contrast to the stationary model specification of the previous section,
the full dynamics of the Vasicek model in equation (1.15) are now used by
-
26 The Maximum Likelihood Principle
specifying the transitional distribution
f(r | rt1;, , 2
)=
12pi2
exp
[(r rt1)
2
22
], (1.21)
where ={, , 2
}and the substitution = 1+ is made for convenience.
This distribution is now of the same form as the conditional distribution of
the AR(1) model in Examples 1.5, 1.9 and 1.13.
The log-likelihood function based on the transitional distribution in equa-
tion (1.21) is
lnLT () = 12ln 2pi 1
2ln2 1
22(T 1)Tt=2
(rt rt1)2 ,
where the sample size is reduced by one observation as a result of the lagged
term rt1. This form of the log-likelihood function does not contain themarginal distribution f(r1; ), a point that is made in Example 1.13. The
first derivatives of the log-likelihood function are
lnL()
=
1
2(T 1)Tt=2
(rt rt1)
lnL()
=
1
2(T 1)Tt=2
(rt rt1)rt1
lnL()
(2)= 1
22+
1
24(T 1)Tt=2
(rt rt1)2 .
Setting these derivatives to zero yields the maximum likelihood estimators
= rt rt1
=
Tt=2
(rt rt)(rt1 rt1)
Tt=2
(rt1 rt1)2
2 =1
T 1Tt=2
(rt rt1)2 ,
where
rt =1
T 1Tt=2
rt , rt1 =1
T 1Tt=2
rt1 .
-
1.5 Applications 27
The maximum likelihood estimates for the Eurodollar interest rates are
= 0.053, = 0.994, 2 = 0.165. (1.22)
An estimate of is obtained by using the relationship = 1+. Rearranging
for and evaluating at gives = 1 = 0.006.The estimated transitional distribution is obtained by evaluating (1.21)
at the maximum likelihood estimates in (1.22)
f(r | rt1; , , 2
)=
12pi2
exp
[(r rt1)
2
22
]. (1.23)
Plots of this distribution are given in Figure 1.8 for three values of the
conditioning variable rt1, corresponding to the minimum (2.9%), median(8.1%) and maximum (24.3%) interest rates in the sample.
f(r)
r0 5 10 15 20 25 30
.
Figure 1.8 Estimated transitional distribution of the Vasicek model, basedon evaluating (1.23) at the maximum likelihood estimates in (1.22) usingEurodollar rates from 1 June 1973 to 25 February 1995. The dashed line isthe transitional density for the minimum (2.9%), the solid line is the transi-tional density for the median (8.1%) and the dotted line is the transitionaldensity for the maximum (24.3%) Eurodollar rate.
The location of the three transitional distributions changes over time,
while the spread of each distribution remains constant at 2 = 0.165. A
comparison of the estimates of the variances of the stationary and transi-
tional distributions, in equations (1.19) and (1.22), respectively, shows that
2 < 2s . This result is a reflection of the property that by conditioning
on information, in this case rt1, the transitional distribution is better attracking the time series behaviour of the interest rate, rt, than the stationary
distribution where there is no conditioning on lagged dependent variables.
-
28 The Maximum Likelihood Principle
Having obtained the estimated transitional distribution using the maxi-
mum likelihood estimates in (1.22), it is also possible to use these estimates
to reestimate the stationary interest rate distribution in (1.20) by using the
expressions in (1.17). The alternative estimates of the mean and variance of
the stationary distribution are
s = =
0.053
0.006= 8.308,
2s = 2
(2 +
) = 0.1650.006 (2 0.006) = 12.967 .
As these estimates are based on the transitional distribution, which incorpo-
rates the full dynamic specification of the Vasicek model, they represent the
maximum likelihood estimates of the parameters of the stationary distribu-
tion. This relationship between the maximum likelihood estimators of the
transitional and stationary distributions is based on the invariance property
of maximum likelihood estimators which is discussed in Chapter 2. While
the parameter estimates of the stationary distribution using the estimates
of the transitional distribution are numerically close to estimates obtained
in the previous section, the latter estimates are obtained from a misspecified
model as the stationary model excludes the dynamic structure in equation
(1.15). Issues relating to misspecified models are discussed in Chapter 9.
1.6 Exercises
(1) Sampling Data
Gauss file(s) basic_sample.g
Matlab file(s) basic_sample.m
This exercise reproduces the simulation results in Figures 1.1 and 1.2.
For each model, simulate T = 5 draws of yt and plot the corresponding
distribution at each point in time. Where applicable the explanatory
variable in these exercises is xt = {0, 1, 2, 3, 4} and wt are draws from auniform distribution on the unit circle.
(a) Time invariant model
yt = 2zt , zt iidN(0, 1) .
(b) Count model
f (y; 2) =2y exp[2]
y!, y = 1, 2, .
-
1.6 Exercises 29
(c) Linear regression model
yt = 3xt + 2zt , zt iidN(0, 1) .
(d) Exponential regression model
f(y; ) =1
texp
[ yt
], t = 1 + 2xt .
(e) Autoregressive model
yt = 0.8yt1 + 2zt , zt iidN(0, 1) .
(f) Bilinear time series model
yt = 0.8yt1 + 0.4yt1ut1 + 2zt , zt iidN(0, 1) .
(g) Autoregressive model with heteroskedasticity
yt = 0.8yt1 + tzt , zt iidN(0, 1)
2t = 0.8 + 0.8wt .
(h) The ARCH regression model
yt = 3xt + ut
ut = tzt
2t = 4 + 0.9u2t1
zt iidN(0, 1) .
(2) Poisson Distribution
Gauss file(s) basic_poisson.g
Matlab file(s) basic_poisson.m
A sample of T = 4 observations, yt = {6, 2, 3, 1}, is drawn from thePoisson distribution
f(y; ) =y exp[]
y!.
(a) Write the log-likelihood function, lnLT ().
(b) Derive and interpret the maximum likelihood estimator, .
(c) Compute the maximum likelihood estimate, .
(d) Compute the log-likelihood function at for each observation.
(e) Compute the value of the log-likelihood function at .
-
30 The Maximum Likelihood Principle
(f) Compute
gt() =d ln lt()
d
=
and ht() =d2 ln lt()
d2
=
,
for each observation.
(g) Compute
GT () =1
T
Tt=1
gt() and HT () =1
T
Tt=1
ht() .
(3) Exponential Distribution
Gauss file(s) basic_exp.g
Matlab file(s) basic_exp.m
A sample of T = 4 observations, yt = {5.5, 2.0, 3.5, 5.0}, is drawn fromthe exponential distribution
f(y; ) = exp[y] .(a) Write the log-likelihood function, lnLT ().
(b) Derive and interpret the maximum likelihood estimator, .
(c) Compute the maximum likelihood estimate, .
(d) Compute the log-likelihood function at for each observation.
(e) Compute the value of the log-likelihood function at .
(f) Compute
gt() =d ln lt()
d
=
and ht() =d2 ln lt()
d2
=
,
for each observation.
(g) Compute
GT () =1
T
Tt=1
gt() and HT () =1
T
Tt=1
ht() .
(4) Alternative Form of Exponential Distribution
Consider a random sample of size T , {y1, y2, , yT }, of iid randomvariables from the exponential distribution with parameter
f(y; ) =1
exp
[y
].
(a) Derive the log-likelihood function, lnLT ().
(b) Derive the first derivative of the log-likelihood function, GT ().
-
1.6 Exercises 31
(c) Derive the second derivative of the log-likelihood function, HT ().
(d) Derive the maximum likelihood estimator of . Compare the result
with that obtained in Exercise 3.
(5) Normal Distribution
Gauss file(s) basic_normal.g, basic_normal_like.g
Matlab file(s) basic_normal.m, basic_normal_like.m
A sample of T = 5 observations consisting of the values {1, 2, 5, 1, 2} isdrawn from the normal distribution
f(y; ) =12pi2
exp
[(y )
2
22
],
where = {, 2}.(a) Assume that 2 = 1.
(i) Derive the log-likelihood function, lnLT ().
(ii) Derive and interpret the maximum likelihood estimator, .
(iii) Compute the maximum likelihood estimate, .
(iv) Compute ln lt(), gt() and ht().
(v) Compute lnLT (), GT () and HT ().
(b) Repeat part (a) for the case where both the mean and the variance
are unknown, = {, 2}.
(6) A Model of the Number of Strikes
Gauss file(s) basic_count.g, strike.dat
Matlab file(s) basic_count.m, strike.mat
The data are the number of strikes per annum, yt, in the U.S. from 1968
to 1976, taken from Kennan (1985). The number of strikes is specified
as a Poisson-distributed random variable with unknown parameter
f (y; ) =y exp[]
y!.
(a) Write the log-likelihood function for a sample of T observations.
(b) Derive and interpret the maximum likelihood estimator of .
(c) Estimate and interpret the result.
(d) Use the estimate from part (c), to plot the distribution of the number
of strikes and interpret this plot.
-
32 The Maximum Likelihood Principle
(e) Compute a histogram of yt and comment on its consistency with
the distribution of strike numbers estimated in part (d).
(7) A Model of the Duration of Strikes
Gauss file(s) basic_strike.g, strike.dat
Matlab file(s) basic_strike.m, strike.mat
The data are 62 observations, taken from the same source as Exercise
6, of the duration of strikes in the U.S. per annum expressed in days,
yt. Durations are assumed to be drawn from an exponential distribution
with unknown parameter
f (y; ) =1
exp
[y
].
(a) Write the log-likelihood function for a sample of T observations.
(b) Derive and interpret the maximum likelihood estimator of .
(c) Use the data on strike durations to estimate . Interpret the result.
(d) Use the estimates from part (c) to plot the distribution of strike
durations and interpret this plot.
(e) Compute a histogram of yt and comment on its consistency with
the distribution of duration times estimated in part (d).
(8) Asset Prices
Gauss file(s) basic_assetprices.g, assetprices.xls
Matlab file(s) basic_assetprices.m, assetprices.mat
The data consist of the Australian, Singapore and NASDAQ stock mar-
ket indexes for the period 3 January 1989 to 31 December 2009, a total
of T = 5478 observations. Consider the following model of asset prices,
pt, that is commonly adopted in the financial econometrics literature
ln pt ln pt1 = + ut , ut iidN(0, 2) ,where = {, 2} are unknown parameters.(a) Use the transformation of variable technique to show that the con-
ditional distribution of p is the log-normal distribution
f (p | pt1; ) = 12pi2p
exp
[ ln p ln pt1
22
].
(b) For a sample of size T , construct the log-likelihood function and de-
rive the maximum likelihood estimator of based on the conditional
distribution of p.
-
1.6 Exercises 33
(c) Use the results in part (b) to compute for the three stock indexes.
(d) Estimate the asset price distribution for each index using the max-
imum likelihood parameter estimates obtained in part (c).
(e) Letting rt = ln pt ln pt1 represent the return on an asset, derivethe maximum likelihood estimator of based on the distribution of
rt. Compute for the three stock market indexes and compare the
estimates to those obtained in part (c).
(9) Stationary Distribution of the Vasicek Model
Gauss file(s) basic_stationary.g, eurodata.dat
Matlab file(s) basic_stationary.m, eurodata.mat
The data are daily 7-day Eurodollar rates, expressed as percentages,
from 1 June 1973 to the 25 February 1995, a total of T = 5505 observa-
tions. The Vasicek discret