MARTIN, V. L.; A. S. HURN AND D. HARRIS. Econometic Modelling with Time Series_Specification,...

953
Econometric Modelling with Time Series Specification, Estimation and Testing V. L. Martin, A. S. Hurn and D. Harris

Transcript of MARTIN, V. L.; A. S. HURN AND D. HARRIS. Econometic Modelling with Time Series_Specification,...

  • Econometric Modelling with Time Series

    Specification, Estimation and Testing

    V. L. Martin, A. S. Hurn and D. Harris

  • iv

    Preface

    This book provides a general framework for specifying, estimating and test-

    ing time series econometric models. Special emphasis is given to estima-

    tion by maximum likelihood, but other methods are also discussed includ-

    ing quasi-maximum likelihood estimation, generalized method of moments,

    nonparametrics and estimation by simulation. An important advantage of

    adopting the principle of maximum likelihood as the unifying framework for

    the book is that many of the estimators and test statistics proposed in econo-

    metrics can be derived within a likelihood framework, thereby providing a

    coherent vehicle for understanding their properties and interrelationships.

    In contrast to many existing econometric textbooks, which deal mainly

    with the theoretical properties of estimators and test statistics through a

    theorem-proof presentation, this book is very concerned with implemen-

    tation issues in order to provide a fast-track between the theory and ap-

    plied work. Consequently many of the econometric methods discussed in

    the book are illustrated by means of a suite of programs written in GAUSS

    and MATLABR.1 The computer code emphasizes the computational side of

    econometrics and follows the notation in the book as closely as possible,

    thereby reinforcing the principles presented in the text. More generally, the

    computer code also helps to bridge the gap between theory and practice

    by enabling the reproduction of both theoretical and empirical results pub-

    lished in recent journal articles. The reader, as a result, may build on the

    code and tailor it to more involved applications.

    Organization of the Book

    Part ONE of the book is an exposition of the basic maximum likelihood

    framework. To implement this approach, three conditions are required: the

    probability distribution of the stochastic process must be known and spec-

    ified correctly, the parametric specifications of the moments of the distri-

    bution must be known and specified correctly, and the likelihood must be

    tractable. The properties of maximum likelihood estimators are presented

    and three fundamental testing procedures namely, the Likelihood Ratio

    test, the Wald test and the Lagrange Multiplier test are discussed in detail.

    There is also a comprehensive treatment of iterative algorithms to compute

    maximum likelihood estimators when no analytical expressions are available.

    Part TWO is the usual regression framework taught in standard econo-

    metric courses but presented within the maximum likelihood framework.1 GAUSS is a registered trademark of Aptech Systems, Inc. http://www.aptech.com/ andMATLABR is a registered trademark of The MathWorks, Inc. http://www.mathworks.com/.

  • vBoth nonlinear regression models and non-spherical models exhibiting ei-

    ther autocorrelation or heteroskedasticity, or both, are presented. A further

    advantage of the maximum likelihood strategy is that it provides a mecha-

    nism for deriving new estimators and new test statistics, which are designed

    specifically for non-standard problems.

    Part THREE provides a coherent treatment of a number of alternative es-

    timation procedures which are applicable when the conditions to implement

    maximum likelihood estimation are not satisfied. For the case where the

    probability distribution is incorrectly specified, quasi-maximum likelihood

    is appropriate. If the joint probability distribution of the data is treated as

    unknown, then a generalized method of moments estimator is adopted. This

    estimator has the advantage of circumventing the need to specify the dis-

    tribution and hence avoids any potential misspecification from an incorrect

    choice of the distribution. An even less restrictive approach is not to specify

    either the distribution or the parametric form of the moments of the distri-

    bution and use nonparametric procedures to model either the distribution

    of variables or the relationships between variables. Simulation estimation

    methods are used for models where the likelihood is intractable arising, for

    example, from the presence of latent variables. Indirect inference, efficient

    methods of moments and simulated methods of moments are presented and

    compared.

    Part FOUR examines stationary time series models with a special empha-

    sis on using maximum likelihood methods to estimate and test these models.

    Both single equation models, including the autoregressive moving average

    class of models, and multiple equation models, including vector autoregres-

    sions and structural vector autoregressions, are dealt with in detail. Also

    discussed are linear factor models where the factors are treated as latent.

    The presence of the latent factor means that the full likelihood is generally

    not tractable. However, if the models are specified in terms of the normal

    distribution with moments based on linear parametric representations, a

    Kalman filter is used to rewrite the likelihood in terms of the observable

    variables thereby making estimation and testing by maximum likelihood

    feasible.

    Part FIVE focusses on nonstationary time series models and in particular

    tests for unit roots and cointegration. Some important asymptotic results

    for nonstationary time series are presented followed by a comprehensive dis-

    cussion of testing for unit roots. Cointegration is tackled from the perspec-

    tive that the well-known Johansen estimator may be usefully interpreted

    as a maximum likelihood estimator based on the assumption of a normal

    distribution applied to a system of equations that is subject to a set of

  • vi

    cross-equation restrictions arising from the assumption of common long-run

    relationships. Further, the trace and maximum eigenvalue tests of cointegra-

    tion are shown to be likelihood ratio tests.

    Part SIX is concerned with nonlinear time series models. Models that are

    nonlinear in mean include the threshold class of model, bilinear models and

    also artificial neural network modelling, which, contrary to many existing

    treatments, is again addressed from the econometric perspective of estima-

    tion and testing based on maximum likelihood methods. Nonlinearities in

    variance are dealt with in terms of the GARCH class of models. The final

    chapter focusses on models that deal with discrete or truncated time series

    data.

    Even in a project of this size and scope, sacrifices have had to be made to

    keep the length of the book manageable. Accordingly, there are a number

    of important topics that have had to be omitted.

    (i) Although Bayesian methods are increasingly being used in many areas

    of statistics and econometrics, no material on Bayesian econometrics is

    included. This is an important field in its own right and the interested

    reader is referred to recent books by Koop (2003), Geweke (2005), Koop,

    Poirier and Tobias (2007) and Greenberg (2008), inter alia. Where ap-

    propriate, references to Bayesian methods are provided in the body of

    the text.

    (ii) With great reluctance a chapter on bootstrapping was not included be-

    cause of space issues. A good place to start reading is the introductory

    text by Efron and Tibshirani (1993) and the useful surveys by Horowitz

    (1997) and Li and Maddala (1996b,1996a).

    (iii) In Part SIX, in the chapter dealing with modelling the variance of time

    series, there are important recent developments in stochastic volatility

    and realized volatility that would be worthy of inclusion. For stochastic

    volatility, there is an excellent volume of readings edited by Shephard

    (2005), while the seminal articles in the area of realized volatility are

    Anderson et al. (2001, 2003).

    The fact that these areas have not been covered should not be regarded as a

    value judgement about their relative importance. Instead the subject matter

    chosen for inclusion reflects a balance between the interests of the authors

    and purely operational decisions aimed at preserving the flow and continuity

    of the book.

  • vii

    Computer Code

    Specifically, computer code is available from a companion website to repro-

    duce relevant examples in the text, to reproduce figures in the text that are

    not part of an example, to reproduce the applications presented in the final

    section of each chapter, and to complete the exercises. Where applicable,

    the time series data used in these examples, applications and exercises are

    also available in a number of different formats.

    Presenting numerical results in the examples immediately gives rise to two

    important issues concerning numerical precision.

    (1) In all of the examples listed in the front of the book where computer code

    has been used, the numbers appearing in the text are rounded versions of

    those generated by the code. Accordingly, the rounded numbers should

    be interpreted as such and not be used independently of the computer

    code to try and reproduce the numbers reported in the text.

    (2) In many of the examples, simulation has been used to demonstrate a

    concept. Since GAUSS and MATLAB have different random number gen-

    erators, the results generated by the different sets of code will not be

    identical to one another. For consistency we have always used the GAUSS

    output for reporting purposes.

    Although GAUSS and MATLAB are very similar high-level programming

    languages, there are some important differences that require explanation.

    Probably the most important difference is one of programming style. GAUSS

    programs are script files that allow calls to both inbuilt GAUSS and user-

    defined procedures. MATLAB, on the other hand, does not support the use

    of user-defined functions in script files. Furthermore, MATLAB programming

    style favours writing user-defined functions in separate files and then calling

    them as if they were in-built functions. This style of programming does not

    suit the learning-by-doing environment that the book tries to create. Con-

    sequently, the MATLAB programs are written mainly as function files with a

    main function and all the required user-defined functions required to im-

    plement the procedure in the same file. The only exception to this rule is

    that a few MATLAB utility files, which greatly facilitate the conversion and

    interpretation of code from GAUSS to MATLAB, which are provided as sep-

    arate stand-alone MATLAB function files. Finally, all the figures in the text

    were created using MATLAB together with a utility file laprint.m written by

    Arno Linnemann of the University of Kessel.2

    2 A user guide is available athttp://www.uni-kassel.de/fb16/rat/matlab/laprint/laprintdoc.ps.

  • viii

    Acknowledgements

    Creating a manuscript of this scope and magnitude is a daunting task and

    there are many people to whom we are indebted. In particular, we would

    like to thank Kenneth Lindsay, Adrian Pagan and Andy Tremayne for their

    careful reading of various chapters of the manuscript and for many helpful

    comments and suggestions. Gael Martin helped with compiling a suitable

    list of references to Bayesian econometric methods. Ayesha Scott compiled

    the index, a painstaking task for a manuscript of this size. Many others

    have commented on earlier drafts of chapters and we are grateful to the

    following individuals: our colleagues, Gunnar Bardsen, Ralf Becker, Adam

    Clements, Vlad Pavlov and Joseph Jeisman; and our graduate students, Tim

    Christensen, Christopher Coleman-Fenn, Andrew McClelland, Jessie Wang

    and Vivianne Vilar.

    We also wish to express our deep appreciation to the team at Cambridge

    University Press, particularly Peter C. B. Phillips for his encouragement

    and support throughout the long gestation period of the book as well as

    for reading and commenting on earlier drafts. Scott Parris, with his energy

    and enthusiasm for the project, was a great help in sustaining the authors

    during the long slog of completing the manuscript. Our thanks are also due

    to our CUP readers who provided detailed and constructive feedback at

    various stages in the compilation of the final document. Michael Erkelenz of

    Fine Line Writers edited the entire manuscript, helped to smooth out the

    prose and provided particular assistance with the correct use of adjectival

    constructions in the passive voice.

    It is fair to say that writing this book was an immense task that involved

    the consumption of copious quantities of chillies, champagne and port over a

    protracted period of time. The biggest debt of gratitude we owe, therefore, is

    to our respective families. To Gael, Sarah and David; Cath, Iain, Robert and

    Tim; and Fiona and Caitlin: thank you for your patience, your good humour

    in putting up with and cleaning up after many a pizza night, your stoicism

    in enduring yet another vacant stare during an important conversation and,

    ultimately, for making it all worthwhile.

    Vance Martin, Stan Hurn & David Harris

    November 2011

  • Contents

    List of illustrations page 1

    Computer Code used in the Examples 4

    PART ONE MAXIMUM LIKELIHOOD 1

    1 The Maximum Likelihood Principle 3

    1.1 Introduction 3

    1.2 Motivating Examples 3

    1.3 Joint Probability Distributions 9

    1.4 Maximum Likelihood Framework 12

    1.4.1 The Log-Likelihood Function 12

    1.4.2 Gradient 18

    1.4.3 Hessian 20

    1.5 Applications 23

    1.5.1 Stationary Distribution of the Vasicek Model 23

    1.5.2 Transitional Distribution of the Vasicek Model 25

    1.6 Exercises 28

    2 Properties of Maximum Likelihood Estimators 35

    2.1 Introduction 35

    2.2 Preliminaries 35

    2.2.1 Stochastic Time Series Models and Their Prop-

    erties 36

    2.2.2 Weak Law of Large Numbers 41

    2.2.3 Rates of Convergence 45

    2.2.4 Central Limit Theorems 47

    2.3 Regularity Conditions 55

    2.4 Properties of the Likelihood Function 57

  • x Contents

    2.4.1 The Population Likelihood Function 57

    2.4.2 Moments of the Gradient 58

    2.4.3 The Information Matrix 61

    2.5 Asymptotic Properties 63

    2.5.1 Consistency 63

    2.5.2 Normality 67

    2.5.3 Efficiency 68

    2.6 Finite-Sample Properties 72

    2.6.1 Unbiasedness 73

    2.6.2 Sufficiency 74

    2.6.3 Invariance 75

    2.6.4 Non-Uniqueness 76

    2.7 Applications 76

    2.7.1 Portfolio Diversification 78

    2.7.2 Bimodal Likelihood 80

    2.8 Exercises 82

    3 Numerical Estimation Methods 91

    3.1 Introduction 91

    3.2 Newton Methods 92

    3.2.1 Newton-Raphson 93

    3.2.2 Method of Scoring 94

    3.2.3 BHHH Algorithm 95

    3.2.4 Comparative Examples 98

    3.3 Quasi-Newton Methods 101

    3.4 Line Searching 102

    3.5 Optimisation Based on Function Evaluation 104

    3.6 Computing Standard Errors 106

    3.7 Hints for Practical Optimization 109

    3.7.1 Concentrating the Likelihood 109

    3.7.2 Parameter Constraints 110

    3.7.3 Choice of Algorithm 111

    3.7.4 Numerical Derivatives 112

    3.7.5 Starting Values 113

    3.7.6 Convergence Criteria 113

    3.8 Applications 114

    3.8.1 Stationary Distribution of the CIR Model 114

    3.8.2 Transitional Distribution of the CIR Model 116

    3.9 Exercises 118

  • Contents xi

    4 Hypothesis Testing 124

    4.1 Introduction 124

    4.2 Overview 124

    4.3 Types of Hypotheses 126

    4.3.1 Simple and Composite Hypotheses 126

    4.3.2 Linear Hypotheses 127

    4.3.3 Nonlinear Hypotheses 128

    4.4 Likelihood Ratio Test 129

    4.5 Wald Test 133

    4.5.1 Linear Hypotheses 134

    4.5.2 Nonlinear Hypotheses 136

    4.6 Lagrange Multiplier Test 137

    4.7 Distribution Theory 139

    4.7.1 Asymptotic Distribution of the Wald Statistic 139

    4.7.2 Asymptotic Relationships Among the Tests 142

    4.7.3 Finite Sample Relationships 143

    4.8 Size and Power Properties 145

    4.8.1 Size of a Test 145

    4.8.2 Power of a Test 146

    4.9 Applications 148

    4.9.1 Exponential Regression Model 148

    4.9.2 Gamma Regression Model 151

    4.10 Exercises 153

    PART TWO REGRESSION MODELS 159

    5 Linear Regression Models 161

    5.1 Introduction 161

    5.2 Specification 162

    5.2.1 Model Classification 162

    5.2.2 Structural and Reduced Forms 163

    5.3 Estimation 166

    5.3.1 Single Equation: Ordinary Least Squares 166

    5.3.2 Multiple Equations: FIML 170

    5.3.3 Identification 175

    5.3.4 Instrumental Variables 177

    5.3.5 Seemingly Unrelated Regression 181

    5.4 Testing 182

    5.5 Applications 187

  • xii Contents

    5.5.1 Linear Taylor Rule 187

    5.5.2 The Klein Model of the U.S. Economy 189

    5.6 Exercises 191

    6 Nonlinear Regression Models 199

    6.1 Introduction 199

    6.2 Specification 199

    6.3 Maximum Likelihood Estimation 201

    6.4 Gauss-Newton 208

    6.4.1 Relationship to Nonlinear Least Squares 212

    6.4.2 Relationship to Ordinary Least Squares 213

    6.4.3 Asymptotic Distributions 213

    6.5 Testing 214

    6.5.1 LR, Wald and LM Tests 214

    6.5.2 Nonnested Tests 218

    6.6 Applications 221

    6.6.1 Robust Estimation of the CAPM 221

    6.6.2 Stochastic Frontier Models 224

    6.7 Exercises 228

    7 Autocorrelated Regression Models 234

    7.1 Introduction 234

    7.2 Specification 234

    7.3 Maximum Likelihood Estimation 236

    7.3.1 Exact Maximum Likelihood 237

    7.3.2 Conditional Maximum Likelihood 238

    7.4 Alternative Estimators 240

    7.4.1 Gauss-Newton 241

    7.4.2 Zig-zag Algorithms 244

    7.4.3 Cochrane-Orcutt 247

    7.5 Distribution Theory 248

    7.5.1 Maximum Likelihood Estimator 249

    7.5.2 Least Squares Estimator 253

    7.6 Lagged Dependent Variables 258

    7.7 Testing 260

    7.7.1 Alternative LM Test I 262

    7.7.2 Alternative LM Test II 263

    7.7.3 Alternative LM Test III 264

    7.8 Systems of Equations 265

    7.8.1 Estimation 266

    7.8.2 Testing 268

  • Contents xiii

    7.9 Applications 268

    7.9.1 Illiquidity and Hedge Funds 268

    7.9.2 Beach-Mackinnon Simulation Study 269

    7.10 Exercises 271

    8 Heteroskedastic Regression Models 280

    8.1 Introduction 280

    8.2 Specification 280

    8.3 Estimation 283

    8.3.1 Maximum Likelihood 283

    8.3.2 Relationship with Weighted Least Squares 286

    8.4 Distribution Theory 289

    8.5 Testing 289

    8.6 Heteroskedasticity in Systems of Equations 295

    8.6.1 Specification 295

    8.6.2 Estimation 297

    8.6.3 Testing 299

    8.6.4 Heteroskedastic and Autocorrelated Disturbances 300

    8.7 Applications 302

    8.7.1 The Great Moderation 302

    8.7.2 Finite Sample Properties of the Wald Test 304

    8.8 Exercises 306

    PART THREE OTHER ESTIMATION METHODS 313

    9 Quasi-Maximum Likelihood Estimation 315

    9.1 Introduction 315

    9.2 Misspecification 316

    9.3 The Quasi-Maximum Likelihood Estimator 320

    9.4 Asymptotic Distribution 323

    9.4.1 Misspecification and the Information Equality 325

    9.4.2 Independent and Identically Distributed Data 328

    9.4.3 Dependent Data: Martingale Difference Score 329

    9.4.4 Dependent Data and Score 330

    9.4.5 Variance Estimation 331

    9.5 Quasi-Maximum Likelihood and Linear Regression 333

    9.5.1 Nonnormality 336

    9.5.2 Heteroskedasticity 337

    9.5.3 Autocorrelation 338

    9.5.4 Variance Estimation 342

  • xiv Contents

    9.6 Testing 346

    9.7 Applications 348

    9.7.1 Autoregressive Models for Count Data 348

    9.7.2 Estimating the Parameters of the CKLS Model 351

    9.8 Exercises 354

    10 Generalized Method of Moments 361

    10.1 Introduction 361

    10.2 Motivating Examples 362

    10.2.1 Population Moments 362

    10.2.2 Empirical Moments 363

    10.2.3 GMM Models from Conditional Expectations 368

    10.2.4 GMM and Maximum Likelihood 371

    10.3 Estimation 372

    10.3.1 The GMM Objective Function 372

    10.3.2 Asymptotic Properties 373

    10.3.3 Estimation Strategies 378

    10.4 Over-Identification Testing 382

    10.5 Applications 387

    10.5.1 Monte Carlo Evidence 387

    10.5.2 Level Effect in Interest Rates 393

    10.6 Exercises 396

    11 Nonparametric Estimation 404

    11.1 Introduction 404

    11.2 The Kernel Density Estimator 405

    11.3 Properties of the Kernel Density Estimator 409

    11.3.1 Finite Sample Properties 410

    11.3.2 Optimal Bandwidth Selection 410

    11.3.3 Asymptotic Properties 414

    11.3.4 Dependent Data 416

    11.4 Semi-Parametric Density Estimation 417

    11.5 The Nadaraya-Watson Kernel Regression Estimator 419

    11.6 Properties of Kernel Regression Estimators 423

    11.7 Bandwidth Selection for Kernel Regression 427

    11.8 Multivariate Kernel Regression 430

    11.9 Semi-parametric Regression of the Partial Linear Model 432

    11.10 Applications 433

    11.10.1Derivatives of a Nonlinear Production Function 434

    11.10.2Drift and Diffusion Functions of SDEs 436

    11.11 Exercises 439

  • Contents xv

    12 Estimation by Simulation 447

    12.1 Introduction 447

    12.2 Motivating Example 448

    12.3 Indirect Inference 450

    12.3.1 Estimation 451

    12.3.2 Relationship with Indirect Least Squares 455

    12.4 Efficient Method of Moments (EMM) 456

    12.4.1 Estimation 456

    12.4.2 Relationship with Instrumental Variables 458

    12.5 Simulated Generalized Method of Moments (SMM) 459

    12.6 Estimating Continuous-Time Models 461

    12.6.1 Brownian Motion 464

    12.6.2 Geometric Brownian Motion 467

    12.6.3 Stochastic Volatility 470

    12.7 Applications 472

    12.7.1 Simulation Properties 473

    12.7.2 Empirical Properties 475

    12.8 Exercises 477

    PART FOUR STATIONARY TIME SERIES 483

    13 Linear Time Series Models 485

    13.1 Introduction 485

    13.2 Time Series Properties of Data 486

    13.3 Specification 488

    13.3.1 Univariate Model Classification 489

    13.3.2 Multivariate Model Classification 491

    13.3.3 Likelihood 493

    13.4 Stationarity 493

    13.4.1 Univariate Examples 494

    13.4.2 Multivariate Examples 495

    13.4.3 The Stationarity Condition 496

    13.4.4 Wolds Representation Theorem 497

    13.4.5 Transforming a VAR to a VMA 498

    13.5 Invertibility 501

    13.5.1 The Invertibility Condition 501

    13.5.2 Transforming a VMA to a VAR 502

    13.6 Estimation 502

    13.7 Optimal Choice of Lag Order 506

  • xvi Contents

    13.8 Distribution Theory 508

    13.9 Testing 511

    13.10 Analyzing Vector Autoregressions 513

    13.10.1Granger Causality Testing 515

    13.10.2Impulse Response Functions 517

    13.10.3Variance Decompositions 523

    13.11 Applications 525

    13.11.1Barros Rational Expectations Model 525

    13.11.2The Campbell-Shiller Present Value Model 526

    13.12 Exercises 528

    14 Structural Vector Autoregressions 537

    14.1 Introduction 537

    14.2 Specification 538

    14.2.1 Short-Run Restrictions 542

    14.2.2 Long-Run Restrictions 544

    14.2.3 Short-Run and Long-Run Restrictions 548

    14.2.4 Sign Restrictions 550

    14.3 Estimation 553

    14.4 Identification 558

    14.5 Testing 559

    14.6 Applications 561

    14.6.1 Peersmans Model of Oil Price Shocks 561

    14.6.2 A Portfolio SVAR Model of Australia 563

    14.7 Exercises 566

    15 Latent Factor Models 571

    15.1 Introduction 571

    15.2 Motivating Examples 572

    15.2.1 Empirical 572

    15.2.2 Theoretical 574

    15.3 The Recursions of the Kalman Filter 575

    15.3.1 Univariate 576

    15.3.2 Multivariate 581

    15.4 Extensions 585

    15.4.1 Intercepts 585

    15.4.2 Dynamics 585

    15.4.3 Nonstationary Factors 587

    15.4.4 Exogenous and Predetermined Variables 589

    15.5 Factor Extraction 589

    15.6 Estimation 591

  • Contents xvii

    15.6.1 Identification 591

    15.6.2 Maximum Likelihood 591

    15.6.3 Principal Components Estimator 593

    15.7 Relationship to VARMA Models 596

    15.8 Applications 597

    15.8.1 The Hodrick-Prescott Filter 597

    15.8.2 A Factor Model of Spreads with Money Shocks 601

    15.9 Exercises 603

    PART FIVE NON-STATIONARY TIME SERIES 613

    16 Nonstationary Distribution Theory 615

    16.1 Introduction 615

    16.2 Specification 616

    16.2.1 Models of Trends 616

    16.2.2 Integration 618

    16.3 Estimation 620

    16.3.1 Stationary Case 621

    16.3.2 Nonstationary Case: Stochastic Trends 624

    16.3.3 Nonstationary Case: Deterministic Trends 626

    16.4 Asymptotics for Integrated Processes 629

    16.4.1 Brownian Motion 630

    16.4.2 Functional Central Limit Theorem 631

    16.4.3 Continuous Mapping Theorem 635

    16.4.4 Stochastic Integrals 637

    16.5 Multivariate Analysis 638

    16.6 Applications 640

    16.6.1 Least Squares Estimator of the AR(1) Model 641

    16.6.2 Trend Misspecification 643

    16.7 Exercises 644

    17 Unit Root Testing 651

    17.1 Introduction 651

    17.2 Specification 651

    17.3 Detrending 653

    17.3.1 Ordinary Least Squares: Dickey and Fuller 655

    17.3.2 First Differences: Schmidt and Phillips 656

    17.3.3 Generalized Least Squares: Elliott, Rothenberg

    and Stock 657

    17.4 Testing 658

  • xviii Contents

    17.4.1 Dickey-Fuller Tests 659

    17.4.2 M Tests 660

    17.5 Distribution Theory 662

    17.5.1 Ordinary Least Squares Detrending 664

    17.5.2 Generalized Least Squares Detrending 665

    17.5.3 Simulating Critical Values 667

    17.6 Power 668

    17.6.1 Near Integration and the Ornstein-Uhlenbeck

    Processes 669

    17.6.2 Asymptotic Local Power 671

    17.6.3 Point Optimal Tests 671

    17.6.4 Asymptotic Power Envelope 673

    17.7 Autocorrelation 675

    17.7.1 Dickey-Fuller Test with Autocorrelation 675

    17.7.2 M Tests with Autocorrelation 676

    17.8 Structural Breaks 678

    17.8.1 Known Break Point 681

    17.8.2 Unknown Break Point 684

    17.9 Applications 685

    17.9.1 Power and the Initial Value 685

    17.9.2 Nelson-Plosser Data Revisited 687

    17.10 Exercises 687

    18 Cointegration 695

    18.1 Introduction 695

    18.2 Long-Run Economic Models 696

    18.3 Specification: VECM 698

    18.3.1 Bivariate Models 698

    18.3.2 Multivariate Models 700

    18.3.3 Cointegration 701

    18.3.4 Deterministic Components 703

    18.4 Estimation 705

    18.4.1 Full-Rank Case 706

    18.4.2 Reduced-Rank Case: Iterative Estimator 707

    18.4.3 Reduced Rank Case: Johansen Estimator 709

    18.4.4 Zero-Rank Case 715

    18.5 Identification 716

    18.5.1 Triangular Restrictions 716

    18.5.2 Structural Restrictions 717

    18.6 Distribution Theory 718

  • Contents xix

    18.6.1 Asymptotic Distribution of the Eigenvalues 718

    18.6.2 Asymptotic Distribution of the Parameters 720

    18.7 Testing 724

    18.7.1 Cointegrating Rank 724

    18.7.2 Cointegrating Vector 727

    18.7.3 Exogeneity 730

    18.8 Dynamics 731

    18.8.1 Impulse responses 731

    18.8.2 Cointegrating Vector Interpretation 732

    18.9 Applications 732

    18.9.1 Rank Selection Based on Information Criteria 733

    18.9.2 Effects of Heteroskedasticity on the Trace Test 735

    18.10 Exercises 737

    PART SIX NONLINEAR TIME SERIES 747

    19 Nonlinearities in Mean 749

    19.1 Introduction 749

    19.2 Motivating Examples 749

    19.3 Threshold Models 755

    19.3.1 Specification 755

    19.3.2 Estimation 756

    19.3.3 Testing 758

    19.4 Artificial Neural Networks 761

    19.4.1 Specification 761

    19.4.2 Estimation 764

    19.4.3 Testing 766

    19.5 Bilinear Time Series Models 767

    19.5.1 Specification 767

    19.5.2 Estimation 768

    19.5.3 Testing 769

    19.6 Markov Switching Model 770

    19.7 Nonparametric Autoregression 774

    19.8 Nonlinear Impulse Responses 775

    19.9 Applications 779

    19.9.1 A Multiple Equilibrium Model of Unemployment 779

    19.9.2 Bivariate Threshold Models of G7 Countries 781

    19.10 Exercises 784

  • xx Contents

    20 Nonlinearities in Variance 795

    20.1 Introduction 795

    20.2 Statistical Properties of Asset Returns 795

    20.3 The ARCH Model 799

    20.3.1 Specification 799

    20.3.2 Estimation 801

    20.3.3 Testing 804

    20.4 Univariate Extensions 807

    20.4.1 GARCH 807

    20.4.2 Integrated GARCH 812

    20.4.3 Additional Variables 813

    20.4.4 Asymmetries 814

    20.4.5 Garch-in-Mean 815

    20.4.6 Diagnostics 817

    20.5 Conditional Nonnormality 818

    20.5.1 Parametric 819

    20.5.2 Semi-Parametric 821

    20.5.3 Nonparametric 821

    20.6 Multivariate GARCH 825

    20.6.1 VECH 826

    20.6.2 BEKK 827

    20.6.3 DCC 830

    20.6.4 DECO 836

    20.7 Applications 837

    20.7.1 DCC and DECO Models of U.S. Zero Coupon

    Yields 837

    20.7.2 A Time-Varying Volatility SVAR Model 838

    20.8 Exercises 841

    21 Discrete Time Series Models 850

    21.1 Introduction 850

    21.2 Motivating Examples 850

    21.3 Qualitative Data 853

    21.3.1 Specification 853

    21.3.2 Estimation 857

    21.3.3 Testing 861

    21.3.4 Binary Autoregressive Models 863

    21.4 Ordered Data 865

    21.5 Count Data 867

    21.5.1 The Poisson Regression Model 869

  • Contents xxi

    21.5.2 Integer Autoregressive Models 871

    21.6 Duration Data 874

    21.7 Applications 876

    21.7.1 An ACH Model of U.S. Airline Trades 876

    21.7.2 EMM Estimator of Integer Models 879

    21.8 Exercises 881

    Appendix A Change of Variable in Probability Density Func-

    tions 887

    Appendix B The Lag Operator 888

    B.1 Basics 888

    B.2 Polynomial Convolution 889

    B.3 Polynomial Inversion 890

    B.4 Polynomial Decomposition 891

    Appendix C FIML Estimation of a Structural Model 892

    C.1 Log-likelihood Function 892

    C.2 First-order Conditions 892

    C.3 Solution 893

    Appendix D Additional Nonparametric Results 897

    D.1 Mean 897

    D.2 Variance 899

    D.3 Mean Square Error 901

    D.4 Roughness 902

    D.4.1 Roughness Results for the Gaussian Distribution 902

    D.4.2 Roughness Results for the Gaussian Kernel 903

    References 905

    Author index 915

    Subject index 918

  • Illustrations

    1.1 Probability distributions of y for various models 51.2 Probability distributions of y for various models 71.3 Log-likelihood function for Poisson distribution 151.4 Log-likelihood function for exponential distribution 151.5 Log-likelihood function for the normal distribution 171.6 Eurodollar interest rates 241.7 Stationary density of Eurodollar interest rates 251.8 Transitional density of Eurodollar interest rates 272.1 Demonstration of the weak law of large numbers 422.2 Demonstration of the Lindeberg-Levy central limit theorem 492.3 Convergence of log-likelihood function 652.4 Consistency of sample mean for normal distribution 652.5 Consistency of median for Cauchy distribution 662.6 Illustrating asymptotic normality 692.7 Bivariate normal distribution 772.8 Scatter plot of returns on Apple and Ford stocks 782.9 Gradient of the bivariate normal model 813.1 Stationary density of Eurodollar interest rates: CIR model 1153.2 Estimated variance function of CIR model 1174.1 Illustrating the LR and Wald tests 1254.2 Illustrating the LM test 1264.3 Simulated and asymptotic distributions of the Wald test 1425.1 Simulating a bivariate regression model 1665.2 Sampling distribution of a weak instrument 1805.3 U.S. data on the Taylor Rule 1886.1 Simulated exponential models 2016.2 Scatter of plot Martin Marietta returns data 2226.3 Stochastic frontier disturbance distribution 2257.1 Simulated models with autocorrelated disturbances 236

  • 2 Illustrations

    7.2 Distribution of maximum likelihood estimator in an autocorre-lated regression model 252

    8.1 Simulated data from heteroskedastic models 2828.2 The Great Moderation 3038.3 Sampling distribution of Wald test 3058.4 Power of Wald test 3059.1 Comparison of true and misspecified log-likelihood functions 3179.2 U.S. Dollar/British Pound exchange rates 3459.3 Estimated variance function of CKLS model 35311.1 Bias and variance of the kernel estimate of density 41111.2 Kernel estimate of distribution of stock index returns 41311.3 Bivariate normal density 41411.4 Semiparametric density estimator 41911.5 Parametric conditional mean estimates 42011.6 Nadaraya-Watson nonparametric kernel regression 42411.7 Effect of bandwidth on kernel regression 42511.8 Cross validation bandwidth selection 42911.9 Two-dimensional product kernel 43111.10 Semiparametric regression 43311.11 Nonparametric production function 43511.12 Nonparametric estimates of drift and diffusion functions 43812.1 Simulated AR(1) model 45012.2 Illustrating Brownian motion 46213.1 U.S. macroeconomic data 48713.2 Plots of simulated stationary time series 49013.3 Choice of optimal lag order 50814.1 Bivariate SVAR model 54114.2 Bivariate SVAR with short-run restrictions 54514.3 Bivariate SVAR with long-run restrictions 54714.4 Bivariate SVAR with short- and long-run restrictions 54914.5 Bivariate SVAR with sign restrictions 55214.6 Impuse responses of Peermans model 56415.1 Daily U.S. zero coupon rates 57315.2 Alternative priors for latent factors in the Kalman filter 58815.3 Factor loadings of a term structure model 59515.4 Hodrick-Prescott filter of real U.S. GPD 60116.1 Nelson-Plosser data 61816.2 Simulated distribution of AR1 parameter 62416.3 Continuous-time processes 63316.4 Functional Central Limit Theorem 63516.5 Distribution of a stochastic integral 63816.6 Mixed normal distribution 64017.1 Real U.S. GDP 652

  • Illustrations 3

    17.2 Detrending 65817.3 Near unit root process 66917.4 Aymptotic power curve of ADF tests 67217.5 Asymptotic power envelope of ADF tests 67417.6 Structural breaks in U.S. GDP 67917.7 Union of rejections approach 68618.1 Permanent income hypothesis 69618.2 Long run money demand 69718.3 Term structure of U.S. yields 69818.4 Error correction phase diagram 69919.1 Properties of an AR(2) model 75019.2 Limit cycle 75119.3 Strange attractor 75219.4 Nonlinear error correction model 75319.5 U.S. unemployment 75419.6 Threshold functions 75719.7 Decomposition of an ANN 76219.8 Simulated bilinear time series models 76819.9 Markov switching model of U.S. output 77319.10 Nonparametric estimate of a TAR(1) model 77519.11 Simulated TAR models for G7 countries 78320.1 Statistical properties of FTSE returns 79620.2 Distribution of FTSE returns 79920.3 News impact curve 80120.4 ACF of GARCH(1,1) models 81020.5 Conditional variance of FTSE returns 81220.6 Risk-return preferences 81620.7 BEKK model of U.S. zero coupon bonds 82920.8 DECO model of interest rates 83820.9 SVAR model of U.K. Libor spread 84021.1 U.S. Federal funds target rate from 1984 to 2009 85221.2 Money demand equation with a floor interest rate 85321.3 Duration descriptive statistics for AMR 877

  • Computer Code used in the Examples(Code is written in GAUSS in which case the extension is .g

    and in MATLAB in which case the extension is .m)

    1.1 basic sample.* 41.2 basic sample.* 61.3 basic sample.* 61.4 basic sample.* 61.5 basic sample.* 71.6 basic sample.* 81.7 basic sample.* 81.8 basic sample.* 91.10 basic poisson.* 131.11 basic exp.* 141.12 basic normal like.* 161.14 basic poisson.* 181.15 basic exp.* 191.16 basic normal like.* 191.18 basic exp.* 221.19 basic normal.* 222.5 prop wlln1.* 412.6 prop wlln2.* 422.8 prop moment.* 452.10 prop lindlevy.* 482.21 prop consistency.* 642.22 prop normal.* 642.23 prop cauchy.* 652.25 prop asymnorm.* 682.28 prop edgeworth.* 722.29 prop bias.* 733.2 max exp.* 933.3 max exp.* 953.4 max exp.* 973.6 max weibull.* 99

  • Computer Code used in the Examples 5

    3.7 max exp.* 1023.8 max exp.* 1034.3 test weibull.* 1334.5 test weibull.* 1354.7 test weibull.* 1394.10 test asymptotic.* 1414.11 text size.* 1454.12 test power.* 1474.13 test power.* 1475.5 linear simulation.* 1655.6 linear estimate.* 1695.7 linear fiml.* 1715.8 linear fiml.* 1735.10 linear weak.* 1795.14 linear lr.*, linear wd.*, linear lm.* 1825.15 linear fiml lr.*, linear fiml wd.*, linear fiml lm.* 1856.3 nls simulate.* 2006.5 nls exponential.* 2066.7 nls consumption estimate.* 2106.8 nls contest.* 2156.11 nls money.* 2197.1 auto simulate.* 2357.5 auto invest.* 2407.8 auto distribution.* 2517.11 auto test.* 2607.12 auto system.* 2678.1 hetero simulate.* 2818.3 hetero estimate.* 2848.7 hetero test.* 2938.9 hetero system.* 2988.10 hetero system.* 2998.11 hetero general.* 30110.2 gmm table.* 36610.3 gmm table.* 36710.11 gmm ccapm.* 38211.1 npd kernel.* 40711.2 npd property.* 41011.3 npd ftse.* 41211.4 npd bivariate.* 41411.5 npd seminonlin.* 41811.6 npr parametric.* 41911.7 npr nadwatson.* 42211.8 npr property.* 424

  • 6 Computer Code used in the Examples

    11.10 npr bivariate.* 43011.11 npr semi.* 43212.1 sim mom.* 45012.3 sim accuracy.* 45312.4 sim ma1indirect.* 45412.5 sim ma1emm.* 45712.6 sim ma1overid.* 46012.7 sim brownind.*,sim brownemm.* 46613.1 stsm simulate.* 48913.8 stsm root.* 49613.9 stsm root.* 49713.17 stsm varma.* 50413.21 stsm anderson.* 51113.24 stsm recursive.* 51313.25 stsm recursive.* 51613.26 stsm recursive.* 52213.27 stsm recursive.* 52314.2 svar bivariate.* 54014.5 svar bivariate.* 54414.9 svar bivariate.* 54714.10 svar bivariate.* 54814.12 svar bivariate.* 55214.13 svar shortrun.* 55414.14 svar longrun.* 55614.15 svar recursive.* 55714.17 svar test.* 56014.18 svar test.* 56115.1 kalman termfig.* 57215.5 kalman uni.* 58015.6 kalman multi.* 58315.8 kalman smooth.* 59015.9 kalman uni.* 59215.10 kalman term.* 59215.11 kalman fvar.* 59415.12 kalman panic.* 59416.1 nts nelplos.* 61616.2 nts nelplos.* 61616.3 nts nelplos.* 61716.4 nts moment.* 62216.5 nts moment.* 62416.6 nts moment.* 62816.7 nts yts.* 63216.8 nts fclt.* 635

  • Computer Code used in the Examples 7

    16.10 nts stochint.* 63716.11 nts mixednormal.* 63917.1 unit qusgdp.* 65717.2 unit qusgdp.* 66117.3 unit asypower1.* 67117.4 unit asypowerenv.* 67417.5 unit maicsim.* 67717.6 unit qusgdp.* 67917.8 unit qusgdp.* 68317.9 unit qusgdp.* 68518.1 coint lrgraphs.* 69618.2 coint lrgraphs.* 69618.3 coint lrgraphs.* 69718.4 coint lrgraphs.* 70218.6 coint bivterm.* 70718.7 coint bivterm.* 70818.8 coint bivterm.* 71218.9 coint permincome.* 71418.10 coint bivterm.* 71518.11 coint triterm.* 71618.13 coint simevals.* 71918.16 coint bivterm.* 72819.1 nlm features.* 75019.2 nlm features.* 75019.3 nlm features.* 75119.4 nlm features.* 75219.6 nlm tarsim.* 76019.7 nlm annfig.* 76219.8 nlm bilinear.* 76719.9 nlm hamilton.* 77219.10 nlm tar.* 77419.11 nlm girf.* 77820.1 garch nic.* 80020.2 garch estimate.* 80420.3 garch test.* 80620.4 garch simulate.* 80920.5 garch estimate.* 81020.6 garch seasonality.* 81320.7 garch mean.* 81620.9 mgarch bekk.* 82821.2 discrete mpol.* 85221.3 discrete floor.* 85221.4 discrete simulation.* 857

  • 8 Computer Code used in the Examples

    21.7 discrete probit.* 85921.8 discrete probit.* 86221.9 discrete ordered.* 86621.11 discrete thinning.* 87121.12 discrete poissonauto.* 873

    Code Disclaimer Information

    Note that the computer code is provided for illustrative purposes only and

    although care has been taken to ensure that it works properly, it has not been

    thoroughly tested under all conditions and on all platforms. The authors and

    Cambridge University Press cannot guarantee or imply reliability, service-

    ability, or function of this computer code. All code is therefore provided as

    is without any warranties of any kind.

  • PART ONE

    MAXIMUM LIKELIHOOD

  • 1The Maximum Likelihood Principle

    1.1 Introduction

    Maximum likelihood estimation is a general method for estimating the pa-

    rameters of econometric models from observed data. The principle of max-

    imum likelihood plays a central role in the exposition of this book, since a

    number of estimators used in econometrics can be derived within this frame-

    work. Examples include ordinary least squares, generalized least squares and

    full-information maximum likelihood. In deriving the maximum likelihood

    estimator, a key concept is the joint probability density function (pdf) of

    the observed random variables, yt. Maximum likelihood estimation requires

    that the following conditions are satisfied.

    (1) The form of the joint pdf of yt is known.

    (2) The specification of the moments of the joint pdf are known.

    (3) The joint pdf can be evaluated for all values of the parameters, .

    Parts ONE and TWO of this book deal with models in which all these

    conditions are satisfied. Part THREE investigates models in which these

    conditions are not satisfied and considers four important cases. First, if the

    distribution of yt is misspecified, resulting in both conditions 1 and 2 being

    violated, estimation is by quasi-maximum likelihood (Chapter 9). Second,

    if condition 1 is not satisfied, a generalized method of moments estimator

    (Chapter 10) is required. Third, if condition 2 is not satisfied, estimation

    relies on nonparametric methods (Chapter 11). Fourth, if condition 3 is

    violated, simulation-based estimation methods are used (Chapter 12).

    1.2 Motivating Examples

    To highlight the role of probability distributions in maximum likelihood esti-

    mation, this section emphasizes the link between observed sample data and

  • 4 The Maximum Likelihood Principle

    the probability distribution from which they are drawn. This relationship

    is illustrated with a number of simulation examples where samples of size

    T = 5 are drawn from a range of alternative models. The realizations of

    these draws for each model are listed in Table 1.1.

    Table 1.1

    Realisations of yt from alternative models: t = 1, 2, , 5.

    Model t=1 t=2 t=3 t=4 t=5

    Time Invariant -2.720 2.470 0.495 0.597 -0.960Count 2.000 4.000 3.000 4.000 0.000Linear Regression 2.850 3.105 5.693 8.101 10.387Exponential Regression 0.874 8.284 0.507 3.722 5.865Autoregressive 0.000 -1.031 -0.283 -1.323 -2.195Bilinear 0.000 -2.721 0.531 1.350 -2.451ARCH 0.000 3.558 6.989 7.925 8.118Poisson 3.000 10.000 17.000 20.000 23.000

    Example 1.1 Time Invariant Model

    Consider the model

    yt = zt ,

    where zt is a disturbance term and is a parameter. Let zt be a standardized

    normal distribution, N(0, 1), defined by

    f(z) =12pi

    exp

    [z

    2

    2

    ].

    The distribution of yt is obtained from the distribution of zt using the change

    of variable technique (see Appendix A for details)

    f(y ; ) = f(z)

    zy ,

    where = {2}. Applying this rule, and recognising that z = y/, yields

    f(y ; ) =12pi

    exp

    [(y/)

    2

    2

    ] 1 = 1

    2pi2exp

    [ y

    2

    22

    ],

    or yt N(0, 2). In this model, the distribution of yt is time invariant

    because neither the mean nor the variance depend on time. This property

    is highlighted in panel (a) of Figure 1.1 where the parameter is = 2.

    For comparative purposes the distributions of both yt and zt are given. As

    yt = 2zt, the distribution of yt is flatter than the distribution of zt.

  • 1.2 Motivating Examples 5

    (a) Time Invariant Modelf(y)

    y

    z

    y

    (b) Count Model

    f(y)

    y

    (c) Linear Regression Model

    f(y)

    y

    (d) Exponential Regression Modelf(y)

    y

    -10 0 10 20-10 0 10 20

    0 1 2 3 4 5 6 7 8 9-10 0 10

    0

    0.2

    0.4

    0.6

    0.8

    1

    0

    0.05

    0.1

    0.15

    0.2

    0

    0.1

    0.2

    0.3

    0

    0.1

    0.2

    0.3

    0.4

    Figure 1.1 Probability distributions of y generated from the time invariant,count, linear regression and exponential regression models. Except for thetime invariant and count models, the solid line represents the density att = 1, the dashed line represents the density at t = 3 and the dotted linerepresents the density at t = 5.

    As the distribution of yt in Example 1.1 does not depend on lagged values

    yti, yt is independently distributed. In addition, since the distribution of ytis the same at each t, yt is identically distributed. These two properties are

    abbreviated as iid. Conversely, the distribution is dependent if yt depends

    on its own lagged values and non-identical if it changes over time.

  • 6 The Maximum Likelihood Principle

    Example 1.2 Count Model

    Consider a time series of counts modelled as a series of draws from a

    Poisson distribution

    f (y; ) =y exp[]

    y!, y = 0, 1, 2, ,

    where > 0 is an unknown parameter. A sample of T = 5 realizations of

    yt, given in Table 1.1, is drawn from the Poisson probability distribution in

    panel (b) of Figure 1.1 for = 2. By assumption, this distribution is the

    same at each point in time. In contrast to the data in the previous example

    where the random variable is continuous, the data here are discrete as they

    are positive integers that measure counts.

    Example 1.3 Linear Regression Model

    Consider the regression model

    yt = xt + zt , zt iidN(0, 1) ,

    where xt is an explanatory variable that is independent of zt and = {, 2}.The distribution of y conditional on xt is

    f(y |xt; ) = 12pi2

    exp

    [(y xt)

    2

    22

    ],

    which is a normal distribution with conditional mean xt and variance 2,

    or yt N(xt, 2). This distribution is illustrated in panel (c) of Figure 1.1

    with = 3, = 2 and explanatory variable xt = {0, 1, 2, 3, 4}. The effect ofxt is to shift the distribution of yt over time into the positive region, resulting

    in the draws of yt given in Table 1.1 becoming increasingly positive. As the

    variance at each point in time is constant, the spread of the distributions of

    yt is the same for all t.

    Example 1.4 Exponential Regression Model

    Consider the exponential regression model

    f(y |xt; ) = 1t

    exp

    [ yt

    ],

    where t = 0+1xt is the time-varying conditional mean, xt is an explana-

    tory variable and = {0, 1}. This distribution is highlighted in panel (d)of Figure 1.1 with 0 = 1, 1 = 1 and xt = {0, 1, 2, 3, 4}. As 1 > 0, the ef-fect of xt is to cause the distribution of yt to become more positively skewed

    over time.

  • 1.2 Motivating Examples 7

    (a) Autoregressive Modelf(y)

    y

    (b) Bilinear Model

    f(y)

    y

    (c) Autoregressive Heteroskedastic Model

    f(y)

    y

    (d) ARCH Modelf(y)

    y

    -10 0 10 20-10 0 10

    -10 0 10 20-10 0 10

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0

    0.05

    0.1

    0.15

    0.2

    0

    0.05

    0.1

    0.15

    0.2

    Figure 1.2 Probability distributions of y generated from the autoregressive,bilinear, autoregressive with heteroskedasticity and ARCH models. Thesolid line represents the density at t = 1, the dashed line represents thedensity at t = 3 and the dotted line represents the density at t = 5.

    Example 1.5 Autoregressive Model

    An example of a first-order autoregressive model, denoted AR(1), is

    yt = yt1 + ut , ut iidN(0, 2) ,

  • 8 The Maximum Likelihood Principle

    with || < 1 and = {, 2}. The distribution of y, conditional on yt1, is

    f(y | yt1; ) = 12pi2

    exp

    [(y yt1)

    2

    22

    ],

    which is a normal distribution with conditional mean yt1 and variance 2,or yt N(yt1, 2). If 0 < < 1, then a large positive (negative) value ofyt1 shifts the distribution into the positive (negative) region for yt, raisingthe probability that the next draw from this distribution is also positive

    (negative). This property of the autoregressive model is highlighted in panel

    (a) of Figure 1.2 with = 0.8, = 2 and initial value y1 = 0.

    Example 1.6 Bilinear Time Series Model

    The autoregressive model discussed above specifies a linear relationship

    between yt and yt1. The following bilinear model is an example of a non-linear time series model

    yt = yt1 + yt1ut1 + ut , ut iidN(0, 2) ,

    where yt1ut1 represents the bilinear term and = {, , 2}. The distri-bution of yt conditional on yt1 is

    f(y | yt1; ) = 12pi2

    exp

    [(y t)

    2

    22

    ],

    which is a normal distribution with conditional mean t = yt1+yt1ut1and variance 2. To highlight the nonlinear property of the model, substitute

    out ut1 in the equation for the mean

    t = yt1 + yt1(yt1 yt2 yt2ut2)= yt1 + y2t1 yt1yt2 2yt1yt2ut2 ,

    which shows that the mean is a nonlinear function of yt1. Setting =0 yields the linear AR(1) model of Example 1.5. The distribution of the

    bilinear model is illustrated in panel (b) of Figure 1.2 with = 0.8, = 0.4,

    = 2 and initial value y1 = 0.

    Example 1.7 Autoregressive Model with Heteroskedasticity

    An example of an AR(1) model with heteroskedasticity is

    yt = yt1 + tzt2t = 0 + 1wt

    zt iidN(0, 1) ,

    where = {, 0, 1} and wt is an explanatory variable. The distribution

  • 1.3 Joint Probability Distributions 9

    of yt conditional on yt1 and wt is

    f(y | yt1, wt; ) = 12pi2t

    exp

    [(y yt1)

    2

    22t

    ],

    which is a normal distribution with conditional mean yt1 and conditionalvariance 0 + 1wt. For this model, the distribution shifts because of the

    dependence on yt1 and the spread of the distribution changes because ofwt. These features are highlighted in panel (c) of Figure 1.2 with = 0.8,

    0 = 0.8, 1 = 0.8, wt is defined as a uniform random number on the unit

    interval and the initial value is y1 = 0.

    Example 1.8 Autoregressive Conditional Heteroskedasticity

    The autoregressive conditional heteroskedasticity (ARCH) class of models

    is a special case of the heteroskedastic regression model where wt in Example

    1.7 is expressed in terms of lagged values of the disturbance term squared.

    An example of a regression model as in Example 1.3 with ARCH is

    yt = xt + ut

    ut = tzt

    2t = 0 + 1u2t1

    zt iidN(0, 1),

    where xt is an explanatory variable and = {, 0, 1}. The distributionof y conditional on yt1, xt and xt1 is

    f (y | yt1, xt, xt1; ) = 12pi(0 + 1 (yt1 xt1)2

    ) exp

    (y xt)22(0 + 1 (yt1 xt1)2

    ) .

    For this model, a large shock, represented by a large value of ut, results in

    an increased variance in the next period if 1 > 0. The distribution from

    which yt is drawn in the next period will therefore have a larger variance.

    The distribution of this model is shown in panel (d) of Figure 1.2 with = 3,

    0 = 0.8, 1 = 0.8 and xt = {0, 1, 2, 3, 4}.

    1.3 Joint Probability Distributions

    The motivating examples of the previous section focus on the distribution

    of yt at time t which is generally a function of its own lags and the current

  • 10 The Maximum Likelihood Principle

    and lagged values of explanatory variables xt. The derivation of the maxi-

    mum likelihood estimator of the model parameters requires using all of the

    information t = 1, 2, , T by defining the joint probability density function(pdf). In the case where both yt and xt are stochastic, the joint probability

    pdf for a sample of T observations is

    f(y1, y2, , yT , x1, x2, , xT ;) , (1.1)where is a vector of parameters. An important feature of the previous

    examples is that yt depends on the explanatory variable xt. To capture this

    conditioning, the joint distribution in (1.1) is expressed as

    f(y1, y2, , yT , x1, x2, , xT ;) = f(y1, y2, , yT |x1, x2, , xT ;) f(x1, x2, , xT ;) , (1.2)

    where the first term on the right hand side of (1.2) represents the conditional

    distribution of {y1, y2, , yT } on {x1, x2, , xT } and the second term isthe marginal distribution of {x1, x2, , xT }. Assuming that the parametervector can be decomposed into {, x} such that expression (1.2) becomesf(y1, y2, , yT , x1, x2, , xT ;) = f(y1, y2, , yT |x1, x2, , xT ; )

    f(x1, x2, , xT ; x) . (1.3)In these circumstances, the maximum likelihood estimation of the parame-

    ters is based on the conditional distribution without loss of information

    from the exclusion of the marginal distribution f(x1, x2, , xT ; x).The conditional distribution on the right hand side of expression (1.3)

    simplifies further in the presence of additional restrictions.

    Independent and identically distributed (iid)

    In the simplest case, {y1, y2, , yT } is independent of {x1, x2, , xT } andyt is iid with density function f(y; ). The conditional pdf in equation (1.3)

    is then

    f(y1, y2, , yT |x1, x2, , xT ; ) =Tt=1

    f(yt; ) . (1.4)

    Examples of this case are the time invariant model (Example 1.1) and the

    count model (Example 1.2).

    If both yt and xt are iid and yt is dependent on xt then the decomposition

    in equation (1.3) implies that inference can be based on

    f(y1, y2, , yT |x1, x2, , xT ; ) =Tt=1

    f(yt |xt; ) . (1.5)

  • 1.3 Joint Probability Distributions 11

    Examples include the regression models in Examples 1.3 and 1.4 if sampling

    is iid.

    Dependent

    Now assume that {y1, y2, , yT } depends on its own lags but is independentof the explanatory variable {x1, x2, , xT }. The joint pdf is expressed asa sequence of conditional distributions where conditioning is based on lags

    of yt. By using standard rules of probability the distributions for the first

    three observations are, respectively,

    f(y1; ) = f(y1; )

    f(y1, y2 ; ) = f(y2|y1; )f(y1; )f(y1, y2, y3; ) = f(y3|y2, y1; )f(y2|y1; )f(y1; ) ,

    where y1 is the initial value with marginal probability density

    Extending this sequence to a sample of T observations, yields the joint

    pdf

    f(y1, y2, , yT ; ) = f(y1 ; )Tt=2

    f(yt|yt1, yt2, , y1; ) . (1.6)

    Examples of this general case are the AR model (Example 1.5), the bilinear

    model (Example 1.6) and the ARCH model (Example 1.8). Extending the

    model to allow for dependence on explanatory variables, xt, gives

    f(y1, y2, , yT |x1, x2, , xT ; ) =

    f(y1 |x1; )Tt=2

    f(yt|yt1, yt2, , y1, xt, xt1, x1; ) . (1.7)

    An example is the autoregressive model with heteroskedasticity (Example

    1.7).

    Example 1.9 Autoregressive Model

    The joint pdf for the AR(1) model in Example 1.5 is

    f(y1, y2, , yT ; ) = f(y1; )Tt=2

    f(yt|yt1; ) ,

    where the conditional distribution is

    f (yt|yt1; ) = 12pi2

    exp

    [(yt yt1)

    2

    22

    ],

  • 12 The Maximum Likelihood Principle

    and the marginal distribution is

    f (y1; ) =1

    2pi2/ (1 2) exp[ y

    21

    22/ (1 2)].

    Non-stochastic explanatory variables

    In the case of non-stochastic explanatory variables, because xt is determin-

    istic its probability mass is degenerate. Explanatory variables of this form

    are also referred to as fixed in repeated samples. The joint probability in

    expression (1.3) simplifies to

    f(y1, y2, , yT , x1, x2, , xT ;) = f(y1, y2, , yT |x1, x2, , xT ; ) .

    Now = and there is no potential loss of information from using the

    conditional distribution to estimate .

    1.4 Maximum Likelihood Framework

    As emphasized previously, a time series of data represents the observed

    realization of draws from a joint pdf. The maximum likelihood principle

    makes use of this result by providing a general framework for estimating the

    unknown parameters, , from the observed time series data, {y1, y2, , yT }.

    1.4.1 The Log-Likelihood Function

    The standard interpretation of the joint pdf in (1.7) is that f is a function of

    yt for given parameters, . In defining the maximum likelihood estimator this

    interpretation is reversed, so that f is taken as a function of for given yt.

    The motivation behind this change in the interpretation of the arguments of

    the pdf is to regard {y1, y2, , yT } as a realized data set which is no longerrandom. The maximum likelihood estimator is then obtained by finding the

    value of which is most likely to have generated the observed data. Here

    the phrase most likely is loosely interpreted in a probability sense.

    It is important to remember that the likelihood function is simply a re-

    definition of the joint pdf in equation (1.7). For many problems it is simpler

    to work with the logarithm of this joint density function. The log-likelihood

  • 1.4 Maximum Likelihood Framework 13

    function is defined as

    lnLT () =1

    Tln f(y1 |x1; )

    +1

    T

    Tt=2

    ln f(yt|yt1, yt2, , y1, xt, xt1, x1; ) ,(1.8)

    where the change of status of the arguments in the joint pdf is highlighted

    by making the sole argument of this function and the T subscript indicates

    that the log-likelihood is an average over the sample of the logarithm of the

    density evaluated at yt. It is worth emphasizing that the term log-likelihood

    function, used here without any qualification, is also known as the average

    log-likelihood function. This convention is also used by, among others, Newey

    and McFadden (1994) and White (1994). This definition of the log-likelihood

    function is consistent with the theoretical development of the properties of

    maximum likelihood estimators discussed in Chapter 2, particularly Sections

    2.3 and 2.5.1.

    For the special case where yt is iid, the log-likelihood function is based on

    the joint pdf in (1.4) and is

    lnLT () =1

    T

    Tt=1

    ln f(yt; ) .

    In all cases, the log-likelihood function, lnLT (), is a scalar that represents

    a summary measure of the data for given .

    The maximum likelihood estimator of is defined as that value of , de-

    noted , that maximizes the log-likelihood function. In a large number of

    cases, this may be achieved using standard calculus. Chapter 3 discusses nu-

    merical approaches to the problem of finding maximum likelihood estimates

    when no analytical solutions exist, or are difficult to derive.

    Example 1.10 Poisson Distribution

    Let {y1, y2, , yT } be iid observations from a Poisson distribution

    f(y; ) =y exp[]

    y!,

    where > 0. The log-likelihood function for the sample is

    lnLT () =1

    T

    Tt=1

    ln f(yt; ) =1

    T

    Tt=1

    yt ln ln(y1!y2! yT !)T

    .

    Consider the following T = 3 observations, yt = {8, 3, 4}. The log-likelihood

  • 14 The Maximum Likelihood Principle

    function is

    lnLT () =15

    3ln ln(8!3!4!)

    3= 5 ln 5.191 .

    A plot of the log-likelihood function is given in panel (a) of Figure 1.3 for

    values of ranging from 0 to 10. Even though the Poisson distribution is

    a discrete distribution in terms of the random variable y, the log-likelihood

    function is continuous in the unknown parameter . Inspection shows that

    a maximum occurs at = 5 with a log-likelihood value of

    lnLT (5) = 5 ln 5 5 5.191 = 2.144 .The contribution to the log-likelihood function at the first observation y1 =

    8, evaluated at = 5 is

    ln f(y1; 5) = y1 ln 5 5 ln(y1!) = 8 ln 5 5 ln(8!) = 2.729 .For the other two observations, the contributions are ln f(y2; 5) = 1.963,ln f(y3; 5) = 1.740. The probabilities f(yt; ) are between 0 and 1 by def-inition and therefore all of the contributions are negative because they are

    computed as the logarithm of f(yt; ). The average of these T = 3 contri-

    butions is lnLT (5) = 2.144, which corresponds to the value already givenabove. A plot of ln f(yt; 5) in panel (b) of Figure 1.3 shows that observations

    closer to = 5 have a relatively greater contribution to the log-likelihood

    function than observations further away in the sense that they are smaller

    negative numbers.

    Example 1.11 Exponential Distribution

    Let {y1, y2, , yT } be iid drawings from an exponential distributionf(y; ) = exp[y] ,

    where > 0. The log-likelihood function for the sample is

    lnLT () =1

    T

    Tt=1

    ln f(yt; ) =1

    T

    Tt=1

    (ln yt) = ln 1T

    Tt=1

    yt .

    Consider the following T = 6 observations, yt = {2.1, 2.2, 3.1, 1.6, 2.5, 0.5}.The log-likelihood function is

    lnLT () = ln 1T

    Tt=1

    yt = ln 2 .

    Plots of the log-likelihood function, lnLT (), and the likelihood LT ()

    functions are given in Figure 1.4, which show that a maximum occurs at

  • 1.4 Maximum Likelihood Framework 15

    (a) Log-likelihood functionlnLT()

    (b) Log-density function

    lnf(yt;5)

    yt1 2 3 4 5 6 7 8 9 100 5 10 15

    -3

    -2.5

    -2

    -1.5

    -1

    -0.5

    0-30

    -25

    -20

    -15

    -10

    -5

    0

    Figure 1.3 Plot of lnLT () and and ln f(yt; = 5) for the Poisson distri-bution example with a sample size of T = 3.

    (a) Log-likelihood function

    lnLT()

    (b) Likelihood function

    LT()105

    0 1 2 30 1 2 3

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    -40

    -35

    -30

    -25

    -20

    -15

    -10

    Figure 1.4 Plot of lnLT () for the exponential distribution example.

    = 0.5. Table 1.2 provides details of the calculations. Let the log-likelihood

    function at each observation evaluated at the maximum likelihood estimate

    be denoted ln lt() = ln f(yt; ). The second column shows ln lt() evaluated

    at = 0.5

    ln lt(0.5) = ln(0.5) 0.5yt ,resulting in a maximum value of the log-likelihood function of

    lnLT (0.5) =1

    6

    6t=1

    ln lt(0.5) =10.159

    6= 1.693 .

  • 16 The Maximum Likelihood Principle

    Table 1.2Maximum likelihood calculations for the exponential distribution example. The

    maximum likelihood estimate is T = 0.5.

    yt ln lt(0.5) gt(0.5) ht(0.5)

    2.1 -1.743 -0.100 -4.0002.2 -1.793 -0.200 -4.0003.1 -2.243 -1.100 -4.0001.6 -1.493 0.400 -4.0002.5 -1.943 -0.500 -4.0000.5 -0.943 1.500 -4.000

    lnLT (0.5) = 1.693 GT (0.5) = 0.000 HT (0.5) = 4.000

    Example 1.12 Normal Distribution

    Let {y1, y2, , yT } be iid observations drawn from a normal distribution

    f(y; ) =12pi2

    exp

    [(y )

    2

    22

    ],

    with unknown parameters ={, 2

    }. The log-likelihood function is

    lnLT () =1

    T

    Tt=1

    ln f(yt; )

    =1

    T

    Tt=1

    ( 12ln 2pi 1

    2ln2 (yt )

    2

    22

    )= 1

    2ln 2pi 1

    2ln2 1

    22T

    Tt=1

    (yt )2.

    Consider the following T = 6 observations, yt = {5,1, 3, 0, 2, 3}. Thelog-likelihood function is

    lnLT () = 12ln 2pi 1

    2ln2 1

    122

    6t=1

    (yt )2 .

    A plot of this function in Figure 1.5 shows that a maximum occurs at = 2

    and 2 = 4.

    Example 1.13 Autoregressive Model

  • 1.4 Maximum Likelihood Framework 17

    PSfrag

    2

    lnLT(,2)

    11.5

    22.5

    3

    33.5

    44.5

    5

    Figure 1.5 Plot of lnLT () for the normal distribution example.

    From Example 1.9, the log-likelihood function for the AR(1) model is

    lnLT () =1

    T

    (1

    2ln(1 2) 1

    22(1 2) y21)

    12ln 2pi 1

    2ln2 1

    22T

    Tt=2

    (yt yt1)2 .

    The first term is commonly excluded from lnLT () as its contribution dis-

    appears asymptotically since

    limT

    1

    T

    (1

    2ln(1 2) 1

    22(1 2) y21) = 0 .

    As the aim of maximum likelihood estimation is to find the value of that

    maximizes the log-likelihood function, a natural way to do this is to use the

    rules of calculus. This involves computing the first derivatives and second

    derivatives of the log-likelihood function with respect to the parameter vec-

    tor .

  • 18 The Maximum Likelihood Principle

    1.4.2 Gradient

    Differentiating lnLT (), with respect to a (K1) parameter vector, , yieldsa (K 1) gradient vector, also known as the score, given by

    GT () = lnLT ()

    =

    lnLT ()

    1 lnLT ()

    2...

    lnLT ()

    K

    =

    1

    T

    Tt=1

    gt() , (1.9)

    where the subscript T emphasizes that the gradient is the sample average

    of the individual gradients

    gt() = ln lt()

    .

    The maximum likelihood estimator of , denoted , is obtained by setting

    the gradients equal to zero and solving the resultantK first-order conditions.

    The maximum likelihood estimator, , therefore satisfies the condition

    GT () = lnLT ()

    =

    = 0 . (1.10)

    Example 1.14 Poisson Distribution

    From Example 1.10, the first derivative of lnLT () with respect to is

    GT () =1

    T

    Tt=1

    yt 1 .

    The maximum likelihood estimator is the solution of the first-order condition

    1

    T

    Tt=1

    yt 1 = 0 ,

    which yields the sample mean as the maximum likelihood estimator

    =1

    T

    Tt=1

    yt = y .

    Using the data for yt in Example 1.10, the maximum likelihood estimate is

    = 15/3 = 5. Evaluating the gradient at = 5 verifies that it is zero at the

  • 1.4 Maximum Likelihood Framework 19

    maximum likelihood estimate

    GT () =1

    T

    Tt=1

    yt 1 = 153 5 1 = 0 .

    Example 1.15 Exponential Distribution

    From Example 1.11, the first derivative of lnLT () with respect to is

    GT () =1

    1T

    Tt=1

    yt .

    Setting GT () = 0 and solving the resultant first-order condition yields

    =TTt=1 yt

    =1

    y,

    which is the reciprocal of the sample mean. Using the same observed data for

    yt as in Example 1.11, the maximum likelihood estimate is = 6/12 = 0.5.

    The third column of Table 1.2 gives the gradients at each observation

    evaluated at = 0.5

    gt(0.5) =1

    0.5 yt .

    The gradient is

    GT (0.5) =1

    6

    6t=1

    gt(0.5) = 0 ,

    which follows from the properties of the maximum likelihood estimator.

    Example 1.16 Normal Distribution

    From Example 1.12, the first derivatives of the log-likelihood function are

    lnLT ()

    =

    1

    2T

    Tt=1

    (yt) , lnLT ()(2)

    = 122

    +1

    24T

    Tt=1

    (yt)2 ,

    yielding the gradient vector

    GT () =

    1

    2T

    Tt=1

    (yt )

    122

    +1

    24T

    Tt=1

    (yt )2

    .

  • 20 The Maximum Likelihood Principle

    Evaluating the gradient at and setting GT () = 0, gives

    GT () =

    1

    2T

    Tt=1

    (yt )

    122

    +1

    24T

    Tt=1

    (yt )2

    = 00

    .

    Solving for = {, 2}, the maximum likelihood estimators are

    =1

    T

    Tt=1

    yt = y , 2 =

    1

    T

    Tt=1

    (yt y)2 .

    Using the data from Example 1.12, the maximum likelihood estimates are

    =5 1 + 3 + 0 + 2 + 3

    6= 2

    2 =(5 2)2 + (1 2)2 + (3 2)2 + (0 2)2 + (2 2)2 + (3 2)2

    6= 4 ,

    which agree with the values given in Example 1.12.

    1.4.3 Hessian

    To establish that maximizes the log-likelihood function, it is necessary to

    determine that the Hessian

    HT () =2 lnLT ()

    , (1.11)

    associated with the log-likelihood function is negative definite. As is a

    (K 1) vector, the Hessian is the (K K) symmetric matrix

    HT () =

    2 lnLT ()

    11

    2 lnLT ()

    12. . .

    2 lnLT ()

    1K

    2 lnLT ()

    21

    2 lnLT ()

    22. . .

    2 lnLT ()

    2K

    ......

    ......

    2 lnLT ()

    K1

    2 lnLT ()

    K2. . .

    2 lnLT ()

    KK

    =

    1

    T

    Tt=1

    ht() ,

  • 1.4 Maximum Likelihood Framework 21

    where the subscript T emphasizes that the Hessian is the sample average of

    the individual elements

    ht() =2 ln lt()

    .

    The second-order condition for a maximum requires that the Hessian matrix

    evaluated at ,

    HT () =2 lnLT ()

    =

    , (1.12)

    is negative definite. The conditions for negative definiteness are

    |H11| < 0, H11 H12H21 H22

    > 0,H11 H12 H13H21 H22 H23H31 H32 H33

    < 0, where Hij is the ij

    th element of HT (). In the case of K = 1, the condition

    is

    H11 < 0 . (1.13)

    For the case of K = 2, the condition is

    H11 < 0, H11H22 H12H21 > 0 . (1.14)

    Example 1.17 Poisson Distribution

    From Examples 1.10 and 1.14, the second derivative of lnLT () with re-

    spect to is

    HT () = 12T

    Tt=1

    yt .

    Evaluating the Hessian at the maximum likelihood estimator, = y, yields

    HT () = 12T

    Tt=1

    yt = 1y2T

    Tt=1

    yt = 1y< 0 .

    As y is always positive because it is the mean of a sample of positive integers,

    the Hessian is negative and a maximum is achieved. Using the data for ytin Example 1.10, verifies that the Hessian at = 5 is negative

    HT () = 12T

    Tt=1

    yt = 1552 3 = 0.200 .

  • 22 The Maximum Likelihood Principle

    Example 1.18 Exponential Distribution

    From Examples 1.11 and 1.15, the second derivative of lnLT () with re-

    spect to is

    HT () = 12.

    Evaluating the Hessian at the maximum likelihood estimator yields

    HT () = 12

    < 0 .

    As this term is negative for any , the condition in equation (1.13) is satisfied

    and a maximum is achieved. The last column of Table 1.2 shows that the

    Hessian at each observation evaluated at the maximum likelihood estimate

    is constant. The value of the Hessian is

    HT (0.5) =1

    6

    6t=1

    ht(0.5) =24.000

    6= 4 ,

    which is negative confirming that a maximum has been reached.

    Example 1.19 Normal Distribution

    From Examples 1.12 and 1.16, the second derivatives of lnLT () with

    respect to are

    2 lnLT ()

    2= 1

    2

    2 lnLT ()

    2= 1

    4T

    Tt=1

    (yt )

    2 lnLT ()

    (2)2=

    1

    24 16T

    Tt=1

    (yt )2 ,

    so that the Hessian is

    HT () =

    12

    14T

    Tt=1

    (yt )

    14T

    Tt=1

    (yt ) 124

    16T

    Tt=1

    (yt )2

    .

    Given that GT () = 0, from Example 1.16 it follows thatT

    t=1(yt ) = 0

  • 1.5 Applications 23

    and therefore

    HT () =

    1

    20

    0 124

    .From equation (1.14)

    H11 = T2

    < 0, H11H22 H12H21 = ( T2)( T

    24) 02 > 0 ,

    establishing that the second-order condition for a maximum is satisfied.

    Using the maximum likelihood estimates from Example 1.16, the Hessian is

    HT (, 2) =

    1

    40

    0 12 42

    = 0.250 0.000

    0.000 0.031

    .

    1.5 Applications

    To highlight the features of maximum likelihood estimation discussed thus

    far, two applications are presented that focus on estimating the discrete time

    version of the Vasicek (1977) model of interest rates, rt. The first application

    is based on the marginal (stationary) distribution while the second focuses

    on the conditional (transitional) distribution that gives the distribution of rtconditional on rt1. The interest rate data used are from At-Sahalia (1996).The data, plotted in Figure 1.6, consists of daily 7-day Eurodollar rates

    (expressed as percentages) for the period 1 June 1973 to the 25 February

    1995, a total of T = 5505 observations.

    The Vasicek model expresses the change in the interest rate, rt, as a

    function of a constant and the lagged interest rate

    rt rt1 = + rt1 + utut iidN

    (0, 2

    ),

    (1.15)

    where = {, , 2} are unknown parameters, with the restriction < 0.

    1.5.1 Stationary Distribution of the Vasicek Model

    As a preliminary step to estimating the parameters of the Vasicek model in

    equation (1.15), consider the alternative model where the level of the interest

  • 24 The Maximum Likelihood Principle

    %

    t1975 1980 1985 1990 1995

    4

    8

    12

    16

    20

    24

    Figure 1.6 Daily 7-day Eurodollar interest rates from the 1 June 1973 to25 February 1995 expressed as a percentage.

    rate is independent of previous interest rates

    rt = s + vt , vt iidN(0, 2s ) .

    The stationary distribution of rt for this model is

    f(r;s, 2s) =

    12pi2s

    exp

    [(r s)

    2

    22s

    ]. (1.16)

    The relationship between the parameters of the stationary distribution and

    the parameters of the model in equation (1.15) is

    s = , 2s =

    2

    (2 + ). (1.17)

    which are obtained as the unconditional mean and variance of (1.15).

    The log-likelihood function based on the stationary distribution in equa-

    tion (1.16) for a sample of T observations is

    lnLT () = 12ln 2pi 1

    2ln2s

    1

    22sT

    Tt=1

    (rt s)2 ,

    where = {s, 2s}. Maximizing lnLT () with respect to gives

    s =1

    T

    Tt=1

    rt , 2s =

    1

    T

    Tt=1

    (rt s)2 . (1.18)

    Using the Eurodollar interest rates, the maximum likelihood estimates are

    s = 8.362, 2s = 12.893. (1.19)

  • 1.5 Applications 25

    f(r)

    Interest Rate-5 0 5 10 15 20 25

    Figure 1.7 Estimated stationary distribution of the Vasicek model based onevaluating (1.16) at the maximum likelihood estimates (1.19), using dailyEurodollar rates from the 1 June 1973 to 25 February 1995.

    The stationary distribution is estimated by evaluating equation (1.16) at

    the maximum likelihood estimates in (1.19) and is given by

    f(r; s,

    2s

    )=

    12pi2s

    exp

    [(r s)

    2

    22s

    ]

    =1

    2pi 12.893 exp[(r 8.362)

    2

    2 12.893

    ], (1.20)

    which is presented in Figure 1.7.

    Inspection of the estimated distribution shows a potential problem with

    the Vasicek stationary distribution, namely that the support of the distri-

    bution is not restricted to being positive. The probability of negative values

    for the interest rate is

    Pr (r < 0) =

    0

    12pi 12.893 exp

    [(r 8.362)

    2

    2 12.893

    ]dr = 0.01 .

    To avoid this problem, alternative models of interest rates are specified where

    the stationary distribution is just defined over the positive region. A well

    known example is the CIR interest rate model (Cox, Ingersoll and Ross,

    1985) which is discussed in Chapters 2, 3 and 12.

    1.5.2 Transitional Distribution of the Vasicek Model

    In contrast to the stationary model specification of the previous section,

    the full dynamics of the Vasicek model in equation (1.15) are now used by

  • 26 The Maximum Likelihood Principle

    specifying the transitional distribution

    f(r | rt1;, , 2

    )=

    12pi2

    exp

    [(r rt1)

    2

    22

    ], (1.21)

    where ={, , 2

    }and the substitution = 1+ is made for convenience.

    This distribution is now of the same form as the conditional distribution of

    the AR(1) model in Examples 1.5, 1.9 and 1.13.

    The log-likelihood function based on the transitional distribution in equa-

    tion (1.21) is

    lnLT () = 12ln 2pi 1

    2ln2 1

    22(T 1)Tt=2

    (rt rt1)2 ,

    where the sample size is reduced by one observation as a result of the lagged

    term rt1. This form of the log-likelihood function does not contain themarginal distribution f(r1; ), a point that is made in Example 1.13. The

    first derivatives of the log-likelihood function are

    lnL()

    =

    1

    2(T 1)Tt=2

    (rt rt1)

    lnL()

    =

    1

    2(T 1)Tt=2

    (rt rt1)rt1

    lnL()

    (2)= 1

    22+

    1

    24(T 1)Tt=2

    (rt rt1)2 .

    Setting these derivatives to zero yields the maximum likelihood estimators

    = rt rt1

    =

    Tt=2

    (rt rt)(rt1 rt1)

    Tt=2

    (rt1 rt1)2

    2 =1

    T 1Tt=2

    (rt rt1)2 ,

    where

    rt =1

    T 1Tt=2

    rt , rt1 =1

    T 1Tt=2

    rt1 .

  • 1.5 Applications 27

    The maximum likelihood estimates for the Eurodollar interest rates are

    = 0.053, = 0.994, 2 = 0.165. (1.22)

    An estimate of is obtained by using the relationship = 1+. Rearranging

    for and evaluating at gives = 1 = 0.006.The estimated transitional distribution is obtained by evaluating (1.21)

    at the maximum likelihood estimates in (1.22)

    f(r | rt1; , , 2

    )=

    12pi2

    exp

    [(r rt1)

    2

    22

    ]. (1.23)

    Plots of this distribution are given in Figure 1.8 for three values of the

    conditioning variable rt1, corresponding to the minimum (2.9%), median(8.1%) and maximum (24.3%) interest rates in the sample.

    f(r)

    r0 5 10 15 20 25 30

    .

    Figure 1.8 Estimated transitional distribution of the Vasicek model, basedon evaluating (1.23) at the maximum likelihood estimates in (1.22) usingEurodollar rates from 1 June 1973 to 25 February 1995. The dashed line isthe transitional density for the minimum (2.9%), the solid line is the transi-tional density for the median (8.1%) and the dotted line is the transitionaldensity for the maximum (24.3%) Eurodollar rate.

    The location of the three transitional distributions changes over time,

    while the spread of each distribution remains constant at 2 = 0.165. A

    comparison of the estimates of the variances of the stationary and transi-

    tional distributions, in equations (1.19) and (1.22), respectively, shows that

    2 < 2s . This result is a reflection of the property that by conditioning

    on information, in this case rt1, the transitional distribution is better attracking the time series behaviour of the interest rate, rt, than the stationary

    distribution where there is no conditioning on lagged dependent variables.

  • 28 The Maximum Likelihood Principle

    Having obtained the estimated transitional distribution using the maxi-

    mum likelihood estimates in (1.22), it is also possible to use these estimates

    to reestimate the stationary interest rate distribution in (1.20) by using the

    expressions in (1.17). The alternative estimates of the mean and variance of

    the stationary distribution are

    s = =

    0.053

    0.006= 8.308,

    2s = 2

    (2 +

    ) = 0.1650.006 (2 0.006) = 12.967 .

    As these estimates are based on the transitional distribution, which incorpo-

    rates the full dynamic specification of the Vasicek model, they represent the

    maximum likelihood estimates of the parameters of the stationary distribu-

    tion. This relationship between the maximum likelihood estimators of the

    transitional and stationary distributions is based on the invariance property

    of maximum likelihood estimators which is discussed in Chapter 2. While

    the parameter estimates of the stationary distribution using the estimates

    of the transitional distribution are numerically close to estimates obtained

    in the previous section, the latter estimates are obtained from a misspecified

    model as the stationary model excludes the dynamic structure in equation

    (1.15). Issues relating to misspecified models are discussed in Chapter 9.

    1.6 Exercises

    (1) Sampling Data

    Gauss file(s) basic_sample.g

    Matlab file(s) basic_sample.m

    This exercise reproduces the simulation results in Figures 1.1 and 1.2.

    For each model, simulate T = 5 draws of yt and plot the corresponding

    distribution at each point in time. Where applicable the explanatory

    variable in these exercises is xt = {0, 1, 2, 3, 4} and wt are draws from auniform distribution on the unit circle.

    (a) Time invariant model

    yt = 2zt , zt iidN(0, 1) .

    (b) Count model

    f (y; 2) =2y exp[2]

    y!, y = 1, 2, .

  • 1.6 Exercises 29

    (c) Linear regression model

    yt = 3xt + 2zt , zt iidN(0, 1) .

    (d) Exponential regression model

    f(y; ) =1

    texp

    [ yt

    ], t = 1 + 2xt .

    (e) Autoregressive model

    yt = 0.8yt1 + 2zt , zt iidN(0, 1) .

    (f) Bilinear time series model

    yt = 0.8yt1 + 0.4yt1ut1 + 2zt , zt iidN(0, 1) .

    (g) Autoregressive model with heteroskedasticity

    yt = 0.8yt1 + tzt , zt iidN(0, 1)

    2t = 0.8 + 0.8wt .

    (h) The ARCH regression model

    yt = 3xt + ut

    ut = tzt

    2t = 4 + 0.9u2t1

    zt iidN(0, 1) .

    (2) Poisson Distribution

    Gauss file(s) basic_poisson.g

    Matlab file(s) basic_poisson.m

    A sample of T = 4 observations, yt = {6, 2, 3, 1}, is drawn from thePoisson distribution

    f(y; ) =y exp[]

    y!.

    (a) Write the log-likelihood function, lnLT ().

    (b) Derive and interpret the maximum likelihood estimator, .

    (c) Compute the maximum likelihood estimate, .

    (d) Compute the log-likelihood function at for each observation.

    (e) Compute the value of the log-likelihood function at .

  • 30 The Maximum Likelihood Principle

    (f) Compute

    gt() =d ln lt()

    d

    =

    and ht() =d2 ln lt()

    d2

    =

    ,

    for each observation.

    (g) Compute

    GT () =1

    T

    Tt=1

    gt() and HT () =1

    T

    Tt=1

    ht() .

    (3) Exponential Distribution

    Gauss file(s) basic_exp.g

    Matlab file(s) basic_exp.m

    A sample of T = 4 observations, yt = {5.5, 2.0, 3.5, 5.0}, is drawn fromthe exponential distribution

    f(y; ) = exp[y] .(a) Write the log-likelihood function, lnLT ().

    (b) Derive and interpret the maximum likelihood estimator, .

    (c) Compute the maximum likelihood estimate, .

    (d) Compute the log-likelihood function at for each observation.

    (e) Compute the value of the log-likelihood function at .

    (f) Compute

    gt() =d ln lt()

    d

    =

    and ht() =d2 ln lt()

    d2

    =

    ,

    for each observation.

    (g) Compute

    GT () =1

    T

    Tt=1

    gt() and HT () =1

    T

    Tt=1

    ht() .

    (4) Alternative Form of Exponential Distribution

    Consider a random sample of size T , {y1, y2, , yT }, of iid randomvariables from the exponential distribution with parameter

    f(y; ) =1

    exp

    [y

    ].

    (a) Derive the log-likelihood function, lnLT ().

    (b) Derive the first derivative of the log-likelihood function, GT ().

  • 1.6 Exercises 31

    (c) Derive the second derivative of the log-likelihood function, HT ().

    (d) Derive the maximum likelihood estimator of . Compare the result

    with that obtained in Exercise 3.

    (5) Normal Distribution

    Gauss file(s) basic_normal.g, basic_normal_like.g

    Matlab file(s) basic_normal.m, basic_normal_like.m

    A sample of T = 5 observations consisting of the values {1, 2, 5, 1, 2} isdrawn from the normal distribution

    f(y; ) =12pi2

    exp

    [(y )

    2

    22

    ],

    where = {, 2}.(a) Assume that 2 = 1.

    (i) Derive the log-likelihood function, lnLT ().

    (ii) Derive and interpret the maximum likelihood estimator, .

    (iii) Compute the maximum likelihood estimate, .

    (iv) Compute ln lt(), gt() and ht().

    (v) Compute lnLT (), GT () and HT ().

    (b) Repeat part (a) for the case where both the mean and the variance

    are unknown, = {, 2}.

    (6) A Model of the Number of Strikes

    Gauss file(s) basic_count.g, strike.dat

    Matlab file(s) basic_count.m, strike.mat

    The data are the number of strikes per annum, yt, in the U.S. from 1968

    to 1976, taken from Kennan (1985). The number of strikes is specified

    as a Poisson-distributed random variable with unknown parameter

    f (y; ) =y exp[]

    y!.

    (a) Write the log-likelihood function for a sample of T observations.

    (b) Derive and interpret the maximum likelihood estimator of .

    (c) Estimate and interpret the result.

    (d) Use the estimate from part (c), to plot the distribution of the number

    of strikes and interpret this plot.

  • 32 The Maximum Likelihood Principle

    (e) Compute a histogram of yt and comment on its consistency with

    the distribution of strike numbers estimated in part (d).

    (7) A Model of the Duration of Strikes

    Gauss file(s) basic_strike.g, strike.dat

    Matlab file(s) basic_strike.m, strike.mat

    The data are 62 observations, taken from the same source as Exercise

    6, of the duration of strikes in the U.S. per annum expressed in days,

    yt. Durations are assumed to be drawn from an exponential distribution

    with unknown parameter

    f (y; ) =1

    exp

    [y

    ].

    (a) Write the log-likelihood function for a sample of T observations.

    (b) Derive and interpret the maximum likelihood estimator of .

    (c) Use the data on strike durations to estimate . Interpret the result.

    (d) Use the estimates from part (c) to plot the distribution of strike

    durations and interpret this plot.

    (e) Compute a histogram of yt and comment on its consistency with

    the distribution of duration times estimated in part (d).

    (8) Asset Prices

    Gauss file(s) basic_assetprices.g, assetprices.xls

    Matlab file(s) basic_assetprices.m, assetprices.mat

    The data consist of the Australian, Singapore and NASDAQ stock mar-

    ket indexes for the period 3 January 1989 to 31 December 2009, a total

    of T = 5478 observations. Consider the following model of asset prices,

    pt, that is commonly adopted in the financial econometrics literature

    ln pt ln pt1 = + ut , ut iidN(0, 2) ,where = {, 2} are unknown parameters.(a) Use the transformation of variable technique to show that the con-

    ditional distribution of p is the log-normal distribution

    f (p | pt1; ) = 12pi2p

    exp

    [ ln p ln pt1

    22

    ].

    (b) For a sample of size T , construct the log-likelihood function and de-

    rive the maximum likelihood estimator of based on the conditional

    distribution of p.

  • 1.6 Exercises 33

    (c) Use the results in part (b) to compute for the three stock indexes.

    (d) Estimate the asset price distribution for each index using the max-

    imum likelihood parameter estimates obtained in part (c).

    (e) Letting rt = ln pt ln pt1 represent the return on an asset, derivethe maximum likelihood estimator of based on the distribution of

    rt. Compute for the three stock market indexes and compare the

    estimates to those obtained in part (c).

    (9) Stationary Distribution of the Vasicek Model

    Gauss file(s) basic_stationary.g, eurodata.dat

    Matlab file(s) basic_stationary.m, eurodata.mat

    The data are daily 7-day Eurodollar rates, expressed as percentages,

    from 1 June 1973 to the 25 February 1995, a total of T = 5505 observa-

    tions. The Vasicek discret