EE625 : ECONOMETRICS - t Uecon.tu.ac.th/class/archan/Anan/teaching/EE625/slides.pdf · Econometrics...

29
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EE625 : ECONOMETRICS

Transcript of EE625 : ECONOMETRICS - t Uecon.tu.ac.th/class/archan/Anan/teaching/EE625/slides.pdf · Econometrics...

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

EE625 : ECONOMETRICS

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Introduction

I Economics concerns with relations among economic variables.Econometrics concerns the analysis of data describingeconomic relationship.

I We may also ask by curiosity whether a change in one variable(x) causes a change in another variable (y)?

I Examples of economic question:I What is the effect of school spending on student performance?I Does having another year of education cause an increase in

salary?I What is the effect of registering debt collectors on low-income

debtors?I Does reducing class size cause an improvement in student

performance?I What is the effect of prohibiting political campaign on voting

outcomes?I What is the effect of minimum wage on unemployment?I and so on...

I These questions have something in common.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Causality and the Notion of Ceteris Paribus in EconometricAnalysis

I What is a causal effect of x on y?

I For example, x could be ”institutions”, y could be ”economicdevelopment”, or x could be ”schooling”, and y could be”wage”

I Suppose x is correlated with y , can we interpret thisrelationship as causation?

I Consider the following story: in 1988, someone conducted aseries of interviews with freshmen and found that those whohad taken SAT preparation courses scored on average 63points lower than those who hadn’t.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

I The person then concluded that SAT preparation courses werenot helpful. Is his/her conclusion valid?

I How can we isolate the effect of x on y , and quantitativelyestablish that x matters for y?

I If we can run a controlled experiment, this may allow a simplecorrelation analysis to uncover causality. But this is rarely thecase in economics.

I We generally must accept the conditions under which peopleact and the responses occur. Typically we cannot choose thelevel of a treatment and then record the outcome, but we canoften observe people behavior as recorded in nonexperimentalor observational data.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Experimentation

I For example, how can we isolate the effect of institutions oneconomic performance, and quantitatively establish thatinstitutions matter for economic development?

I Suppose we can conduct the following experiment: Pick 2identical economies, holding all other factors fixed, changeinstitutions of only one country, and then watch what happento economic development of these 2 countries.

I Then we can convincingly attribute the difference indevelopment paths to institutional change.

I Fortunately, we cannot do that. Instead, we can useeconometric methods to effectively hold other factors fixed.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

I Experiments are conducted less often in the social sciencesthan in the natural sciences.

I Thus experimental data that are often collected in laboratoryenvironments in the natural sciences, are more difficult toobtain in the social sciences.

I Although some social experiments can be devised, it is oftenexpensive, or morally repugnant to conduct the kinds ofcontrolled experiments that would be needed to addresseconomic issues.

I What we usually have are nonexperimental or observationaldata. And econometrics has evolved as a separate disciplinefrom statistics because it focuses on the problems inherent incollecting and analyzing nonexperimental economic data.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

I Causal effect refers to the answer to the followingcounterfactual thought experiment: if, all else being equal, butonly x changes exogenously, what would be the effect on y?

I Thus the notion of ceteris paribus which means other(relevant) factors being equal plays an important role incausal analysis.

I Answering such causal questions is quite challenging, becauseit is hard to hold all other relevant factor fixed.

I The key question in most empirical studies is: Have enoughother factors been held fixed to make a case for causality? Ifsome relevant variables are omitted, it is then difficult toisolate changes in endogenous variables that are not driven byomitted factors.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

I Econometrics is useful because if it is carefully applied, it cansimulate a ceteris paribus experiment.

I For example, we might be interested in the effect of anotherweek of job training on wages, with all other componentsbeing equal (in particular, education and experience).

I If we succeed in holding all other relevant factors fixed andthen find a link between job training and wages, we canconclude that job training has a causal effect on workerproductivity.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

I Example: Measuring the Return to Education

I If a person is chosen from the population and given anotheryear of education, by how much will his or her wage increase?

I This is a ceteris paribus question where all other factors areheld fixed while another year of education is given to theperson.

I If we can conduct an experiment to measure the return toeducation? How would we set up this experiment?

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

I One way is to randomly pick and assign each person anamount of education; some are given no education at all,some are given a high school education, some are given twoyears of college, and so on.

I Subsequently, we measures wages for each group of people.

I However, the experiment described above is infeasible. Wecannot give someone only a high school education if he or shealready has a college degree and so on.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

I Even though experimental data cannot be obtained formeasuring the return to education, we can collectnonexperimental data on education levels and wages for alarge group by sampling randomly from the population ofworking people.

I Example of such data: the Current Population Survey (CPS),the Labor Force Survey (LFS).

I A common feature of many observational data isself-selection. Usually people choose their own levels ofeducation. Therefore education levels are probably notdetermined independently of all other factors affecting wage.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

I If we settle on a list of controls, and if all factors in the listcan be observed, then estimating the causal effect of x on y isquite straightforward. But some factors in the list may not beobservable.

I For example, to estimate the causal effect of education onwage, we might decide that the relevant list to control for isyears of workforce experience, and innate ability.

I Since pursuing more education generally requires postponingentering the workforce, those with more education usuallyhave less experience.

I Thus, in a nonexperimental data set on wages and education,education is likely to be negatively associated with experience.

I People with more innate ability often choose higher levels ofeducation. Since higher ability leads to higher wages, thereshould be a positive relationship between education and ability.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

I As it is not difficult to measure experience, it is likely to havethis variable in nonexperimental data set.

I So accounting for observed factors, such as experience, whenestimating the ceteris paribus effect of another variable, suchas education, is relatively straightforward.

I Ability, on the other hand, is difficult to measure.

I Accounting for inherently unobservable factors, such as ability,is much more problematic. Many of the advances ineconometric methods have tried to deal with unobservedfactors in econometric models.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Types of Data

I Some econometric methods can be applied across differentkinds of data sets.

I But some data sets might have special features that must beaccounted for or should be exploited in particular.

I The most important data structures encountered in appliedwork are

I 1) Cross-Section DataI 2) Time Series DataI 3) Pooled Cross SectionsI 4) Panel Data

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Cross-Section Data

I A cross-section data consists of sample units collected in aparticular time period. The ”sample units” could be persons,households, firms, cities, states, or countries, etc.

I We can often assume in cross-sectional data that they havebeen obtained by random sampling from the underlyingpopulation.

I For example, if 1,289 people are randomly drawing from theworking population and we acquire information on wages,education, experience, and other characteristics, then we havea random sample from the population of working people.

I Random sampling simplifies the analysis of cross-sectionaldata.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

I Example of cross-section data: a portion of CPS

I The variable obs is the observation number assigned to eachperson in the sample. Unlike the other variables, it is not acharacteristic of the individual.

I It does not matter which person is labeled as observation 1,which person is called observation 2, and so on.

I The fact that the ordering of the data does not matter foreconometric analysis is a key feature of cross-sectional datasets obtained from random sampling.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Time Series Data

I A time series data consists of observations on a variable orseveral variables over time.

I The observations refer to a single unit such as person,household, firm, village, province, country and so on.

I For example, rainfalls, stock prices, M1, CPI, GDP, exchangerates, export, import, etc.

I Because past events can influence future events and lags inbehavior are prevalent in the social sciences, time is animportant dimension in a time series data set.

I Unlike the arrangement of cross-sectional data, thechronological ordering of observations in a time series conveyspotentially important information.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

I Because observations in time series can rarely be assumed tobe independent across time, time series data are usually moredifficult to analyze than cross-section data.

I Most time series are related to their recent histories.

I For example, knowing something about the GDP from lastquarter tells us quite a bit about the likely range of the GDPduring this quarter, because GDP tends to remain fairly stablefrom one quarter to the next. Or the probability that it rainstoday is not independent of whether it rained yesterday.

I Although most econometric procedures can be used with bothcross-sectional and time series data, more needs to be done inspecifying econometric models for time series data beforestandard econometric methods can be justified.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Pooled Cross Sections

I Some data sets have both cross-sectional and time seriesfeatures.

I For example, if we pool cross section data SES1986 withanother cross section data SES2006. This pooled cross sctionswill have more sample of the same variables.

I Pooled cross sections is an effective way to analyze the effectsof a new government policy.

I We can also use it to see how a key relationship has changedover time.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Panel Data

I Panel data set consists of a time series for each cross-sectionalmember in the data set.

I For example, suppose we have wage, education, andemployment history for a set of individuals followed over aten-year period. Or we might collect investment and financialdata of the same set of firms over a five-year time period.

I Panel data can also be collected on geographical units. Forexample, we can collect data for the same set of villages onmigration flows, remittances, employment, loan default, andso on, for the years 2000, 2001, and 2010.

I The key feature of panel data that distinguishes them from apooled cross section is that the same cross-sectional units(individuals, firms, or villages in the preceding examples) arefollowed over a given time period.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

I Having multiple observations on the same units allows us tocontrol for certain unobserved characteristics of individuals,firms, and so on.

I The use of more than one observation can facilitate causalinference in situations where inferring causality would be verydifficult if only a single cross section were available.

I Thus, even we can treat panel data set as a pooled crosssection. But the panel structure can be used to to analyzequestions that cannot be answered by simply viewing this as apooled cross section.

I Another advantage of panel data is that they often allow us tostudy the importance of lags in behavior or the result ofdecision making.

I This information can be significant because many economicpolicies can be expected to have an impact only after sometime has passed.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Empirical Relations

I A portion of CPShttp://www.census.gov/programs-surveys/cps.html

I Is there any wage difference across gender, race, unionmembership?

I In this case, y is wage, and x are gender, race, unionmembership. y is the dependent or explained variable and xare the independent or explanatory variables.

I Typically, both y and x are assumed to be random variableswhich means that the observations are supposed to begenerated by a random experiment, in advance of which theirvalues are unknown.

I This aims to capture the idea that the experiment might inprinciple be performed repeatedly, in each case throwing up anew sample whose particular values are not predictable inadvance.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

I First, look at differences in sample means of each group.

I On average, men receive $3.52522 per hour more than women.

I Whites receive $2.804217 per hour more than nonwhites.

I Union members receive $2.20683 per hour more thannonmembers.

I Can we conclude that being male causes earning to be higherby $3.52522 per hour and so on?

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

I There is systematic variation in the characteristics of eachgroup that confound the difference in wages.

I 19.14% of men in the sample are members of unions but only13% of women are.

I 86.27% of men are white while 83.15% of women are white.

I The average difference in wage between men and womenreflects union membership and racial differences as well asgender.

I We also expect that wages vary with education andexperience.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

020

4060

Hou

rly w

age

in d

olla

rs

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Education in years

Figure: Scatter Plot of Hourly Wage against Education Level

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

020

4060

Hou

rly w

age

in d

olla

rs

0 10 20 30 40 50 60Potential work experience in years, age−schooling−6

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

020

4060

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Education in years

Hourly Wage ($)Mean Wage ($)

Raw Data and Conditional Means

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

020

4060

0 10 20 30 40 50 60Potential work experience in years, age−schooling−6

Hourly Wage ($)Mean Wage ($)

Raw Data and Conditional Means

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

I On average, wage rises with education while wage rises andthen fall with experience.

I Wages vary systematically with education and experience.

I Variables in the data set covary.

I Wage is positively correlated with education, experience,union membership, but negatively correlated with beingfemale and nonwhite, and so on.

I With so many attributes varying simultaneously acrossindividuals, how can we separate the variation in wageassociated with one attribute, say gender, from anotherattribute, say education?

I OLS is one method that we can use to sort out suchsimultaneous variation in several variables.