Stats 3 Note 04

7/28/2019 Stats 3 Note 04

1/48

17.874LectureNotesPart3: Regression Model

7/28/2019 Stats 3 Note 04

2/48

3. RegressionModelRegressioniscentraltothesocialsciencesbecausesocialscientistsusuallywanttoknow

the eectof one variable orsetofvariablesofinterestonan outcomevariable,holding allelse equal. Herearesomeclassic, important problems:

What istheeectofknowledge ofasubject onpreferencesaboutthat subject? Dounin-formedpeoplehave anypreferences? Are thosepreferences short-sighted? Dopeopledevelop\enlightened"interests,seeinghowtheybenet fromcollectivebenet?

What is the eect of increasing enforcement and punishment on deviant behavior? Forexample, how much doesan increase in thenumber of policeon the streets on thecrime rate? Doesthedeathpenaltydetercrime?

Howaremarketsstructure? Howdoesthesupplyofa goodchangewiththe pricepeoplearewilling topay? Howdoespeople'swillingnesstopay fora goodchange asagoodbecomes

scarce?

To whom are electedrepresentatives responsible? Do partiesandcandidatesconverge to

the preferred policy of the median voter, as predicted by many analytical models ofelectoralcompetition? Ordopartiesandinterestgroupsstronglyinuencelegislators'behavior?

Ineachof these problemswe areinterested in measuring howchanges in an input or inde-pendent

variable

produce

changes

in

an

output

or

dependent

variable.

Forexample,mycolleagues interested inthestudyofenergywishtoknow ifincreasing

publicunderstandingof the issue would changeattitudes about global warming. In par-ticular, would peoplebecomemore likelyto supportremediation. Ihavedoneacoupleofsurveysforthisinitiative. Partofthedesignofthesesurveysinvolvescapturingtheattitudetoward globalwarming, the dependent variable. One simple variable is the willingness topay.

1

7/28/2019 Stats 3 Note 04

3/48

If it solved globalwarming, would you bewilling to pay$5 more a month onyourelectricitybill? [ofthosewhoansweredyes]$10moreamonth? $25moreamonth? $50moreamonth? $100moreamonth?

Wealsowanted todistinguishtheattitudesofthosewhoknowalotaboutglobalwarmingandcarbonemissionsandthosewhodonot. Weaskedabatteryofquestions. Peoplecouldhave gotten up to 10 factual questions right. The averagewas 5. Finally, we controlledmanyother factors,such as income,religiosity,attitude towardbusiness regulation,sizeofelectricitybillandsoforth.

Theresultsoftheanalysisareprovidedonthehandout. Thedependentvariableiscoded0forthoseunwillingtopay$5,1forthosewillingtopaynomorethan$5,2forthosewillingto payno more than$10, 3 forthose willing to pay more than $25,4 forthose willing topaynomorethan$50,and5forthosewhosaidtheywouldbewillingtopay$100(roughlydoublethetypicalelectricitybill).

Whatisestimatedbythecoecientoninformation(gotit)istheeectofknowingmoreabout

carbon

dioxide

on

willingness

to

pay

for

asolution

to

global

warming.

Moving

from

no information to complete information(from 0 to 10) increases willingness to payby .65unitsalongthewillingnesstopayscale,whichrangesfrom0to5andhasameanof1.7andstandarddeviationof1.3

0 1Moregenerally,weseektomeasuretheeectonY ofincreasingX1 from x1 tox1. The expectedchange intherandomvariableY is:

1 0E[yjx1; x2;:::xk]E[y jx1; x2;:::xk];wherethevaluesofallindependentvariablesotherthanx1 areheldconstant.

Unfortunately,theworld isamess. Weusuallydealwithobservationaldataratherthancarefully constructed experiments. It is exceedingly dicult to hold all else constant inorder toisolatethepartialeect ofavariable of interestontheoutcome variable. Insteadoftryingtoisolateandcontroltheindependentvariableof interest inthedesignofastudy;weattempttomeasureitsindependenteectintheanalysisofthe datacollected.

2

7/28/2019 Stats 3 Note 04

4/48

Multivariateregression istheprimarydataanalytictoolwithwhichweattempt toholdotherthingsconstant.

3.1. BasicModelIn general, we can treat y and X asrandom variables. At times we willswitch into a

simpler problem{ X xed. What ismeant bythis distinction? Fixed valuesof X wouldariseiftheresearcherchosethespecicvaluesofX andassignedthemtotheunits. RandomorstochasticX ariseswhenthe researcherdoesnotcontroltheassignment{ thisistrueinthelabaswellastheeld.

FixedX will be easier to deal with analytically. If X is xedthen it isguaranteed tobehavenocorrelationwithbecausethevectorsofthematrixofX consistofxedvaluesof thevariables, notrandomdraws fromtheset ofpossiblevalues of thevariables. Moregenerally,X israndom,andwemustanalyzetheconditionaldistributionsofjX carefully.

Thebasicregressionmodelconsistsofseveralsetsofassumptions.1. Linearity

y=X+2. 0meanoftheerrors.

E[jX] = 03. NocorrelationbetweenX and:

E[XjX] = 04. SphericalErrors(constanterrorvarianceandnoautocorrelation):

E[0jX] = 2I5. Normallydistributederrors:

jXN(0; 2I)3

7/28/2019 Stats 3 Note 04

5/48

Indeed,thesepropertiesareimpliedby(sucientbutnotnecessary),theassumptionthaty;Xarejointlynormallydistributed. Seethepropertiesoftheconditionaldistributions insection2.6.

It iseasytosee that thismodel implementsthe denitionofaneect above. Indeed, ifalloftheassumptionsholdwemightevensaytheeectcapturedbytheregressionmodel iscausal.

Twofailuresofcausalitymightemerge.

First,there

may

be

omitted

variables.

Any

variable

Xj

that

is

not

measured

and

included

in a model is captured in the error term . An included variable might appear tocause(ornotcause)y,butwehave in factmissed thetruerelationshipbecausewedidnothold Xj constant. Ofcourse,there area very large number of potentialomittedvariables,andthestruggleinanyeldofinquiryistospeculate whatthosemightbeandcomeupwithexplicitmeasurestocapturethoseeectsordesignstoremovethem.

Second, theremaybesimultaneouscausation. X1 mightcausey, but y mightalsocauseX1. Thisisacommonproblem ineconomicswhere pricesat which goodsareboughtand the quantities purchasedaredeterminedthroughbargainingor throughmarkets.Pricesandquantitiesaresimultaneously determined. Thisshows up, incomplicatedways,intheerrorterm. Specically,theerrortermbecomesrecursivejX1 necessarilydependsonujy,theerrortermfromtheregressionofX ony.

Socialscientistshavefoundwaystosolvetheseproblems,involving\quasi-experiments"and

sometimes

real

experiments.

Within

areas

of

research

there

is

alot

of

back

and

forth

about what specic designs and tools really solve the problems of omitted variables andsimultaneity. The lastthirdofthecoursewillbe devotedtothese ideas.

Nowwefocusonthetoolsforholdingconstantotherfactorsdirectly.

4

7/28/2019 Stats 3 Note 04

6/48

3.2. Estimation

The regression modelhasK +1 parameters{ the regression coecients and the errorvariance. But,thereare n equations. Theparametersareoverdetermined,assumningn > K +1. We mustsome how reducethe data at hand to devise estimatesof theunknownparameters intermsofdataweknow.

Overdeterminationmightseemlikeanuisance,butifn weresmallerthank + 1 we could not hope toestimatethe parameters. Here lies a moregeneral lesson about social inquiry.Bewaryofgeneralizationsfromasingleorevenasmallsetofobservations.

3.2.1. TheUsual SuspectsThere are, as mentioned before, three general ideas about how use data to estimate

parameters.1. TheMethodofMoments.(a)Expressthetheoreticalmomentsoftherandomvariables

asfunctionsoftheunknownparameters. Thetheoreticalmomentsarethemeans,variances,covariances,andhigher powersoftherandom variables. (b)Usetheempiricalmomentsasestimatorsofthe theoreticalmoments{ thesewillbe unbiasedestimatesof the theoreticalmoments. (c)Solvethesystemofequationsforthevaluesoftheparameters intermsoftheempiricalmoments.

22. Minimum Variance/Minimum Mean Squared Error (Minimum ). (a) Expresstheobjective functionofthesumsquared errorsasafunctionofthe unknown parameters. (b)Findvalues oftheparametersthat minimize thatfunction.

3. MaximumLikelihood. (a)Assumethattherandomvariablesfollwaparticulardensityfunction (usually normal). (b) Find values of the parameters that maximize the densityfunction.

5

7/28/2019 Stats 3 Note 04

7/48

We have shown that for the regression model, all three approaches lead to the sameestimate ofthevectorofcoecients:

b= (X0X)1X0yAnunbiasedestimateofthevarianceofthe errortermis

02 e es =e nK

Themaximumlikelihoodestimate is0

2e e

^ = nWemayalsoestimatethevariancedecomposition. Wewishtoaccountforthetotalsum

ofsquareddeviations iny:0y y

, which has mean square y0y=(n1) (the est. variance of y).Theresidualor errorsumofsquaresis

0e ee0e, which has mean square s2 = nK:e

Themodelorexplainedsumofsquares isthedierencebetweenthese:0 0 0y ye e=y y(y0b0X0)(yXb) = b0X0Xb:

whichhasmeansquareb0X0Xb=(K1). R2 =b0X 0Xb.y0y3.2.2. InterpretationoftheOrdinaryLeastSquaresRegressionAn importantinterpretationoftheregressionestimatesisthattheyestimatethepartial

derivative, j. The coecient from a simple bivariate regressionof y on Xj measurestheTOTALeectofaXj on y. Thisisakintothetotalderivative. Letusbreakthisdown intocomponentparts andseewhereweendup.

6

7/28/2019 Stats 3 Note 04

8/48

Thetotalderivative,youmayrecall, isthepartialderivativeofywithrespecttoXj plusthesumof partialderivatives ofy with respectto allother variables,Xk, times the partial derivative ofXk withrespecttoXj. If ydependsontwoX variables,then:

dy @y @y dX2= +

dX1 @X1 @X2dX1dy @y @y dX1= +

dX2 @X2 @X1dX2Using the idea that the bivariate regression coecients measure the total eectof one

variable onanother,we canwritethe bivariate regressionestimates interms of the partialregressioncoecient(fromamultivariateregression)plusotherpartialregressioncoecientsand bivariateregressioncoecients. For simplicityconsidera casewith2X variables. Leta1 anda2 betheslopecoecients from the regressionsofyonX1 andyonX2. Let b1 andb2 be the partialregressioncoecientsfromthemultivariateregressionofy on X1 andX2.Andletc1 betheslopecoecientfromtheregressionofX1 onX2 andc2 betheslopefromtheregressionofX2 onX1.

We canexpress our estimates ofthe regression parameters interms of the other coe-cients. Letr bethecorrelationbetweenX1 andX2.

0 1 0 10 )y2

0 )y1

0 )(x x2120 )x21

0 )(x x2120 )x21

0 ) (y x10 ) (x x2 2

0 ) (y x20 ) (x x2 2

0 )(x x11

0 )(x x11

0 )(x x22

0 )(x x11

(x(x

(x

(x

Bb1 =b2

a1a2c11r2

C BC B= CCBB C B C@ A @ A

a2a1c21r2

Wecansolvethesetwoequationstoexpressa1 and a2 intermsofb1 andb2.

a1 =b1+b2c1a2 =b2+b1c2:

This isestimatedtotaleect(bivariate regression)equalstheestimateddirecteectofXj(partialcoecient)onyplustheindirecteectofXj throughXk.

Another wayto expresstheseresults is that multivariateregressioncanbe simpliedtobivariate regression,once we partialout the eectsofthe othervariablesonyandonXj.

7

7/28/2019 Stats 3 Note 04

9/48

SupposewewishtomeasureanddisplaytheeectofX1 on y. Regress X1 onX2;:::Xk.RegressyonX2;:::Xk. Taketheresidualsfromeachoftheseregressions,e(x1jx2;:::xk) and e(yjx2;:::xk). TherstvectorofresidualsequalsthepartofX1 remainingaftersubtractingouttheothervariables. Importantly,thisvectorofresidualsisindependentofX2;:::Xk. The secondvectorequalsthepartofyremainingaftersubtratctingtheeectsofX2;:::Xk{ that is the directeectofX1 andtheerrorterm.

Regresse(yjx2;:::xk) on e(x1jx2;:::xk). This bivariate regression estimates the partialregressioncoecientfromthemultivariateregression,1.

Partialregressionisveryusefulfordisplayingaparticularrelationship. Plottingtherstvectorof residualsagainstthesecondvectordisplaysthepartialeectofX1 ony, holdingallothervariablesconstant. InSTATApartialregressionplotsareimplmentedthroughthecommand avplot. Afterperforming aregressiontypethecommandavplot x1, where x1is the variableof interest.

Analwaytointerpretregressionisusingprediction. Thepredictedvaluesofaregressionarethevaluesoftheregressionplane,y . TheseareestimatesofE[y

jX]. We may vary values

of any Xj to generate predicted values. Typically, we consider the eect on y of a one-standarddeviationchangeinXj,say from1/2 standarddeviationbelowthe mean of X to1/2standarddeviationabovethemean.

1Denote y1 as the predicted value at xj = xj +:5sj and y0 as the predicted value at0xj = xj :5sj. The matrix of all variables except xj is X(j) and b(j) is the vector of

coecientsexcept bj. Chooseanyvectorx(j). A standard deviation change in thepredictedvaluesis:

1 0 1 0y1y0= (b0+bjxj+x(0j)b(j))(b0 +bjx0 +x(j)b(j)) = bj(xj xj ) = bjsj:jFornon-linearfucntionssuchvariationsarehardertoanalyze,becausetheeect ofany

onevariabledependson thevaluesof theothervariables. We typicallysetothervariablesequaltotheirmeans.

3.2.3. AnAlternativeEstimationConcept: InstrumentalVariables.8

7/28/2019 Stats 3 Note 04

10/48

Therearemanyotherwayswemayestimatetheregressionparameters. Forexample,wecouldminimizetheMeanAbsoluteDeviationoftheerrors.

Oneimportantalternativetoordinary leastsquaresestimationisinstrumentalvariablesestimation. Assumethat thereisavariable or matrixof variables,Z, such that Zdoesnotdirectlyaecty,i.e., itisuncorrelatedwith,butit doescorrelatestronglywithX.

Theinstrumentalvariablesestimatoris:bIV = (X0Z)1Z0y:

This is a linearestimator. Itis aweightedaverage ofthey's,wherethe weights areoftheform zi x), insteadof (xixi x).

Inthebivariatecase,thisesimatoristheratiooftheslopecoecientfromtheregressionofyonztotheslopecoecientfromtheregressionofxonz.

This estimator is quite important in quasi-experiments, because, if we can nd validinstruments,theestimatorwillbeunbiasedbecauseitwillbeindependentofomittedfactorsintheleastsquaresregression. Itwill,however,comeatsomecost. Itisanoisierestimatorthanleastsqures.

(ziz)(xi x)(xi

9

7/28/2019 Stats 3 Note 04

11/48

3.3. Properties ofEstimates

Parameter estimates themselves are random variables. They are functions of randomvariables that we use to make guesses about unknown constants (parameters). Therefore,fromonestudytothenext,parameterestimateswillvary. Itisourhopethatagivenstudyhasnobias,soifthestudywererepeatedunderidenticalconditionstheresultswouldvaryaroundthetrueparametervalue. Itisalsohopedthatestimationusesallavailabledataasecientlyaspossible.

We wish to know the sample properties of estimates in order to understand when wemight face problems thatmight leadus todrawthewrongconclusions fromourestimates,suchasspuriouscorrelations.

Wealsowishtoknowthesamplepropertiesofestimatesinordertoperforminferences.At times those inferences are about testing particular theories, but often our inferencesconcernwhetherwe'vebuiltthebestpossiblemodelusingthedataathand. Inthisregard,weareconcernedaboutpotentialbias,andwilltry toerrorontheside ofgetting unbiasedestimates.

Erringonthe sideofsafety,though,cancostus eciency. Test for theappropriatenessofonemodelversusanother,then,dependonthetradeobetweenbiasandeciency.3.3.1. SamplingDistributionsofEstimates

Wewillcharacterizetherandomvectorbwithitsmean,thevariance,andthefrequencyfunction. Themeanis,whichmeansthatitisunbiased. Thevarianceis2(X0X)1, which isthematrixdescribingthevariancesandcovariancesoftheestimatedcoecients. Andthedensity function f(b) isapproximatedbytheNormaldistributionasnbecomeslarge.

Theresultsaboutthemeanandthevariancestemfromtheregressionassumptions.First,considerthemeanoftheparameterestimate. Undertheassumptionthat0X= 0,

10

7/28/2019 Stats 3 Note 04

12/48

theregressionestimates areunbiased. Thatis,E[b] = E[bjX] = E[(X0X)1X0y] = E[(X0X)1X0(X+)]=(X0X)1X0X+E(X0X)1X0=Thismeansthat beforewedoastudywe expectthedatatoyielda bof. Ofcourse,thedensity of anyone point iszero. So anotherway to think aboutthis is if wedo repeatedsamplingunderidenticalcircumstances,thenthecoecientswillvaryfromsampletosample.Themeanofthosecoecients,though,willbe .

Second,considerthevariance.V[b] = E[(b )(b )0] = E[(+ (X0X)1X0 )(+ (X0X)1X0 )0]

=E[(X0X1X0)((X0X)1X0)0] = E[X0X1X00X(X0X)1]=2(X0X1X0X(X0X)1) = 2X0X1

Herewehave assumedthe sphericalityof the errors andtheexogeneityof the independentvariables.

Thesamplingdistributionfunctioncanbeshowntofollowthejointnormal. Thisfollowsfromthemultivariate versionofthe centrallimit theorem,whichIwillnotpresentbecauseofthecomplexityofthemathematics. Theresultemerges,however, fromthefactthattheregressioncoecientsareweightedaveragesofthevaluesoftherandomvariabley,wherethe

xiweightsareoftheform(xix)2. Moreformally,letbn bethevectorofregressioncoecients

estimatedfromasampleofsize n.bn !d N(; 2(X0X1))

Ofcourse,asn! 1, (X0X1)! 0. Wecanseethisinthefollowingway. Theelementsx0jxkof(X0X1) are, up to a signed term, jX 0X j. Theelementsof thedeterminantare multiples

ofthe crossproducts,andasobservationsareaddedthedeterminantgrowsfasterthananysinglecrossproduct. Hence,eachelementapproaches0.

Consider the2x2case.

x2x2;x10x20 0 0 0(X

0X1) = x1x1x2x2

1 (x1x2)2

0x2x1;x10x1 :

11

7/28/2019 Stats 3 Note 04

13/48

This mayberewrittenas1x x (x

1(x x

0 x x110 x22

0 ) (x x2 1

20 )x2 =1

0 x x2=2

0 (x x110 x x11

!; 0 )x21

;0 )x21

11

0 x2220 )x x2 =1

0 ) (x x2 10 (x x11

0 x x2=2

Takingthelimitofthismatrixasn! 1 amountstotakingthelimitofeachelementofthematrix. Considering eachterm, we see that the sumof squares in the denominators growwiththe addition ofeach observation,while the numeratorsremain constant. Hence,eachelement approaches0.

Combined with unbiasedness, this last result proves consistency. Consistency meansthatthe limitas ngrowsof the probabilitythatanestimatordeviates fromthe parameterofinterestapproaches0. Thatis,

limn!1P r(j n j) = 0 A sucientconditionforthis isthattheestimatorbeunbiased and thatthevarianceoftheestimatorshrinkto0. Thiscondition iscalledconvergenceinmeansquarederror,andisanimmediateapplicationofChebychev'sInequality.

Of course, this means that the limiting distribution of f(b) shrinks to a single point,which is not so good. So the normality of the distribution of b is sometimes written asfollows:

pn(bn ) N(0; 2Q1);

1whereQ=n(X0X),theasymptotic variancecovariancematrixofX's.Wemayconsiderinthisthedistributionofasingleelementofb.

bj N(j; 2ajj);

whereajj isthejthdiagonalelementof(X0X)1.Hence,wecanconstructa95percentcondenceintervalforanyparameterj usingthe

normal distribution and the above formula for thevariance of bj. The standard error isq2ajj. So,a95percentcondenceinterval fora singleparameter is:

bj 1:96q

2ajj

12

7/28/2019 Stats 3 Note 04

14/48

.

Theinstrumentalvariablesestimatorprovidesaninterestingcontrasttotheleastsquaresestimator. LetusconsiderthesamplingpropertiesoftheInstrumentalVariablesestimator.

E[bIV ] = E[(X0Z)1(Z0y)]=E[(X0Z)1(Z0(X+)]=E[(X0Z)1(Z0X)+ (X0Z)1(Z0))]=E[+ (X0Z)1(Z0))]=

Theinstrumentalvariablesestimatorisanunbiasedestimatorof.Thevariance oftheinstrumentalvariablesestimatoris:

0V[bIV ] = E[(bIV )(bIV )0] = E[(X Z)1(Z0))((X0Z)1(Z0))0]2=E[(X0Z)1(Z0Z((X0Z)1] = (X0Z)1(Z0IZ((X0Z)1

=2(X0Z)1(Z0Z((X0Z)1:As with the least squares estimator the instrumental variables estimator will follow a

normaldistribution becausetheIVestimator isa(weighted)sumoftherandomvariables,y.3.3.2. BiasversusEciency

3.3.2.1. GeneralEciencyof Least SquaresAn importantproperty oftheOrdinaryLeastSquaresestimates is that theyhavethe

lowestvarianceofalllinear,unbiasedestimators. Thatis,theyarethemostecientunbiasedestimators. ThisresultistheGauss-MarkovTheorem. Anotherversionarises as apropertyofthemaximumlikelihoodestimator,wherethelowerboundforthevariancesofallpossibleconsistent estimatorsis2(X0X)1.

ThisimpliesthattheInstrumentalVariablesestimatorislessecientthantheOrdinaryLeastSquaresestimator. Toseethis,considerthebivariateregressioncase.

2V[b] = Px)2(xi

13

7/28/2019 Stats 3 Note 04

15/48

P(ziz)22V[bIV] = Px)(ziz)(xiP P

x)2PComparing these two formulas: V[bIV]=V[b] = ( (ziz)2 (xi . This ratio is the inversex)(ziz)(xiofthesquareofthecorrelationbetweenX andZ. Sincethecorrelationneverexceeds1,weknowthenumeratormustbelargerthanthedenominator. (Thesquareofthecorrelationisknownto be less than 1 becauseoftheCauchy-Schwartz inequality.)

3.3.2.2. Sourcesof Bias and Ineciency in Least SquaresTherearefourprimarysourcesofbiasandinconsistencyinleastsquaresestimates: mea-

surement error in the independent variables, omitted variables, non-linearities, and simul-taneity. We'lldiscusstwoofthesecaseshere{specicationofregressors(omittedvariables)andmeasurementerror.

MeasurementError.Assume that X

isthe true variableof interest butwe canonlymeasure X =X

+u;

where u is a randomerrorterm. Forexample,supposeweregresstheactual shareofthevotefortheincumbentpresidentinanelectiononthejobapprovalratingoftheincumbencypresident, measured in a 500 person preelection poll the week before the election. Galluphas measured thissince1948. Eachelectionisanobservation. The polling data willhavemeasurementerrorbasedonrandomsamplinginherentinsurveys. Specically,thevarianceofthemeasurement erroris p(1np), where pis the percentapprovingofthe president.

Anothercommon

source

of

measurement

error

arises

from

typographical

errors

in

datasets.

Keypunching errors are verycommon, even in data sets distributed publiclythrough rep-utablesourcessuchastheInteruniversityConsortiumforPoliticalandSocialResearch. Forexample,GaryKing,Jim Snyder,andothers whohave workedwithelectiondata estimatethatabout 10 percent of the party identicationcodes ofcandidates are incorrect insomeoftheolder ICPSRdatasetsonelections.

Finally, somedata sources are not very reliable, or estimates must be made. This is14

7/28/2019 Stats 3 Note 04

16/48

commoninsocialandeconomic dataindevelopingeconomiesandindataonwars.Ifweregressy on X,the coecient is a function ofboth the true variable X andthe

errorterm. That isPn P

b= P x)(yiy) i=1(xi x+uii=1(xi n u)(yiy) u)2n x)2 = Pin=1(xi x+uii=1(xiPn Pni=1(xi

x)(yi y) + i=1(uiu)(yi y)=Pn Pn Pni=1(xi

x)2+ i=1(ui x)(uiu)u)2+ i=1(xi This looks likequiteamess.

Afewassumptionsaremadetogetsometraction. First,itisusuallyassumedthatuandandthatuandX areuncorrelated. Iftheyarecorrelated,thingsareevenworse. Second,theui'sis assumedtobe uncorrelatedwithoneanotherandtohave constant variance.

Taking expected values of b will be very dicult, becauseb is a function of the ratioofrandom variables. Here is asituationwhere Probability Limits (plim's)make lifeeasier.Becauseplimsarelimits,theyobeythebasicrulesoflimits. Thelimitofasum isthesumofthelimitandthelimitofaratiois the ratioofthelimits. Dividethetopandbottomofbby n. Nowconsidertheprobabilitylimitofeachelementofb:

X1 nplim (x x)2 =2i xni=1

n n1X 1Xplim (x x)(yiy) = plim (xi x)(x x) = 2 i i xn ni=1 i=1

X1 nplim (xi x)(uiu) = 0 ni=1

nX1u)2

=2plim (ui uni=1

Wecanpulltheselimitstogetheras follows:2 2

plimb= x = x < 2+2 2+2x u x u

Thus,inabivariateregression,measurementerrorintheX variablebiasestheestimatetoward 0. In a multivariate setting, the bias cannot generally be signed. Attentuation is

15

7/28/2019 Stats 3 Note 04

17/48

typical,butitispossiblealsotoreversesignsorinatethecoecients. Innon-linearmodels,suchasthoseusingthesquareofX,thebiastermsbecomequitesubstantialandevenmoretroublesome.

Thebestapproachforeliminatingmeasurementerroriscleaningthedataset. However,it is possibleto xsome measurementerror usinginstrumentalvariables. One approach istousethequantilesortheranksoftheobservedX topredicttheobservedX and then use thepredictedvaluesintheregressionpredictingy. TheideabehindthisisthattheranksofX arecorrelatedwiththeunderlyingtruevalues ofX,i.e., X butnotwithuor.

Theseexamplesofmeasurementerrorassumethattheerrorispurelyrandom. Itmaynotbe. Sometimes measurement error issystematic. For example,people underreport sociallyundesirable attitudes orbehavior. This isan interestingsubjectthatisextensivelystudiedinpublicopinionresearch,butoftenunderstudiedinotherelds. Agoodsurveyresearcher,forexample, willtapinto archivesofquestionsandeven do question wording experimentstotest thevalidityandreliability of instruments.

ChoiceofRegressors:omittedand IncludedVariables.The struggle in most statisticalmodeling isto specify the most appropriate regression

modelusingthedataathand. Thedicultyisdecidingwhichvariablestoincludeandwhichtoexclude. Inrarecaseswecanbeguidedbyaspecictheory. Mostoften,though,wehavegathered data orwill gatherdata to test specic ideasand arguments. Fromthedata athandwhatisthebestmodel?

There are threeimportantrulestokeepinmind.1. OmittedVariablesThatDirectlyAectY AndAreCorrelatedWithX ProduceBias.The most common threat to the validity of estimates from a multivariate statistical

analysis isomittedvariables. omittedvariablesaectboththeconsistencyandeciencyofourestimates. Firstandforemost,theycreatebias. Eveniftheydonotbiasourresults,we

16

7/28/2019 Stats 3 Note 04

18/48

oftenwant tocontrolforother factors toimproveeciency.ToseethebiasduetoomittedvariablesassumethatX isamatrixofincludedvariables

andZ isamatrixofvariablesnotincludedintheanalysis. The fullmodelisy=XX +Zz +:

SupposethatZisomitted. Obviouslywecan'testimatethecoecient . Will the other zcoecientsbebiased? Istherealoss ofeciency?

Letbx be theparameter vector estimatedwhenonlythe variables inthe matrix Xareincluded. LetX betthesubsetofcoecientsfromthetruemodelontheincludedvariables,X. The model estimated is

y=XX +u;whereu=Zz +.

0 0X)1 0 0E[bX ] = E[(X0X)1X y] = E[(X X XX + (X0X)1X u]= +E[(X0X)1X0Zz] + E[(X0X)1X0] = X +0 z;X zx

wherezx isamatrixofcoecientsfromtheregressionofthecolumnsofZonthe variablesin X.

Thisis anextremelyusefulformula. There are twoimportant lessonstotakeaway.First, omittedvariableswillbiasregressionestimatesoftheincludedvariables(andlead

toinconsistency)if(1)thosevariablesdirectlyaectY and(2)thosevariablesarecorrelatedwiththeincluded variables(X). It is not enough,then,to objecttoananalysisthatthereare variables that have not been included. That is always true. Rather, science advancesby conjecturing (and then gathering thedata on) variables that aecty directlyand arecorrelatedwithX. Ithinkthelatterisusuallyhardtond.

Second, wecan generally signthebiasofomittedvariables. Whenwethinkaboutthepotential problem of an omitted variable we usually have in mind the direct eect that it

17

7/28/2019 Stats 3 Note 04

19/48

likelyhasonthedependentvariableandwemightalsoknoworcanmakeareasonableguessaboutthecorrelationwiththeincludedvariablesofinterest. Thebiasinanincludedvariablewillbethedirecteectoftheomittedvariableonytimestheeectoftheincludedvariableontheexcludedvariable. If bothof those eectsarepositiveor botharenegative thentheestimatedeect of X onY willbebiasedup{ itwillbetoolarge. Ifoneoftheseeects isnegativeandtheotherpositvethentheestimateeectofX onY willbebiaseddownward.

2. EciencyCanBeGainedByIncludingVariablesThatPredictY ButAreUncorrelatedWithX.

2A straightforward analysis reveals that the estimated V[bX ] = se(X0X)1. So far so 1 0good. But thevariance of theerrorterm is inated. Specically,s2 =

nKu u. Because e0 2u = ZZ +, E[u0u=(nK)] = ZZ0ZZ=(nK) + 2 > . In fact, the estimated

residual variance is too large by the explained or model sums of squared errors for theomittedvariables.

Thishasaninterestingimplicationforexperiments. Randomizationinexperimentsguar-anteesunbiasedness. But,we stillwanttocontrol forother factors toreducenoise. Infact,combining regression models in thedata analysis is a powerful way togain eciency (andreducethenecessarysample sizes)inrandomizedexperiments.

Eveninobservationalstudieswemaywanttokeepinamodelavariablewhoseinclusiondoesnotaecttheestimatedvalueoftheparameter ofavariableof interest iftheincludedvariablestronglyaectsthedependentvariable. Keeping such avariablecapturessomeofthe otherwise unexplained error variance, thereby reducing the estimated variance of theresiduals. Asaresult,thesizeofcondenceintervalswillnarrow.

3. IncludingVariablesThatAreUnrelatedToY andXLosesEciency(Precision),AndThere Is No Bias From Their Exclusion

Thatthereisnobiascanbeseenreadilyfromtheargumentwehavejustmade.

18

7/28/2019 Stats 3 Note 04

20/48

The lossof eciencyoccursbecausewe use up degreesoffreedom. Hence,allvarianceestimateswillbetoo large.

Simplyput,parsimonious modelsarebetter.

COMMENTS:a. Thinkingthroughthepotentialconsequencesofomittedvariablesinthismannerisvery

useful. Ithelpsyouidentifywhatothervariableswillmatter inyouranalysisandwhy,and ithelpsyouidentify additionalinformationtosee if thiscould beaproblem. Itturnsoutthattheideologicalt with the district has somecorrelationwiththevote,butitisnotthatstrong. Thereasonisthereisrelativelylittlevariationinideologicaltthatisnotexplainedbysimpleknowledgeofparty. Sothisadditional informationallows us to conjecture (reasonably safely) that, although ideological t could be aproblem,itlikelydoesnotexplainthesubstantialbiasinthecoecientsonspending.

b.Thegeneralchallengeinstatisticalmodelingandinferenceisdecidinghowtobalancepos-siblebiases againstpossible ineciencies in choosinga particularspecication. Nat-urally, we usually wish to err on the side of ineciency. But, these are choices wemakeonthemargin. Aswewillsee statisticaltestsmeasurewhetherthepossibleim-provementinbiasfromonemodeloutweighsthelossofeciencycomparedtoanothermodel. Thisshouldnot distractfromthemainobjectiveofyourresearch, which istond phenomenaandrelationshipsof largemagnitudeand ofsubstantive importance.Concern aboutomitted Variable Bias should be a seed of doubt thatdrives you tomaketheestimatesthatyoumakeasgoodaspossible.

3.3.3. ExamplesModelbuilding inStatisticsisreally aprogressiveactivity. Weusuallybeginwith inter-

estingor important observations. Sometimes those originateina theory ofsocialbehavior,

19

7/28/2019 Stats 3 Note 04

21/48

and sometimes they come from observation of the world. Statisticalanalyses allow us torene thoseobservations. And sometimes they lead to amorerened view of how socialbehavior works. Obvious problems of measurement error or omitted variables exist whentheimplicationsofananalysisareabsurd. Equallyvalid,though,areargumentsthatsuggestaproblem withaplausibleresult. Herewe'llconsider fourexamples.

1. IncumbencyAdvantagesTheobservationofthe incumbencyadvantagestems fromasimpledierenceofmeans.

From1978to 2002,theaverageDemcoraticvoteshareofa typicalU.S.House Democraticincumbent is 68%; the average Democratic vote share a typical U.S. House Republicanincumbent is34

The incumbency advantagemodel isspecied as follows. Thevote fortheDemocraticcandidate indistrict iinelectiontequalsthenormalpartyvote,Ni,plusanationalpartytide, t, plus the eect of incumbency. Incumbency is coded Iit = +1 for DemocraticIncumbents,Iit =1forRepublicanIncumbents,andIit = 0 for Open Seats.

Vit =t+Ni+Iit+itControlling for thenormalvoteandyeartidesreducesthe estimatedincumbencyeect

toabout7to9percentageoints.2. StrategicRetirementandtheIncumbencyAdvantageAnobjectionto modelsoftheincumbencyadvantageisthatincumbentschoosetostep

down only when they are threatened, eitherbychangingtimes, personal problems, or anunusuallygoodchallenger. Thereasonsthatsomeoneretires,then,mightdependonfactorsthattheresearchercannotmeasurebutthatpredictthevote{omittedvariables. ThiswouldcauseI tobecorrelatedtotheregressionerroru. ThefactorsthatarethoughttoaecttheretirementdecisionsandthevotearenegativelycorrelatedwithV andnegativelycorrelatedwithI (makingit lesslikelytorun). Hence,theincumbencyadvantagemaybeinated.

20

7/28/2019 Stats 3 Note 04

22/48

3. Police/PrisonsandCrimeThe theory of crime and punishment begins with simple assumptions about rational

behavior. Peoplewillcomit crimes ifthe likelihoodofbeing caughtandthe severityofthepunishment are lower than the benet to the crime. A very common observation in thesociologyofcrimeisthatareasthathavelargernumbersofpoliceormorepeopleinprisonhavehighercrimerates.

4. CampaignSpendingandVotesResearchersmeasuringthefactorsthatexplainHouseelectionoutcomes includevarious

measuresofelectoralcompetition inexplainingthevote. CampaignExpenditures,andtheadvertisingtheybuy,arethoughttobeoneofthemainforcesaectingelectionoutcomes.

A commonlyused modeltreatsthe incumbent party's share of the votesas a functionofthenormalpartydivisioninthecongressionaldistrict,candidatecharacteristics(suchasexperienceor scandals), and campaign expendituresofthe incumbentand the challenger.Mostoftheresultsfromsuchregressionsmakesense: thecoecientonchallengerspendingand on incumbentparty strength makesense. Butthecoecienton incumbent spendinghasthewrong(negative)sign. Thenaive interpretationisthatthemore incumbentsspendtheworsetheydo.

Onepossibile explanation isthat incumbentswho areunpopularand out of step withtheirdistrictshavetospendmoreinordertoremaininplace. Couldthisexplaintheincorrectsign? Inthisaccountof thebias inthespending coecientsthere isa positivecorrelationbetween

the

omitted

variable,

\incumbent's

t

with

the

district,"

and

the

included

variable,

\incumbentspending."Also,theincumbent'stwiththedistrictlikelyhasanegativedirecteecton the vote. The more out-of-step an incumbent isthe worse he willdoon electionday. Hence, lackingameasureof\twiththe district"mightcause adownwardbias.

3.3.3. General strategies for Correcting for Omitted Variables and Measurement ErrorBiases

21

7/28/2019 Stats 3 Note 04

23/48

1. MoreData,MoreVariables. Identifyrelevantomittedvariablesandthentrytocollectthem. This iswhymany regression analyseswill includea largenumberofvariablesthatdo not seem relevant to the immediate question. Theyare included to hold other thingsconstant,andalsotoimproveeciency.

2. MultipleMeasurement.Twosortsofuseofmultiple measurement arecommon.First, to reduce measurement error researchers often average repeated measures of a

variableor

construct

an

index

to

capture

a\latent"

variable.

Although

not

properly

atopic

forthiscourse, factoranalysisand muti-dimensionalscaling techniquesare very handy forthissort ofdatareduction.

Second, to eliminate bias we may observe the \same observation" many times. Forexample,wecouldobservethecrimerateinaset ofcitiesoveralongperiodoftime. Iftheomitted factor is one that due to factorsthat are constant within Panelmodels. omittedVariables asNuisanceFactors. Usingtheideaof"control inthedesign."

3. InstrumentalVariables.Instrumentalvariables estimates allowresearchers to purge theindependent variableof

interestwithitscorrelationwiththeomittedvariables,whichcausebias. Whatisdicultisndingsuitablevariableswithwhichtoconstructinstruments.Wewilldealwiththismatteratlengthlaterinthecourse.

22

7/28/2019 Stats 3 Note 04

24/48

3.4. Prediction3.4.1. PredictionandInterpretation

We have givenone interpretation to the regression model as anestimate of the partialderivativesofa function, i.e.,theeectsofasetofindependentvariablesholdingconstantthevaluesoftheotherindependentvariables. Inconstructingthisdenitionwebeganwiththedenitionofaneectasthedierence intheconditionalmeanofY acrosstwodistinctvalues of X. And,acausaleectassumesthatweholdallelseconstant.

Another important way to interpretconditional means and regressions is as predictedvalues. Indeed,sometimesthegoalofananalysisisnot toestimate eectsbutto generatepredictions. Forexample,onemightbe askedto formulatea predictionaboutthecomingpresidentialelection. Acommonsortof electionforecastingmodelregressestheincumbentpresident'svote share onthe rateof growth inthe economyplusa measure ofpresidentialpopularityplusameasureofpartyidenticationinthepublic. Basedonelectionssince1948,thatregressionhasthefollowingcoecients:

Vote =xxx+xxxGrowth+xxxPopularity+xxxxParty:We then consider plausible values for Growth, Popularity, and Partisanship to generatepredictionsabouttheVote.

For any set of values of X, say x0, the most likely value or expected value of Y isy0 = E[Yjx = x0]. This value is calculated in a straightforward manner from the esti-matedregression. Letx0 bearowvectorofvaluesoftheindependentvariablesforwhicha

0 0 0prediction ismade. I.e., x0 = (x1;x2;:::xk). ThepredictedvalueiscalculatedasKX

0 0y =x0b=b0+ bjxj :j=1

It is standard practice to set variables equal to their mean value if no specic value is ofinterest in aprediction. One should be somewhat careful in the choice of predicted valuesso thatthevaluedoesnot lietoo faroutof thesetof valueson whichtheregression wasoriginally estimated.

23

7/28/2019 Stats 3 Note 04

25/48

Consider the presidentialelection example. Assumea growthrate of2 percent, aPop-ularity rating of 50 percent, and a Republican Party Identication of 50 Percent (equalsplit between the parties), then Bush ispredicted to receive xxx percentof the two-partypresidentialvote in2004.

The predictedvalue or forecast is itselfsubject to error. To measure the forecast errorweconstructthedeviation of theobserved value fromthe\true"value, which we wish topredict. Thetruevalueisitselfarandomvariable: y0=x0+0. Thepredictionorforecasterroris thedeviationofthepredictedvaluefromthetruevalue:

0 0e =y y0 =x0(b) + 0Thevarinceofthepredictionerroris

0 0 0V[e ] = 2+V[(x0(b)] =2 + (x0E[(b)(b)0]x0 ) = 2 +2(x0(X0X)1x0 ) Wecanusethislastresulttoconstructthe95percentcondenceintervalforthepredicted

value:0y 1:96qV[e0]

As a practical matter this might become somewhat cumbersome to do. A quick wayto generateprediction condenceintervalsiswith an \augmented"regression. Supposewehaveestimatedaregressionusingnobservations,andwewishtoconstructseveraldierentpredicted valuesbased on dierentsets of values for X, say X0. A handytrick is to addthe matrixof values X0 to the bottom of the X matrix. That is add n0 observations toyour dataset for whichthevaluesoftheindepedent variables are the appropriatevaluesofX0. Letthedependentvariableequal0 forallof thesevalues. Finally, addn0 columns toyourdatasetthatequal -1foreachnewobservation. NowregressyonX andthenewsetofdummyvariables.

Theresultingestimateswillreproducetheoriginalregressionandwillhavecoecientes-timatesforeachoftheindependentvariables. Thecoecientsonthedummyvariablesequal

24

7/28/2019 Stats 3 Note 04

26/48

the predicted valuesand the standard errors of these estimates are the correct predictionstandarderrors.

As an example, consider the analysis of the relationship between voting weights andposts inparliamentarygovernments. Let's let regressioncalculatethepredictedvaluesandstandard errors for 6 distinct cases: Voting Weight = .25 and Formateur, Voting Weight= .25 and NotFormateur,Voting Weight= .35 andFormateur, Voting Weight= .35 andNotFormateur, Voting Weight = .45 and Formateur, and Voting Weight = .45 and NotFormateur. First, IrantheregressionofShare of PostsonShareofVoting Weightplus anIndicator variable of The Party that formed the government (Formateur). I then ran theregression with 6additional observations, constructed as described above. Below aretheestimatedcoecientsandstandarderrors(constantsnotreported).

Using Augmented Regression To Calculate Predicted ValuesVariable Coe. (SE) Coe. (SE)

Voting WeightFormateurD1 (.25, 1)D2 (.25, 0)D3 (.35, 1)D4 (.35, 0)D5 (.45, 1)D6 (.45, 0)

.9812(.0403)

.2300(.0094){

{{{{{

.9812(.0403)

.2300(.0094)

.5572(.0952)

.3272(.0952).6554(.0953).4253(.0955).7535(.0955).5235(.0960)

3.4.2. ModelChecking

Predictedvalues

allow

us

to

detect

deviations

from

most

of

the

assumptions

of

the

re-

gression model. Theone assumption we cannot validate is theassumption thatX and are uncorrelated. The residual vector, e, is dened to be orthogonal to the independentvariables, X: e0X = 0. This implies that e0y = e0Xb = 0. This is a restriction, so wecannottesthowwellthedataapproximateoragreewiththisassumption.

The otherassumptions{linearity, homoskedasticity,noautocorrelation,andnormality{ are readily veried, and xed. A useful diagnostic tool is a residual plot. Graph the

25

7/28/2019 Stats 3 Note 04

27/48

residuals from a regression against the predicted values. This plot will immediately showmanyproblems, iftheyexist. Weexpectanellipticalcloudcenteredarounde= 0.

Iftheunderlyingmodelisnon-linear,theresidualplotwillreectthedeviationsofthedata from a straightline t through the data. Data generated from a quadratic concavefunctionwillhavenegativeresidualsforlowvaluesofy,thenpositiveforintermediatevaluesof ^ y.y,thennegative forlarge valuesof ^

If there isheteroskedasticity, the residualplot willshowdeviationsnon-constantdevia-tions arounde=0. A common case ariseswhen the residual plot looks like a funnel. This

situation means thatthe eects aremultiplicative. That is the model is y =X (where takesonly positive values), not y = +X+epsilon. This is readily xed by takinglogarithms,sothatthemodelbecomes: y=log() + log(X) + log().

Afurtherplotformeasuringheteroskedasticityisofe2 againsty,againstaparticularXvariable,or against some other factor,suchasthe \size" of theunit. Inthis plot e2 servesastheestimateofthevariance. ThisisanenlighteningplotwhenthevarianceisafunctionofaparticularX orwhenwearedealingwithaggregateddata,wheretheaggregatesconsistofaveragesofvariablesacrossplacesofdierentpopulations.

Todetectautocorrelationweuseaslightlydierentplot. Supposetheunitsareindexedbytime,sayyears, t. Wemayexaminetheextentofautocorrelationbytakingthecorrelationsbetweenobservationsthataresunits apart:

PTt=1etetsrs = PT 2

t=1etThe correlation between an observation and the previous observation is r1. This is called rst-orderautocorrelation. Anotherformofautocorrelation,especiallyinmonthlyeconomicdata,isseasonalvariation,whichiscapturedwiths= 12.

It is instructiveto plotthe estimatedautocorrelationagainsts, where srunsfrom0toarelativelylargenumber,say1/10thofT.

Whatshouldthisplot look like? Ats= 0, the autocorrelation parameter isjust 2 =2u

12. The values of rs fors > 0 depends on the nature of the autocorrelation structure.26

7/28/2019 Stats 3 Note 04

28/48

Let'stakea closer look.Themostbasicandcommonlyanalyzedautocorrelationstructureinvolvesanautoregres-

sion of order1 (or AR-1). Letut bean error termthat is independent of t. First-order autocorrelation in isoftheform:

=t1+uttThe variance of t followsimmediatelyfromthedenitionofthevariance:

2 =E[(t1

+ut)(

t1+u

t)]=22

epsison +2

u

Solvingfor2epsilon:2u= :2 (12)

Toderive,thecorrelationbetweentandts,wemustderivethecovariancerst. Fors= 1, thecovariancebetweentwoobservationsis

E[tt1] = E[(t1+ut)t1] = 2Usingrepeated substitutions for wend that for an AR-1:t

E[tts] = s2Now,we canstate what weexpect toobserve inthers whentheresidualscontainrst-

orderautocorrelation:C ov(t; ts) s2s = = =s

2q

V(t)V(t1) There

are

two

patterns

that

may

appear

in

the

autocorrelation

plot,

depending

on

the

sign

of. If > 0,theplot should decline exponentiallytoward0. Forexamle,suppose= :5,thenwe expectr0 = 1, r1 = :5,r2 = :25, r3 = :125,r4 =:0625,r5 =:03125. If < 0,theplotwillseesaw, converging on 0. Forexample, suppose =:5, thenwe expect r0 = 1, r1 =:5,r2 =:25,r3 =:125,r4 =:0625,r5=:03125.

Higher order auto-correlation structures{ such as t = 1t1 +2t2 +ut { lead to more complicatedpatterns. We maytest for the appropriateness of a particular structure

27

7/28/2019 Stats 3 Note 04

29/48

bycomparingtheestimatedautocorrelations,rs,withthevaluesimpliedbyastructure. Forexample, wemay test for rst orderautocorrelation bycomparing theobserved r 's withsthoseimpliedbytheAR-1modelwhen ^ =r1.

Asimpleruleofthumbappliesforallautocorrelationstructures. Ifrs < :2foralls, then thereis nosignicantdegreeofautocorrelation.

28

7/28/2019 Stats 3 Note 04

30/48

3.5. Inference

3.5.1. GeneralFramework.

Hypotheses.What is anhypothesis? An hypothesis isastatementaboutthedata derived from an

argument, model, or theory. It usually takes the form ofa claim about the behavior ofparameters ofthedistributionfunctionor abouttheeectofonevariableonanother.

Forexample,asimpleargumentaboutvotingholdsthatintheabsenceofotherfactors,such as incumbency, voters use party to determine their candidate of choice. Therefore,when no incumbent ison the ticket, an additionalone-percent Democratic in an electoraldistrictshouldtranslate into one-percenthigherDemocraticvotefor aparticular oce. Inaregressionmodel,controllingforincumbency,theslopeonthenormalvoteoughttoequalone. Ofcoursetherearea numberof reasonswhythishypothesismight fail tohold. Theargumentitselfmightbeincorrect;theremaybeotherfactorsbesideincumbencythatmustbeincludedinthemodel;thenormalvoteishardtomeasureandwemustuseproxies,whichintroducemeasurementerror.

Inclassicalstatistics,anhypothesistestisaprobabilitystatement. Werejectanhypoth-esis ifthe probabilityofobserving thedatagiven that the hypothesis is true issucientlysmall,saybelow.05. LetTbethevectorofestimatedparametersandbethetrueparam-etersofthedistribution. Let0 bethevaluesofthedistributionpositedbythehypothesis.Finally,let bethevarianceofT. Ifthehypothesisistrue,then=0. Ifthehypothesisistrue,thenthedeviationofTfrom0 shouldlooklikearandomdrawfromtheunderlyingdistribution,andthusbeunlikelytohaveoccuredbychance.

Whatwehavejustdescribedisthesizeofatest. Wealsocareaboutthepowerofatest.If0 isnot true,what is theprobabilityof observinga suciently smalldeviationthat wedonotrejectthehypothesis? ThisdependsonsamplesizeandvarianceofX.

29

7/28/2019 Stats 3 Note 04

31/48

TestsofaSingleParameter.Wemayextendtheframeworkforstatisticaltestsaboutmeansanddierencesofmeans

tothecaseoftestsabout asingleregressioncoecient. Recallthat theclassicalhypothesistestforasinglemeanwas:

sP r(jx 0j> t=2;n1p

n)< Becausex isthesumofrandomvariables,itsdistributionisapproximatelynormal. However,becausewemustestimatethestandarddeviation ofX, s, the t-distributionisused asthereferencedistributionforthehypothesistest. Thetestcriterioncanberewrittenas follows.

x0jWerejectthehypothesis if j > t=2;n1.s=pnThereisanimportantdualitybetweenthetestcriterionaboveandthecondenceinterval.

Thetest criterionforsize.05maybe rewrittenas:

x t:025;n1pn

< 0 < P r( x +t:025;n1pn)> :95

So, wecantestthehypothesisbyascertaining whetherthehypothesized value falls insidethe95percentcondence interval.

Now considerthe regression coecient, . Supposeour hypothesis is H : = 0. A common value is 0 = 0{ i.e., no eect of X on Y. Like the sample average, a singleregression parameter follows the normal distribution, because the regression parameter is

P 2thesumofrandomvariables. Themeanofthisdistribution is andthevariancex)2,(xi

inthecaseofabivariateregression,or,moregenerally,2aj j, where aj j isthejthdiagonalelementofthematrix(X0X)1.

The test criterion for thehypothesis statesthe following. Ifthenullhypothesis is true,then we expect that the probability of a large standardized deviation b from 0 will beunlikelytohave occurredbychance:

P r(jb 0j> t=2;nKspajj)< As with the sample mean, this test criterion can be expressed as follows. We reject the

jb0jhypothesizedvalue if spajj > t=2;nK.30

7/28/2019 Stats 3 Note 04

32/48

TheWaldCriterion.Thegeneral frameworkfortestinginthemultivariatecontextisaWaldTest. Construct

avectorofestimatedparametersanda vector ofhypothesizedparameters. Generically,wewill write these as Tand0,because theymaybefunctions of regression coecients,not

justthecoecientsthemselves. Denethevectorofdeviationsof theobserved data fromthehypothesizeddataasd=bfT0. This isavectoroflengthJ.

Whatisthedistributionofd? Underthenullhypothesis,E[d] = 0andV[d] = V[T] = .

The

Wald

statistic

is:

W =d01d:

Assuming d isnormally distributed, the Wald Statistic follows the 2 distributionwith Jdegreesoffreedom.

Usually, wedo not know ,andmust estimate it. Substituting theestimatedvalueof and making an appropriate adjustment for the degrees of freedom, we arrive at a newstatisticthatfollowstheF distribution. ThedegreesoffreedomareJ andnK, which are thenumberofrestrictionsin dand the number of free piecesofinformationavailableafterestimationofthemodel.

In the regression framework,we may implementatest of aset of linear restrictions ontheparametersasfollows. Supposethehypothesestaketheformthat linearcombinationsofvariousparametersmustequalspecicconstants. Forexample,

1 = 0 2 = 3

and4+ 3 = 1

. We can specify the hypothesis as R=q

31

7/28/2019 Stats 3 Note 04

33/48

Where R is a matrix of numbers that correspond to the parameters in the set of linearequations found in thehypothesis; is the vector of coecients in the fullmodel(lengthK),andqisanappropriatesetofnumbers. Intheexample,supposethereare5coecintsinthefullmodel: 0

1

10 1 0 1B C 01 0 0 0 0 2B CB CB CB CR=@0 1 0 0 0 AB 3C@3AB C@ A 10 0 1 1 0 4

5Thedeviationoftheestimatedvaluefromthetruevalueassumingthehypothesisistrue

isd=RbR=Rbq

If the hypothesis is correct: E[d] = E[Rbq] = 0.0 0V[d] = E[dd ] = E[(RbR)(RbR)0] = E[R(b)(b)0R ]

BecauseRisamatrixofconstants:0 2 0V[d] = RE[(b)(b)0]R =RX0X)1R

Becauseweusuallyhavetoestimatethe varianceoftheWaldstatisticis(Rbq)0[R(X0X)1R0](Rbq)=J

F =e0e=(nK)

Another way to write the F-statistic for the Wald test is as the percentage loss of t.Suppose we estimate two models. In one model, the Restricted Model, we impose thehypothesis,andontheothermodel,theUnrestrictedModel,weimposenorestrictions. TheF-testfor the Waldcriterioncanbe writtenas

0 0eReR eueu=JF = 0eueu=(nK)Denote the residualsfromthesemodels as eR and eU . Wecanwritetheresidualsfromtherestrictedmodel in terms of the residuals from the unrestricted model: eR = yXbR =

32

7/28/2019 Stats 3 Note 04

34/48

yXbU X(bR bU ) = eU X(bR bU ). The sum of squares residuals from the re-0strictedmodel is eReR = (eU X(bR bU ))0(eU X(bR bU))

0 0= eUeU (bR bU )0X0X(bR bU) = eUeU (RbU q)0(R(X0X)1R0)(RbU q) So, thedierenceinthesumofsquaresbetweentherestrictedandtheunrestrictedmodelequalsthenumeratoroftheWaldtest.

There are two other important testing criteria{ likelihoodratiostatistics and lagrangemultiplier tests. These three are asymptotically the same. The Wald criterion has bettersmallsampleproperties,anditiseasytoimplementfortheregressionmodel. Wewilldiscusslikelihoodratiostatistics as part ofmaximumlikelihoodestimation.

33

7/28/2019 Stats 3 Note 04

35/48

3.5.2. ApplicationsThree important applications of the Wald Test are (1) tests of specic hypotheses, (2)

vericationof choiceof aspecication, and (3)testsof biasreduction in causalestimates,specicallyOLS versusIV.Allthreeamounttocomparingthebiasduetothemoreparsi-monioushypothesizedmodelagainsttheeciencygainwiththatmodel.

i. HypothesisTests: SeatsandVotes inEngland,1922-2002.The Cube Lawprovides an excellentexample of the sort of calculationmade to test a

concrete hypothesis. The Cube Law states that the proportionof seats won varies asthecubeoftheproportionofvoteswon:

3S V= :

(1S) 1VWemayimplementthisinaregressionmodelasfollows. Thisisamultiplicativemodel

whichwewilllinearizeusinglogarithms. Therearereallytwofreeparameters{theconstantterm

and

the

exponent

.

S Vlog( ) = + log

(1S) 1VForEnglishParliamentaryelections from1922to2002,thegraphofthelogoftheodds

oftheConservativeswinningaseat(labelledLCS)versusthelogoftheoddsoftheConser-vativeswinningavote(labelled LCV) isshownin thegraph. Therelationshiplooksfairlylinear.

ResultsoftheleastsquaresregressionofLCSonLCVareshowninthetable. Theslopeis2.75andtheinterceptis-.07. Testsofeachcoecientseparatelyshowthattherewewouldsafelyaccept the hypothesisthat 0 = 0 and 1 =3,separately. Specically,the t-test ofwhethertheslopeequals3is2:753 = 1:25,whichhasap-valueof.25. Thet-testofwhether

:20theconstantequals0 is :070 =1:4,whichhasap-valueof.18.:05

34

7/28/2019 Stats 3 Note 04

36/48

Regression Example:Cube

Law

in

England,

1922-2002

Y =Log of the Odds Ratio of Conservative SeatsX = Log of the Odds Ratio of Conservative Votes

Variable Coe. (SE) t-testLCV 2.75(.20) .25/.2=1.25

(p=.24)Constant -0.07(.05) -.07/.05=-1.4

(p=.18)N 21R2 .91

MSE .221F-testfor 2.54H :0= 0; (p=.10)

1 = 3 However, this is the wrong approach to testing hypotheses about multiple coecients.

Thehypothesishasimplicationsforbothcoecientsatthesame time. Anappropriate testmeasures how much the vector b0 = (2:75;:07) deviates from the hypothesized vector0 = (3;0).

After estimating the regression, we can extract the variance-covariance matrix for b,usingthecommand matrixliste(V). If the nullhypothesis istrue,then E[b] = 0. The Waldcriterion is d0V[d]1d. Because the matrix R istheidentitymatrixinthis example,i.e., each of the hypotheses pertains to a dierent coecient, V(d) equals V(b). So, theF-statistic fortheWaldcriterionis:

1 27:6405 38:2184

F = (:246;:070)38:2184 427:3381

:246

2 :070=:5((

:246)227:6405+2(

:070)(

:246)38:2184+(

:070)2427:3381)=2:54

TheprobabilityofobservingadeviationthislargeforarandomvariabledistributedFwith2and19degreesoffreedomis.10(i.e.,p-value=.10). ThissuggeststhatthedatadodeviatesomewhatfromtheCubeLaw.

WecanimplementsuchtestsusingthetestcommandinSTATA.Followingaregression,youcanusetest variable names = qtotestasinglerestriction|f(b1; b2;:::) = q1, such astestLCV=3. Totestmultiplerestrictions,youmustaccumulatesuccessivetestsusing

35

7/28/2019 Stats 3 Note 04

37/48

theoptionaccum. Inourexample,rsttype test LCV = 3, then type test cons = 0,accum. ThisreturnsF = 2:54; p value=:10.

ii. ChoiceofSpecication: EnergySurvey.Byaspecication,wemeanaspecicsetofindependentvariablesincludedinaregression

model. Becausethereisatradeobetweeneciencywhenirrelevantvariablesareincludedandbiaswhenrelevantvariablesareexcluded,researchersusuallywanttondthesmallestpossiblemodel. Indoing so,weoftenjettison variablesthat appearinsignicant accordingto the rule ofthumb that theyhave lowvalues of the t-statistic. After runningnumerousregressionswemayarriveatwhatwethink isagoodttingmodel.

The example of the Cube Law should revealan important lesson. The t-statistic on asingle variable is only a good guide about that variable. If you've made decisions aboutmanyvariables,youmighthavemadeamistake inchoosingtheappropriateset.

The F-testfor the Waldcriterionallows us totest whetheranentire set of variables infactoughtto beomittedfromamodel. The decisiontoomitasetofvariablesamountstoaspecichypothesis{ thateach ofthevariablescanbeassumed tohavecoecientsequalto0.

Consider the datafrom the GlobalWarmingSurveydiscussedearlier inthecourse. Seehandout. There are 5 variables in the full regression model that have relatively low t-statistics. Toimprovetheestimates oftheothercoecients,weoughttodropthesevalues.Also,wemightwanttotestwhethersomeofthesefactorsshouldbedroppedasasubstantivematter.

The F-test is implementedbycalculating theaverage loss of t in the sum of squared

residuals. The information in the ANOVA can be used to compute: (1276:49121270:5196)=5

F = =:947545141:260436

Onemayalsousethecommandstest and test, accum.

36

7/28/2019 Stats 3 Note 04

38/48

Specicationtestssuchasthisoneareverycommon.

iii. OLSversusInstrumentalVariables:UsingTermLimitsToCorrectForStrategicRetire-ment.

Wemayalsousethe Waldcriteriontotest acrossestimators. Onehypothesisofpartic-ular importance is thatthere arenoomittedvariables in theregression model. We cannotgenerallydetectthisproblemusing regression diagnostics, such asgraphs, butwemaybeabletoconstruct atest.

Letd=bIV bOLS. TheWaldcriterion isd0[V[d]1d.V[bIV bOLS] = V[bIV] + V[bOLS]Cov[bIV; bOLS]Cov [bOLS; bIV]

HausmanshowsthatCov [bOLS; bIV] = V[bIV], soV[bIV bOLS] = V[bIV]V[bOLS]:

^LetXbethesetofpredictedvalues from regressingXonZ.TheF-testforOLSversus

IVis^ X)1(X0X)1]1dd0[(X0^

H =2s

ThisisdistributedFwithJandn-Kdegreesoffreedom,whereJisthenumberofvariableswehavetriedtoxwithIV.

37

7/28/2019 Stats 3 Note 04

39/48

4. Non-Linear Models

4.1. Non-LinearityWehave discussedoneimportantviolationoftheregressionassumption{omittedvari-

ables. And, we have touched on problemsof ineciency introduced byheteroskedasticityand autocorrelation. Thisandthe followingsubsectionsdealwithviolationsoftheregres-sion

assumptions

(other

than

the

omitted

variables

problem).

The

current

section

examines

correctionsfornon-linearity;thenextsectionconcernsdiscretedependentvariables. Follow-ing that we will deal briey with weighting, heteroskedasticity, andautocorrelation. Timepermittingwe willdoabitonSensitivity.

Weencountertwo sortsof non-linearity in regression analysis. In someproblems non-linearityoccursamongtheXvariablesbutitcanbehandledusingalinearformofnon-linearfunctionsoftheX variables. In other problemsnon-linearity is inherent in the model: wecannot

\linearize"

the

relationship

between

Y

and

X.

The

rst

sort

of

problem

is

sometimes

called\intrinsicallylinear"andthesecondsortis \intrinsically non-linear."

Consider, rst, situations where we can convert a non-linear model into a linear form.IntheSeats-Votesexample,thebasicmodelinvolvedmultiplicativeandexponentialparam-eters. We convertedthis into a linear form by taking logarithms. There are awide rangeofnon-linearitiesthat canarise; indeed,there are an innite number oftransformationsofvariablesthatonemightuse. TypicallywedonotknowtheappropriatefunctionandbeginwithalinearrelationshipbetweenY andX astheapproximationofthecorrectrelationship.Wethenstipulatepossiblenon-linearrelationshipsandimplementtransformations.

Commonexamplesofnon-linearities include:Multiplicativemodels,wheretheindependentvariablesentertheequationmultiplicatively

ratherthan linearly(suchmodelsarelinearin logarithms);Polynomialregressions,wheretheindependentvariablesarepolynomialsofasetofvariables

38

7/28/2019 Stats 3 Note 04

40/48

(common in production function analysis in economics); and Interactions, wheresome independent variables magnify theeects of other independent

variables (common inpsychological research).Ineachofthesecases,wecanusetransformationsoftheindependentvariablestoconstructalinearspecicationwithwithwecanestimate alloftheparameters of interest.

Qualitative Independent variables and interaction eects are the simplest sort of non-linearity. Theyaresimpletoimplement,but sometimeshardtointerpret. Let'sconsiderasimplecase. AnsolabehereandIyengar(1996)conductedaseriesoflaboratoryexperimentsinvolving approximately 3,000 subjects over 3 years. The experiments manipulated thecontentofpoliticaladvertisements,nestedthemanipulatedadvertisementsinthecommercialbreaksofvideotapesoflocalnewscasts,randomlyassignedsubjectstowatchspecictapes,andthenmeasuredthepoliticalopinionsandinformationofthesubjects. TheseexperimentsarereportedinthebookGoingNegative.

Onpage190,theyreportthefollowingtable.

Eects ofParty and Advertising ExponsureonVotePreferences: GeneralElectionExperiments

Independent (1) (2)Variable Coe. (SE) Coe. (SE)

Constant .100(.029) .102(.029)Advertising

Eects

Sponsor .077(.012) .023(.034)

Sponsor*Same Party { .119(.054)Sponsor*OppositeParty { .028(.055)

Control VariablesParty ID .182(.022) .152(.031)

Past Vote .339(.022) .341(.022)PastTurnout .115(.030) .113(.029)

Gender .115(.029) .114(.030)

39

7/28/2019 Stats 3 Note 04

41/48

The dependentvariableis+1ifthe personstatedan intentiontovoteDemocraticafterviewingthetape,-1ifthepersonstated an intentiontovoteRepublican,and0otherwise.Thevariable Sponsoristhepartyofthecandidatewhoseadwasshown; itequals+1iftheadwasaDemocraticad,-1iftheadwasaRepublicanad,and0ifnopoliticaladwasshown(control). Party ID is similarly coded usinga trichotomy. Same Partywas codedas+1 ifapersonwasaDemocratandsawaDemocraticadoraRepublicanandsawaRepublicanad. OppositePartywascodedas+1if apersonwasaDemocratandsawaRepublican adoraRepublicanandsawaDemocratic ad.

Intherstcolumn,the\persuasion"eectisestimatedasthecoecientonthevariableSponsor. Theestimateis.077. Theinterpretationisthatexposuretoanadfromacandidateincreasessupportforthatcandidateby7.7percentagepoints.

ThesecondcolumnestimatesinteractionsoftheSponsorvariablewiththePartyIDvari-able. Whatistheinterpretationofthe setofthreevariablesSponsor,Sponsor*SameParty, and Sponsor*OppositeParty. Thevariables Same Party and Opposite Party encompassall party identiers. When these variables equal 0, the viewer is a non-partisan. So, thecoecientonSponsorinthesecondcolumnmeasurestheeectofseeinganadamonginde-pendentviewers. It increasessupport forthesponsorbyonly2.3percentagepoints. WhenSame Party equals 1, the coecient on Sponsor is 11.9. This is notthe eect of the adamongpeopleofthesameparty. ItisthedierencebetweentheIndependentsandthoseofthesame party. Tocalculate the eectofthe ad on people ofthesame party we must add.119to.023,yieldinganeectof.142,ora14.2percentagepointgain.

Interactionssuchastheseallowustoestimatedierentslopesfordierentgroups,changesintrends,andotherdiscretechangesinfunctionalforms.

Anotherclassofnon-linearspecicationstakesthe formofPolynomials. Manytheoriesof behaviorbegin with aconjecture of an inherently non-linear function. For instance, arm'sproductionfunctionisthoughttoexhibitdecreasingmarginalreturnson investments,capitalorlabor. Also,riskaversionimpliesa concaveutilityfunction.

40

7/28/2019 Stats 3 Note 04

42/48

Barefootempiricismsometimesleadsustonon-linearity,too. Examinationofdata,eithera scatter plotora residual plot, may reveala non-linearrelationship, between Y and X.Whilewedonotknowwhattheright functionalform is,wecancapturetherelationshipinthe data by includingadditionalvariablesthatarepowersof X, such as X1=2,X2,X3, as wellasX. Inotherwords,tocapturethenon-linearrelationship,y=g(x),weapproximatetheunknownfunction,g(x),usingapolynomialofthevaluesofx.

Onenoteofcaution. Includingpowersofindependentvariablesoftenleadstocollinearityamong the righthand sidevariables. One trick forbreaking such collinearity is to deviatetheindependentvariablesfromtheirmeansbeforetransformingthem.Example: CandidateConvergenceinCongressionalElections.

An important debate in the study of congressional electionsconcerns how well candi-dates represent their districts. Two conjectures concern the extent to which candidatesreecttheir districts. First,arecandidatesresponsiveto districts? Arecandidates inmoreliberaldistricts moreliberal? Second,does competitiveness leadtocloser representationofdistrictpreferences? Inhighlycompetitivedistricts,arethecandidatesmore\converged"?Huntingtonpositsgreaterdivergenceofcandidatesincompetitiveareas{ asortofclashofideas.

Ansolabehere,Snyder,and Stewart(2001)analyzedata froma surveyof congressionalcandidatesontheir positionsona range of issuesduring the1996election. There are 292districtsinwhichbothofthecompetingcandidateslledoutthesurvey. Theauthorscon-structedameasureofcandidateideologyfromthesesurveys,andthenexaminethemidpointbetweenthetwocandidates(thecutpointatwhichavoterwouldbeindierentbetweenthecandidates)andtheideologicalgapbetweenthecandidates(thedegreeofconvergence). Tomeasure electoral competitiveness theauthors used theaverage share of votewon by theDemocraticcandidate inthepriortwopresidentialelections(1988and1992).

The Figures show the relationship between Democratic Presidential Vote share and,respectively, candidate midpoints and the ideological gap between competing candidates.

41

7/28/2019 Stats 3 Note 04

43/48

There is clearevidence of a non-linear relationship explaining the gap betweenthe candi-dates

ThetablepresentsaseriesofregressionsinwhichtheMidpointandtheGaparepredictedusing quadratic functions of the Democratic Presidential Vote plus an indicator of OpenSeats,which tendto be highly competitive. We consider, separately, aspecicationusingthe value of the independent variable and its square and a specication using the valueof the independent variable deviated from its mean and its square. The last column usestheabsolutevalueofthedeviationofthepresidentialvote from .5 asanalternative tothequadratic.

Eects of District Partisanship on Candidate PositioningN = 292

DependentVariableMidpoint Gap

Independent (1) (2) (3) (4) (5)Variable Coe.(SE) Coe.(SE) Coe. (SE) Coe. (SE) Coe. (SE)DemocraticPresidential -.158 (.369) | -2.657(.555) | |

VoteDemocraticPresidential -.219 (.336) | +2.565(.506) | |

VoteSquaredDem. Pres. Vote | -.377(.061) | -.092(.092) |

MeanDeviatedDem. Pres. Vote | -.219(.336) | +2.565(.506) |

MeanDev. SquaredAbsoluteValueofDem. Pres. | | | | .764(.128)

Vote, Mean Dev.OpenSeat .007(.027) .007(.027) -.024(.040) -.024(.040) -.017(.039)Constant .649(.098) .515(.008) 1.127(.147) .440(.012) .406(.015)R2 .156 .156 .089 .089 .110pMSE .109 .109 .164 .164 .162

F (p-value) 18.0(.00) 18.0(.00) 9.59(.00) 9.59(.00) 12.46(.00)

42

7/28/2019 Stats 3 Note 04

44/48

M

idpointBetweenCandidateId

eologicalPositions

0

.2

.4

.6

.8

1

.2 .3 .4 .5 .6 .7 .8Average Dem. Presidential Vote Share

43

7/28/2019 Stats 3 Note 04

45/48

IdeologicalGapBetween

Candidates

0

.2

.4

.6

.8

1

.2 .3 .4 .5 .6 .7 .8Average Dem. Presidential Vote Share

44

7/28/2019 Stats 3 Note 04

46/48

Consider, rst, the regressionsexplaining the midpoint (columns1 and 2). We expectthatthemoreliberalthedistrictsarethemoretotheleftwardthecandidateswilltend. Theideologymeasureisorientedsuchthathighervaluesaremoreconservativepositions. Figure1shows astrongrelationshipconsistentwiththeargument.

Column(1)presents theestimates when wenaively include PresidentialvoteandPres-idential vote squared. This is a good example of what collinearity looks like. The F-testshowsthattheregressionis\signicant"{i.e.,notallofthecoecientsare0. But,neitherthe coecient on Presidential Vote or Presidential Vote Squared are signcant. Tell-talecollinearity. AtrickforbreakingthecollinearityinthiscaseisbydeviatingXfromitsmean.Doingso, wenda signicanteectonthe linear coecient,but the quadraticcoecientdoesn'tchange. Thereisonlyreallyone freecoecienthere,anditlookstobelinear.

The coecients in a polynomial regression measure the partial derivatives of the un-knownfunctionevaluated atthe mean ofX. Taylor'sapproximation leadsusimmediatelyto this interpretationof the coecients for polynomialmodels. Recall from Calculus thatTaylor'sTheoremallowsustoexpressany functionasasumofderivativesofthatfunctionevaluatedata specicpoint. Wemaychooseanydegreeofapproximationtothe functionby selecting a specicdegree of approximation. A rst-orderapproximation usesthe rstderivatives;asecondorder approximationusesthesecondderivatives;andsoon. Asecondorderapproximationofanunknownfuction,then,maybeexpressedas:

1yi+0xi + x

2where " #

@f(x)g0 =@x

0H ;0 ixi

x=x0" #@2f(x)

H0 = 00 +0x0

@x@x x=x01

x2=f(x0)g 0 H0 0x0

=g0H0x0:

45

7/28/2019 Stats 3 Note 04

47/48

Thecoecientsonthesquaresandcross-productterms,then,capturetheapproximatesecondderivative. Thecoecientsonthelinearvaluesofxequalthegradient,adjustingforthequadraturearoundthepointatwhichthedataareevaluated(X0). Ifwedeviateallofthevariablesfromtheir meansrst,thenthecoecienton X inthepolynomialregressioncanbeinterpretedstraightforwardlyasthegradientofthe(unknown)functionatthemean.Example: CandidateConvergenceinCongressionalElections,continued.

Consider theestimates incolumn (3). We mayanalyzethese coecientstoaskseveral@y

basicquestions.

What

is

the

marginal

rate

of

change?

@D P =1+ 22DP. SinceDPranges

from.2to.8,therateofchangeintheGapforachangeinelectoralcompetitionrangesfrom-1.62 when DP = .2to +1.42 when DP = .8. Atwhat point doesthe rate of changeequal0(whatistheminimum)? Settingthepartialequalto0revealsthatDP0 =1 =:517,so22the candidatesare most converged near .5. What isthe predictedvalue ofthe gapat thispoint? 1:1272:657:517+2:565(:517)2 =:438. Atthe extremes,the predicted valuesare.64(whenDP=.8)and.70(whenDP=.2).

Ifwemustchooseamongseveraltransformations,suchaslogarithms,inverses,andpoly-nomials,wetypicallycannottestwhich is mostappropriate using the Waldtest. Davidsonand MacKinnon(1981)proposeusingthepredictedvaluesfromoneofthenon-linearanal-ysisto construct a test. Suppose we havetwoalternativemodels: y = X and y = Z.Consideracompoundmodel

y= (1)bfX+Z+. Then,includeZ^TheJ-testisimplemented intwosteps. First,estimatey=ZtogetZ^

^intheregressionofy=X. ThecoecientonZyieldsanestimateof. Asymptotically,^0thet-ratio testswhichmodel isappropriate.

SE(

Example: CandidateConvergenceinCongressionalElections,continued.Analternativetothequadraticspecicationincolumn(4)isanabsolutevaluespecica-

tion incolumn(5). NoticethattheR2 isslightly better withthe absolute value. However,46

7/28/2019 Stats 3 Note 04

48/48

owingto collinearity, it ishardtotestwhich model ismore appropriate. I rstregress onGapon DPandDP2 and Openandgeneratedpredictedvalues,calltheseGapHat. I then regressGaponZGand jDP:5jand Open. The coecienton jDP:5j is1.12 with astandard error of .40, so the t-ratio is above thecritical value for statistical signicance.The coecient on ZGequals -.55 with a standard error of .58. The estimated is notsignicantly dierent from0; hence,we favortheabsolutevaluespecication.

Stats 3 Note 04

Documents

Transcript of Stats 3 Note 04