Stats 3 Note 04

download Stats 3 Note 04

of 48

Transcript of Stats 3 Note 04

  • 7/28/2019 Stats 3 Note 04

    1/48

    17.874LectureNotesPart3: Regression Model

  • 7/28/2019 Stats 3 Note 04

    2/48

    3. RegressionModelRegressioniscentraltothesocialsciencesbecausesocialscientistsusuallywanttoknow

    the eectof one variable orsetofvariablesofinterestonan outcomevariable,holding allelse equal. Herearesomeclassic, important problems:

    What istheeectofknowledge ofasubject onpreferencesaboutthat subject? Dounin-formedpeoplehave anypreferences? Are thosepreferences short-sighted? Dopeopledevelop\enlightened"interests,seeinghowtheybenet fromcollectivebenet?

    What is the eect of increasing enforcement and punishment on deviant behavior? Forexample, how much doesan increase in thenumber of policeon the streets on thecrime rate? Doesthedeathpenaltydetercrime?

    Howaremarketsstructure? Howdoesthesupplyofa goodchangewiththe pricepeoplearewilling topay? Howdoespeople'swillingnesstopay fora goodchange asagoodbecomes

    scarce?

    To whom are electedrepresentatives responsible? Do partiesandcandidatesconverge to

    the preferred policy of the median voter, as predicted by many analytical models ofelectoralcompetition? Ordopartiesandinterestgroupsstronglyinuencelegislators'behavior?

    Ineachof these problemswe areinterested in measuring howchanges in an input or inde-pendent

    variable

    produce

    changes

    in

    an

    output

    or

    dependent

    variable.

    Forexample,mycolleagues interested inthestudyofenergywishtoknow ifincreasing

    publicunderstandingof the issue would changeattitudes about global warming. In par-ticular, would peoplebecomemore likelyto supportremediation. Ihavedoneacoupleofsurveysforthisinitiative. Partofthedesignofthesesurveysinvolvescapturingtheattitudetoward globalwarming, the dependent variable. One simple variable is the willingness topay.

    1

  • 7/28/2019 Stats 3 Note 04

    3/48

    If it solved globalwarming, would you bewilling to pay$5 more a month onyourelectricitybill? [ofthosewhoansweredyes]$10moreamonth? $25moreamonth? $50moreamonth? $100moreamonth?

    Wealsowanted todistinguishtheattitudesofthosewhoknowalotaboutglobalwarmingandcarbonemissionsandthosewhodonot. Weaskedabatteryofquestions. Peoplecouldhave gotten up to 10 factual questions right. The averagewas 5. Finally, we controlledmanyother factors,such as income,religiosity,attitude towardbusiness regulation,sizeofelectricitybillandsoforth.

    Theresultsoftheanalysisareprovidedonthehandout. Thedependentvariableiscoded0forthoseunwillingtopay$5,1forthosewillingtopaynomorethan$5,2forthosewillingto payno more than$10, 3 forthose willing to pay more than $25,4 forthose willing topaynomorethan$50,and5forthosewhosaidtheywouldbewillingtopay$100(roughlydoublethetypicalelectricitybill).

    Whatisestimatedbythecoecientoninformation(gotit)istheeectofknowingmoreabout

    carbon

    dioxide

    on

    willingness

    to

    pay

    for

    asolution

    to

    global

    warming.

    Moving

    from

    no information to complete information(from 0 to 10) increases willingness to payby .65unitsalongthewillingnesstopayscale,whichrangesfrom0to5andhasameanof1.7andstandarddeviationof1.3

    0 1Moregenerally,weseektomeasuretheeectonY ofincreasingX1 from x1 tox1. The expectedchange intherandomvariableY is:

    1 0E[yjx1; x2;:::xk]E[y jx1; x2;:::xk];wherethevaluesofallindependentvariablesotherthanx1 areheldconstant.

    Unfortunately,theworld isamess. Weusuallydealwithobservationaldataratherthancarefully constructed experiments. It is exceedingly dicult to hold all else constant inorder toisolatethepartialeect ofavariable of interestontheoutcome variable. Insteadoftryingtoisolateandcontroltheindependentvariableof interest inthedesignofastudy;weattempttomeasureitsindependenteectintheanalysisofthe datacollected.

    2

  • 7/28/2019 Stats 3 Note 04

    4/48

    Multivariateregression istheprimarydataanalytictoolwithwhichweattempt toholdotherthingsconstant.

    3.1. BasicModelIn general, we can treat y and X asrandom variables. At times we willswitch into a

    simpler problem{ X xed. What ismeant bythis distinction? Fixed valuesof X wouldariseiftheresearcherchosethespecicvaluesofX andassignedthemtotheunits. RandomorstochasticX ariseswhenthe researcherdoesnotcontroltheassignment{ thisistrueinthelabaswellastheeld.

    FixedX will be easier to deal with analytically. If X is xedthen it isguaranteed tobehavenocorrelationwithbecausethevectorsofthematrixofX consistofxedvaluesof thevariables, notrandomdraws fromtheset ofpossiblevalues of thevariables. Moregenerally,X israndom,andwemustanalyzetheconditionaldistributionsofjX carefully.

    Thebasicregressionmodelconsistsofseveralsetsofassumptions.1. Linearity

    y=X+2. 0meanoftheerrors.

    E[jX] = 03. NocorrelationbetweenX and:

    E[XjX] = 04. SphericalErrors(constanterrorvarianceandnoautocorrelation):

    E[0jX] = 2I5. Normallydistributederrors:

    jXN(0; 2I)3

  • 7/28/2019 Stats 3 Note 04

    5/48

    Indeed,thesepropertiesareimpliedby(sucientbutnotnecessary),theassumptionthaty;Xarejointlynormallydistributed. Seethepropertiesoftheconditionaldistributions insection2.6.

    It iseasytosee that thismodel implementsthe denitionofaneect above. Indeed, ifalloftheassumptionsholdwemightevensaytheeectcapturedbytheregressionmodel iscausal.

    Twofailuresofcausalitymightemerge.

    First,there

    may

    be

    omitted

    variables.

    Any

    variable

    Xj

    that

    is

    not

    measured

    and

    included

    in a model is captured in the error term . An included variable might appear tocause(ornotcause)y,butwehave in factmissed thetruerelationshipbecausewedidnothold Xj constant. Ofcourse,there area very large number of potentialomittedvariables,andthestruggleinanyeldofinquiryistospeculate whatthosemightbeandcomeupwithexplicitmeasurestocapturethoseeectsordesignstoremovethem.

    Second, theremaybesimultaneouscausation. X1 mightcausey, but y mightalsocauseX1. Thisisacommonproblem ineconomicswhere pricesat which goodsareboughtand the quantities purchasedaredeterminedthroughbargainingor throughmarkets.Pricesandquantitiesaresimultaneously determined. Thisshows up, incomplicatedways,intheerrorterm. Specically,theerrortermbecomesrecursivejX1 necessarilydependsonujy,theerrortermfromtheregressionofX ony.

    Socialscientistshavefoundwaystosolvetheseproblems,involving\quasi-experiments"and

    sometimes

    real

    experiments.

    Within

    areas

    of

    research

    there

    is

    alot

    of

    back

    and

    forth

    about what specic designs and tools really solve the problems of omitted variables andsimultaneity. The lastthirdofthecoursewillbe devotedtothese ideas.

    Nowwefocusonthetoolsforholdingconstantotherfactorsdirectly.

    4

  • 7/28/2019 Stats 3 Note 04

    6/48

    3.2. Estimation

    The regression modelhasK +1 parameters{ the regression coecients and the errorvariance. But,thereare n equations. Theparametersareoverdetermined,assumningn > K +1. We mustsome how reducethe data at hand to devise estimatesof theunknownparameters intermsofdataweknow.

    Overdeterminationmightseemlikeanuisance,butifn weresmallerthank + 1 we could not hope toestimatethe parameters. Here lies a moregeneral lesson about social inquiry.Bewaryofgeneralizationsfromasingleorevenasmallsetofobservations.

    3.2.1. TheUsual SuspectsThere are, as mentioned before, three general ideas about how use data to estimate

    parameters.1. TheMethodofMoments.(a)Expressthetheoreticalmomentsoftherandomvariables

    asfunctionsoftheunknownparameters. Thetheoreticalmomentsarethemeans,variances,covariances,andhigher powersoftherandom variables. (b)Usetheempiricalmomentsasestimatorsofthe theoreticalmoments{ thesewillbe unbiasedestimatesof the theoreticalmoments. (c)Solvethesystemofequationsforthevaluesoftheparameters intermsoftheempiricalmoments.

    22. Minimum Variance/Minimum Mean Squared Error (Minimum ). (a) Expresstheobjective functionofthesumsquared errorsasafunctionofthe unknown parameters. (b)Findvalues oftheparametersthat minimize thatfunction.

    3. MaximumLikelihood. (a)Assumethattherandomvariablesfollwaparticulardensityfunction (usually normal). (b) Find values of the parameters that maximize the densityfunction.

    5

  • 7/28/2019 Stats 3 Note 04

    7/48

    We have shown that for the regression model, all three approaches lead to the sameestimate ofthevectorofcoecients:

    b= (X0X)1X0yAnunbiasedestimateofthevarianceofthe errortermis

    02 e es =e nK

    Themaximumlikelihoodestimate is0

    2e e

    ^ = nWemayalsoestimatethevariancedecomposition. Wewishtoaccountforthetotalsum

    ofsquareddeviations iny:0y y

    , which has mean square y0y=(n1) (the est. variance of y).Theresidualor errorsumofsquaresis

    0e ee0e, which has mean square s2 = nK:e

    Themodelorexplainedsumofsquares isthedierencebetweenthese:0 0 0y ye e=y y(y0b0X0)(yXb) = b0X0Xb:

    whichhasmeansquareb0X0Xb=(K1). R2 =b0X 0Xb.y0y3.2.2. InterpretationoftheOrdinaryLeastSquaresRegressionAn importantinterpretationoftheregressionestimatesisthattheyestimatethepartial

    derivative, j. The coecient from a simple bivariate regressionof y on Xj measurestheTOTALeectofaXj on y. Thisisakintothetotalderivative. Letusbreakthisdown intocomponentparts andseewhereweendup.

    6

  • 7/28/2019 Stats 3 Note 04

    8/48

    Thetotalderivative,youmayrecall, isthepartialderivativeofywithrespecttoXj plusthesumof partialderivatives ofy with respectto allother variables,Xk, times the partial derivative ofXk withrespecttoXj. If ydependsontwoX variables,then:

    dy @y @y dX2= +

    dX1 @X1 @X2dX1dy @y @y dX1= +

    dX2 @X2 @X1dX2Using the idea that the bivariate regression coecients measure the total eectof one

    variable onanother,we canwritethe bivariate regressionestimates interms of the partialregressioncoecient(fromamultivariateregression)plusotherpartialregressioncoecientsand bivariateregressioncoecients. For simplicityconsidera casewith2X variables. Leta1 anda2 betheslopecoecients from the regressionsofyonX1 andyonX2. Let b1 andb2 be the partialregressioncoecientsfromthemultivariateregressionofy on X1 andX2.Andletc1 betheslopecoecientfromtheregressionofX1 onX2 andc2 betheslopefromtheregressionofX2 onX1.

    We canexpress our estimates ofthe regression parameters interms of the other coe-cients. Letr bethecorrelationbetweenX1 andX2.

    0 1 0 10 )y2

    0 )y1

    0 )(x x2120 )x21

    0 )(x x2120 )x21

    0 ) (y x10 ) (x x2 2

    0 ) (y x20 ) (x x2 2

    0 )(x x11

    0 )(x x11

    0 )(x x22

    0 )(x x11

    (x(x

    (x

    (x

    Bb1 =b2

    a1a2c11r2

    C BC B= CCBB C B C@ A @ A

    a2a1c21r2

    Wecansolvethesetwoequationstoexpressa1 and a2 intermsofb1 andb2.

    a1 =b1+b2c1a2 =b2+b1c2:

    This isestimatedtotaleect(bivariate regression)equalstheestimateddirecteectofXj(partialcoecient)onyplustheindirecteectofXj throughXk.

    Another wayto expresstheseresults is that multivariateregressioncanbe simpliedtobivariate regression,once we partialout the eectsofthe othervariablesonyandonXj.

    7

  • 7/28/2019 Stats 3 Note 04

    9/48

    SupposewewishtomeasureanddisplaytheeectofX1 on y. Regress X1 onX2;:::Xk.RegressyonX2;:::Xk. Taketheresidualsfromeachoftheseregressions,e(x1jx2;:::xk) and e(yjx2;:::xk). TherstvectorofresidualsequalsthepartofX1 remainingaftersubtractingouttheothervariables. Importantly,thisvectorofresidualsisindependentofX2;:::Xk. The secondvectorequalsthepartofyremainingaftersubtratctingtheeectsofX2;:::Xk{ that is the directeectofX1 andtheerrorterm.

    Regresse(yjx2;:::xk) on e(x1jx2;:::xk). This bivariate regression estimates the partialregressioncoecientfromthemultivariateregression,1.

    Partialregressionisveryusefulfordisplayingaparticularrelationship. Plottingtherstvectorof residualsagainstthesecondvectordisplaysthepartialeectofX1 ony, holdingallothervariablesconstant. InSTATApartialregressionplotsareimplmentedthroughthecommand avplot. Afterperforming aregressiontypethecommandavplot x1, where x1is the variableof interest.

    Analwaytointerpretregressionisusingprediction. Thepredictedvaluesofaregressionarethevaluesoftheregressionplane,y . TheseareestimatesofE[y

    jX]. We may vary values

    of any Xj to generate predicted values. Typically, we consider the eect on y of a one-standarddeviationchangeinXj,say from1/2 standarddeviationbelowthe mean of X to1/2standarddeviationabovethemean.

    1Denote y1 as the predicted value at xj = xj +:5sj and y0 as the predicted value at0xj = xj :5sj. The matrix of all variables except xj is X(j) and b(j) is the vector of

    coecientsexcept bj. Chooseanyvectorx(j). A standard deviation change in thepredictedvaluesis:

    1 0 1 0y1y0= (b0+bjxj+x(0j)b(j))(b0 +bjx0 +x(j)b(j)) = bj(xj xj ) = bjsj:jFornon-linearfucntionssuchvariationsarehardertoanalyze,becausetheeect ofany

    onevariabledependson thevaluesof theothervariables. We typicallysetothervariablesequaltotheirmeans.

    3.2.3. AnAlternativeEstimationConcept: InstrumentalVariables.8

  • 7/28/2019 Stats 3 Note 04

    10/48

    Therearemanyotherwayswemayestimatetheregressionparameters. Forexample,wecouldminimizetheMeanAbsoluteDeviationoftheerrors.

    Oneimportantalternativetoordinary leastsquaresestimationisinstrumentalvariablesestimation. Assumethat thereisavariable or matrixof variables,Z, such that Zdoesnotdirectlyaecty,i.e., itisuncorrelatedwith,butit doescorrelatestronglywithX.

    Theinstrumentalvariablesestimatoris:bIV = (X0Z)1Z0y:

    This is a linearestimator. Itis aweightedaverage ofthey's,wherethe weights areoftheform zi x), insteadof (xixi x).

    Inthebivariatecase,thisesimatoristheratiooftheslopecoecientfromtheregressionofyonztotheslopecoecientfromtheregressionofxonz.

    This estimator is quite important in quasi-experiments, because, if we can nd validinstruments,theestimatorwillbeunbiasedbecauseitwillbeindependentofomittedfactorsintheleastsquaresregression. Itwill,however,comeatsomecost. Itisanoisierestimatorthanleastsqures.

    (ziz)(xi x)(xi

    9

  • 7/28/2019 Stats 3 Note 04

    11/48

    3.3. Properties ofEstimates

    Parameter estimates themselves are random variables. They are functions of randomvariables that we use to make guesses about unknown constants (parameters). Therefore,fromonestudytothenext,parameterestimateswillvary. Itisourhopethatagivenstudyhasnobias,soifthestudywererepeatedunderidenticalconditionstheresultswouldvaryaroundthetrueparametervalue. Itisalsohopedthatestimationusesallavailabledataasecientlyaspossible.

    We wish to know the sample properties of estimates in order to understand when wemight face problems thatmight leadus todrawthewrongconclusions fromourestimates,suchasspuriouscorrelations.

    Wealsowishtoknowthesamplepropertiesofestimatesinordertoperforminferences.At times those inferences are about testing particular theories, but often our inferencesconcernwhetherwe'vebuiltthebestpossiblemodelusingthedataathand. Inthisregard,weareconcernedaboutpotentialbias,andwilltry toerrorontheside ofgetting unbiasedestimates.

    Erringonthe sideofsafety,though,cancostus eciency. Test for theappropriatenessofonemodelversusanother,then,dependonthetradeobetweenbiasandeciency.3.3.1. SamplingDistributionsofEstimates

    Wewillcharacterizetherandomvectorbwithitsmean,thevariance,andthefrequencyfunction. Themeanis,whichmeansthatitisunbiased. Thevarianceis2(X0X)1, which isthematrixdescribingthevariancesandcovariancesoftheestimatedcoecients. Andthedensity function f(b) isapproximatedbytheNormaldistributionasnbecomeslarge.

    Theresultsaboutthemeanandthevariancestemfromtheregressionassumptions.First,considerthemeanoftheparameterestimate. Undertheassumptionthat0X= 0,

    10

  • 7/28/2019 Stats 3 Note 04

    12/48

    theregressionestimates areunbiased. Thatis,E[b] = E[bjX] = E[(X0X)1X0y] = E[(X0X)1X0(X+)]=(X0X)1X0X+E(X0X)1X0=Thismeansthat beforewedoastudywe expectthedatatoyielda bof. Ofcourse,thedensity of anyone point iszero. So anotherway to think aboutthis is if wedo repeatedsamplingunderidenticalcircumstances,thenthecoecientswillvaryfromsampletosample.Themeanofthosecoecients,though,willbe .

    Second,considerthevariance.V[b] = E[(b )(b )0] = E[(+ (X0X)1X0 )(+ (X0X)1X0 )0]

    =E[(X0X1X0)((X0X)1X0)0] = E[X0X1X00X(X0X)1]=2(X0X1X0X(X0X)1) = 2X0X1

    Herewehave assumedthe sphericalityof the errors andtheexogeneityof the independentvariables.

    Thesamplingdistributionfunctioncanbeshowntofollowthejointnormal. Thisfollowsfromthemultivariate versionofthe centrallimit theorem,whichIwillnotpresentbecauseofthecomplexityofthemathematics. Theresultemerges,however, fromthefactthattheregressioncoecientsareweightedaveragesofthevaluesoftherandomvariabley,wherethe

    xiweightsareoftheform(xix)2. Moreformally,letbn bethevectorofregressioncoecients

    estimatedfromasampleofsize n.bn !d N(; 2(X0X1))

    Ofcourse,asn! 1, (X0X1)! 0. Wecanseethisinthefollowingway. Theelementsx0jxkof(X0X1) are, up to a signed term, jX 0X j. Theelementsof thedeterminantare multiples

    ofthe crossproducts,andasobservationsareaddedthedeterminantgrowsfasterthananysinglecrossproduct. Hence,eachelementapproaches0.

    Consider the2x2case.

    x2x2;x10x20 0 0 0(X

    0X1) = x1x1x2x2

    1 (x1x2)2

    0x2x1;x10x1 :

    11

  • 7/28/2019 Stats 3 Note 04

    13/48

    This mayberewrittenas1x x (x

    1(x x

    0 x x110 x22

    0 ) (x x2 1

    20 )x2 =1

    0 x x2=2

    0 (x x110 x x11

    !; 0 )x21

    ;0 )x21

    11

    0 x2220 )x x2 =1

    0 ) (x x2 10 (x x11

    0 x x2=2

    Takingthelimitofthismatrixasn! 1 amountstotakingthelimitofeachelementofthematrix. Considering eachterm, we see that the sumof squares in the denominators growwiththe addition ofeach observation,while the numeratorsremain constant. Hence,eachelement approaches0.

    Combined with unbiasedness, this last result proves consistency. Consistency meansthatthe limitas ngrowsof the probabilitythatanestimatordeviates fromthe parameterofinterestapproaches0. Thatis,

    limn!1P r(j n j) = 0 A sucientconditionforthis isthattheestimatorbeunbiased and thatthevarianceoftheestimatorshrinkto0. Thiscondition iscalledconvergenceinmeansquarederror,andisanimmediateapplicationofChebychev'sInequality.

    Of course, this means that the limiting distribution of f(b) shrinks to a single point,which is not so good. So the normality of the distribution of b is sometimes written asfollows:

    pn(bn ) N(0; 2Q1);

    1whereQ=n(X0X),theasymptotic variancecovariancematrixofX's.Wemayconsiderinthisthedistributionofasingleelementofb.

    bj N(j; 2ajj);

    whereajj isthejthdiagonalelementof(X0X)1.Hence,wecanconstructa95percentcondenceintervalforanyparameterj usingthe

    normal distribution and the above formula for thevariance of bj. The standard error isq2ajj. So,a95percentcondenceinterval fora singleparameter is:

    bj 1:96q

    2ajj

    12

  • 7/28/2019 Stats 3 Note 04

    14/48

    .

    Theinstrumentalvariablesestimatorprovidesaninterestingcontrasttotheleastsquaresestimator. LetusconsiderthesamplingpropertiesoftheInstrumentalVariablesestimator.

    E[bIV ] = E[(X0Z)1(Z0y)]=E[(X0Z)1(Z0(X+)]=E[(X0Z)1(Z0X)+ (X0Z)1(Z0))]=E[+ (X0Z)1(Z0))]=

    Theinstrumentalvariablesestimatorisanunbiasedestimatorof.Thevariance oftheinstrumentalvariablesestimatoris:

    0V[bIV ] = E[(bIV )(bIV )0] = E[(X Z)1(Z0))((X0Z)1(Z0))0]2=E[(X0Z)1(Z0Z((X0Z)1] = (X0Z)1(Z0IZ((X0Z)1

    =2(X0Z)1(Z0Z((X0Z)1:As with the least squares estimator the instrumental variables estimator will follow a

    normaldistribution becausetheIVestimator isa(weighted)sumoftherandomvariables,y.3.3.2. BiasversusEciency

    3.3.2.1. GeneralEciencyof Least SquaresAn importantproperty oftheOrdinaryLeastSquaresestimates is that theyhavethe

    lowestvarianceofalllinear,unbiasedestimators. Thatis,theyarethemostecientunbiasedestimators. ThisresultistheGauss-MarkovTheorem. Anotherversionarises as apropertyofthemaximumlikelihoodestimator,wherethelowerboundforthevariancesofallpossibleconsistent estimatorsis2(X0X)1.

    ThisimpliesthattheInstrumentalVariablesestimatorislessecientthantheOrdinaryLeastSquaresestimator. Toseethis,considerthebivariateregressioncase.

    2V[b] = Px)2(xi

    13

  • 7/28/2019 Stats 3 Note 04

    15/48

    P(ziz)22V[bIV] = Px)(ziz)(xiP P

    x)2PComparing these two formulas: V[bIV]=V[b] = ( (ziz)2 (xi . This ratio is the inversex)(ziz)(xiofthesquareofthecorrelationbetweenX andZ. Sincethecorrelationneverexceeds1,weknowthenumeratormustbelargerthanthedenominator. (Thesquareofthecorrelationisknownto be less than 1 becauseoftheCauchy-Schwartz inequality.)

    3.3.2.2. Sourcesof Bias and Ineciency in Least SquaresTherearefourprimarysourcesofbiasandinconsistencyinleastsquaresestimates: mea-

    surement error in the independent variables, omitted variables, non-linearities, and simul-taneity. We'lldiscusstwoofthesecaseshere{specicationofregressors(omittedvariables)andmeasurementerror.

    MeasurementError.Assume that X

    isthe true variableof interest butwe canonlymeasure X =X

    +u;

    where u is a randomerrorterm. Forexample,supposeweregresstheactual shareofthevotefortheincumbentpresidentinanelectiononthejobapprovalratingoftheincumbencypresident, measured in a 500 person preelection poll the week before the election. Galluphas measured thissince1948. Eachelectionisanobservation. The polling data willhavemeasurementerrorbasedonrandomsamplinginherentinsurveys. Specically,thevarianceofthemeasurement erroris p(1np), where pis the percentapprovingofthe president.

    Anothercommon

    source

    of

    measurement

    error

    arises

    from

    typographical

    errors

    in

    datasets.

    Keypunching errors are verycommon, even in data sets distributed publiclythrough rep-utablesourcessuchastheInteruniversityConsortiumforPoliticalandSocialResearch. Forexample,GaryKing,Jim Snyder,andothers whohave workedwithelectiondata estimatethatabout 10 percent of the party identicationcodes ofcandidates are incorrect insomeoftheolder ICPSRdatasetsonelections.

    Finally, somedata sources are not very reliable, or estimates must be made. This is14

  • 7/28/2019 Stats 3 Note 04

    16/48

    commoninsocialandeconomic dataindevelopingeconomiesandindataonwars.Ifweregressy on X,the coecient is a function ofboth the true variable X andthe

    errorterm. That isPn P

    b= P x)(yiy) i=1(xi x+uii=1(xi n u)(yiy) u)2n x)2 = Pin=1(xi x+uii=1(xiPn Pni=1(xi

    x)(yi y) + i=1(uiu)(yi y)=Pn Pn Pni=1(xi

    x)2+ i=1(ui x)(uiu)u)2+ i=1(xi This looks likequiteamess.

    Afewassumptionsaremadetogetsometraction. First,itisusuallyassumedthatuandandthatuandX areuncorrelated. Iftheyarecorrelated,thingsareevenworse. Second,theui'sis assumedtobe uncorrelatedwithoneanotherandtohave constant variance.

    Taking expected values of b will be very dicult, becauseb is a function of the ratioofrandom variables. Here is asituationwhere Probability Limits (plim's)make lifeeasier.Becauseplimsarelimits,theyobeythebasicrulesoflimits. Thelimitofasum isthesumofthelimitandthelimitofaratiois the ratioofthelimits. Dividethetopandbottomofbby n. Nowconsidertheprobabilitylimitofeachelementofb:

    X1 nplim (x x)2 =2i xni=1

    n n1X 1Xplim (x x)(yiy) = plim (xi x)(x x) = 2 i i xn ni=1 i=1

    X1 nplim (xi x)(uiu) = 0 ni=1

    nX1u)2

    =2plim (ui uni=1

    Wecanpulltheselimitstogetheras follows:2 2

    plimb= x = x < 2+2 2+2x u x u

    Thus,inabivariateregression,measurementerrorintheX variablebiasestheestimatetoward 0. In a multivariate setting, the bias cannot generally be signed. Attentuation is

    15

  • 7/28/2019 Stats 3 Note 04

    17/48

    typical,butitispossiblealsotoreversesignsorinatethecoecients. Innon-linearmodels,suchasthoseusingthesquareofX,thebiastermsbecomequitesubstantialandevenmoretroublesome.

    Thebestapproachforeliminatingmeasurementerroriscleaningthedataset. However,it is possibleto xsome measurementerror usinginstrumentalvariables. One approach istousethequantilesortheranksoftheobservedX topredicttheobservedX and then use thepredictedvaluesintheregressionpredictingy. TheideabehindthisisthattheranksofX arecorrelatedwiththeunderlyingtruevalues ofX,i.e., X butnotwithuor.

    Theseexamplesofmeasurementerrorassumethattheerrorispurelyrandom. Itmaynotbe. Sometimes measurement error issystematic. For example,people underreport sociallyundesirable attitudes orbehavior. This isan interestingsubjectthatisextensivelystudiedinpublicopinionresearch,butoftenunderstudiedinotherelds. Agoodsurveyresearcher,forexample, willtapinto archivesofquestionsandeven do question wording experimentstotest thevalidityandreliability of instruments.

    ChoiceofRegressors:omittedand IncludedVariables.The struggle in most statisticalmodeling isto specify the most appropriate regression

    modelusingthedataathand. Thedicultyisdecidingwhichvariablestoincludeandwhichtoexclude. Inrarecaseswecanbeguidedbyaspecictheory. Mostoften,though,wehavegathered data orwill gatherdata to test specic ideasand arguments. Fromthedata athandwhatisthebestmodel?

    There are threeimportantrulestokeepinmind.1. OmittedVariablesThatDirectlyAectY AndAreCorrelatedWithX ProduceBias.The most common threat to the validity of estimates from a multivariate statistical

    analysis isomittedvariables. omittedvariablesaectboththeconsistencyandeciencyofourestimates. Firstandforemost,theycreatebias. Eveniftheydonotbiasourresults,we

    16

  • 7/28/2019 Stats 3 Note 04

    18/48

    oftenwant tocontrolforother factors toimproveeciency.ToseethebiasduetoomittedvariablesassumethatX isamatrixofincludedvariables

    andZ isamatrixofvariablesnotincludedintheanalysis. The fullmodelisy=XX +Zz +:

    SupposethatZisomitted. Obviouslywecan'testimatethecoecient . Will the other zcoecientsbebiased? Istherealoss ofeciency?

    Letbx be theparameter vector estimatedwhenonlythe variables inthe matrix Xareincluded. LetX betthesubsetofcoecientsfromthetruemodelontheincludedvariables,X. The model estimated is

    y=XX +u;whereu=Zz +.

    0 0X)1 0 0E[bX ] = E[(X0X)1X y] = E[(X X XX + (X0X)1X u]= +E[(X0X)1X0Zz] + E[(X0X)1X0] = X +0 z;X zx

    wherezx isamatrixofcoecientsfromtheregressionofthecolumnsofZonthe variablesin X.

    Thisis anextremelyusefulformula. There are twoimportant lessonstotakeaway.First, omittedvariableswillbiasregressionestimatesoftheincludedvariables(andlead

    toinconsistency)if(1)thosevariablesdirectlyaectY and(2)thosevariablesarecorrelatedwiththeincluded variables(X). It is not enough,then,to objecttoananalysisthatthereare variables that have not been included. That is always true. Rather, science advancesby conjecturing (and then gathering thedata on) variables that aecty directlyand arecorrelatedwithX. Ithinkthelatterisusuallyhardtond.

    Second, wecan generally signthebiasofomittedvariables. Whenwethinkaboutthepotential problem of an omitted variable we usually have in mind the direct eect that it

    17

  • 7/28/2019 Stats 3 Note 04

    19/48

    likelyhasonthedependentvariableandwemightalsoknoworcanmakeareasonableguessaboutthecorrelationwiththeincludedvariablesofinterest. Thebiasinanincludedvariablewillbethedirecteectoftheomittedvariableonytimestheeectoftheincludedvariableontheexcludedvariable. If bothof those eectsarepositiveor botharenegative thentheestimatedeect of X onY willbebiasedup{ itwillbetoolarge. Ifoneoftheseeects isnegativeandtheotherpositvethentheestimateeectofX onY willbebiaseddownward.

    2. EciencyCanBeGainedByIncludingVariablesThatPredictY ButAreUncorrelatedWithX.

    2A straightforward analysis reveals that the estimated V[bX ] = se(X0X)1. So far so 1 0good. But thevariance of theerrorterm is inated. Specically,s2 =

    nKu u. Because e0 2u = ZZ +, E[u0u=(nK)] = ZZ0ZZ=(nK) + 2 > . In fact, the estimated

    residual variance is too large by the explained or model sums of squared errors for theomittedvariables.

    Thishasaninterestingimplicationforexperiments. Randomizationinexperimentsguar-anteesunbiasedness. But,we stillwanttocontrol forother factors toreducenoise. Infact,combining regression models in thedata analysis is a powerful way togain eciency (andreducethenecessarysample sizes)inrandomizedexperiments.

    Eveninobservationalstudieswemaywanttokeepinamodelavariablewhoseinclusiondoesnotaecttheestimatedvalueoftheparameter ofavariableof interest iftheincludedvariablestronglyaectsthedependentvariable. Keeping such avariablecapturessomeofthe otherwise unexplained error variance, thereby reducing the estimated variance of theresiduals. Asaresult,thesizeofcondenceintervalswillnarrow.

    3. IncludingVariablesThatAreUnrelatedToY andXLosesEciency(Precision),AndThere Is No Bias From Their Exclusion

    Thatthereisnobiascanbeseenreadilyfromtheargumentwehavejustmade.

    18

  • 7/28/2019 Stats 3 Note 04

    20/48

    The lossof eciencyoccursbecausewe use up degreesoffreedom. Hence,allvarianceestimateswillbetoo large.

    Simplyput,parsimonious modelsarebetter.

    COMMENTS:a. Thinkingthroughthepotentialconsequencesofomittedvariablesinthismannerisvery

    useful. Ithelpsyouidentifywhatothervariableswillmatter inyouranalysisandwhy,and ithelpsyouidentify additionalinformationtosee if thiscould beaproblem. Itturnsoutthattheideologicalt with the district has somecorrelationwiththevote,butitisnotthatstrong. Thereasonisthereisrelativelylittlevariationinideologicaltthatisnotexplainedbysimpleknowledgeofparty. Sothisadditional informationallows us to conjecture (reasonably safely) that, although ideological t could be aproblem,itlikelydoesnotexplainthesubstantialbiasinthecoecientsonspending.

    b.Thegeneralchallengeinstatisticalmodelingandinferenceisdecidinghowtobalancepos-siblebiases againstpossible ineciencies in choosinga particularspecication. Nat-urally, we usually wish to err on the side of ineciency. But, these are choices wemakeonthemargin. Aswewillsee statisticaltestsmeasurewhetherthepossibleim-provementinbiasfromonemodeloutweighsthelossofeciencycomparedtoanothermodel. Thisshouldnot distractfromthemainobjectiveofyourresearch, which istond phenomenaandrelationshipsof largemagnitudeand ofsubstantive importance.Concern aboutomitted Variable Bias should be a seed of doubt thatdrives you tomaketheestimatesthatyoumakeasgoodaspossible.

    3.3.3. ExamplesModelbuilding inStatisticsisreally aprogressiveactivity. Weusuallybeginwith inter-

    estingor important observations. Sometimes those originateina theory ofsocialbehavior,

    19

  • 7/28/2019 Stats 3 Note 04

    21/48

    and sometimes they come from observation of the world. Statisticalanalyses allow us torene thoseobservations. And sometimes they lead to amorerened view of how socialbehavior works. Obvious problems of measurement error or omitted variables exist whentheimplicationsofananalysisareabsurd. Equallyvalid,though,areargumentsthatsuggestaproblem withaplausibleresult. Herewe'llconsider fourexamples.

    1. IncumbencyAdvantagesTheobservationofthe incumbencyadvantagestems fromasimpledierenceofmeans.

    From1978to 2002,theaverageDemcoraticvoteshareofa typicalU.S.House Democraticincumbent is 68%; the average Democratic vote share a typical U.S. House Republicanincumbent is34

    The incumbency advantagemodel isspecied as follows. Thevote fortheDemocraticcandidate indistrict iinelectiontequalsthenormalpartyvote,Ni,plusanationalpartytide, t, plus the eect of incumbency. Incumbency is coded Iit = +1 for DemocraticIncumbents,Iit =1forRepublicanIncumbents,andIit = 0 for Open Seats.

    Vit =t+Ni+Iit+itControlling for thenormalvoteandyeartidesreducesthe estimatedincumbencyeect

    toabout7to9percentageoints.2. StrategicRetirementandtheIncumbencyAdvantageAnobjectionto modelsoftheincumbencyadvantageisthatincumbentschoosetostep

    down only when they are threatened, eitherbychangingtimes, personal problems, or anunusuallygoodchallenger. Thereasonsthatsomeoneretires,then,mightdependonfactorsthattheresearchercannotmeasurebutthatpredictthevote{omittedvariables. ThiswouldcauseI tobecorrelatedtotheregressionerroru. ThefactorsthatarethoughttoaecttheretirementdecisionsandthevotearenegativelycorrelatedwithV andnegativelycorrelatedwithI (makingit lesslikelytorun). Hence,theincumbencyadvantagemaybeinated.

    20

  • 7/28/2019 Stats 3 Note 04

    22/48

    3. Police/PrisonsandCrimeThe theory of crime and punishment begins with simple assumptions about rational

    behavior. Peoplewillcomit crimes ifthe likelihoodofbeing caughtandthe severityofthepunishment are lower than the benet to the crime. A very common observation in thesociologyofcrimeisthatareasthathavelargernumbersofpoliceormorepeopleinprisonhavehighercrimerates.

    4. CampaignSpendingandVotesResearchersmeasuringthefactorsthatexplainHouseelectionoutcomes includevarious

    measuresofelectoralcompetition inexplainingthevote. CampaignExpenditures,andtheadvertisingtheybuy,arethoughttobeoneofthemainforcesaectingelectionoutcomes.

    A commonlyused modeltreatsthe incumbent party's share of the votesas a functionofthenormalpartydivisioninthecongressionaldistrict,candidatecharacteristics(suchasexperienceor scandals), and campaign expendituresofthe incumbentand the challenger.Mostoftheresultsfromsuchregressionsmakesense: thecoecientonchallengerspendingand on incumbentparty strength makesense. Butthecoecienton incumbent spendinghasthewrong(negative)sign. Thenaive interpretationisthatthemore incumbentsspendtheworsetheydo.

    Onepossibile explanation isthat incumbentswho areunpopularand out of step withtheirdistrictshavetospendmoreinordertoremaininplace. Couldthisexplaintheincorrectsign? Inthisaccountof thebias inthespending coecientsthere isa positivecorrelationbetween

    the

    omitted

    variable,

    \incumbent's

    t

    with

    the

    district,"

    and

    the

    included

    variable,

    \incumbentspending."Also,theincumbent'stwiththedistrictlikelyhasanegativedirecteecton the vote. The more out-of-step an incumbent isthe worse he willdoon electionday. Hence, lackingameasureof\twiththe district"mightcause adownwardbias.

    3.3.3. General strategies for Correcting for Omitted Variables and Measurement ErrorBiases

    21

  • 7/28/2019 Stats 3 Note 04

    23/48

    1. MoreData,MoreVariables. Identifyrelevantomittedvariablesandthentrytocollectthem. This iswhymany regression analyseswill includea largenumberofvariablesthatdo not seem relevant to the immediate question. Theyare included to hold other thingsconstant,andalsotoimproveeciency.

    2. MultipleMeasurement.Twosortsofuseofmultiple measurement arecommon.First, to reduce measurement error researchers often average repeated measures of a

    variableor

    construct

    an

    index

    to

    capture

    a\latent"

    variable.

    Although

    not

    properly

    atopic

    forthiscourse, factoranalysisand muti-dimensionalscaling techniquesare very handy forthissort ofdatareduction.

    Second, to eliminate bias we may observe the \same observation" many times. Forexample,wecouldobservethecrimerateinaset ofcitiesoveralongperiodoftime. Iftheomitted factor is one that due to factorsthat are constant within Panelmodels. omittedVariables asNuisanceFactors. Usingtheideaof"control inthedesign."

    3. InstrumentalVariables.Instrumentalvariables estimates allowresearchers to purge theindependent variableof

    interestwithitscorrelationwiththeomittedvariables,whichcausebias. Whatisdicultisndingsuitablevariableswithwhichtoconstructinstruments.Wewilldealwiththismatteratlengthlaterinthecourse.

    22

  • 7/28/2019 Stats 3 Note 04

    24/48

    3.4. Prediction3.4.1. PredictionandInterpretation

    We have givenone interpretation to the regression model as anestimate of the partialderivativesofa function, i.e.,theeectsofasetofindependentvariablesholdingconstantthevaluesoftheotherindependentvariables. Inconstructingthisdenitionwebeganwiththedenitionofaneectasthedierence intheconditionalmeanofY acrosstwodistinctvalues of X. And,acausaleectassumesthatweholdallelseconstant.

    Another important way to interpretconditional means and regressions is as predictedvalues. Indeed,sometimesthegoalofananalysisisnot toestimate eectsbutto generatepredictions. Forexample,onemightbe askedto formulatea predictionaboutthecomingpresidentialelection. Acommonsortof electionforecastingmodelregressestheincumbentpresident'svote share onthe rateof growth inthe economyplusa measure ofpresidentialpopularityplusameasureofpartyidenticationinthepublic. Basedonelectionssince1948,thatregressionhasthefollowingcoecients:

    Vote =xxx+xxxGrowth+xxxPopularity+xxxxParty:We then consider plausible values for Growth, Popularity, and Partisanship to generatepredictionsabouttheVote.

    For any set of values of X, say x0, the most likely value or expected value of Y isy0 = E[Yjx = x0]. This value is calculated in a straightforward manner from the esti-matedregression. Letx0 bearowvectorofvaluesoftheindependentvariablesforwhicha

    0 0 0prediction ismade. I.e., x0 = (x1;x2;:::xk). ThepredictedvalueiscalculatedasKX

    0 0y =x0b=b0+ bjxj :j=1

    It is standard practice to set variables equal to their mean value if no specic value is ofinterest in aprediction. One should be somewhat careful in the choice of predicted valuesso thatthevaluedoesnot lietoo faroutof thesetof valueson whichtheregression wasoriginally estimated.

    23

  • 7/28/2019 Stats 3 Note 04

    25/48

    Consider the presidentialelection example. Assumea growthrate of2 percent, aPop-ularity rating of 50 percent, and a Republican Party Identication of 50 Percent (equalsplit between the parties), then Bush ispredicted to receive xxx percentof the two-partypresidentialvote in2004.

    The predictedvalue or forecast is itselfsubject to error. To measure the forecast errorweconstructthedeviation of theobserved value fromthe\true"value, which we wish topredict. Thetruevalueisitselfarandomvariable: y0=x0+0. Thepredictionorforecasterroris thedeviationofthepredictedvaluefromthetruevalue:

    0 0e =y y0 =x0(b) + 0Thevarinceofthepredictionerroris

    0 0 0V[e ] = 2+V[(x0(b)] =2 + (x0E[(b)(b)0]x0 ) = 2 +2(x0(X0X)1x0 ) Wecanusethislastresulttoconstructthe95percentcondenceintervalforthepredicted

    value:0y 1:96qV[e0]

    As a practical matter this might become somewhat cumbersome to do. A quick wayto generateprediction condenceintervalsiswith an \augmented"regression. Supposewehaveestimatedaregressionusingnobservations,andwewishtoconstructseveraldierentpredicted valuesbased on dierentsets of values for X, say X0. A handytrick is to addthe matrixof values X0 to the bottom of the X matrix. That is add n0 observations toyour dataset for whichthevaluesoftheindepedent variables are the appropriatevaluesofX0. Letthedependentvariableequal0 forallof thesevalues. Finally, addn0 columns toyourdatasetthatequal -1foreachnewobservation. NowregressyonX andthenewsetofdummyvariables.

    Theresultingestimateswillreproducetheoriginalregressionandwillhavecoecientes-timatesforeachoftheindependentvariables. Thecoecientsonthedummyvariablesequal

    24

  • 7/28/2019 Stats 3 Note 04

    26/48

    the predicted valuesand the standard errors of these estimates are the correct predictionstandarderrors.

    As an example, consider the analysis of the relationship between voting weights andposts inparliamentarygovernments. Let's let regressioncalculatethepredictedvaluesandstandard errors for 6 distinct cases: Voting Weight = .25 and Formateur, Voting Weight= .25 and NotFormateur,Voting Weight= .35 andFormateur, Voting Weight= .35 andNotFormateur, Voting Weight = .45 and Formateur, and Voting Weight = .45 and NotFormateur. First, IrantheregressionofShare of PostsonShareofVoting Weightplus anIndicator variable of The Party that formed the government (Formateur). I then ran theregression with 6additional observations, constructed as described above. Below aretheestimatedcoecientsandstandarderrors(constantsnotreported).

    Using Augmented Regression To Calculate Predicted ValuesVariable Coe. (SE) Coe. (SE)

    Voting WeightFormateurD1 (.25, 1)D2 (.25, 0)D3 (.35, 1)D4 (.35, 0)D5 (.45, 1)D6 (.45, 0)

    .9812(.0403)

    .2300(.0094){

    {{{{{

    .9812(.0403)

    .2300(.0094)

    .5572(.0952)

    .3272(.0952).6554(.0953).4253(.0955).7535(.0955).5235(.0960)

    3.4.2. ModelChecking

    Predictedvalues

    allow

    us

    to

    detect

    deviations

    from

    most

    of

    the

    assumptions

    of

    the

    re-

    gression model. Theone assumption we cannot validate is theassumption thatX and are uncorrelated. The residual vector, e, is dened to be orthogonal to the independentvariables, X: e0X = 0. This implies that e0y = e0Xb = 0. This is a restriction, so wecannottesthowwellthedataapproximateoragreewiththisassumption.

    The otherassumptions{linearity, homoskedasticity,noautocorrelation,andnormality{ are readily veried, and xed. A useful diagnostic tool is a residual plot. Graph the

    25

  • 7/28/2019 Stats 3 Note 04

    27/48

    residuals from a regression against the predicted values. This plot will immediately showmanyproblems, iftheyexist. Weexpectanellipticalcloudcenteredarounde= 0.

    Iftheunderlyingmodelisnon-linear,theresidualplotwillreectthedeviationsofthedata from a straightline t through the data. Data generated from a quadratic concavefunctionwillhavenegativeresidualsforlowvaluesofy,thenpositiveforintermediatevaluesof ^ y.y,thennegative forlarge valuesof ^

    If there isheteroskedasticity, the residualplot willshowdeviationsnon-constantdevia-tions arounde=0. A common case ariseswhen the residual plot looks like a funnel. This

    situation means thatthe eects aremultiplicative. That is the model is y =X (where takesonly positive values), not y = +X+epsilon. This is readily xed by takinglogarithms,sothatthemodelbecomes: y=log() + log(X) + log().

    Afurtherplotformeasuringheteroskedasticityisofe2 againsty,againstaparticularXvariable,or against some other factor,suchasthe \size" of theunit. Inthis plot e2 servesastheestimateofthevariance. ThisisanenlighteningplotwhenthevarianceisafunctionofaparticularX orwhenwearedealingwithaggregateddata,wheretheaggregatesconsistofaveragesofvariablesacrossplacesofdierentpopulations.

    Todetectautocorrelationweuseaslightlydierentplot. Supposetheunitsareindexedbytime,sayyears, t. Wemayexaminetheextentofautocorrelationbytakingthecorrelationsbetweenobservationsthataresunits apart:

    PTt=1etetsrs = PT 2

    t=1etThe correlation between an observation and the previous observation is r1. This is called rst-orderautocorrelation. Anotherformofautocorrelation,especiallyinmonthlyeconomicdata,isseasonalvariation,whichiscapturedwiths= 12.

    It is instructiveto plotthe estimatedautocorrelationagainsts, where srunsfrom0toarelativelylargenumber,say1/10thofT.

    Whatshouldthisplot look like? Ats= 0, the autocorrelation parameter isjust 2 =2u

    12. The values of rs fors > 0 depends on the nature of the autocorrelation structure.26

  • 7/28/2019 Stats 3 Note 04

    28/48

    Let'stakea closer look.Themostbasicandcommonlyanalyzedautocorrelationstructureinvolvesanautoregres-

    sion of order1 (or AR-1). Letut bean error termthat is independent of t. First-order autocorrelation in isoftheform:

    =t1+uttThe variance of t followsimmediatelyfromthedenitionofthevariance:

    2 =E[(t1

    +ut)(

    t1+u

    t)]=22

    epsison +2

    u

    Solvingfor2epsilon:2u= :2 (12)

    Toderive,thecorrelationbetweentandts,wemustderivethecovariancerst. Fors= 1, thecovariancebetweentwoobservationsis

    E[tt1] = E[(t1+ut)t1] = 2Usingrepeated substitutions for wend that for an AR-1:t

    E[tts] = s2Now,we canstate what weexpect toobserve inthers whentheresidualscontainrst-

    orderautocorrelation:C ov(t; ts) s2s = = =s

    2q

    V(t)V(t1) There

    are

    two

    patterns

    that

    may

    appear

    in

    the

    autocorrelation

    plot,

    depending

    on

    the

    sign

    of. If > 0,theplot should decline exponentiallytoward0. Forexamle,suppose= :5,thenwe expectr0 = 1, r1 = :5,r2 = :25, r3 = :125,r4 =:0625,r5 =:03125. If < 0,theplotwillseesaw, converging on 0. Forexample, suppose =:5, thenwe expect r0 = 1, r1 =:5,r2 =:25,r3 =:125,r4 =:0625,r5=:03125.

    Higher order auto-correlation structures{ such as t = 1t1 +2t2 +ut { lead to more complicatedpatterns. We maytest for the appropriateness of a particular structure

    27

  • 7/28/2019 Stats 3 Note 04

    29/48

    bycomparingtheestimatedautocorrelations,rs,withthevaluesimpliedbyastructure. Forexample, wemay test for rst orderautocorrelation bycomparing theobserved r 's withsthoseimpliedbytheAR-1modelwhen ^ =r1.

    Asimpleruleofthumbappliesforallautocorrelationstructures. Ifrs < :2foralls, then thereis nosignicantdegreeofautocorrelation.

    28

  • 7/28/2019 Stats 3 Note 04

    30/48

    3.5. Inference

    3.5.1. GeneralFramework.

    Hypotheses.What is anhypothesis? An hypothesis isastatementaboutthedata derived from an

    argument, model, or theory. It usually takes the form ofa claim about the behavior ofparameters ofthedistributionfunctionor abouttheeectofonevariableonanother.

    Forexample,asimpleargumentaboutvotingholdsthatintheabsenceofotherfactors,such as incumbency, voters use party to determine their candidate of choice. Therefore,when no incumbent ison the ticket, an additionalone-percent Democratic in an electoraldistrictshouldtranslate into one-percenthigherDemocraticvotefor aparticular oce. Inaregressionmodel,controllingforincumbency,theslopeonthenormalvoteoughttoequalone. Ofcoursetherearea numberof reasonswhythishypothesismight fail tohold. Theargumentitselfmightbeincorrect;theremaybeotherfactorsbesideincumbencythatmustbeincludedinthemodel;thenormalvoteishardtomeasureandwemustuseproxies,whichintroducemeasurementerror.

    Inclassicalstatistics,anhypothesistestisaprobabilitystatement. Werejectanhypoth-esis ifthe probabilityofobserving thedatagiven that the hypothesis is true issucientlysmall,saybelow.05. LetTbethevectorofestimatedparametersandbethetrueparam-etersofthedistribution. Let0 bethevaluesofthedistributionpositedbythehypothesis.Finally,let bethevarianceofT. Ifthehypothesisistrue,then=0. Ifthehypothesisistrue,thenthedeviationofTfrom0 shouldlooklikearandomdrawfromtheunderlyingdistribution,andthusbeunlikelytohaveoccuredbychance.

    Whatwehavejustdescribedisthesizeofatest. Wealsocareaboutthepowerofatest.If0 isnot true,what is theprobabilityof observinga suciently smalldeviationthat wedonotrejectthehypothesis? ThisdependsonsamplesizeandvarianceofX.

    29

  • 7/28/2019 Stats 3 Note 04

    31/48

    TestsofaSingleParameter.Wemayextendtheframeworkforstatisticaltestsaboutmeansanddierencesofmeans

    tothecaseoftestsabout asingleregressioncoecient. Recallthat theclassicalhypothesistestforasinglemeanwas:

    sP r(jx 0j> t=2;n1p

    n)< Becausex isthesumofrandomvariables,itsdistributionisapproximatelynormal. However,becausewemustestimatethestandarddeviation ofX, s, the t-distributionisused asthereferencedistributionforthehypothesistest. Thetestcriterioncanberewrittenas follows.

    x0jWerejectthehypothesis if j > t=2;n1.s=pnThereisanimportantdualitybetweenthetestcriterionaboveandthecondenceinterval.

    Thetest criterionforsize.05maybe rewrittenas:

    x t:025;n1pn

    < 0 < P r( x +t:025;n1pn)> :95

    So, wecantestthehypothesisbyascertaining whetherthehypothesized value falls insidethe95percentcondence interval.

    Now considerthe regression coecient, . Supposeour hypothesis is H : = 0. A common value is 0 = 0{ i.e., no eect of X on Y. Like the sample average, a singleregression parameter follows the normal distribution, because the regression parameter is

    P 2thesumofrandomvariables. Themeanofthisdistribution is andthevariancex)2,(xi

    inthecaseofabivariateregression,or,moregenerally,2aj j, where aj j isthejthdiagonalelementofthematrix(X0X)1.

    The test criterion for thehypothesis statesthe following. Ifthenullhypothesis is true,then we expect that the probability of a large standardized deviation b from 0 will beunlikelytohave occurredbychance:

    P r(jb 0j> t=2;nKspajj)< As with the sample mean, this test criterion can be expressed as follows. We reject the

    jb0jhypothesizedvalue if spajj > t=2;nK.30

  • 7/28/2019 Stats 3 Note 04

    32/48

    TheWaldCriterion.Thegeneral frameworkfortestinginthemultivariatecontextisaWaldTest. Construct

    avectorofestimatedparametersanda vector ofhypothesizedparameters. Generically,wewill write these as Tand0,because theymaybefunctions of regression coecients,not

    justthecoecientsthemselves. Denethevectorofdeviationsof theobserved data fromthehypothesizeddataasd=bfT0. This isavectoroflengthJ.

    Whatisthedistributionofd? Underthenullhypothesis,E[d] = 0andV[d] = V[T] = .

    The

    Wald

    statistic

    is:

    W =d01d:

    Assuming d isnormally distributed, the Wald Statistic follows the 2 distributionwith Jdegreesoffreedom.

    Usually, wedo not know ,andmust estimate it. Substituting theestimatedvalueof and making an appropriate adjustment for the degrees of freedom, we arrive at a newstatisticthatfollowstheF distribution. ThedegreesoffreedomareJ andnK, which are thenumberofrestrictionsin dand the number of free piecesofinformationavailableafterestimationofthemodel.

    In the regression framework,we may implementatest of aset of linear restrictions ontheparametersasfollows. Supposethehypothesestaketheformthat linearcombinationsofvariousparametersmustequalspecicconstants. Forexample,

    1 = 0 2 = 3

    and4+ 3 = 1

    . We can specify the hypothesis as R=q

    31

  • 7/28/2019 Stats 3 Note 04

    33/48

    Where R is a matrix of numbers that correspond to the parameters in the set of linearequations found in thehypothesis; is the vector of coecients in the fullmodel(lengthK),andqisanappropriatesetofnumbers. Intheexample,supposethereare5coecintsinthefullmodel: 0

    1

    10 1 0 1B C 01 0 0 0 0 2B CB CB CB CR=@0 1 0 0 0 AB 3C@3AB C@ A 10 0 1 1 0 4

    5Thedeviationoftheestimatedvaluefromthetruevalueassumingthehypothesisistrue

    isd=RbR=Rbq

    If the hypothesis is correct: E[d] = E[Rbq] = 0.0 0V[d] = E[dd ] = E[(RbR)(RbR)0] = E[R(b)(b)0R ]

    BecauseRisamatrixofconstants:0 2 0V[d] = RE[(b)(b)0]R =RX0X)1R

    Becauseweusuallyhavetoestimatethe varianceoftheWaldstatisticis(Rbq)0[R(X0X)1R0](Rbq)=J

    F =e0e=(nK)

    Another way to write the F-statistic for the Wald test is as the percentage loss of t.Suppose we estimate two models. In one model, the Restricted Model, we impose thehypothesis,andontheothermodel,theUnrestrictedModel,weimposenorestrictions. TheF-testfor the Waldcriterioncanbe writtenas

    0 0eReR eueu=JF = 0eueu=(nK)Denote the residualsfromthesemodels as eR and eU . Wecanwritetheresidualsfromtherestrictedmodel in terms of the residuals from the unrestricted model: eR = yXbR =

    32

  • 7/28/2019 Stats 3 Note 04

    34/48

    yXbU X(bR bU ) = eU X(bR bU ). The sum of squares residuals from the re-0strictedmodel is eReR = (eU X(bR bU ))0(eU X(bR bU))

    0 0= eUeU (bR bU )0X0X(bR bU) = eUeU (RbU q)0(R(X0X)1R0)(RbU q) So, thedierenceinthesumofsquaresbetweentherestrictedandtheunrestrictedmodelequalsthenumeratoroftheWaldtest.

    There are two other important testing criteria{ likelihoodratiostatistics and lagrangemultiplier tests. These three are asymptotically the same. The Wald criterion has bettersmallsampleproperties,anditiseasytoimplementfortheregressionmodel. Wewilldiscusslikelihoodratiostatistics as part ofmaximumlikelihoodestimation.

    33

  • 7/28/2019 Stats 3 Note 04

    35/48

    3.5.2. ApplicationsThree important applications of the Wald Test are (1) tests of specic hypotheses, (2)

    vericationof choiceof aspecication, and (3)testsof biasreduction in causalestimates,specicallyOLS versusIV.Allthreeamounttocomparingthebiasduetothemoreparsi-monioushypothesizedmodelagainsttheeciencygainwiththatmodel.

    i. HypothesisTests: SeatsandVotes inEngland,1922-2002.The Cube Lawprovides an excellentexample of the sort of calculationmade to test a

    concrete hypothesis. The Cube Law states that the proportionof seats won varies asthecubeoftheproportionofvoteswon:

    3S V= :

    (1S) 1VWemayimplementthisinaregressionmodelasfollows. Thisisamultiplicativemodel

    whichwewilllinearizeusinglogarithms. Therearereallytwofreeparameters{theconstantterm

    and

    the

    exponent

    .

    S Vlog( ) = + log

    (1S) 1VForEnglishParliamentaryelections from1922to2002,thegraphofthelogoftheodds

    oftheConservativeswinningaseat(labelledLCS)versusthelogoftheoddsoftheConser-vativeswinningavote(labelled LCV) isshownin thegraph. Therelationshiplooksfairlylinear.

    ResultsoftheleastsquaresregressionofLCSonLCVareshowninthetable. Theslopeis2.75andtheinterceptis-.07. Testsofeachcoecientseparatelyshowthattherewewouldsafelyaccept the hypothesisthat 0 = 0 and 1 =3,separately. Specically,the t-test ofwhethertheslopeequals3is2:753 = 1:25,whichhasap-valueof.25. Thet-testofwhether

    :20theconstantequals0 is :070 =1:4,whichhasap-valueof.18.:05

    34

  • 7/28/2019 Stats 3 Note 04

    36/48

    Regression Example:Cube

    Law

    in

    England,

    1922-2002

    Y =Log of the Odds Ratio of Conservative SeatsX = Log of the Odds Ratio of Conservative Votes

    Variable Coe. (SE) t-testLCV 2.75(.20) .25/.2=1.25

    (p=.24)Constant -0.07(.05) -.07/.05=-1.4

    (p=.18)N 21R2 .91

    MSE .221F-testfor 2.54H :0= 0; (p=.10)

    1 = 3 However, this is the wrong approach to testing hypotheses about multiple coecients.

    Thehypothesishasimplicationsforbothcoecientsatthesame time. Anappropriate testmeasures how much the vector b0 = (2:75;:07) deviates from the hypothesized vector0 = (3;0).

    After estimating the regression, we can extract the variance-covariance matrix for b,usingthecommand matrixliste(V). If the nullhypothesis istrue,then E[b] = 0. The Waldcriterion is d0V[d]1d. Because the matrix R istheidentitymatrixinthis example,i.e., each of the hypotheses pertains to a dierent coecient, V(d) equals V(b). So, theF-statistic fortheWaldcriterionis:

    1 27:6405 38:2184

    F = (:246;:070)38:2184 427:3381

    :246

    2 :070=:5((

    :246)227:6405+2(

    :070)(

    :246)38:2184+(

    :070)2427:3381)=2:54

    TheprobabilityofobservingadeviationthislargeforarandomvariabledistributedFwith2and19degreesoffreedomis.10(i.e.,p-value=.10). ThissuggeststhatthedatadodeviatesomewhatfromtheCubeLaw.

    WecanimplementsuchtestsusingthetestcommandinSTATA.Followingaregression,youcanusetest variable names = qtotestasinglerestriction|f(b1; b2;:::) = q1, such astestLCV=3. Totestmultiplerestrictions,youmustaccumulatesuccessivetestsusing

    35

  • 7/28/2019 Stats 3 Note 04

    37/48

    theoptionaccum. Inourexample,rsttype test LCV = 3, then type test cons = 0,accum. ThisreturnsF = 2:54; p value=:10.

    ii. ChoiceofSpecication: EnergySurvey.Byaspecication,wemeanaspecicsetofindependentvariablesincludedinaregression

    model. Becausethereisatradeobetweeneciencywhenirrelevantvariablesareincludedandbiaswhenrelevantvariablesareexcluded,researchersusuallywanttondthesmallestpossiblemodel. Indoing so,weoftenjettison variablesthat appearinsignicant accordingto the rule ofthumb that theyhave lowvalues of the t-statistic. After runningnumerousregressionswemayarriveatwhatwethink isagoodttingmodel.

    The example of the Cube Law should revealan important lesson. The t-statistic on asingle variable is only a good guide about that variable. If you've made decisions aboutmanyvariables,youmighthavemadeamistake inchoosingtheappropriateset.

    The F-testfor the Waldcriterionallows us totest whetheranentire set of variables infactoughtto beomittedfromamodel. The decisiontoomitasetofvariablesamountstoaspecichypothesis{ thateach ofthevariablescanbeassumed tohavecoecientsequalto0.

    Consider the datafrom the GlobalWarmingSurveydiscussedearlier inthecourse. Seehandout. There are 5 variables in the full regression model that have relatively low t-statistics. Toimprovetheestimates oftheothercoecients,weoughttodropthesevalues.Also,wemightwanttotestwhethersomeofthesefactorsshouldbedroppedasasubstantivematter.

    The F-test is implementedbycalculating theaverage loss of t in the sum of squared

    residuals. The information in the ANOVA can be used to compute: (1276:49121270:5196)=5

    F = =:947545141:260436

    Onemayalsousethecommandstest and test, accum.

    36

  • 7/28/2019 Stats 3 Note 04

    38/48

    Specicationtestssuchasthisoneareverycommon.

    iii. OLSversusInstrumentalVariables:UsingTermLimitsToCorrectForStrategicRetire-ment.

    Wemayalsousethe Waldcriteriontotest acrossestimators. Onehypothesisofpartic-ular importance is thatthere arenoomittedvariables in theregression model. We cannotgenerallydetectthisproblemusing regression diagnostics, such asgraphs, butwemaybeabletoconstruct atest.

    Letd=bIV bOLS. TheWaldcriterion isd0[V[d]1d.V[bIV bOLS] = V[bIV] + V[bOLS]Cov[bIV; bOLS]Cov [bOLS; bIV]

    HausmanshowsthatCov [bOLS; bIV] = V[bIV], soV[bIV bOLS] = V[bIV]V[bOLS]:

    ^LetXbethesetofpredictedvalues from regressingXonZ.TheF-testforOLSversus

    IVis^ X)1(X0X)1]1dd0[(X0^

    H =2s

    ThisisdistributedFwithJandn-Kdegreesoffreedom,whereJisthenumberofvariableswehavetriedtoxwithIV.

    37

  • 7/28/2019 Stats 3 Note 04

    39/48

    4. Non-Linear Models

    4.1. Non-LinearityWehave discussedoneimportantviolationoftheregressionassumption{omittedvari-

    ables. And, we have touched on problemsof ineciency introduced byheteroskedasticityand autocorrelation. Thisandthe followingsubsectionsdealwithviolationsoftheregres-sion

    assumptions

    (other

    than

    the

    omitted

    variables

    problem).

    The

    current

    section

    examines

    correctionsfornon-linearity;thenextsectionconcernsdiscretedependentvariables. Follow-ing that we will deal briey with weighting, heteroskedasticity, andautocorrelation. Timepermittingwe willdoabitonSensitivity.

    Weencountertwo sortsof non-linearity in regression analysis. In someproblems non-linearityoccursamongtheXvariablesbutitcanbehandledusingalinearformofnon-linearfunctionsoftheX variables. In other problemsnon-linearity is inherent in the model: wecannot

    \linearize"

    the

    relationship

    between

    Y

    and

    X.

    The

    rst

    sort

    of

    problem

    is

    sometimes

    called\intrinsicallylinear"andthesecondsortis \intrinsically non-linear."

    Consider, rst, situations where we can convert a non-linear model into a linear form.IntheSeats-Votesexample,thebasicmodelinvolvedmultiplicativeandexponentialparam-eters. We convertedthis into a linear form by taking logarithms. There are awide rangeofnon-linearitiesthat canarise; indeed,there are an innite number oftransformationsofvariablesthatonemightuse. TypicallywedonotknowtheappropriatefunctionandbeginwithalinearrelationshipbetweenY andX astheapproximationofthecorrectrelationship.Wethenstipulatepossiblenon-linearrelationshipsandimplementtransformations.

    Commonexamplesofnon-linearities include:Multiplicativemodels,wheretheindependentvariablesentertheequationmultiplicatively

    ratherthan linearly(suchmodelsarelinearin logarithms);Polynomialregressions,wheretheindependentvariablesarepolynomialsofasetofvariables

    38

  • 7/28/2019 Stats 3 Note 04

    40/48

    (common in production function analysis in economics); and Interactions, wheresome independent variables magnify theeects of other independent

    variables (common inpsychological research).Ineachofthesecases,wecanusetransformationsoftheindependentvariablestoconstructalinearspecicationwithwithwecanestimate alloftheparameters of interest.

    Qualitative Independent variables and interaction eects are the simplest sort of non-linearity. Theyaresimpletoimplement,but sometimeshardtointerpret. Let'sconsiderasimplecase. AnsolabehereandIyengar(1996)conductedaseriesoflaboratoryexperimentsinvolving approximately 3,000 subjects over 3 years. The experiments manipulated thecontentofpoliticaladvertisements,nestedthemanipulatedadvertisementsinthecommercialbreaksofvideotapesoflocalnewscasts,randomlyassignedsubjectstowatchspecictapes,andthenmeasuredthepoliticalopinionsandinformationofthesubjects. TheseexperimentsarereportedinthebookGoingNegative.

    Onpage190,theyreportthefollowingtable.

    Eects ofParty and Advertising ExponsureonVotePreferences: GeneralElectionExperiments

    Independent (1) (2)Variable Coe. (SE) Coe. (SE)

    Constant .100(.029) .102(.029)Advertising

    Eects

    Sponsor .077(.012) .023(.034)

    Sponsor*Same Party { .119(.054)Sponsor*OppositeParty { .028(.055)

    Control VariablesParty ID .182(.022) .152(.031)

    Past Vote .339(.022) .341(.022)PastTurnout .115(.030) .113(.029)

    Gender .115(.029) .114(.030)

    39

  • 7/28/2019 Stats 3 Note 04

    41/48

    The dependentvariableis+1ifthe personstatedan intentiontovoteDemocraticafterviewingthetape,-1ifthepersonstated an intentiontovoteRepublican,and0otherwise.Thevariable Sponsoristhepartyofthecandidatewhoseadwasshown; itequals+1iftheadwasaDemocraticad,-1iftheadwasaRepublicanad,and0ifnopoliticaladwasshown(control). Party ID is similarly coded usinga trichotomy. Same Partywas codedas+1 ifapersonwasaDemocratandsawaDemocraticadoraRepublicanandsawaRepublicanad. OppositePartywascodedas+1if apersonwasaDemocratandsawaRepublican adoraRepublicanandsawaDemocratic ad.

    Intherstcolumn,the\persuasion"eectisestimatedasthecoecientonthevariableSponsor. Theestimateis.077. Theinterpretationisthatexposuretoanadfromacandidateincreasessupportforthatcandidateby7.7percentagepoints.

    ThesecondcolumnestimatesinteractionsoftheSponsorvariablewiththePartyIDvari-able. Whatistheinterpretationofthe setofthreevariablesSponsor,Sponsor*SameParty, and Sponsor*OppositeParty. Thevariables Same Party and Opposite Party encompassall party identiers. When these variables equal 0, the viewer is a non-partisan. So, thecoecientonSponsorinthesecondcolumnmeasurestheeectofseeinganadamonginde-pendentviewers. It increasessupport forthesponsorbyonly2.3percentagepoints. WhenSame Party equals 1, the coecient on Sponsor is 11.9. This is notthe eect of the adamongpeopleofthesameparty. ItisthedierencebetweentheIndependentsandthoseofthesame party. Tocalculate the eectofthe ad on people ofthesame party we must add.119to.023,yieldinganeectof.142,ora14.2percentagepointgain.

    Interactionssuchastheseallowustoestimatedierentslopesfordierentgroups,changesintrends,andotherdiscretechangesinfunctionalforms.

    Anotherclassofnon-linearspecicationstakesthe formofPolynomials. Manytheoriesof behaviorbegin with aconjecture of an inherently non-linear function. For instance, arm'sproductionfunctionisthoughttoexhibitdecreasingmarginalreturnson investments,capitalorlabor. Also,riskaversionimpliesa concaveutilityfunction.

    40

  • 7/28/2019 Stats 3 Note 04

    42/48

    Barefootempiricismsometimesleadsustonon-linearity,too. Examinationofdata,eithera scatter plotora residual plot, may reveala non-linearrelationship, between Y and X.Whilewedonotknowwhattheright functionalform is,wecancapturetherelationshipinthe data by includingadditionalvariablesthatarepowersof X, such as X1=2,X2,X3, as wellasX. Inotherwords,tocapturethenon-linearrelationship,y=g(x),weapproximatetheunknownfunction,g(x),usingapolynomialofthevaluesofx.

    Onenoteofcaution. Includingpowersofindependentvariablesoftenleadstocollinearityamong the righthand sidevariables. One trick forbreaking such collinearity is to deviatetheindependentvariablesfromtheirmeansbeforetransformingthem.Example: CandidateConvergenceinCongressionalElections.

    An important debate in the study of congressional electionsconcerns how well candi-dates represent their districts. Two conjectures concern the extent to which candidatesreecttheir districts. First,arecandidatesresponsiveto districts? Arecandidates inmoreliberaldistricts moreliberal? Second,does competitiveness leadtocloser representationofdistrictpreferences? Inhighlycompetitivedistricts,arethecandidatesmore\converged"?Huntingtonpositsgreaterdivergenceofcandidatesincompetitiveareas{ asortofclashofideas.

    Ansolabehere,Snyder,and Stewart(2001)analyzedata froma surveyof congressionalcandidatesontheir positionsona range of issuesduring the1996election. There are 292districtsinwhichbothofthecompetingcandidateslledoutthesurvey. Theauthorscon-structedameasureofcandidateideologyfromthesesurveys,andthenexaminethemidpointbetweenthetwocandidates(thecutpointatwhichavoterwouldbeindierentbetweenthecandidates)andtheideologicalgapbetweenthecandidates(thedegreeofconvergence). Tomeasure electoral competitiveness theauthors used theaverage share of votewon by theDemocraticcandidate inthepriortwopresidentialelections(1988and1992).

    The Figures show the relationship between Democratic Presidential Vote share and,respectively, candidate midpoints and the ideological gap between competing candidates.

    41

  • 7/28/2019 Stats 3 Note 04

    43/48

    There is clearevidence of a non-linear relationship explaining the gap betweenthe candi-dates

    ThetablepresentsaseriesofregressionsinwhichtheMidpointandtheGaparepredictedusing quadratic functions of the Democratic Presidential Vote plus an indicator of OpenSeats,which tendto be highly competitive. We consider, separately, aspecicationusingthe value of the independent variable and its square and a specication using the valueof the independent variable deviated from its mean and its square. The last column usestheabsolutevalueofthedeviationofthepresidentialvote from .5 asanalternative tothequadratic.

    Eects of District Partisanship on Candidate PositioningN = 292

    DependentVariableMidpoint Gap

    Independent (1) (2) (3) (4) (5)Variable Coe.(SE) Coe.(SE) Coe. (SE) Coe. (SE) Coe. (SE)DemocraticPresidential -.158 (.369) | -2.657(.555) | |

    VoteDemocraticPresidential -.219 (.336) | +2.565(.506) | |

    VoteSquaredDem. Pres. Vote | -.377(.061) | -.092(.092) |

    MeanDeviatedDem. Pres. Vote | -.219(.336) | +2.565(.506) |

    MeanDev. SquaredAbsoluteValueofDem. Pres. | | | | .764(.128)

    Vote, Mean Dev.OpenSeat .007(.027) .007(.027) -.024(.040) -.024(.040) -.017(.039)Constant .649(.098) .515(.008) 1.127(.147) .440(.012) .406(.015)R2 .156 .156 .089 .089 .110pMSE .109 .109 .164 .164 .162

    F (p-value) 18.0(.00) 18.0(.00) 9.59(.00) 9.59(.00) 12.46(.00)

    42

  • 7/28/2019 Stats 3 Note 04

    44/48

    M

    idpointBetweenCandidateId

    eologicalPositions

    0

    .2

    .4

    .6

    .8

    1

    .2 .3 .4 .5 .6 .7 .8Average Dem. Presidential Vote Share

    43

  • 7/28/2019 Stats 3 Note 04

    45/48

    IdeologicalGapBetween

    Candidates

    0

    .2

    .4

    .6

    .8

    1

    .2 .3 .4 .5 .6 .7 .8Average Dem. Presidential Vote Share

    44

  • 7/28/2019 Stats 3 Note 04

    46/48

    Consider, rst, the regressionsexplaining the midpoint (columns1 and 2). We expectthatthemoreliberalthedistrictsarethemoretotheleftwardthecandidateswilltend. Theideologymeasureisorientedsuchthathighervaluesaremoreconservativepositions. Figure1shows astrongrelationshipconsistentwiththeargument.

    Column(1)presents theestimates when wenaively include PresidentialvoteandPres-idential vote squared. This is a good example of what collinearity looks like. The F-testshowsthattheregressionis\signicant"{i.e.,notallofthecoecientsare0. But,neitherthe coecient on Presidential Vote or Presidential Vote Squared are signcant. Tell-talecollinearity. AtrickforbreakingthecollinearityinthiscaseisbydeviatingXfromitsmean.Doingso, wenda signicanteectonthe linear coecient,but the quadraticcoecientdoesn'tchange. Thereisonlyreallyone freecoecienthere,anditlookstobelinear.

    The coecients in a polynomial regression measure the partial derivatives of the un-knownfunctionevaluated atthe mean ofX. Taylor'sapproximation leadsusimmediatelyto this interpretationof the coecients for polynomialmodels. Recall from Calculus thatTaylor'sTheoremallowsustoexpressany functionasasumofderivativesofthatfunctionevaluatedata specicpoint. Wemaychooseanydegreeofapproximationtothe functionby selecting a specicdegree of approximation. A rst-orderapproximation usesthe rstderivatives;asecondorder approximationusesthesecondderivatives;andsoon. Asecondorderapproximationofanunknownfuction,then,maybeexpressedas:

    1yi+0xi + x

    2where " #

    @f(x)g0 =@x

    0H ;0 ixi

    x=x0" #@2f(x)

    H0 = 00 +0x0

    @x@x x=x01

    x2=f(x0)g 0 H0 0x0

    =g0H0x0:

    45

  • 7/28/2019 Stats 3 Note 04

    47/48

    Thecoecientsonthesquaresandcross-productterms,then,capturetheapproximatesecondderivative. Thecoecientsonthelinearvaluesofxequalthegradient,adjustingforthequadraturearoundthepointatwhichthedataareevaluated(X0). Ifwedeviateallofthevariablesfromtheir meansrst,thenthecoecienton X inthepolynomialregressioncanbeinterpretedstraightforwardlyasthegradientofthe(unknown)functionatthemean.Example: CandidateConvergenceinCongressionalElections,continued.

    Consider theestimates incolumn (3). We mayanalyzethese coecientstoaskseveral@y

    basicquestions.

    What

    is

    the

    marginal

    rate

    of

    change?

    @D P =1+ 22DP. SinceDPranges

    from.2to.8,therateofchangeintheGapforachangeinelectoralcompetitionrangesfrom-1.62 when DP = .2to +1.42 when DP = .8. Atwhat point doesthe rate of changeequal0(whatistheminimum)? Settingthepartialequalto0revealsthatDP0 =1 =:517,so22the candidatesare most converged near .5. What isthe predictedvalue ofthe gapat thispoint? 1:1272:657:517+2:565(:517)2 =:438. Atthe extremes,the predicted valuesare.64(whenDP=.8)and.70(whenDP=.2).

    Ifwemustchooseamongseveraltransformations,suchaslogarithms,inverses,andpoly-nomials,wetypicallycannottestwhich is mostappropriate using the Waldtest. Davidsonand MacKinnon(1981)proposeusingthepredictedvaluesfromoneofthenon-linearanal-ysisto construct a test. Suppose we havetwoalternativemodels: y = X and y = Z.Consideracompoundmodel

    y= (1)bfX+Z+. Then,includeZ^TheJ-testisimplemented intwosteps. First,estimatey=ZtogetZ^

    ^intheregressionofy=X. ThecoecientonZyieldsanestimateof. Asymptotically,^0thet-ratio testswhichmodel isappropriate.

    SE(

    Example: CandidateConvergenceinCongressionalElections,continued.Analternativetothequadraticspecicationincolumn(4)isanabsolutevaluespecica-

    tion incolumn(5). NoticethattheR2 isslightly better withthe absolute value. However,46

  • 7/28/2019 Stats 3 Note 04

    48/48

    owingto collinearity, it ishardtotestwhich model ismore appropriate. I rstregress onGapon DPandDP2 and Openandgeneratedpredictedvalues,calltheseGapHat. I then regressGaponZGand jDP:5jand Open. The coecienton jDP:5j is1.12 with astandard error of .40, so the t-ratio is above thecritical value for statistical signicance.The coecient on ZGequals -.55 with a standard error of .58. The estimated is notsignicantly dierent from0; hence,we favortheabsolutevaluespecication.