PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

PLSC 503: Quantitative Methods, Week 1

Thad DunningDepartment of Political Science

Yale University

Lecture Notes Week 1:Causal Inference and Potential Outcomes

Lecture Notes, Week 1 1/ 41

Introduction to 503

Social scientists use many methods, for many differentpurposes.

Quantitative analysis can play several roles:

I Description, e.g., conceptualization and measurementI Causal inference

The latter is perhaps the trickiest.This course introduces causal and statistical models forquantitative analysis, and places emphasis on the importanceof strong research design.

I It is important to master technique; it is even more important tounderstand core assumptions.

Introduction to 503

Social scientists use many methods, for many differentpurposes.Quantitative analysis can play several roles:

Introduction to 503

I Description, e.g., conceptualization and measurement

I Causal inference

Introduction to 503

The latter is perhaps the trickiest.

This course introduces causal and statistical models forquantitative analysis, and places emphasis on the importanceof strong research design.

Introduction to 503

Organization of the course

Causal and statistical inference under the potential outcomesmodel

Regression as a descriptive tool (bivariate and multivariate, inscalar and matrix form)

Regression models: causal and statistical inference

(Spring break)Various topics in the design and analysis of experimental andobservational data:

I E.g., difference-in-difference designs, matching, naturalexperiments

(Spring break)

Various topics in the design and analysis of experimental andobservational data:

An applicationDefining causalityPotential outcomesCausal inference

One Design for Causal Inference: Does Voter PressureShape Turnout?

Why people fail to vote—and why they vote at all—are bothpuzzles for many social scientists.

I Maybe people have intrinsic incentives that lead them to vote(a sense of duty?). Or maybe they respond to peer pressure.

I However, testing hypotheses about what causes people tovote is challenging.

Gerber and Green have conducted many experimentalstudies to assess what factors influence turnout, e.g., phonecalls, door-to-door contacts–and social pressure.

I Before the August 2006 primary election in Michigan, 180,000households were assigned either to a control group, or toreceive one of four mailings regarding the election.

Civic Duty mailing

$PHULFDQ� 3ROLWLFDO� 6FLHQFH�5HYLHZ� 9RO�� 1R��

$33(1',;� $��0$,/,1*6�

&LYLF�'XW\�PDLOLQJ�

��B,1� ,,� ,,�0� ,B;;;��)RU�PRUH� LQIRUPDWLRQ�� HPDLO�� HWRY#JUHEQHU�FRP�3UDFWLFDO� 3ROLWLFDO� &RQVXOWLQJ�3��2�� %R[� ��(DVW� /DQVLQJ�� 0,� ��

3565767'�8�6�� 3RVWDJH�

3$,'�/DQVLQJ�� 0O�

3HUPLW� ��

(&5/27� &��7+(�-21(6�)$0,/<��:,//,$06�5'�)/,17�0O��

'HDU� 5HJLVWHUHG� 9RWHU��

'2�<285�&,9,&�'87<�$1'�927(��

:K\� GR� VR�PDQ\� SHRSOH� IDLO�WR�YRWH"� :HYH� EHHQ� WDONLQJ� DERXW� WKLV�SUREOHP� IRU�

\HDUV�� EXW� LW�RQO\� VHHPV� WR�JHW�ZRUVH��

7KH� ZKROH� SRLQW� RI�GHPRFUDF\� LV�WKDW�FLWL]HQV� DUH� DFWLYH� SDUWLFLSDQWV� LQ�

JRYHUQPHQW�� WKDW�ZH� KDYH� D� YRLFH� LQ�JRYHUQPHQW�� <RXU� YRLFH� VWDUWV�ZLWK� \RXU�YRWH�� 2Q� $XJXVW� �� UHPHPEHU� \RXU� ULJKWV�DQG� UHVSRQVLELOLWLHV� DV� D� FLWL]HQ��5HPHPEHU� WR�YRWH��

'2�<285�&,9,&�'87<�"� 927(��

“Hawthorne effect” mailing

6RFLDO� 3UHVVXUH� DQG�9RWHU�7XUQRXW� )HEUXDU\� ��

+DZWKRUQH� PDLOLQJ�

�� )RU�PRUH� LQIRUPDWLRQ�� HPDLO�� HWRY#JUHEQHU�FRP�3UDFWLFDO� 3ROLWLFDO� &RQVXOWLQJ�3��2�� %R[� ��(DVW� /DQVLQJ�� 0,� ��

3565767'�8�6�� 3RVWDJH�

3$,'�/DQVLQJ�� 0O�

3HUPLW� ��

(&5/27� &��7+(�60,7+�)$0,/<��3$5.�/$1(�)/,17�0O��

<28�$5(�%(,1*�678',('��

:K\� GR� VR�PDQ\� SHRSOH� IDLO�WR�YRWH"� :HYH� EHHQ� WDONLQJ�DERXW� WKLV�SUREOHP� IRU�

\HDUV�� EXW� LW�RQO\� VHHPV� WR�JHW�ZRUVH��

7KLV� \HDU�� ZHUH� WU\LQJ�WR�ILJXUH�RXW�ZK\� SHRSOH� GR� RU�GR� QRW�YRWH�� :HOO� EH�

VWXG\LQJ� YRWHU� WXUQRXW� LQ�WKH�$XJXVW� �� SULPDU\� HOHFWLRQ��

2XU� DQDO\VLV� ZLOO� EH� EDVHG� RQ� SXEOLF� UHFRUGV�� VR� \RX�ZLOO� QRW�EH� FRQWDFWHG�

DJDLQ� RU�GLVWXUEHG� LQ�DQ\�ZD\�� $Q\WKLQJ� ZH� OHDUQ�DERXW� \RXU� YRWLQJ� RU� QRW�

YRWLQJ�ZLOO� UHPDLQ� FRQILGHQWLDO� DQG� ZLOO� QRW�EH� GLVFORVHG� WR�DQ\RQH� HOVH��

'2�<285�&,9,&�'87<�"927(��

��

Self (own-record) mailing

$PHULFDQ� 3ROLWLFDO� 6FLHQFH�5HYLHZ� 9RO�� 1R��

6HOI�PDLOLQJ�

�� 0O� ,,� ��,� ,,�)RU�PRUH� LQIRUPDWLRQ�� HPDLO�� HWRY#JUHEQHU�FRP�3UDFWLFDO� 3ROLWLFDO� &RQVXOWLQJ�3��2�� %R[� ��(DVW� /DQVLQJ�� 0,� ��

3565767'�8�6�� 3RVWDJH�3$,'�

/DQVLQJ�� 0O�3HUPLW� ��

(&5/27� &��7+(�:$<1(� )$0,/<��2$.�67�)/,17�0O��

:+2�927(6� ,6�38%/,&� ,1)250$7,21��

:K\� GR� VR�PDQ\� SHRSOH� IDLO�WR�YRWH"� :HYH� EHHQ� WDONLQJ� DERXW� WKH�SUREOHP�IRU�\HDUV�� EXW� LW�RQO\� VHHPV� WR�JHW�ZRUVH��

7KLV� \HDU��ZHUH� WDNLQJ� D� GLIIHUHQW�DSSURDFK�� :H� DUH� UHPLQGLQJ� SHRSOH�WKDW�ZKR� YRWHV� LV�D�PDWWHU� RI�SXEOLF� UHFRUG��

7KH� FKDUW� VKRZV� \RXU� QDPH� IURP�WKH� OLVW�RI� UHJLVWHUHG� YRWHUV�� VKRZLQJ�SDVW� YRWHV�� DV� ZHOO� DV� DQ� HPSW\� ER[�ZKLFK� ZH� ZLOO� ILOO�LQ�WR�VKRZ� ZKHWKHU�

\RX� YRWH� LQ�WKH�$XJXVW� �� SULPDU\� HOHFWLRQ�� :H� LQWHQG� WR�PDLO� \RX� DQ�

XSGDWHG� FKDUW�ZKHQ� ZH� KDYH� WKDW� LQIRUPDWLRQ��

:H� ZLOO� OHDYH� WKH�ER[� EODQN� LI�\RX� GR� QRW�YRWH��

'2�<285�&,9,&�'87<"927(��

2$.� 67� $XJ�� 1RY�� $XJ�� 52%(57�:$<1(� 9RWHG� B�� /$85$�:$<1(� 9RWHG� 9RWHG�

��

Neighbors/Social Pressure mailing

6RFLDO� 3UHVVXUH� DQG� 9RWHU� 7XUQRXW� )HEUXDU\� ��

1HLJKERUV�PDLOLQJ�

�� )RU�PRUH� LQIRUPDWLRQ�� HPDLO�� HWRY#JUHEQHU�FRP�3UDFWLFDO� 3ROLWLFDO� &RQVXOWLQJ�3��2�� %R[� ��(DVW� /DQVLQJ�� 0,� ��

(&5/27� &��7+(�-$&.621�)$0,/<��0$3/(�'5�)/,17�0O��

3565767'�8�6�� 3RVODJH�3$,'�

/DQVLQJ�� 0O�3HUPLW� ��

:+$7� ,)�<285�1(,*+%256�.1(:�:+(7+(5�<28�927('"�

:K\� GR� VR�PDQ\� SHRSOH� IDLO�WR�YRWH"� :HYH� EHHQ� WDONLQJ�DERXW� WKH�SUREOHP� IRU�

\HDUV�� EXW� LW�RQO\�VHHPV� WR�JHW�ZRUVH�� 7KLV� \HDU��ZHUH� WDNLQJ�D� QHZ�DSSURDFK��:HUH� VHQGLQJ� WKLV�PDLOLQJ� WR�\RX� DQG� \RXU�QHLJKERUV� WR�SXEOLFL]H�ZKR� GRHV� DQG�GRHV� QRW�YRWH��

7KH� FKDUW�VKRZV� WKH�QDPHV� RI�VRPH� RI�\RXU�QHLJKERUV�� VKRZLQJ�ZKLFK� KDYH� YRWHG� LQ�WKH�SDVW�� $IWHU� WKH�$XJXVW� ��HOHFWLRQ��ZH� LQWHQG�WR�PDLO� DQ� XSGDWHG� FKDUW�� <RX�

DQG� \RXU�QHLJKERUV�ZLOO�DOO�NQRZ�ZKR� YRWHG� DQG�ZKR� GLG�QRW��

'2�<285�&,9,&�'87<�"927(��

0$3/(�'5� $XJ�� 1RY�� $XJ�� -26(3+�-$0(6�60,7+� 9RWHG� 9RWHG�B�� -(11,)(5�.$<� 60,7+� 9RWHG�B�� 5,&+$5'� %� -$&.621� 9RWHG� B��.$7+<�0$5,(� -$&.621� 9RWHG�B�� %5,$1�-26(3+� -$&.621� 9RWHG�B�� -(11,)(5�.$<� 7+203621� 9RWHG�B�� %2%� 5� 7+203621� 9RWHG� B�� %,//6� 60,7+� B��:,//,$0�/8.(�&$63(5�9RWHG�B�� -(11,)(5�68(�&$63(5�9RWHG� B�� 0$5,$� 6� -2+1621� 9RWHG� 9RWHG�B��720�-$&.�-2+1621� 9RWHG� 9RWHG�B��5,&+$5'�720�-2+1621� 9RWHG�B��526(0$5<6� 68(� 9RWHG� B�� .$7+5<1� /�68(� 9RWHG� B�� +2:$5'�%(1�68(� 9RWHG� B�� 1$7+$1� &+$'� %(5*� 9RWHG� B�� &$55,(� $11� %(5*� 9RWHG� B�� ($5/�-2(/� 60,7+� B�� '(%25$+� .$<� :$<1(� 9RWHG�B�� -2(/5� :$<1(� 9RWHG� B�

��

Estimated Treatment Effects

6RFLDO� 3UHVVXUH� DQG� 9RWHU� 7XUQRXW� )HEUXDU\� ��

7$%/(� �� (IIHFWV� RI�)RXU� 0DLO� 7UHDWPHQWV� RQ� 9RWHU� 7XUQRXW� LQ�WKH�$XJXVW� �� 3ULPDU\�

(OHFWLRQB�([SHULPHQWDO�*URXS�

B&RQWUROB&LYLF� 'XW\B+DZWKRUQHB6HOIB1HLJKERUV�3HUFHQWDJH� 9RWLQJ� �� 1�RI� ,QGLYLGXDOV� ��

SURYLGHV� D�EDVHOLQH� IRU�FRPSDULVRQ� ZLWK� WKH�RWKHU� WUHDW�PHQWV� EHFDXVH� LW�GRHV� OLWWOH�EHVLGHV� HPSKDVL]H� FLYLF�GXW\��+RXVHKROGV� UHFHLYLQJ� WKLV�W\SH�RI�PDLOLQJ� ZHUH�WROG��5HPHPEHU� \RXU� ULJKWV�DQG� UHVSRQVLELOLWLHV� DV� D�FLWL]HQ�� 5HPHPEHU� WR� YRWH��

7KH� VHFRQG�PDLOLQJ� DGGV� WR�WKLV�FLYLF�GXW\�EDVHOLQH� D�PLOG� IRUP�RI� VRFLDO� SUHVVXUH�� LQ�WKLV�FDVH�� REVHUYDWLRQ�E\�UHVHDUFKHUV��+RXVHKROGV� UHFHLYLQJ� WKH��+DZWKRUQH�HIIHFW��PDLOLQJ� ZHUH� WROG��<28� $5(� %(,1*� 678'�,('�� DQG� LQIRUPHG� WKDW�WKHLU�YRWLQJ�EHKDYLRU� ZRXOG�EH� H[DPLQHG� E\�PHDQV� RI�SXEOLF� UHFRUGV��7KH� GHJUHH�RI� VRFLDO� SUHVVXUH� LQ�WKLV�PDLOLQJ� ZDV�� E\�GHVLJQ�� OLP�LWHG�E\� WKH�SURPLVH� WKDW�WKH�UHVHDUFKHUV� ZRXOG� QHLWKHU�FRQWDFW� WKH� VXEMHFW� QRU� GLVFORVH� ZKHWKHU� WKH� VXEMHFW�YRWHG��&RQVLVWHQW� ZLWK� WKH�QRWLRQ� RI�+DZWKRUQH� HIIHFWV��WKH�SXUSRVH� RI� WKLV�PDLOLQJ� ZDV� WR�WHVW�ZKHWKHU� PHUH�REVHUYDWLRQ� LQIOXHQFHV� YRWHU� WXUQRXW��

7KH� �6HOI��PDLOLQJ� H[HUWV�PRUH� VRFLDO� SUHVVXUH� E\� LQ�IRUPLQJ�UHFLSLHQWV� WKDW�ZKR� YRWHV� LV�SXEOLF� LQIRUPDWLRQ�DQG� OLVWLQJ�WKH�UHFHQW�YRWLQJ� UHFRUG� RI�HDFK� UHJLVWHUHG�YRWHU� LQ�WKH�KRXVHKROG�� 7KH� ZRUG� �9RWHG�� DSSHDUV� E\�QDPHV� RI�UHJLVWHUHG� YRWHUV� LQ�WKH�KRXVHKROG� ZKR� DFWX�DOO\�YRWHG� LQ�WKH�� SULPDU\� HOHFWLRQ� DQG� WKH��JHQHUDO� HOHFWLRQ�� DQG� D� EODQN� VSDFH� DSSHDUV� LI�WKH\�GLG� QRW� YRWH��7KH� SXUSRVH� RI� WKLV�PDLOLQJ� ZDV� WR�WHVW�ZKHWKHU� SHRSOH� DUH�PRUH� OLNHO\�WR�YRWH� LI�RWKHUV�ZLWKLQ�WKHLU�RZQ� KRXVHKROG� DUH� DEOH� WR�REVHUYH� WKHLU�YRWLQJ�EHKDYLRU�� 7KH� PDLOLQJ� LQIRUPHG� YRWHUV� WKDW�DIWHU� WKH�SULPDU\�HOHFWLRQ� �ZH� LQWHQG�WR�PDLO� DQ�XSGDWHG� FKDUW��ILOOLQJ� LQ�ZKHWKHU� WKH� UHFLSLHQW� YRWHG� LQ� WKH�$XJXVW�� SULPDU\��7KH� �6HOI�� FRQGLWLRQ� WKXV�FRPELQHV� WKH�H[WHUQDO�PRQLWRULQJ� RI� WKH�+DZWKRUQH� FRQGLWLRQ� ZLWK�DFWXDO� GLVFORVXUH� RI�YRWLQJ� UHFRUGV��7KH� IRXUWK�PDLOLQJ�� 1HLJKERUV�� UDWFKHWV� XS� WKH�

VRFLDO� SUHVVXUH� HYHQ� IXUWKHU�E\� OLVWLQJ�QRW� RQO\� WKH�KRXVHKROGV� YRWLQJ� UHFRUGV� EXW� DOVR� WKH�YRWLQJ� UHFRUGV�RI� WKRVH� OLYLQJ�QHDUE\�� /LNH� WKH� �6HOI�� PDLOLQJ�� WKH��1HLJKERUV�� PDLOLQJ� LQIRUPHG� WKH�UHFLSLHQW� WKDW��ZH�LQWHQG� WR�PDLO� DQ� XSGDWHG� FKDUW�� DIWHU� WKH�SULPDU\��VKRZLQJ�ZKHWKHU� PHPEHUV� RI� WKH�KRXVHKROG� YRWHG� LQ�WKH�SULPDU\� DQG�ZKR� DPRQJ� WKHLU�QHLJKERUV� KDG� DF�WXDOO\� YRWHG� LQ� WKH�SULPDU\��7KH� LPSOLFDWLRQ� LV� WKDW�PHPEHUV� RI� WKH�KRXVHKROG� ZRXOG� NQRZ� WKHLU�QHLJK�ERUV� YRWLQJ� UHFRUGV�� DQG� WKHLU�QHLJKERUV� ZRXOG� NQRZ�WKHLUV��%\� WKUHDWHQLQJ� WR��SXEOLFL]H� ZKR� GRHV� DQG�GRHV�QRW�YRWH�� WKLV�WUHDWPHQW� LV�GHVLJQHG� WR�DSSO\�PD[LPDO�VRFLDO� SUHVVXUH��

5(68/76�

)ROORZLQJ� WKH�$XJXVW� �� HOHFWLRQ� ZH� REWDLQHG�WXUQRXW�GDWD� IURP�SXEOLF� UHFRUGV��7DEOH� ��UHSRUWV�EDVLF�

WXUQRXW�UDWHV�IRU�HDFK� RI�WKH�H[SHULPHQWDO� JURXSV��7KH�FRQWURO�JURXS� LQ�RXU� VWXG\�YRWHG� DW�D�UDWH�RI�� %\�FRPSDULVRQ�� WKH��&LYLF�'XW\�� WUHDWPHQW� JURXS� YRWHG�DW�D�UDWH�RI�� VXJJHVWLQJ� WKDW�DSSHDOV� WR�FLYLF�GXW\�DORQH� UDLVH� WXUQRXW�E\� ��SHUFHQWDJH� SRLQWV��$GGLQJ�VRFLDO� SUHVVXUH� LQ�WKH�IRUP�RI�+DZWKRUQH� HIIHFWV�UDLVHV�WXUQRXW� WR� �� ZKLFK� LPSOLHV� D� �� SHUFHQWDJH�SRLQW� JDLQ� RYHU� WKH�FRQWURO� JURXS��7KH� HIIHFW�RI�VKRZ�

LQJ�KRXVHKROGV� WKHLU�RZQ� YRWLQJ� UHFRUGV� LV�GUDPDWLF��7XUQRXW� FOLPEV� WR�� D� �� SHUFHQWDJH�SRLQW� LQ�FUHDVH� RYHU� WKH� FRQWURO� JURXS�� (YHQ� PRUH� GUDPDWLF� LV�

WKH�HIIHFW�RI� VKRZLQJ� KRXVHKROGV� ERWK� WKHLU�RZQ� YRW�

LQJ�UHFRUGV� DQG� WKH�YRWLQJ� UHFRUGV� RI� WKHLU�QHLJKERUV��7XUQRXW� LQ�WKLV�H[SHULPHQWDO� JURXS� LV�� ZKLFK�LPSOLHV� D� UHPDUNDEOH� �� SHUFHQWDJH�SRLQW� WUHDWPHQW�HIIHFW��

,W� LV� LPSRUWDQW� WR� XQGHUVFRUH� WKH�PDJQLWXGH� RI�WKHVH� HIIHFWV��7KH� �� SHUFHQWDJH�SRLQW� HIIHFW� LV�QRW�

RQO\� ELJJHU� WKDQ� DQ\�PDLO� HIIHFW� JDXJHG� E\� D� UDQ�GRPL]HG� H[SHULPHQW�� LW�H[FHHGV� WKH� HIIHFW� RI� OLYH�SKRQH� FDOOV� �$UFHQHDX[�� *HUEHU�� DQG� *UHHQ� ��1LFNHUVRQ� ��E�� DQG� ULYDOV� WKH� HIIHFW� RI� IDFH�WR�IDFH� FRQWDFW� ZLWK� FDQYDVVHUV� FRQGXFWLQJ� JHW�RXW�WKH�YRWH� FDPSDLJQV� �$UFHQHDX[� ��*HUEHU� DQG�*UHHQ�� *HUEHU�� *UHHQ�� DQG�*UHHQ� �� (YHQ� DOORZ�LQJ�IRU�WKH�IDFW� WKDW�RXU� H[SHULPHQW� IRFXVHG� RQ� UHJ�LVWHUHG�YRWHUV�� UDWKHU� WKDQ�YRWLQJ�HOLJLEOH� FLWL]HQV�� WKH�HIIHFW�RI� WKH�1HLJKERUV� WUHDWPHQW� LV� LPSUHVVLYH��$Q�� SHUFHQWDJH�SRLQW� LQFUHDVH� LQ� WXUQRXW� DPRQJ� UHJ�LVWHUHG� YRWHUV� LQ� D� VWDWH� ZKHUH� UHJLVWHUHG� YRWHUV� FRP�

SULVH�� RI�YRWLQJ�HOLJLEOH� FLWL]HQV� WUDQVODWHV� LQWR�D��SHUFHQWDJH�SRLQW� LQFUHDVH� LQ� WKH� RYHUDOO� WXUQRXW� UDWH��

%\� FRPSDULVRQ�� SROLF\� LQWHUYHQWLRQV� VXFK� DV�(OHFWLRQ�'D\� UHJLVWUDWLRQ� RU�YRWH�E\�PDLO�� ZKLFK� VHHN� WR�UDLVH�WXUQRXW�E\� ORZHULQJ� WKH�FRVWV�RI�YRWLQJ�� DUH� WKRXJKW� WR�KDYH� HIIHFWV�RQ� WKH�RUGHU�RI��SHUFHQWDJH�SRLQWV� RU� OHVV��.QDFN� ��

,Q� WHUPV�RI� VKHHU� FRVW� HIILFLHQF\��PDLOLQJV� WKDW�H[�HUW� VRFLDO� SUHVVXUH� IDU� RXWVWULS� GRRU�WR�GRRU� FDQYDVV�

LQJ��7KH� SRZGHU� EOXH�PDLOLQJV� XVHG� KHUH�ZHUH� SULQWHG�RQ� RQH� VLGH� DQG� FRVW� �� FHQWV� DSLHFH� WR�SULQW� DQG�PDLO�� 7UHDWLQJ� HDFK� H[SHULPHQWDO� JURXS� WKHUHIRUH� FRVW�

DSSUR[LPDWHO\� �� 7KH� �6HOI��PDLOLQJ� JHQHUDWHG�� YRWHV� DW� D� UDWH� RI� �� SHU� YRWH�� 7KH� �1HLJK�ERUV��PDLOLQJ� JHQHUDWHG� �� YRWHV� DW�� SHU� YRWH��%\� FRPSDULVRQ�� D� W\SLFDO� GRRU�WR�GRRU� FDQYDVVLQJ� FDP�

SDLJQ� SURGXFHV� YRWHV� DW�D�UDWH�RI�URXJKO\��SHU�YRWH��ZKLOH� SKRQH� EDQNV� WHQG�WR�FRPH� LQ�DW��RU�PRUH� SHU�YRWH� �*UHHQ� DQG�*HUEHU� ��7KH� DQDO\VLV� WKXV� IDU� KDV� LJQRUHG� WKH� LVVXH� RI�

VDPSOLQJ� YDULDELOLW\�� 7KH� PDLQ� FRPSOLFDWLRQ� DVVRFL�DWHG� ZLWK� LQGLYLGXDO�OHYHO� DQDO\VLV� RI�GDWD� WKDW�ZHUH�

��

Some points about the analysis

The analysis can be extremely simple: a difference of meansmay be just the right tool.

This is design-based inference:

I Confounding is controlled through ex-ante research designchoices—not through ex-post statistical adjustment.

I This simple analysis rests on a model that is often, though notalways, quite credible.

This contrasts with many conventional model-basedapproaches, which instead rely on regression modeling toapproximate an experimental ideal

I “The power of multiple regression analysis is that it allows usto do in non-experimental environments what natural scientistsare able to do in a controlled laboratory setting: keep otherfactors fixed” (Wooldridge 2009: 77).

The analysis can be extremely simple: a difference of meansmay be just the right tool.This is design-based inference:

Strengths and limitations of strong research design

Strong designs can improve causal inferences in diversesubstantive contexts.

I The statistics can be simple, transparent, and credible.

Yet, they also have important limitations

I External validity issues; also, interventions may or may not besubstantively relevant.

I In practice, the analysis may be more or less design-based.

How best to maximize the promise—and minimize thepitfalls—is our focus.

Whatever kind of research you do, considering these issuescan help you think more clearly about research design andcausal inference

Yet, they also have important limitationsI External validity issues; also, interventions may or may not be

substantively relevant.

substantively relevant.I In practice, the analysis may be more or less design-based.

Defining and Measuring Causality

Three types questions arise in philosophical discussions ofcausality (Brady):

1. How do people understand causality when they use theconcept? (Psychological/linguistic)

2. What is causality? (Metaphysical/ontological)

3. How do we discover when causality is operative?(Epistemological/Inferential)

1. Counterfactuals

A counterfactual statement contains a false premise, and anassertion of what would have occurred had the premise beentrue:

I If an economic stimulus plan had not been adopted, then X . . .(where X might be “the economy would not have recovered” or“the economy would be in worse shape today”).

For any cause C, the causal effect of C is the differencebetween what would happen in two states of the world: e.g.,one in which C is present and one in which C is(counterfactually) absent.Counterfactuals play a critical role in causalinference—though they aren’t always sufficient:

I “If the storm had not occurred, the mercury in the barometerwould not have fallen” is a valid counterfactual statement, yetchanges in air pressure, not storms, are the cause of fallingmercury in barometers.

1. Counterfactuals

For any cause C, the causal effect of C is the differencebetween what would happen in two states of the world: e.g.,one in which C is present and one in which C is(counterfactually) absent.

Counterfactuals play a critical role in causalinference—though they aren’t always sufficient:

1. Counterfactuals

2. Manipulation

Causation as forced movement (Lakoff)

I E.g., children learn about causation by dropping a fork

When combined with counterfactuals, manipulation provides astrong criterion:

I Does playing basketball make children grow tall? No.Intuitively, that’s because the following statement doesn’t makesense:

I If we had intervened to make the children play basketball, theywould have grown tall.

(Here is also a place where mechanistic understandings ofcausality come into play).

2. Manipulation

Causation as forced movement (Lakoff)I E.g., children learn about causation by dropping a fork

2. Manipulation

Potential Outcomes

In statistics, an idea that combines counterfactuals andmanipulation (Neyman; Rubin; Holland)

Imagine, e.g., an experiment with two treatment conditions,say, a treatment and a control group

I The potential outcome under treatment Yi(1) is the outcomesome unit i would experience if assigned to treatment.

I The potential outcome under control Yi(0) is the outcome iwould experience if assigned to control.

The unit causal effect is the difference between these twooutcomes:

Yi(1) − Yi(0)

This parameter is not directly observable, because we seeYi(1) or Yi(0) but not both. (The “fundamental problem ofcausal inference”—Holland).

Potential Outcomes

In statistics, an idea that combines counterfactuals andmanipulation (Neyman; Rubin; Holland)Imagine, e.g., an experiment with two treatment conditions,say, a treatment and a control group

Yi(1) − Yi(0)

Potential Outcomes

Yi(1) − Yi(0)

Potential Outcomes

Yi(1) − Yi(0)

Potential Outcomes

Yi(1) − Yi(0)

Potential Outcomes

Yi(1) − Yi(0)

Average Causal Effects

Attention often focuses instead on average causal effects,e.g., for some units i = 1, ...,N,

N∑i=1

[Yi(1) − Yi(0)].

This parameter is the difference between two counterfactuals:the average outcome if all units were assigned to treatment,minus the average if all units were assigned to control.

The Neyman model is a causal model: it stipulates how unitsrespond when they are assigned to treatment or control.

Such response schedules play a critical role in causalinference.

N∑i=1

[Yi(1) − Yi(0)].

N∑i=1

[Yi(1) − Yi(0)].

N∑i=1

[Yi(1) − Yi(0)].

A missing data problem

Notice that the fundamental problem of inference applies toaverage causal effects:

N∑i=1

[Yi(1) − Yi(0)].

If we assign all N units to treatment, we don’t see Yi(0) forany i. Similarly if all N units go to control.

Social science often relies on empirical comparisons—e.g.,between some set of units for whom we observe Yi(1) andanother set for whom we observe Yi(0).

One group serves as the counterfactual for the other—whichmay help us overcome this fundamental problem.

N∑i=1

[Yi(1) − Yi(0)].

N∑i=1

[Yi(1) − Yi(0)].

N∑i=1

[Yi(1) − Yi(0)].

N∑i=1

[Yi(1) − Yi(0)].

N∑i=1

[Yi(1) − Yi(0)].

Random assignment and the missing data problem

Randomization is one way to solve the missing data problem.

The logic of random sampling is key.

The question is how we can use sample data—statistics—toestimate parameters—like the average causal effect.

To understand the sampling process, we need a box model

A box model for experiments and natural experiments

… …

Treatment group Control group

Yi(1)€

Yi(0)Study group

The average causal effect

… …

Yi(1)€

Yi(0)Study group

[Yi(1) −i=1

∑ Yi(0)]The estimand:

Estimating the average causal effect

… …

Yi(1)€

Yi(0)Study group

[Yi(1) −i=1

∑ Yi(0)]The estimand:

An unbiased estimator:

[Yi |Ti =1i=1

∑ ] − 1N −m

[Yi |Ti = 0i=m+1

where is an indicator for treatment assignment. Under this model, a random subset of size m<N units is assigned to treatment. The units assigned to the treatment group are indexed from 1 to m.

The expected value, and definition of unbiasedness

With a simple random sample, the expected value for thesample mean equals the population mean.

I So the expected value of the mean of the Yi(1)s in theassigned-to-treatment group equals the average in the studygroup (the population).

I Same with the control group: the sample mean of the Yi(0)s inthe assigned-to-control group, on average, will equal the truepopulation mean.

In any given experiment, the control group mean may be toohigh or too low (as may the treatment group mean).

I But across infinitely many (hypothetical) replications of thesampling process, the average of the sample averages willequal the true average in the study group.

Similarly, the expected value of the difference of sampleaverages equals the average difference in the population.

I The difference-of-means estimator is unbiased for theaverage causal effect.

I The difference-of-means estimator is unbiased for theaverage causal effect. Lecture Notes, Week 1 22/ 41

Assumptions of the Neyman model

The Neyman urn model involves several assumptions:

I As-if Random: Units are sampled at random from the studygroup and assigned to treatment or control

I Non-Interference/SUTVA: Each unit’s outcome depends onlyon its treatment assignment (and not on the assignment ofother units) (Analogue to regression models: Unit i’s responsedepends on i’s covariate values and error term).

I Exclusion restriction: Treatment assignment only affectsoutcomes through treatment receipt

N.B.: As we will see, these are also assumptions of standardregression models, too

Important questions: how can the validity of theseassumptions be probed?

The Neyman urn model involves several assumptions:I As-if Random: Units are sampled at random from the study

group and assigned to treatment or control

I Non-Interference/SUTVA: Each unit’s outcome depends onlyon its treatment assignment (and not on the assignment ofother units) (Analogue to regression models: Unit i’s responsedepends on i’s covariate values and error term).

group and assigned to treatment or controlI Non-Interference/SUTVA: Each unit’s outcome depends only

on its treatment assignment (and not on the assignment ofother units) (Analogue to regression models: Unit i’s responsedepends on i’s covariate values and error term).

Box modelsStatistical InferenceInference under the potential outcomes model

A box model for rolling a die

Let’s talk a bit more about random variables and theirexpectations.

A six-sided fair die has an equal probability of landing1, 2, 3, 4, 5, or 6 each time it is rolled.A single roll of the die can thus be modeled as a draw atrandom from a box of tickets

1 2 3 4 5 6?

Let’s talk a bit more about random variables and theirexpectations.A six-sided fair die has an equal probability of landing1, 2, 3, 4, 5, or 6 each time it is rolled.

A single roll of the die can thus be modeled as a draw atrandom from a box of tickets

1 2 3 4 5 6?

Let’s talk a bit more about random variables and theirexpectations.A six-sided fair die has an equal probability of landing1, 2, 3, 4, 5, or 6 each time it is rolled.A single roll of the die can thus be modeled as a draw atrandom from a box of tickets

1 2 3 4 5 6?

Random variables and observed values

A ticket drawn at random from the box is a random variable.

I Definition: a random variable is a chance procedure forgenerating a number.

The value of a particular draw is an observed value (orrealization) of this random variable.

1 2 3 4 5 6 4 = X

A ticket drawn at random from the box is a random variable.I Definition: a random variable is a chance procedure for

generating a number.

The value of a particular draw is an observed value (orrealization) of this random variable.

1 2 3 4 5 6 4 = X

A ticket drawn at random from the box is a random variable.I Definition: a random variable is a chance procedure for

generating a number.The value of a particular draw is an observed value (orrealization) of this random variable.

1 2 3 4 5 6 4 = X

Another random variable

Suppose we discard all values of the die of 5 and 6.

Now we’ve got a new random variable:

1 2 3 4

Another random variable

Suppose we discard all values of the die of 5 and 6.Now we’ve got a new random variable:

1 2 3 4

Operations of random variables

An arithmetic operation on draws from boxes makes a newrandom variable.

For example, define a new random variable Z in terms of Xand Y:

Y + X = Z

1 2 3 4 5 6

1 2 3 4

Y + X = Z

An arithmetic operation on draws from boxes makes a newrandom variable.For example, define a new random variable Z in terms of Xand Y:

Y + X = Z

1 2 3 4 5 6

1 2 3 4

Y + X = Z

An arithmetic operation on draws from boxes makes a newrandom variable.For example, define a new random variable Z in terms of Xand Y:

Y + X = Z

1 2 3 4 5 6

1 2 3 4

Y + X = Z

Manipulating random variables

Other operations are fine, too.

1 2 3 4 5 6

1 2 3 4

3 Y X = W

Manipulating random variables

Other operations are fine, too.

1 2 3 4 5 6

1 2 3 4

3 Y X = W

Expectations

The expected value of a random variable is a number.

I If X is a random draw from a box of numbered tickets, thenE(X) is the average of the tickets in the box.

The observed value of the random variable will be somewherearound this number—sometimes too high, sometimes too low.Consider a previous example:

1. E(X) = ?2. E(Y) = ?3. E(2X + 3Y) = ?4. Does E(Z) = E(2X + 3Y) = 2E(X) + 3E(Y)?

1 2 3 4 5 6

1 2 3 4

Y + X = Z

Expectations

The expected value of a random variable is a number.I If X is a random draw from a box of numbered tickets, then

E(X) is the average of the tickets in the box.

The observed value of the random variable will be somewherearound this number—sometimes too high, sometimes too low.Consider a previous example:

1 2 3 4 5 6

1 2 3 4

Y + X = Z

Expectations

E(X) is the average of the tickets in the box.The observed value of the random variable will be somewherearound this number—sometimes too high, sometimes too low.

Consider a previous example:

1 2 3 4 5 6

1 2 3 4

Y + X = Z

Expectations

E(X) is the average of the tickets in the box.The observed value of the random variable will be somewherearound this number—sometimes too high, sometimes too low.Consider a previous example:

1 2 3 4 5 6

1 2 3 4

Y + X = Z

Independence and dependence

Suppose X and Y are random variables. Pretend you knowthe value of X . Do the chances of Y depend on that value? Ifso, X and Y are dependent. If not, they are independent.

Are X and Y dependent or independent?

2 2 2 4 2 9

4 2 4 9 4 4

Independence and dependence

Suppose X and Y are random variables. Pretend you knowthe value of X . Do the chances of Y depend on that value? Ifso, X and Y are dependent. If not, they are independent.Are X and Y dependent or independent?

2 2 2 4 2 9

4 2 4 9 4 4

Drawing with and without replacement

Suppose we make two draws from the box below.I If the draws are made with replacement, are they

independent?I If the draws are made without replacement, are they

independent?

1 2 3 4 5 6 ? ?

Conditional versus unconditional probabilities

Suppose we make two draws without replacement from thebox below.

I What is the chance that the second draw is 4?I What is the chance that the second draw is 4, given that the

first draw is 3?

1 2 3 4

Conditional versus unconditional probabilities

In answering such questions, it is helpful to map out allpossible outcomes of the two draws.

second draw1 2 3 4

first draw

1 n.a. 12 13 142 21 n.a. 23 243 31 32 n.a. 344 41 41 43 n.a.

In the table, “12” means that 1 is the observed value of the first drawand 2 is the observed value of the second draw.

PLSC 503_spring 2013_lecture 1

Education

Transcript of PLSC 503_spring 2013_lecture 1