PLSC 503_spring 2013_lecture 1

140
Causality Statistical inference PLSC 503: Quantitative Methods, Week 1 Thad Dunning Department of Political Science Yale University Lecture Notes Week 1: Causal Inference and Potential Outcomes Lecture Notes, Week 1 1/ 41

description

 

Transcript of PLSC 503_spring 2013_lecture 1

Page 1: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

PLSC 503: Quantitative Methods, Week 1

Thad DunningDepartment of Political Science

Yale University

Lecture Notes Week 1:Causal Inference and Potential Outcomes

Lecture Notes, Week 1 1/ 41

Page 2: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Introduction to 503

Social scientists use many methods, for many differentpurposes.

Quantitative analysis can play several roles:

I Description, e.g., conceptualization and measurementI Causal inference

The latter is perhaps the trickiest.This course introduces causal and statistical models forquantitative analysis, and places emphasis on the importanceof strong research design.

I It is important to master technique; it is even more important tounderstand core assumptions.

Lecture Notes, Week 1 2/ 41

Page 3: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Introduction to 503

Social scientists use many methods, for many differentpurposes.Quantitative analysis can play several roles:

I Description, e.g., conceptualization and measurementI Causal inference

The latter is perhaps the trickiest.This course introduces causal and statistical models forquantitative analysis, and places emphasis on the importanceof strong research design.

I It is important to master technique; it is even more important tounderstand core assumptions.

Lecture Notes, Week 1 2/ 41

Page 4: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Introduction to 503

Social scientists use many methods, for many differentpurposes.Quantitative analysis can play several roles:

I Description, e.g., conceptualization and measurement

I Causal inference

The latter is perhaps the trickiest.This course introduces causal and statistical models forquantitative analysis, and places emphasis on the importanceof strong research design.

I It is important to master technique; it is even more important tounderstand core assumptions.

Lecture Notes, Week 1 2/ 41

Page 5: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Introduction to 503

Social scientists use many methods, for many differentpurposes.Quantitative analysis can play several roles:

I Description, e.g., conceptualization and measurementI Causal inference

The latter is perhaps the trickiest.This course introduces causal and statistical models forquantitative analysis, and places emphasis on the importanceof strong research design.

I It is important to master technique; it is even more important tounderstand core assumptions.

Lecture Notes, Week 1 2/ 41

Page 6: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Introduction to 503

Social scientists use many methods, for many differentpurposes.Quantitative analysis can play several roles:

I Description, e.g., conceptualization and measurementI Causal inference

The latter is perhaps the trickiest.

This course introduces causal and statistical models forquantitative analysis, and places emphasis on the importanceof strong research design.

I It is important to master technique; it is even more important tounderstand core assumptions.

Lecture Notes, Week 1 2/ 41

Page 7: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Introduction to 503

Social scientists use many methods, for many differentpurposes.Quantitative analysis can play several roles:

I Description, e.g., conceptualization and measurementI Causal inference

The latter is perhaps the trickiest.This course introduces causal and statistical models forquantitative analysis, and places emphasis on the importanceof strong research design.

I It is important to master technique; it is even more important tounderstand core assumptions.

Lecture Notes, Week 1 2/ 41

Page 8: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Introduction to 503

Social scientists use many methods, for many differentpurposes.Quantitative analysis can play several roles:

I Description, e.g., conceptualization and measurementI Causal inference

The latter is perhaps the trickiest.This course introduces causal and statistical models forquantitative analysis, and places emphasis on the importanceof strong research design.

I It is important to master technique; it is even more important tounderstand core assumptions.

Lecture Notes, Week 1 2/ 41

Page 9: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Organization of the course

Causal and statistical inference under the potential outcomesmodel

Regression as a descriptive tool (bivariate and multivariate, inscalar and matrix form)

Regression models: causal and statistical inference

(Spring break)Various topics in the design and analysis of experimental andobservational data:

I E.g., difference-in-difference designs, matching, naturalexperiments

Lecture Notes, Week 1 3/ 41

Page 10: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Organization of the course

Causal and statistical inference under the potential outcomesmodel

Regression as a descriptive tool (bivariate and multivariate, inscalar and matrix form)

Regression models: causal and statistical inference

(Spring break)Various topics in the design and analysis of experimental andobservational data:

I E.g., difference-in-difference designs, matching, naturalexperiments

Lecture Notes, Week 1 3/ 41

Page 11: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Organization of the course

Causal and statistical inference under the potential outcomesmodel

Regression as a descriptive tool (bivariate and multivariate, inscalar and matrix form)

Regression models: causal and statistical inference

(Spring break)Various topics in the design and analysis of experimental andobservational data:

I E.g., difference-in-difference designs, matching, naturalexperiments

Lecture Notes, Week 1 3/ 41

Page 12: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Organization of the course

Causal and statistical inference under the potential outcomesmodel

Regression as a descriptive tool (bivariate and multivariate, inscalar and matrix form)

Regression models: causal and statistical inference

(Spring break)

Various topics in the design and analysis of experimental andobservational data:

I E.g., difference-in-difference designs, matching, naturalexperiments

Lecture Notes, Week 1 3/ 41

Page 13: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Organization of the course

Causal and statistical inference under the potential outcomesmodel

Regression as a descriptive tool (bivariate and multivariate, inscalar and matrix form)

Regression models: causal and statistical inference

(Spring break)Various topics in the design and analysis of experimental andobservational data:

I E.g., difference-in-difference designs, matching, naturalexperiments

Lecture Notes, Week 1 3/ 41

Page 14: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Organization of the course

Causal and statistical inference under the potential outcomesmodel

Regression as a descriptive tool (bivariate and multivariate, inscalar and matrix form)

Regression models: causal and statistical inference

(Spring break)Various topics in the design and analysis of experimental andobservational data:

I E.g., difference-in-difference designs, matching, naturalexperiments

Lecture Notes, Week 1 3/ 41

Page 15: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

One Design for Causal Inference: Does Voter PressureShape Turnout?

Why people fail to vote—and why they vote at all—are bothpuzzles for many social scientists.

I Maybe people have intrinsic incentives that lead them to vote(a sense of duty?). Or maybe they respond to peer pressure.

I However, testing hypotheses about what causes people tovote is challenging.

Gerber and Green have conducted many experimentalstudies to assess what factors influence turnout, e.g., phonecalls, door-to-door contacts–and social pressure.

I Before the August 2006 primary election in Michigan, 180,000households were assigned either to a control group, or toreceive one of four mailings regarding the election.

Lecture Notes, Week 1 4/ 41

Page 16: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

One Design for Causal Inference: Does Voter PressureShape Turnout?

Why people fail to vote—and why they vote at all—are bothpuzzles for many social scientists.

I Maybe people have intrinsic incentives that lead them to vote(a sense of duty?). Or maybe they respond to peer pressure.

I However, testing hypotheses about what causes people tovote is challenging.

Gerber and Green have conducted many experimentalstudies to assess what factors influence turnout, e.g., phonecalls, door-to-door contacts–and social pressure.

I Before the August 2006 primary election in Michigan, 180,000households were assigned either to a control group, or toreceive one of four mailings regarding the election.

Lecture Notes, Week 1 4/ 41

Page 17: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

One Design for Causal Inference: Does Voter PressureShape Turnout?

Why people fail to vote—and why they vote at all—are bothpuzzles for many social scientists.

I Maybe people have intrinsic incentives that lead them to vote(a sense of duty?). Or maybe they respond to peer pressure.

I However, testing hypotheses about what causes people tovote is challenging.

Gerber and Green have conducted many experimentalstudies to assess what factors influence turnout, e.g., phonecalls, door-to-door contacts–and social pressure.

I Before the August 2006 primary election in Michigan, 180,000households were assigned either to a control group, or toreceive one of four mailings regarding the election.

Lecture Notes, Week 1 4/ 41

Page 18: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

One Design for Causal Inference: Does Voter PressureShape Turnout?

Why people fail to vote—and why they vote at all—are bothpuzzles for many social scientists.

I Maybe people have intrinsic incentives that lead them to vote(a sense of duty?). Or maybe they respond to peer pressure.

I However, testing hypotheses about what causes people tovote is challenging.

Gerber and Green have conducted many experimentalstudies to assess what factors influence turnout, e.g., phonecalls, door-to-door contacts–and social pressure.

I Before the August 2006 primary election in Michigan, 180,000households were assigned either to a control group, or toreceive one of four mailings regarding the election.

Lecture Notes, Week 1 4/ 41

Page 19: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

One Design for Causal Inference: Does Voter PressureShape Turnout?

Why people fail to vote—and why they vote at all—are bothpuzzles for many social scientists.

I Maybe people have intrinsic incentives that lead them to vote(a sense of duty?). Or maybe they respond to peer pressure.

I However, testing hypotheses about what causes people tovote is challenging.

Gerber and Green have conducted many experimentalstudies to assess what factors influence turnout, e.g., phonecalls, door-to-door contacts–and social pressure.

I Before the August 2006 primary election in Michigan, 180,000households were assigned either to a control group, or toreceive one of four mailings regarding the election.

Lecture Notes, Week 1 4/ 41

Page 20: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Civic Duty mailing

$PHULFDQ� 3ROLWLFDO� 6FLHQFH�5HYLHZ� 9RO�� �����1R�� ��

$33(1',;� $��0$,/,1*6�

&LYLF�'XW\�PDLOLQJ�

�������B,1� ,,� ,,�0� ,B;;;��)RU�PRUH� LQIRUPDWLRQ�� ������ ���������HPDLO�� HWRY#JUHEQHU�FRP�3UDFWLFDO� 3ROLWLFDO� &RQVXOWLQJ�3��2�� %R[� �����(DVW� /DQVLQJ�� 0,� ������

3565767'�8�6�� 3RVWDJH�

3$,'�/DQVLQJ�� 0O�

3HUPLW� �����

(&5/27� &����7+(�-21(6�)$0,/<������:,//,$06�5'�)/,17�0O�������

'HDU� 5HJLVWHUHG� 9RWHU��

'2�<285�&,9,&�'87<�$1'�927(��

:K\� GR� VR�PDQ\� SHRSOH� IDLO�WR�YRWH"� :HYH� EHHQ� WDONLQJ� DERXW� WKLV�SUREOHP� IRU�

\HDUV�� EXW� LW�RQO\� VHHPV� WR�JHW�ZRUVH��

7KH� ZKROH� SRLQW� RI�GHPRFUDF\� LV�WKDW�FLWL]HQV� DUH� DFWLYH� SDUWLFLSDQWV� LQ�

JRYHUQPHQW�� WKDW�ZH� KDYH� D� YRLFH� LQ�JRYHUQPHQW�� <RXU� YRLFH� VWDUWV�ZLWK� \RXU�YRWH�� 2Q� $XJXVW� ��� UHPHPEHU� \RXU� ULJKWV�DQG� UHVSRQVLELOLWLHV� DV� D� FLWL]HQ��5HPHPEHU� WR�YRWH��

'2�<285�&,9,&�'87<�"� 927(��

Lecture Notes, Week 1 5/ 41

Page 21: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

“Hawthorne effect” mailing

6RFLDO� 3UHVVXUH� DQG�9RWHU�7XUQRXW� )HEUXDU\� �����

+DZWKRUQH� PDLOLQJ�

�������� ��)RU�PRUH� LQIRUPDWLRQ�� ������ ���������HPDLO�� HWRY#JUHEQHU�FRP�3UDFWLFDO� 3ROLWLFDO� &RQVXOWLQJ�3��2�� %R[� �����(DVW� /DQVLQJ�� 0,� ������

3565767'�8�6�� 3RVWDJH�

3$,'�/DQVLQJ�� 0O�

3HUPLW� �����

(&5/27� &����7+(�60,7+�)$0,/<������3$5.�/$1(�)/,17�0O�������

'HDU� 5HJLVWHUHG� 9RWHU��

<28�$5(�%(,1*�678',('��

:K\� GR� VR�PDQ\� SHRSOH� IDLO�WR�YRWH"� :HYH� EHHQ� WDONLQJ�DERXW� WKLV�SUREOHP� IRU�

\HDUV�� EXW� LW�RQO\� VHHPV� WR�JHW�ZRUVH��

7KLV� \HDU�� ZHUH� WU\LQJ�WR�ILJXUH�RXW�ZK\� SHRSOH� GR� RU�GR� QRW�YRWH�� :HOO� EH�

VWXG\LQJ� YRWHU� WXUQRXW� LQ�WKH�$XJXVW� �� SULPDU\� HOHFWLRQ��

2XU� DQDO\VLV� ZLOO� EH� EDVHG� RQ� SXEOLF� UHFRUGV�� VR� \RX�ZLOO� QRW�EH� FRQWDFWHG�

DJDLQ� RU�GLVWXUEHG� LQ�DQ\�ZD\�� $Q\WKLQJ� ZH� OHDUQ�DERXW� \RXU� YRWLQJ� RU� QRW�

YRWLQJ�ZLOO� UHPDLQ� FRQILGHQWLDO� DQG� ZLOO� QRW�EH� GLVFORVHG� WR�DQ\RQH� HOVH��

'2�<285�&,9,&�'87<�"927(��

���

Lecture Notes, Week 1 6/ 41

Page 22: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Self (own-record) mailing

$PHULFDQ� 3ROLWLFDO� 6FLHQFH�5HYLHZ� 9RO�� ����� 1R�� ��

6HOI�PDLOLQJ�

�������� 0O� ,,� ���,� ,,�)RU�PRUH� LQIRUPDWLRQ�� ������ ���������HPDLO�� HWRY#JUHEQHU�FRP�3UDFWLFDO� 3ROLWLFDO� &RQVXOWLQJ�3��2�� %R[� �����(DVW� /DQVLQJ�� 0,� ������

3565767'�8�6�� 3RVWDJH�3$,'�

/DQVLQJ�� 0O�3HUPLW� �����

(&5/27� &����7+(�:$<1(� )$0,/<������2$.�67�)/,17�0O�������

'HDU� 5HJLVWHUHG� 9RWHU��

:+2�927(6� ,6�38%/,&� ,1)250$7,21��

:K\� GR� VR�PDQ\� SHRSOH� IDLO�WR�YRWH"� :HYH� EHHQ� WDONLQJ� DERXW� WKH�SUREOHP�IRU�\HDUV�� EXW� LW�RQO\� VHHPV� WR�JHW�ZRUVH��

7KLV� \HDU��ZHUH� WDNLQJ� D� GLIIHUHQW�DSSURDFK�� :H� DUH� UHPLQGLQJ� SHRSOH�WKDW�ZKR� YRWHV� LV�D�PDWWHU� RI�SXEOLF� UHFRUG��

7KH� FKDUW� VKRZV� \RXU� QDPH� IURP�WKH� OLVW�RI� UHJLVWHUHG� YRWHUV�� VKRZLQJ�SDVW� YRWHV�� DV� ZHOO� DV� DQ� HPSW\� ER[�ZKLFK� ZH� ZLOO� ILOO�LQ�WR�VKRZ� ZKHWKHU�

\RX� YRWH� LQ�WKH�$XJXVW� �� SULPDU\� HOHFWLRQ�� :H� LQWHQG� WR�PDLO� \RX� DQ�

XSGDWHG� FKDUW�ZKHQ� ZH� KDYH� WKDW� LQIRUPDWLRQ��

:H� ZLOO� OHDYH� WKH�ER[� EODQN� LI�\RX� GR� QRW�YRWH��

'2�<285�&,9,&�'87<"927(��

2$.� 67� $XJ���� 1RY���� $XJ��������� 52%(57�:$<1(� 9RWHG� B������ /$85$�:$<1(� 9RWHG� 9RWHG�

���

Lecture Notes, Week 1 7/ 41

Page 23: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Neighbors/Social Pressure mailing

6RFLDO� 3UHVVXUH� DQG� 9RWHU� 7XUQRXW� )HEUXDU\� �����

1HLJKERUV�PDLOLQJ�

�������� ������������)RU�PRUH� LQIRUPDWLRQ�� ������ ���������HPDLO�� HWRY#JUHEQHU�FRP�3UDFWLFDO� 3ROLWLFDO� &RQVXOWLQJ�3��2�� %R[� �����(DVW� /DQVLQJ�� 0,� ������

(&5/27� &����7+(�-$&.621�)$0,/<������0$3/(�'5�)/,17�0O�������

3565767'�8�6�� 3RVODJH�3$,'�

/DQVLQJ�� 0O�3HUPLW� �����

'HDU� 5HJLVWHUHG� 9RWHU��

:+$7� ,)�<285�1(,*+%256�.1(:�:+(7+(5�<28�927('"�

:K\� GR� VR�PDQ\� SHRSOH� IDLO�WR�YRWH"� :HYH� EHHQ� WDONLQJ�DERXW� WKH�SUREOHP� IRU�

\HDUV�� EXW� LW�RQO\�VHHPV� WR�JHW�ZRUVH�� 7KLV� \HDU��ZHUH� WDNLQJ�D� QHZ�DSSURDFK��:HUH� VHQGLQJ� WKLV�PDLOLQJ� WR�\RX� DQG� \RXU�QHLJKERUV� WR�SXEOLFL]H�ZKR� GRHV� DQG�GRHV� QRW�YRWH��

7KH� FKDUW�VKRZV� WKH�QDPHV� RI�VRPH� RI�\RXU�QHLJKERUV�� VKRZLQJ�ZKLFK� KDYH� YRWHG� LQ�WKH�SDVW�� $IWHU� WKH�$XJXVW� ��HOHFWLRQ��ZH� LQWHQG�WR�PDLO� DQ� XSGDWHG� FKDUW�� <RX�

DQG� \RXU�QHLJKERUV�ZLOO�DOO�NQRZ�ZKR� YRWHG� DQG�ZKR� GLG�QRW��

'2�<285�&,9,&�'87<�"927(��

0$3/(�'5� $XJ��� 1RY���� $XJ��������� -26(3+�-$0(6�60,7+� 9RWHG� 9RWHG�B������ -(11,)(5�.$<� 60,7+� 9RWHG�B������ 5,&+$5'� %� -$&.621� 9RWHG� B������.$7+<�0$5,(� -$&.621� 9RWHG�B������ %5,$1�-26(3+� -$&.621� 9RWHG�B������ -(11,)(5�.$<� 7+203621� 9RWHG�B������ %2%� 5� 7+203621� 9RWHG� B������ %,//6� 60,7+� B������:,//,$0�/8.(�&$63(5�9RWHG�B������ -(11,)(5�68(�&$63(5�9RWHG� B������ 0$5,$� 6� -2+1621� 9RWHG� 9RWHG�B������720�-$&.�-2+1621� 9RWHG� 9RWHG�B������5,&+$5'�720�-2+1621� 9RWHG�B������526(0$5<6� 68(� 9RWHG� B������ .$7+5<1� /�68(� 9RWHG� B������ +2:$5'�%(1�68(� 9RWHG� B������ 1$7+$1� &+$'� %(5*� 9RWHG� B������ &$55,(� $11� %(5*� 9RWHG� B������ ($5/�-2(/� 60,7+� B������ '(%25$+� .$<� :$<1(� 9RWHG�B������ -2(/5� :$<1(� 9RWHG� B�

���

Lecture Notes, Week 1 8/ 41

Page 24: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Estimated Treatment Effects

6RFLDO� 3UHVVXUH� DQG� 9RWHU� 7XUQRXW� )HEUXDU\� �����

7$%/(� ��� (IIHFWV� RI�)RXU� 0DLO� 7UHDWPHQWV� RQ� 9RWHU� 7XUQRXW� LQ�WKH�$XJXVW� ����� 3ULPDU\�

(OHFWLRQB�([SHULPHQWDO�*URXS�

B&RQWUROB&LYLF� 'XW\B+DZWKRUQHB6HOIB1HLJKERUV�3HUFHQWDJH� 9RWLQJ� ������ ������ ������������ ������1�RI� ,QGLYLGXDOV� �������� ������� ������� ��������������

SURYLGHV� D�EDVHOLQH� IRU�FRPSDULVRQ� ZLWK� WKH�RWKHU� WUHDW�PHQWV� EHFDXVH� LW�GRHV� OLWWOH�EHVLGHV� HPSKDVL]H� FLYLF�GXW\��+RXVHKROGV� UHFHLYLQJ� WKLV�W\SH�RI�PDLOLQJ� ZHUH�WROG���5HPHPEHU� \RXU� ULJKWV�DQG� UHVSRQVLELOLWLHV� DV� D�FLWL]HQ�� 5HPHPEHU� WR� YRWH���

7KH� VHFRQG�PDLOLQJ� DGGV� WR�WKLV�FLYLF�GXW\�EDVHOLQH� D�PLOG� IRUP�RI� VRFLDO� SUHVVXUH�� LQ�WKLV�FDVH�� REVHUYDWLRQ�E\�UHVHDUFKHUV��+RXVHKROGV� UHFHLYLQJ� WKH��+DZWKRUQH�HIIHFW��PDLOLQJ� ZHUH� WROG��<28� $5(� %(,1*� 678'�,('��� DQG� LQIRUPHG� WKDW�WKHLU�YRWLQJ�EHKDYLRU� ZRXOG�EH� H[DPLQHG� E\�PHDQV� RI�SXEOLF� UHFRUGV��7KH� GHJUHH�RI� VRFLDO� SUHVVXUH� LQ�WKLV�PDLOLQJ� ZDV�� E\�GHVLJQ�� OLP�LWHG�E\� WKH�SURPLVH� WKDW�WKH�UHVHDUFKHUV� ZRXOG� QHLWKHU�FRQWDFW� WKH� VXEMHFW� QRU� GLVFORVH� ZKHWKHU� WKH� VXEMHFW�YRWHG��&RQVLVWHQW� ZLWK� WKH�QRWLRQ� RI�+DZWKRUQH� HIIHFWV��WKH�SXUSRVH� RI� WKLV�PDLOLQJ� ZDV� WR�WHVW�ZKHWKHU� PHUH�REVHUYDWLRQ� LQIOXHQFHV� YRWHU� WXUQRXW��

7KH� �6HOI��PDLOLQJ� H[HUWV�PRUH� VRFLDO� SUHVVXUH� E\� LQ�IRUPLQJ�UHFLSLHQWV� WKDW�ZKR� YRWHV� LV�SXEOLF� LQIRUPDWLRQ�DQG� OLVWLQJ�WKH�UHFHQW�YRWLQJ� UHFRUG� RI�HDFK� UHJLVWHUHG�YRWHU� LQ�WKH�KRXVHKROG�� 7KH� ZRUG� �9RWHG�� DSSHDUV� E\�QDPHV� RI�UHJLVWHUHG� YRWHUV� LQ�WKH�KRXVHKROG� ZKR� DFWX�DOO\�YRWHG� LQ�WKH������ SULPDU\� HOHFWLRQ� DQG� WKH������JHQHUDO� HOHFWLRQ�� DQG� D� EODQN� VSDFH� DSSHDUV� LI�WKH\�GLG� QRW� YRWH��7KH� SXUSRVH� RI� WKLV�PDLOLQJ� ZDV� WR�WHVW�ZKHWKHU� SHRSOH� DUH�PRUH� OLNHO\�WR�YRWH� LI�RWKHUV�ZLWKLQ�WKHLU�RZQ� KRXVHKROG� DUH� DEOH� WR�REVHUYH� WKHLU�YRWLQJ�EHKDYLRU�� 7KH� PDLOLQJ� LQIRUPHG� YRWHUV� WKDW�DIWHU� WKH�SULPDU\�HOHFWLRQ� �ZH� LQWHQG�WR�PDLO� DQ�XSGDWHG� FKDUW���ILOOLQJ� LQ�ZKHWKHU� WKH� UHFLSLHQW� YRWHG� LQ� WKH�$XJXVW������ SULPDU\��7KH� �6HOI�� FRQGLWLRQ� WKXV�FRPELQHV� WKH�H[WHUQDO�PRQLWRULQJ� RI� WKH�+DZWKRUQH� FRQGLWLRQ� ZLWK�DFWXDO� GLVFORVXUH� RI�YRWLQJ� UHFRUGV��7KH� IRXUWK�PDLOLQJ�� �1HLJKERUV��� UDWFKHWV� XS� WKH�

VRFLDO� SUHVVXUH� HYHQ� IXUWKHU�E\� OLVWLQJ�QRW� RQO\� WKH�KRXVHKROGV� YRWLQJ� UHFRUGV� EXW� DOVR� WKH�YRWLQJ� UHFRUGV�RI� WKRVH� OLYLQJ�QHDUE\�� /LNH� WKH� �6HOI�� PDLOLQJ�� WKH��1HLJKERUV�� PDLOLQJ� LQIRUPHG� WKH�UHFLSLHQW� WKDW��ZH�LQWHQG� WR�PDLO� DQ� XSGDWHG� FKDUW�� DIWHU� WKH�SULPDU\��VKRZLQJ�ZKHWKHU� PHPEHUV� RI� WKH�KRXVHKROG� YRWHG� LQ�WKH�SULPDU\� DQG�ZKR� DPRQJ� WKHLU�QHLJKERUV� KDG� DF�WXDOO\� YRWHG� LQ� WKH�SULPDU\��7KH� LPSOLFDWLRQ� LV� WKDW�PHPEHUV� RI� WKH�KRXVHKROG� ZRXOG� NQRZ� WKHLU�QHLJK�ERUV� YRWLQJ� UHFRUGV�� DQG� WKHLU�QHLJKERUV� ZRXOG� NQRZ�WKHLUV��%\� WKUHDWHQLQJ� WR��SXEOLFL]H� ZKR� GRHV� DQG�GRHV�QRW�YRWH��� WKLV�WUHDWPHQW� LV�GHVLJQHG� WR�DSSO\�PD[LPDO�VRFLDO� SUHVVXUH��

5(68/76�

)ROORZLQJ� WKH�$XJXVW� ����� HOHFWLRQ� ZH� REWDLQHG�WXUQRXW�GDWD� IURP�SXEOLF� UHFRUGV��7DEOH� ��UHSRUWV�EDVLF�

WXUQRXW�UDWHV�IRU�HDFK� RI�WKH�H[SHULPHQWDO� JURXSV��7KH�FRQWURO�JURXS� LQ�RXU� VWXG\�YRWHG� DW�D�UDWH�RI�������� %\�FRPSDULVRQ�� WKH��&LYLF�'XW\�� WUHDWPHQW� JURXS� YRWHG�DW�D�UDWH�RI�������� VXJJHVWLQJ� WKDW�DSSHDOV� WR�FLYLF�GXW\�DORQH� UDLVH� WXUQRXW�E\� ����SHUFHQWDJH� SRLQWV��$GGLQJ�VRFLDO� SUHVVXUH� LQ�WKH�IRUP�RI�+DZWKRUQH� HIIHFWV�UDLVHV�WXUQRXW� WR� ������� ZKLFK� LPSOLHV� D� ���� SHUFHQWDJH�SRLQW� JDLQ� RYHU� WKH�FRQWURO� JURXS��7KH� HIIHFW�RI�VKRZ�

LQJ�KRXVHKROGV� WKHLU�RZQ� YRWLQJ� UHFRUGV� LV�GUDPDWLF��7XUQRXW� FOLPEV� WR�������� D� ���� SHUFHQWDJH�SRLQW� LQ�FUHDVH� RYHU� WKH� FRQWURO� JURXS�� (YHQ� PRUH� GUDPDWLF� LV�

WKH�HIIHFW�RI� VKRZLQJ� KRXVHKROGV� ERWK� WKHLU�RZQ� YRW�

LQJ�UHFRUGV� DQG� WKH�YRWLQJ� UHFRUGV� RI� WKHLU�QHLJKERUV��7XUQRXW� LQ�WKLV�H[SHULPHQWDO� JURXS� LV�������� ZKLFK�LPSOLHV� D� UHPDUNDEOH� ���� SHUFHQWDJH�SRLQW� WUHDWPHQW�HIIHFW��

,W� LV� LPSRUWDQW� WR� XQGHUVFRUH� WKH�PDJQLWXGH� RI�WKHVH� HIIHFWV��7KH� ���� SHUFHQWDJH�SRLQW� HIIHFW� LV�QRW�

RQO\� ELJJHU� WKDQ� DQ\�PDLO� HIIHFW� JDXJHG� E\� D� UDQ�GRPL]HG� H[SHULPHQW�� LW�H[FHHGV� WKH� HIIHFW� RI� OLYH�SKRQH� FDOOV� �$UFHQHDX[�� *HUEHU�� DQG� *UHHQ� ������1LFNHUVRQ� ����E�� DQG� ULYDOV� WKH� HIIHFW� RI� IDFH�WR�IDFH� FRQWDFW� ZLWK� FDQYDVVHUV� FRQGXFWLQJ� JHW�RXW�WKH�YRWH� FDPSDLJQV� �$UFHQHDX[� ������*HUEHU� DQG�*UHHQ������� *HUEHU�� *UHHQ�� DQG�*UHHQ� ������� (YHQ� DOORZ�LQJ�IRU�WKH�IDFW� WKDW�RXU� H[SHULPHQW� IRFXVHG� RQ� UHJ�LVWHUHG�YRWHUV�� UDWKHU� WKDQ�YRWLQJ�HOLJLEOH� FLWL]HQV�� WKH�HIIHFW�RI� WKH�1HLJKERUV� WUHDWPHQW� LV� LPSUHVVLYH��$Q����� SHUFHQWDJH�SRLQW� LQFUHDVH� LQ� WXUQRXW� DPRQJ� UHJ�LVWHUHG� YRWHUV� LQ� D� VWDWH� ZKHUH� UHJLVWHUHG� YRWHUV� FRP�

SULVH����� RI�YRWLQJ�HOLJLEOH� FLWL]HQV� WUDQVODWHV� LQWR�D�����SHUFHQWDJH�SRLQW� LQFUHDVH� LQ� WKH� RYHUDOO� WXUQRXW� UDWH��

%\� FRPSDULVRQ�� SROLF\� LQWHUYHQWLRQV� VXFK� DV�(OHFWLRQ�'D\� UHJLVWUDWLRQ� RU�YRWH�E\�PDLO�� ZKLFK� VHHN� WR�UDLVH�WXUQRXW�E\� ORZHULQJ� WKH�FRVWV�RI�YRWLQJ�� DUH� WKRXJKW� WR�KDYH� HIIHFWV�RQ� WKH�RUGHU�RI���SHUFHQWDJH�SRLQWV� RU� OHVV��.QDFN� �������

,Q� WHUPV�RI� VKHHU� FRVW� HIILFLHQF\��PDLOLQJV� WKDW�H[�HUW� VRFLDO� SUHVVXUH� IDU� RXWVWULS� GRRU�WR�GRRU� FDQYDVV�

LQJ��7KH� SRZGHU� EOXH�PDLOLQJV� XVHG� KHUH�ZHUH� SULQWHG�RQ� RQH� VLGH� DQG� FRVW� ��� FHQWV� DSLHFH� WR�SULQW� DQG�PDLO�� 7UHDWLQJ� HDFK� H[SHULPHQWDO� JURXS� WKHUHIRUH� FRVW�

DSSUR[LPDWHO\� �������� 7KH� �6HOI��PDLOLQJ� JHQHUDWHG������� YRWHV� DW� D� UDWH� RI� ������ SHU� YRWH�� 7KH� �1HLJK�ERUV��PDLOLQJ� JHQHUDWHG� ������ YRWHV� DW������� SHU� YRWH��%\� FRPSDULVRQ�� D� W\SLFDO� GRRU�WR�GRRU� FDQYDVVLQJ� FDP�

SDLJQ� SURGXFHV� YRWHV� DW�D�UDWH�RI�URXJKO\�����SHU�YRWH��ZKLOH� SKRQH� EDQNV� WHQG�WR�FRPH� LQ�DW�����RU�PRUH� SHU�YRWH� �*UHHQ� DQG�*HUEHU� �������7KH� DQDO\VLV� WKXV� IDU� KDV� LJQRUHG� WKH� LVVXH� RI�

VDPSOLQJ� YDULDELOLW\�� 7KH� PDLQ� FRPSOLFDWLRQ� DVVRFL�DWHG� ZLWK� LQGLYLGXDO�OHYHO� DQDO\VLV� RI�GDWD� WKDW�ZHUH�

���

Lecture Notes, Week 1 9/ 41

Page 25: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Some points about the analysis

The analysis can be extremely simple: a difference of meansmay be just the right tool.

This is design-based inference:

I Confounding is controlled through ex-ante research designchoices—not through ex-post statistical adjustment.

I This simple analysis rests on a model that is often, though notalways, quite credible.

This contrasts with many conventional model-basedapproaches, which instead rely on regression modeling toapproximate an experimental ideal

I “The power of multiple regression analysis is that it allows usto do in non-experimental environments what natural scientistsare able to do in a controlled laboratory setting: keep otherfactors fixed” (Wooldridge 2009: 77).

Lecture Notes, Week 1 10/ 41

Page 26: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Some points about the analysis

The analysis can be extremely simple: a difference of meansmay be just the right tool.This is design-based inference:

I Confounding is controlled through ex-ante research designchoices—not through ex-post statistical adjustment.

I This simple analysis rests on a model that is often, though notalways, quite credible.

This contrasts with many conventional model-basedapproaches, which instead rely on regression modeling toapproximate an experimental ideal

I “The power of multiple regression analysis is that it allows usto do in non-experimental environments what natural scientistsare able to do in a controlled laboratory setting: keep otherfactors fixed” (Wooldridge 2009: 77).

Lecture Notes, Week 1 10/ 41

Page 27: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Some points about the analysis

The analysis can be extremely simple: a difference of meansmay be just the right tool.This is design-based inference:

I Confounding is controlled through ex-ante research designchoices—not through ex-post statistical adjustment.

I This simple analysis rests on a model that is often, though notalways, quite credible.

This contrasts with many conventional model-basedapproaches, which instead rely on regression modeling toapproximate an experimental ideal

I “The power of multiple regression analysis is that it allows usto do in non-experimental environments what natural scientistsare able to do in a controlled laboratory setting: keep otherfactors fixed” (Wooldridge 2009: 77).

Lecture Notes, Week 1 10/ 41

Page 28: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Some points about the analysis

The analysis can be extremely simple: a difference of meansmay be just the right tool.This is design-based inference:

I Confounding is controlled through ex-ante research designchoices—not through ex-post statistical adjustment.

I This simple analysis rests on a model that is often, though notalways, quite credible.

This contrasts with many conventional model-basedapproaches, which instead rely on regression modeling toapproximate an experimental ideal

I “The power of multiple regression analysis is that it allows usto do in non-experimental environments what natural scientistsare able to do in a controlled laboratory setting: keep otherfactors fixed” (Wooldridge 2009: 77).

Lecture Notes, Week 1 10/ 41

Page 29: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Some points about the analysis

The analysis can be extremely simple: a difference of meansmay be just the right tool.This is design-based inference:

I Confounding is controlled through ex-ante research designchoices—not through ex-post statistical adjustment.

I This simple analysis rests on a model that is often, though notalways, quite credible.

This contrasts with many conventional model-basedapproaches, which instead rely on regression modeling toapproximate an experimental ideal

I “The power of multiple regression analysis is that it allows usto do in non-experimental environments what natural scientistsare able to do in a controlled laboratory setting: keep otherfactors fixed” (Wooldridge 2009: 77).

Lecture Notes, Week 1 10/ 41

Page 30: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Some points about the analysis

The analysis can be extremely simple: a difference of meansmay be just the right tool.This is design-based inference:

I Confounding is controlled through ex-ante research designchoices—not through ex-post statistical adjustment.

I This simple analysis rests on a model that is often, though notalways, quite credible.

This contrasts with many conventional model-basedapproaches, which instead rely on regression modeling toapproximate an experimental ideal

I “The power of multiple regression analysis is that it allows usto do in non-experimental environments what natural scientistsare able to do in a controlled laboratory setting: keep otherfactors fixed” (Wooldridge 2009: 77).

Lecture Notes, Week 1 10/ 41

Page 31: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Strengths and limitations of strong research design

Strong designs can improve causal inferences in diversesubstantive contexts.

I The statistics can be simple, transparent, and credible.

Yet, they also have important limitations

I External validity issues; also, interventions may or may not besubstantively relevant.

I In practice, the analysis may be more or less design-based.

How best to maximize the promise—and minimize thepitfalls—is our focus.

Whatever kind of research you do, considering these issuescan help you think more clearly about research design andcausal inference

Lecture Notes, Week 1 11/ 41

Page 32: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Strengths and limitations of strong research design

Strong designs can improve causal inferences in diversesubstantive contexts.

I The statistics can be simple, transparent, and credible.

Yet, they also have important limitations

I External validity issues; also, interventions may or may not besubstantively relevant.

I In practice, the analysis may be more or less design-based.

How best to maximize the promise—and minimize thepitfalls—is our focus.

Whatever kind of research you do, considering these issuescan help you think more clearly about research design andcausal inference

Lecture Notes, Week 1 11/ 41

Page 33: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Strengths and limitations of strong research design

Strong designs can improve causal inferences in diversesubstantive contexts.

I The statistics can be simple, transparent, and credible.

Yet, they also have important limitations

I External validity issues; also, interventions may or may not besubstantively relevant.

I In practice, the analysis may be more or less design-based.

How best to maximize the promise—and minimize thepitfalls—is our focus.

Whatever kind of research you do, considering these issuescan help you think more clearly about research design andcausal inference

Lecture Notes, Week 1 11/ 41

Page 34: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Strengths and limitations of strong research design

Strong designs can improve causal inferences in diversesubstantive contexts.

I The statistics can be simple, transparent, and credible.

Yet, they also have important limitationsI External validity issues; also, interventions may or may not be

substantively relevant.

I In practice, the analysis may be more or less design-based.

How best to maximize the promise—and minimize thepitfalls—is our focus.

Whatever kind of research you do, considering these issuescan help you think more clearly about research design andcausal inference

Lecture Notes, Week 1 11/ 41

Page 35: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Strengths and limitations of strong research design

Strong designs can improve causal inferences in diversesubstantive contexts.

I The statistics can be simple, transparent, and credible.

Yet, they also have important limitationsI External validity issues; also, interventions may or may not be

substantively relevant.

I In practice, the analysis may be more or less design-based.

How best to maximize the promise—and minimize thepitfalls—is our focus.

Whatever kind of research you do, considering these issuescan help you think more clearly about research design andcausal inference

Lecture Notes, Week 1 11/ 41

Page 36: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Strengths and limitations of strong research design

Strong designs can improve causal inferences in diversesubstantive contexts.

I The statistics can be simple, transparent, and credible.

Yet, they also have important limitationsI External validity issues; also, interventions may or may not be

substantively relevant.I In practice, the analysis may be more or less design-based.

How best to maximize the promise—and minimize thepitfalls—is our focus.

Whatever kind of research you do, considering these issuescan help you think more clearly about research design andcausal inference

Lecture Notes, Week 1 11/ 41

Page 37: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Strengths and limitations of strong research design

Strong designs can improve causal inferences in diversesubstantive contexts.

I The statistics can be simple, transparent, and credible.

Yet, they also have important limitationsI External validity issues; also, interventions may or may not be

substantively relevant.I In practice, the analysis may be more or less design-based.

How best to maximize the promise—and minimize thepitfalls—is our focus.

Whatever kind of research you do, considering these issuescan help you think more clearly about research design andcausal inference

Lecture Notes, Week 1 11/ 41

Page 38: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Strengths and limitations of strong research design

Strong designs can improve causal inferences in diversesubstantive contexts.

I The statistics can be simple, transparent, and credible.

Yet, they also have important limitationsI External validity issues; also, interventions may or may not be

substantively relevant.I In practice, the analysis may be more or less design-based.

How best to maximize the promise—and minimize thepitfalls—is our focus.

Whatever kind of research you do, considering these issuescan help you think more clearly about research design andcausal inference

Lecture Notes, Week 1 11/ 41

Page 39: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Defining and Measuring Causality

Three types questions arise in philosophical discussions ofcausality (Brady):

1. How do people understand causality when they use theconcept? (Psychological/linguistic)

2. What is causality? (Metaphysical/ontological)

3. How do we discover when causality is operative?(Epistemological/Inferential)

Lecture Notes, Week 1 12/ 41

Page 40: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Defining and Measuring Causality

Three types questions arise in philosophical discussions ofcausality (Brady):

1. How do people understand causality when they use theconcept? (Psychological/linguistic)

2. What is causality? (Metaphysical/ontological)

3. How do we discover when causality is operative?(Epistemological/Inferential)

Lecture Notes, Week 1 12/ 41

Page 41: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Defining and Measuring Causality

Three types questions arise in philosophical discussions ofcausality (Brady):

1. How do people understand causality when they use theconcept? (Psychological/linguistic)

2. What is causality? (Metaphysical/ontological)

3. How do we discover when causality is operative?(Epistemological/Inferential)

Lecture Notes, Week 1 12/ 41

Page 42: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

1. Counterfactuals

A counterfactual statement contains a false premise, and anassertion of what would have occurred had the premise beentrue:

I If an economic stimulus plan had not been adopted, then X . . .(where X might be “the economy would not have recovered” or“the economy would be in worse shape today”).

For any cause C, the causal effect of C is the differencebetween what would happen in two states of the world: e.g.,one in which C is present and one in which C is(counterfactually) absent.Counterfactuals play a critical role in causalinference—though they aren’t always sufficient:

I “If the storm had not occurred, the mercury in the barometerwould not have fallen” is a valid counterfactual statement, yetchanges in air pressure, not storms, are the cause of fallingmercury in barometers.

Lecture Notes, Week 1 13/ 41

Page 43: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

1. Counterfactuals

A counterfactual statement contains a false premise, and anassertion of what would have occurred had the premise beentrue:

I If an economic stimulus plan had not been adopted, then X . . .(where X might be “the economy would not have recovered” or“the economy would be in worse shape today”).

For any cause C, the causal effect of C is the differencebetween what would happen in two states of the world: e.g.,one in which C is present and one in which C is(counterfactually) absent.Counterfactuals play a critical role in causalinference—though they aren’t always sufficient:

I “If the storm had not occurred, the mercury in the barometerwould not have fallen” is a valid counterfactual statement, yetchanges in air pressure, not storms, are the cause of fallingmercury in barometers.

Lecture Notes, Week 1 13/ 41

Page 44: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

1. Counterfactuals

A counterfactual statement contains a false premise, and anassertion of what would have occurred had the premise beentrue:

I If an economic stimulus plan had not been adopted, then X . . .(where X might be “the economy would not have recovered” or“the economy would be in worse shape today”).

For any cause C, the causal effect of C is the differencebetween what would happen in two states of the world: e.g.,one in which C is present and one in which C is(counterfactually) absent.

Counterfactuals play a critical role in causalinference—though they aren’t always sufficient:

I “If the storm had not occurred, the mercury in the barometerwould not have fallen” is a valid counterfactual statement, yetchanges in air pressure, not storms, are the cause of fallingmercury in barometers.

Lecture Notes, Week 1 13/ 41

Page 45: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

1. Counterfactuals

A counterfactual statement contains a false premise, and anassertion of what would have occurred had the premise beentrue:

I If an economic stimulus plan had not been adopted, then X . . .(where X might be “the economy would not have recovered” or“the economy would be in worse shape today”).

For any cause C, the causal effect of C is the differencebetween what would happen in two states of the world: e.g.,one in which C is present and one in which C is(counterfactually) absent.Counterfactuals play a critical role in causalinference—though they aren’t always sufficient:

I “If the storm had not occurred, the mercury in the barometerwould not have fallen” is a valid counterfactual statement, yetchanges in air pressure, not storms, are the cause of fallingmercury in barometers.

Lecture Notes, Week 1 13/ 41

Page 46: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

1. Counterfactuals

A counterfactual statement contains a false premise, and anassertion of what would have occurred had the premise beentrue:

I If an economic stimulus plan had not been adopted, then X . . .(where X might be “the economy would not have recovered” or“the economy would be in worse shape today”).

For any cause C, the causal effect of C is the differencebetween what would happen in two states of the world: e.g.,one in which C is present and one in which C is(counterfactually) absent.Counterfactuals play a critical role in causalinference—though they aren’t always sufficient:

I “If the storm had not occurred, the mercury in the barometerwould not have fallen” is a valid counterfactual statement, yetchanges in air pressure, not storms, are the cause of fallingmercury in barometers.

Lecture Notes, Week 1 13/ 41

Page 47: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

2. Manipulation

Causation as forced movement (Lakoff)

I E.g., children learn about causation by dropping a fork

When combined with counterfactuals, manipulation provides astrong criterion:

I Does playing basketball make children grow tall? No.Intuitively, that’s because the following statement doesn’t makesense:

I If we had intervened to make the children play basketball, theywould have grown tall.

(Here is also a place where mechanistic understandings ofcausality come into play).

Lecture Notes, Week 1 14/ 41

Page 48: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

2. Manipulation

Causation as forced movement (Lakoff)I E.g., children learn about causation by dropping a fork

When combined with counterfactuals, manipulation provides astrong criterion:

I Does playing basketball make children grow tall? No.Intuitively, that’s because the following statement doesn’t makesense:

I If we had intervened to make the children play basketball, theywould have grown tall.

(Here is also a place where mechanistic understandings ofcausality come into play).

Lecture Notes, Week 1 14/ 41

Page 49: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

2. Manipulation

Causation as forced movement (Lakoff)I E.g., children learn about causation by dropping a fork

When combined with counterfactuals, manipulation provides astrong criterion:

I Does playing basketball make children grow tall? No.Intuitively, that’s because the following statement doesn’t makesense:

I If we had intervened to make the children play basketball, theywould have grown tall.

(Here is also a place where mechanistic understandings ofcausality come into play).

Lecture Notes, Week 1 14/ 41

Page 50: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

2. Manipulation

Causation as forced movement (Lakoff)I E.g., children learn about causation by dropping a fork

When combined with counterfactuals, manipulation provides astrong criterion:

I Does playing basketball make children grow tall? No.Intuitively, that’s because the following statement doesn’t makesense:

I If we had intervened to make the children play basketball, theywould have grown tall.

(Here is also a place where mechanistic understandings ofcausality come into play).

Lecture Notes, Week 1 14/ 41

Page 51: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

2. Manipulation

Causation as forced movement (Lakoff)I E.g., children learn about causation by dropping a fork

When combined with counterfactuals, manipulation provides astrong criterion:

I Does playing basketball make children grow tall? No.Intuitively, that’s because the following statement doesn’t makesense:

I If we had intervened to make the children play basketball, theywould have grown tall.

(Here is also a place where mechanistic understandings ofcausality come into play).

Lecture Notes, Week 1 14/ 41

Page 52: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

2. Manipulation

Causation as forced movement (Lakoff)I E.g., children learn about causation by dropping a fork

When combined with counterfactuals, manipulation provides astrong criterion:

I Does playing basketball make children grow tall? No.Intuitively, that’s because the following statement doesn’t makesense:

I If we had intervened to make the children play basketball, theywould have grown tall.

(Here is also a place where mechanistic understandings ofcausality come into play).

Lecture Notes, Week 1 14/ 41

Page 53: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Potential Outcomes

In statistics, an idea that combines counterfactuals andmanipulation (Neyman; Rubin; Holland)

Imagine, e.g., an experiment with two treatment conditions,say, a treatment and a control group

I The potential outcome under treatment Yi(1) is the outcomesome unit i would experience if assigned to treatment.

I The potential outcome under control Yi(0) is the outcome iwould experience if assigned to control.

The unit causal effect is the difference between these twooutcomes:

Yi(1) − Yi(0)

This parameter is not directly observable, because we seeYi(1) or Yi(0) but not both. (The “fundamental problem ofcausal inference”—Holland).

Lecture Notes, Week 1 15/ 41

Page 54: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Potential Outcomes

In statistics, an idea that combines counterfactuals andmanipulation (Neyman; Rubin; Holland)Imagine, e.g., an experiment with two treatment conditions,say, a treatment and a control group

I The potential outcome under treatment Yi(1) is the outcomesome unit i would experience if assigned to treatment.

I The potential outcome under control Yi(0) is the outcome iwould experience if assigned to control.

The unit causal effect is the difference between these twooutcomes:

Yi(1) − Yi(0)

This parameter is not directly observable, because we seeYi(1) or Yi(0) but not both. (The “fundamental problem ofcausal inference”—Holland).

Lecture Notes, Week 1 15/ 41

Page 55: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Potential Outcomes

In statistics, an idea that combines counterfactuals andmanipulation (Neyman; Rubin; Holland)Imagine, e.g., an experiment with two treatment conditions,say, a treatment and a control group

I The potential outcome under treatment Yi(1) is the outcomesome unit i would experience if assigned to treatment.

I The potential outcome under control Yi(0) is the outcome iwould experience if assigned to control.

The unit causal effect is the difference between these twooutcomes:

Yi(1) − Yi(0)

This parameter is not directly observable, because we seeYi(1) or Yi(0) but not both. (The “fundamental problem ofcausal inference”—Holland).

Lecture Notes, Week 1 15/ 41

Page 56: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Potential Outcomes

In statistics, an idea that combines counterfactuals andmanipulation (Neyman; Rubin; Holland)Imagine, e.g., an experiment with two treatment conditions,say, a treatment and a control group

I The potential outcome under treatment Yi(1) is the outcomesome unit i would experience if assigned to treatment.

I The potential outcome under control Yi(0) is the outcome iwould experience if assigned to control.

The unit causal effect is the difference between these twooutcomes:

Yi(1) − Yi(0)

This parameter is not directly observable, because we seeYi(1) or Yi(0) but not both. (The “fundamental problem ofcausal inference”—Holland).

Lecture Notes, Week 1 15/ 41

Page 57: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Potential Outcomes

In statistics, an idea that combines counterfactuals andmanipulation (Neyman; Rubin; Holland)Imagine, e.g., an experiment with two treatment conditions,say, a treatment and a control group

I The potential outcome under treatment Yi(1) is the outcomesome unit i would experience if assigned to treatment.

I The potential outcome under control Yi(0) is the outcome iwould experience if assigned to control.

The unit causal effect is the difference between these twooutcomes:

Yi(1) − Yi(0)

This parameter is not directly observable, because we seeYi(1) or Yi(0) but not both. (The “fundamental problem ofcausal inference”—Holland).

Lecture Notes, Week 1 15/ 41

Page 58: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Potential Outcomes

In statistics, an idea that combines counterfactuals andmanipulation (Neyman; Rubin; Holland)Imagine, e.g., an experiment with two treatment conditions,say, a treatment and a control group

I The potential outcome under treatment Yi(1) is the outcomesome unit i would experience if assigned to treatment.

I The potential outcome under control Yi(0) is the outcome iwould experience if assigned to control.

The unit causal effect is the difference between these twooutcomes:

Yi(1) − Yi(0)

This parameter is not directly observable, because we seeYi(1) or Yi(0) but not both. (The “fundamental problem ofcausal inference”—Holland).

Lecture Notes, Week 1 15/ 41

Page 59: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Average Causal Effects

Attention often focuses instead on average causal effects,e.g., for some units i = 1, ...,N,

1N

N∑i=1

[Yi(1) − Yi(0)].

This parameter is the difference between two counterfactuals:the average outcome if all units were assigned to treatment,minus the average if all units were assigned to control.

The Neyman model is a causal model: it stipulates how unitsrespond when they are assigned to treatment or control.

Such response schedules play a critical role in causalinference.

Lecture Notes, Week 1 16/ 41

Page 60: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Average Causal Effects

Attention often focuses instead on average causal effects,e.g., for some units i = 1, ...,N,

1N

N∑i=1

[Yi(1) − Yi(0)].

This parameter is the difference between two counterfactuals:the average outcome if all units were assigned to treatment,minus the average if all units were assigned to control.

The Neyman model is a causal model: it stipulates how unitsrespond when they are assigned to treatment or control.

Such response schedules play a critical role in causalinference.

Lecture Notes, Week 1 16/ 41

Page 61: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Average Causal Effects

Attention often focuses instead on average causal effects,e.g., for some units i = 1, ...,N,

1N

N∑i=1

[Yi(1) − Yi(0)].

This parameter is the difference between two counterfactuals:the average outcome if all units were assigned to treatment,minus the average if all units were assigned to control.

The Neyman model is a causal model: it stipulates how unitsrespond when they are assigned to treatment or control.

Such response schedules play a critical role in causalinference.

Lecture Notes, Week 1 16/ 41

Page 62: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Average Causal Effects

Attention often focuses instead on average causal effects,e.g., for some units i = 1, ...,N,

1N

N∑i=1

[Yi(1) − Yi(0)].

This parameter is the difference between two counterfactuals:the average outcome if all units were assigned to treatment,minus the average if all units were assigned to control.

The Neyman model is a causal model: it stipulates how unitsrespond when they are assigned to treatment or control.

Such response schedules play a critical role in causalinference.

Lecture Notes, Week 1 16/ 41

Page 63: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

A missing data problem

Notice that the fundamental problem of inference applies toaverage causal effects:

1N

N∑i=1

[Yi(1) − Yi(0)].

If we assign all N units to treatment, we don’t see Yi(0) forany i. Similarly if all N units go to control.

Social science often relies on empirical comparisons—e.g.,between some set of units for whom we observe Yi(1) andanother set for whom we observe Yi(0).

One group serves as the counterfactual for the other—whichmay help us overcome this fundamental problem.

Lecture Notes, Week 1 17/ 41

Page 64: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

A missing data problem

Notice that the fundamental problem of inference applies toaverage causal effects:

1N

N∑i=1

[Yi(1) − Yi(0)].

If we assign all N units to treatment, we don’t see Yi(0) forany i. Similarly if all N units go to control.

Social science often relies on empirical comparisons—e.g.,between some set of units for whom we observe Yi(1) andanother set for whom we observe Yi(0).

One group serves as the counterfactual for the other—whichmay help us overcome this fundamental problem.

Lecture Notes, Week 1 17/ 41

Page 65: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

A missing data problem

Notice that the fundamental problem of inference applies toaverage causal effects:

1N

N∑i=1

[Yi(1) − Yi(0)].

If we assign all N units to treatment, we don’t see Yi(0) forany i. Similarly if all N units go to control.

Social science often relies on empirical comparisons—e.g.,between some set of units for whom we observe Yi(1) andanother set for whom we observe Yi(0).

One group serves as the counterfactual for the other—whichmay help us overcome this fundamental problem.

Lecture Notes, Week 1 17/ 41

Page 66: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

A missing data problem

Notice that the fundamental problem of inference applies toaverage causal effects:

1N

N∑i=1

[Yi(1) − Yi(0)].

If we assign all N units to treatment, we don’t see Yi(0) forany i. Similarly if all N units go to control.

Social science often relies on empirical comparisons—e.g.,between some set of units for whom we observe Yi(1) andanother set for whom we observe Yi(0).

One group serves as the counterfactual for the other—whichmay help us overcome this fundamental problem.

Lecture Notes, Week 1 17/ 41

Page 67: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

A missing data problem

Notice that the fundamental problem of inference applies toaverage causal effects:

1N

N∑i=1

[Yi(1) − Yi(0)].

If we assign all N units to treatment, we don’t see Yi(0) forany i. Similarly if all N units go to control.

Social science often relies on empirical comparisons—e.g.,between some set of units for whom we observe Yi(1) andanother set for whom we observe Yi(0).

One group serves as the counterfactual for the other—whichmay help us overcome this fundamental problem.

Lecture Notes, Week 1 17/ 41

Page 68: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

A missing data problem

Notice that the fundamental problem of inference applies toaverage causal effects:

1N

N∑i=1

[Yi(1) − Yi(0)].

If we assign all N units to treatment, we don’t see Yi(0) forany i. Similarly if all N units go to control.

Social science often relies on empirical comparisons—e.g.,between some set of units for whom we observe Yi(1) andanother set for whom we observe Yi(0).

One group serves as the counterfactual for the other—whichmay help us overcome this fundamental problem.

Lecture Notes, Week 1 17/ 41

Page 69: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Random assignment and the missing data problem

Randomization is one way to solve the missing data problem.

The logic of random sampling is key.

The question is how we can use sample data—statistics—toestimate parameters—like the average causal effect.

To understand the sampling process, we need a box model

Lecture Notes, Week 1 18/ 41

Page 70: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Random assignment and the missing data problem

Randomization is one way to solve the missing data problem.

The logic of random sampling is key.

The question is how we can use sample data—statistics—toestimate parameters—like the average causal effect.

To understand the sampling process, we need a box model

Lecture Notes, Week 1 18/ 41

Page 71: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Random assignment and the missing data problem

Randomization is one way to solve the missing data problem.

The logic of random sampling is key.

The question is how we can use sample data—statistics—toestimate parameters—like the average causal effect.

To understand the sampling process, we need a box model

Lecture Notes, Week 1 18/ 41

Page 72: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Random assignment and the missing data problem

Randomization is one way to solve the missing data problem.

The logic of random sampling is key.

The question is how we can use sample data—statistics—toestimate parameters—like the average causal effect.

To understand the sampling process, we need a box model

Lecture Notes, Week 1 18/ 41

Page 73: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

A box model for experiments and natural experiments

… …

Yi(1)

Treatment group Control group

Yi(1)

Yi(1)€

Yi(1)

Yi(1)

Yi(1)

Yi(1)

Yi(1)

Yi(0)

Yi(0)

Yi(0)

Yi(0)

Yi(0)

Yi(0)

Yi(0)

Yi(0)Study group

Lecture Notes, Week 1 19/ 41

Page 74: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

The average causal effect

… …

Yi(1)

Treatment group Control group

Yi(1)

Yi(1)€

Yi(1)

Yi(1)

Yi(1)

Yi(1)

Yi(1)

Yi(0)

Yi(0)

Yi(0)

Yi(0)

Yi(0)

Yi(0)

Yi(0)

Yi(0)Study group

1N

[Yi(1) −i=1

N

∑ Yi(0)]The estimand:

Lecture Notes, Week 1 20/ 41

Page 75: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Estimating the average causal effect

… …

Yi(1)

Treatment group Control group

Yi(1)

Yi(1)€

Yi(1)

Yi(1)

Yi(1)

Yi(1)

Yi(1)

Yi(0)

Yi(0)

Yi(0)

Yi(0)

Yi(0)

Yi(0)

Yi(0)

Yi(0)Study group

1N

[Yi(1) −i=1

N

∑ Yi(0)]The estimand:

An unbiased estimator:

1m

[Yi |Ti =1i=1

m

∑ ] − 1N −m

[Yi |Ti = 0i=m+1

N

∑ ]

where is an indicator for treatment assignment. Under this model, a random subset of size m<N units is assigned to treatment. The units assigned to the treatment group are indexed from 1 to m.

Ti

Lecture Notes, Week 1 21/ 41

Page 76: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

The expected value, and definition of unbiasedness

With a simple random sample, the expected value for thesample mean equals the population mean.

I So the expected value of the mean of the Yi(1)s in theassigned-to-treatment group equals the average in the studygroup (the population).

I Same with the control group: the sample mean of the Yi(0)s inthe assigned-to-control group, on average, will equal the truepopulation mean.

In any given experiment, the control group mean may be toohigh or too low (as may the treatment group mean).

I But across infinitely many (hypothetical) replications of thesampling process, the average of the sample averages willequal the true average in the study group.

Similarly, the expected value of the difference of sampleaverages equals the average difference in the population.

I The difference-of-means estimator is unbiased for theaverage causal effect.

Lecture Notes, Week 1 22/ 41

Page 77: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

The expected value, and definition of unbiasedness

With a simple random sample, the expected value for thesample mean equals the population mean.

I So the expected value of the mean of the Yi(1)s in theassigned-to-treatment group equals the average in the studygroup (the population).

I Same with the control group: the sample mean of the Yi(0)s inthe assigned-to-control group, on average, will equal the truepopulation mean.

In any given experiment, the control group mean may be toohigh or too low (as may the treatment group mean).

I But across infinitely many (hypothetical) replications of thesampling process, the average of the sample averages willequal the true average in the study group.

Similarly, the expected value of the difference of sampleaverages equals the average difference in the population.

I The difference-of-means estimator is unbiased for theaverage causal effect.

Lecture Notes, Week 1 22/ 41

Page 78: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

The expected value, and definition of unbiasedness

With a simple random sample, the expected value for thesample mean equals the population mean.

I So the expected value of the mean of the Yi(1)s in theassigned-to-treatment group equals the average in the studygroup (the population).

I Same with the control group: the sample mean of the Yi(0)s inthe assigned-to-control group, on average, will equal the truepopulation mean.

In any given experiment, the control group mean may be toohigh or too low (as may the treatment group mean).

I But across infinitely many (hypothetical) replications of thesampling process, the average of the sample averages willequal the true average in the study group.

Similarly, the expected value of the difference of sampleaverages equals the average difference in the population.

I The difference-of-means estimator is unbiased for theaverage causal effect.

Lecture Notes, Week 1 22/ 41

Page 79: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

The expected value, and definition of unbiasedness

With a simple random sample, the expected value for thesample mean equals the population mean.

I So the expected value of the mean of the Yi(1)s in theassigned-to-treatment group equals the average in the studygroup (the population).

I Same with the control group: the sample mean of the Yi(0)s inthe assigned-to-control group, on average, will equal the truepopulation mean.

In any given experiment, the control group mean may be toohigh or too low (as may the treatment group mean).

I But across infinitely many (hypothetical) replications of thesampling process, the average of the sample averages willequal the true average in the study group.

Similarly, the expected value of the difference of sampleaverages equals the average difference in the population.

I The difference-of-means estimator is unbiased for theaverage causal effect.

Lecture Notes, Week 1 22/ 41

Page 80: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

The expected value, and definition of unbiasedness

With a simple random sample, the expected value for thesample mean equals the population mean.

I So the expected value of the mean of the Yi(1)s in theassigned-to-treatment group equals the average in the studygroup (the population).

I Same with the control group: the sample mean of the Yi(0)s inthe assigned-to-control group, on average, will equal the truepopulation mean.

In any given experiment, the control group mean may be toohigh or too low (as may the treatment group mean).

I But across infinitely many (hypothetical) replications of thesampling process, the average of the sample averages willequal the true average in the study group.

Similarly, the expected value of the difference of sampleaverages equals the average difference in the population.

I The difference-of-means estimator is unbiased for theaverage causal effect.

Lecture Notes, Week 1 22/ 41

Page 81: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

The expected value, and definition of unbiasedness

With a simple random sample, the expected value for thesample mean equals the population mean.

I So the expected value of the mean of the Yi(1)s in theassigned-to-treatment group equals the average in the studygroup (the population).

I Same with the control group: the sample mean of the Yi(0)s inthe assigned-to-control group, on average, will equal the truepopulation mean.

In any given experiment, the control group mean may be toohigh or too low (as may the treatment group mean).

I But across infinitely many (hypothetical) replications of thesampling process, the average of the sample averages willequal the true average in the study group.

Similarly, the expected value of the difference of sampleaverages equals the average difference in the population.

I The difference-of-means estimator is unbiased for theaverage causal effect. Lecture Notes, Week 1 22/ 41

Page 82: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Assumptions of the Neyman model

The Neyman urn model involves several assumptions:

I As-if Random: Units are sampled at random from the studygroup and assigned to treatment or control

I Non-Interference/SUTVA: Each unit’s outcome depends onlyon its treatment assignment (and not on the assignment ofother units) (Analogue to regression models: Unit i’s responsedepends on i’s covariate values and error term).

I Exclusion restriction: Treatment assignment only affectsoutcomes through treatment receipt

N.B.: As we will see, these are also assumptions of standardregression models, too

Important questions: how can the validity of theseassumptions be probed?

Lecture Notes, Week 1 23/ 41

Page 83: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Assumptions of the Neyman model

The Neyman urn model involves several assumptions:I As-if Random: Units are sampled at random from the study

group and assigned to treatment or control

I Non-Interference/SUTVA: Each unit’s outcome depends onlyon its treatment assignment (and not on the assignment ofother units) (Analogue to regression models: Unit i’s responsedepends on i’s covariate values and error term).

I Exclusion restriction: Treatment assignment only affectsoutcomes through treatment receipt

N.B.: As we will see, these are also assumptions of standardregression models, too

Important questions: how can the validity of theseassumptions be probed?

Lecture Notes, Week 1 23/ 41

Page 84: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Assumptions of the Neyman model

The Neyman urn model involves several assumptions:I As-if Random: Units are sampled at random from the study

group and assigned to treatment or controlI Non-Interference/SUTVA: Each unit’s outcome depends only

on its treatment assignment (and not on the assignment ofother units) (Analogue to regression models: Unit i’s responsedepends on i’s covariate values and error term).

I Exclusion restriction: Treatment assignment only affectsoutcomes through treatment receipt

N.B.: As we will see, these are also assumptions of standardregression models, too

Important questions: how can the validity of theseassumptions be probed?

Lecture Notes, Week 1 23/ 41

Page 85: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Assumptions of the Neyman model

The Neyman urn model involves several assumptions:I As-if Random: Units are sampled at random from the study

group and assigned to treatment or controlI Non-Interference/SUTVA: Each unit’s outcome depends only

on its treatment assignment (and not on the assignment ofother units) (Analogue to regression models: Unit i’s responsedepends on i’s covariate values and error term).

I Exclusion restriction: Treatment assignment only affectsoutcomes through treatment receipt

N.B.: As we will see, these are also assumptions of standardregression models, too

Important questions: how can the validity of theseassumptions be probed?

Lecture Notes, Week 1 23/ 41

Page 86: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Assumptions of the Neyman model

The Neyman urn model involves several assumptions:I As-if Random: Units are sampled at random from the study

group and assigned to treatment or controlI Non-Interference/SUTVA: Each unit’s outcome depends only

on its treatment assignment (and not on the assignment ofother units) (Analogue to regression models: Unit i’s responsedepends on i’s covariate values and error term).

I Exclusion restriction: Treatment assignment only affectsoutcomes through treatment receipt

N.B.: As we will see, these are also assumptions of standardregression models, too

Important questions: how can the validity of theseassumptions be probed?

Lecture Notes, Week 1 23/ 41

Page 87: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

An applicationDefining causalityPotential outcomesCausal inference

Assumptions of the Neyman model

The Neyman urn model involves several assumptions:I As-if Random: Units are sampled at random from the study

group and assigned to treatment or controlI Non-Interference/SUTVA: Each unit’s outcome depends only

on its treatment assignment (and not on the assignment ofother units) (Analogue to regression models: Unit i’s responsedepends on i’s covariate values and error term).

I Exclusion restriction: Treatment assignment only affectsoutcomes through treatment receipt

N.B.: As we will see, these are also assumptions of standardregression models, too

Important questions: how can the validity of theseassumptions be probed?

Lecture Notes, Week 1 23/ 41

Page 88: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

A box model for rolling a die

Let’s talk a bit more about random variables and theirexpectations.

A six-sided fair die has an equal probability of landing1, 2, 3, 4, 5, or 6 each time it is rolled.A single roll of the die can thus be modeled as a draw atrandom from a box of tickets

1 2 3 4 5 6?

Lecture Notes, Week 1 24/ 41

Page 89: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

A box model for rolling a die

Let’s talk a bit more about random variables and theirexpectations.A six-sided fair die has an equal probability of landing1, 2, 3, 4, 5, or 6 each time it is rolled.

A single roll of the die can thus be modeled as a draw atrandom from a box of tickets

1 2 3 4 5 6?

Lecture Notes, Week 1 24/ 41

Page 90: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

A box model for rolling a die

Let’s talk a bit more about random variables and theirexpectations.A six-sided fair die has an equal probability of landing1, 2, 3, 4, 5, or 6 each time it is rolled.A single roll of the die can thus be modeled as a draw atrandom from a box of tickets

1 2 3 4 5 6?

Lecture Notes, Week 1 24/ 41

Page 91: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Random variables and observed values

A ticket drawn at random from the box is a random variable.

I Definition: a random variable is a chance procedure forgenerating a number.

The value of a particular draw is an observed value (orrealization) of this random variable.

1   2   3   4   5   6  4   = X

Lecture Notes, Week 1 25/ 41

Page 92: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Random variables and observed values

A ticket drawn at random from the box is a random variable.I Definition: a random variable is a chance procedure for

generating a number.

The value of a particular draw is an observed value (orrealization) of this random variable.

1   2   3   4   5   6  4   = X

Lecture Notes, Week 1 25/ 41

Page 93: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Random variables and observed values

A ticket drawn at random from the box is a random variable.I Definition: a random variable is a chance procedure for

generating a number.The value of a particular draw is an observed value (orrealization) of this random variable.

1   2   3   4   5   6  4   = X

Lecture Notes, Week 1 25/ 41

Page 94: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Another random variable

Suppose we discard all values of the die of 5 and 6.

Now we’ve got a new random variable:

1   2   3   4  

?   = Y

Lecture Notes, Week 1 26/ 41

Page 95: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Another random variable

Suppose we discard all values of the die of 5 and 6.Now we’ve got a new random variable:

1   2   3   4  

?   = Y

Lecture Notes, Week 1 26/ 41

Page 96: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Operations of random variables

An arithmetic operation on draws from boxes makes a newrandom variable.

For example, define a new random variable Z in terms of Xand Y:

Y + X = Z

1   2   3   4   5   6  

?  

1   2   3   4  

?  

Y + X = Z

+

Lecture Notes, Week 1 27/ 41

Page 97: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Operations of random variables

An arithmetic operation on draws from boxes makes a newrandom variable.For example, define a new random variable Z in terms of Xand Y:

Y + X = Z

1   2   3   4   5   6  

?  

1   2   3   4  

?  

Y + X = Z

+

Lecture Notes, Week 1 27/ 41

Page 98: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Operations of random variables

An arithmetic operation on draws from boxes makes a newrandom variable.For example, define a new random variable Z in terms of Xand Y:

Y + X = Z

1   2   3   4   5   6  

?  

1   2   3   4  

?  

Y + X = Z

+

Lecture Notes, Week 1 27/ 41

Page 99: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Manipulating random variables

Other operations are fine, too.

1   2   3   4   5   6  

?  

1   2   3   4  

?  

3 Y X = W

3

Lecture Notes, Week 1 28/ 41

Page 100: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Manipulating random variables

Other operations are fine, too.

1   2   3   4   5   6  

?  

1   2   3   4  

?  

3 Y X = W

3

Lecture Notes, Week 1 28/ 41

Page 101: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Expectations

The expected value of a random variable is a number.

I If X is a random draw from a box of numbered tickets, thenE(X) is the average of the tickets in the box.

The observed value of the random variable will be somewherearound this number—sometimes too high, sometimes too low.Consider a previous example:

1. E(X) = ?2. E(Y) = ?3. E(2X + 3Y) = ?4. Does E(Z) = E(2X + 3Y) = 2E(X) + 3E(Y)?

1   2   3   4   5   6  

?  

1   2   3   4  

?  

Y + X = Z

+

Lecture Notes, Week 1 29/ 41

Page 102: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Expectations

The expected value of a random variable is a number.I If X is a random draw from a box of numbered tickets, then

E(X) is the average of the tickets in the box.

The observed value of the random variable will be somewherearound this number—sometimes too high, sometimes too low.Consider a previous example:

1. E(X) = ?2. E(Y) = ?3. E(2X + 3Y) = ?4. Does E(Z) = E(2X + 3Y) = 2E(X) + 3E(Y)?

1   2   3   4   5   6  

?  

1   2   3   4  

?  

Y + X = Z

+

Lecture Notes, Week 1 29/ 41

Page 103: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Expectations

The expected value of a random variable is a number.I If X is a random draw from a box of numbered tickets, then

E(X) is the average of the tickets in the box.The observed value of the random variable will be somewherearound this number—sometimes too high, sometimes too low.

Consider a previous example:

1. E(X) = ?2. E(Y) = ?3. E(2X + 3Y) = ?4. Does E(Z) = E(2X + 3Y) = 2E(X) + 3E(Y)?

1   2   3   4   5   6  

?  

1   2   3   4  

?  

Y + X = Z

+

Lecture Notes, Week 1 29/ 41

Page 104: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Expectations

The expected value of a random variable is a number.I If X is a random draw from a box of numbered tickets, then

E(X) is the average of the tickets in the box.The observed value of the random variable will be somewherearound this number—sometimes too high, sometimes too low.Consider a previous example:

1. E(X) = ?2. E(Y) = ?3. E(2X + 3Y) = ?4. Does E(Z) = E(2X + 3Y) = 2E(X) + 3E(Y)?

1   2   3   4   5   6  

?  

1   2   3   4  

?  

Y + X = Z

+

Lecture Notes, Week 1 29/ 41

Page 105: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Independence and dependence

Suppose X and Y are random variables. Pretend you knowthe value of X . Do the chances of Y depend on that value? Ifso, X and Y are dependent. If not, they are independent.

Are X and Y dependent or independent?

 2    2    2    4    2    9  

 4    2    4    9    4    4  

   ?    ?  

X Y

Lecture Notes, Week 1 30/ 41

Page 106: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Independence and dependence

Suppose X and Y are random variables. Pretend you knowthe value of X . Do the chances of Y depend on that value? Ifso, X and Y are dependent. If not, they are independent.Are X and Y dependent or independent?

 2    2    2    4    2    9  

 4    2    4    9    4    4  

   ?    ?  

X Y

Lecture Notes, Week 1 30/ 41

Page 107: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Drawing with and without replacement

Suppose we make two draws from the box below.I If the draws are made with replacement, are they

independent?I If the draws are made without replacement, are they

independent?

1   2   3   4   5   6  ?   ?  

Lecture Notes, Week 1 31/ 41

Page 108: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Conditional versus unconditional probabilities

Suppose we make two draws without replacement from thebox below.

I What is the chance that the second draw is 4?I What is the chance that the second draw is 4, given that the

first draw is 3?

1   2   3   4  

Lecture Notes, Week 1 32/ 41

Page 109: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Conditional versus unconditional probabilities

In answering such questions, it is helpful to map out allpossible outcomes of the two draws.

second draw1 2 3 4

first draw

1 n.a. 12 13 142 21 n.a. 23 243 31 32 n.a. 344 41 41 43 n.a.

In the table, “12” means that 1 is the observed value of the first drawand 2 is the observed value of the second draw.

Lecture Notes, Week 1 33/ 41

Page 110: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

More on chance processes

A data variable is a list of numbers.

A random variable is a chance procedure for generating a number.

Sometimes, a data variable can be viewed as a list of observedvalues of random variables.

I Tomorrow morning, you go out and ask the age of the first personyou meet. Is this a random variable?

Random variables involve a model for the process whichgenerated the data.

In this section, we started by writing down a box model for rolling asix-sided die. This sort of modeling is basic to statistical inference.

I Sometimes, the models are apt descriptions of the chanceprocedure. Sometimes, they are not . . .

Lecture Notes, Week 1 34/ 41

Page 111: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

More on chance processes

A data variable is a list of numbers.

A random variable is a chance procedure for generating a number.

Sometimes, a data variable can be viewed as a list of observedvalues of random variables.

I Tomorrow morning, you go out and ask the age of the first personyou meet. Is this a random variable?

Random variables involve a model for the process whichgenerated the data.

In this section, we started by writing down a box model for rolling asix-sided die. This sort of modeling is basic to statistical inference.

I Sometimes, the models are apt descriptions of the chanceprocedure. Sometimes, they are not . . .

Lecture Notes, Week 1 34/ 41

Page 112: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

More on chance processes

A data variable is a list of numbers.

A random variable is a chance procedure for generating a number.

Sometimes, a data variable can be viewed as a list of observedvalues of random variables.

I Tomorrow morning, you go out and ask the age of the first personyou meet. Is this a random variable?

Random variables involve a model for the process whichgenerated the data.

In this section, we started by writing down a box model for rolling asix-sided die. This sort of modeling is basic to statistical inference.

I Sometimes, the models are apt descriptions of the chanceprocedure. Sometimes, they are not . . .

Lecture Notes, Week 1 34/ 41

Page 113: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

More on chance processes

A data variable is a list of numbers.

A random variable is a chance procedure for generating a number.

Sometimes, a data variable can be viewed as a list of observedvalues of random variables.

I Tomorrow morning, you go out and ask the age of the first personyou meet. Is this a random variable?

Random variables involve a model for the process whichgenerated the data.

In this section, we started by writing down a box model for rolling asix-sided die. This sort of modeling is basic to statistical inference.

I Sometimes, the models are apt descriptions of the chanceprocedure. Sometimes, they are not . . .

Lecture Notes, Week 1 34/ 41

Page 114: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

More on chance processes

A data variable is a list of numbers.

A random variable is a chance procedure for generating a number.

Sometimes, a data variable can be viewed as a list of observedvalues of random variables.

I Tomorrow morning, you go out and ask the age of the first personyou meet. Is this a random variable?

Random variables involve a model for the process whichgenerated the data.

In this section, we started by writing down a box model for rolling asix-sided die. This sort of modeling is basic to statistical inference.

I Sometimes, the models are apt descriptions of the chanceprocedure. Sometimes, they are not . . .

Lecture Notes, Week 1 34/ 41

Page 115: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

More on chance processes

A data variable is a list of numbers.

A random variable is a chance procedure for generating a number.

Sometimes, a data variable can be viewed as a list of observedvalues of random variables.

I Tomorrow morning, you go out and ask the age of the first personyou meet. Is this a random variable?

Random variables involve a model for the process whichgenerated the data.

In this section, we started by writing down a box model for rolling asix-sided die. This sort of modeling is basic to statistical inference.

I Sometimes, the models are apt descriptions of the chanceprocedure. Sometimes, they are not . . .

Lecture Notes, Week 1 34/ 41

Page 116: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

More on chance processes

A data variable is a list of numbers.

A random variable is a chance procedure for generating a number.

Sometimes, a data variable can be viewed as a list of observedvalues of random variables.

I Tomorrow morning, you go out and ask the age of the first personyou meet. Is this a random variable?

Random variables involve a model for the process whichgenerated the data.

In this section, we started by writing down a box model for rolling asix-sided die. This sort of modeling is basic to statistical inference.

I Sometimes, the models are apt descriptions of the chanceprocedure. Sometimes, they are not . . .

Lecture Notes, Week 1 34/ 41

Page 117: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Statistics and parameters

For our model of rolling a six-sided die, the expected value of adraw from the box is the box’s mean:

1 + 2 + 3 + 4 + 5 + 66

= 3.5 (1)

The expected value is a parameter.

In this case, we know what it is. In other cases, we might useobserved values to estimate the parameter.

Drawing inferences from data to the box is the focus of statistics.(Reasoning forward from the box to the data is the study ofprobability).

Lecture Notes, Week 1 35/ 41

Page 118: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Statistics and parameters

For our model of rolling a six-sided die, the expected value of adraw from the box is the box’s mean:

1 + 2 + 3 + 4 + 5 + 66

= 3.5 (1)

The expected value is a parameter.

In this case, we know what it is. In other cases, we might useobserved values to estimate the parameter.

Drawing inferences from data to the box is the focus of statistics.(Reasoning forward from the box to the data is the study ofprobability).

Lecture Notes, Week 1 35/ 41

Page 119: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Statistics and parameters

For our model of rolling a six-sided die, the expected value of adraw from the box is the box’s mean:

1 + 2 + 3 + 4 + 5 + 66

= 3.5 (1)

The expected value is a parameter.

In this case, we know what it is. In other cases, we might useobserved values to estimate the parameter.

Drawing inferences from data to the box is the focus of statistics.(Reasoning forward from the box to the data is the study ofprobability).

Lecture Notes, Week 1 35/ 41

Page 120: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Statistics and parameters

For our model of rolling a six-sided die, the expected value of adraw from the box is the box’s mean:

1 + 2 + 3 + 4 + 5 + 66

= 3.5 (1)

The expected value is a parameter.

In this case, we know what it is. In other cases, we might useobserved values to estimate the parameter.

Drawing inferences from data to the box is the focus of statistics.(Reasoning forward from the box to the data is the study ofprobability).

Lecture Notes, Week 1 35/ 41

Page 121: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Statistics and parameters

For our model of rolling a six-sided die, the expected value of adraw from the box is the box’s mean:

1 + 2 + 3 + 4 + 5 + 66

= 3.5 (1)

The expected value is a parameter.

In this case, we know what it is. In other cases, we might useobserved values to estimate the parameter.

Drawing inferences from data to the box is the focus of statistics.(Reasoning forward from the box to the data is the study ofprobability).

Lecture Notes, Week 1 35/ 41

Page 122: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Statistics and parameters

For our model of rolling a six-sided die, the expected value of adraw from the box is the box’s mean:

1 + 2 + 3 + 4 + 5 + 66

= 3.5 (1)

The expected value is a parameter.

In this case, we know what it is. In other cases, we might useobserved values to estimate the parameter.

Drawing inferences from data to the box is the focus of statistics.(Reasoning forward from the box to the data is the study ofprobability).

Lecture Notes, Week 1 35/ 41

Page 123: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

More on random variables

If we roll a die n times and add up the total number of spots, wehave another random variable:

n∑i=1

Ui , (2)

where Ui is the number of spots on the ith roll.

This is like drawing n tickets from the box with replacement.

The Ui are independent, identically distributed random variables.

Distributing expectations, we have

E(n∑

i=1

Ui) =n∑i

E(Ui) = n · 3.5

Lecture Notes, Week 1 36/ 41

Page 124: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

More on random variables

If we roll a die n times and add up the total number of spots, wehave another random variable:

n∑i=1

Ui , (2)

where Ui is the number of spots on the ith roll.

This is like drawing n tickets from the box with replacement.

The Ui are independent, identically distributed random variables.

Distributing expectations, we have

E(n∑

i=1

Ui) =n∑i

E(Ui) = n · 3.5

Lecture Notes, Week 1 36/ 41

Page 125: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

More on random variables

If we roll a die n times and add up the total number of spots, wehave another random variable:

n∑i=1

Ui , (2)

where Ui is the number of spots on the ith roll.

This is like drawing n tickets from the box with replacement.

The Ui are independent, identically distributed random variables.

Distributing expectations, we have

E(n∑

i=1

Ui) =n∑i

E(Ui) = n · 3.5

Lecture Notes, Week 1 36/ 41

Page 126: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

More on random variables

If we roll a die n times and add up the total number of spots, wehave another random variable:

n∑i=1

Ui , (2)

where Ui is the number of spots on the ith roll.

This is like drawing n tickets from the box with replacement.

The Ui are independent, identically distributed random variables.

Distributing expectations, we have

E(n∑

i=1

Ui) =n∑i

E(Ui) = n · 3.5

Lecture Notes, Week 1 36/ 41

Page 127: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Sampling from the box

Again, throw a die n times and count the number of spots on eachroll. The sample mean is

U =1n

n∑i=1

Ui (3)

The sample mean is a sum of random variables and is itself arandom variable—it will turn out a little different in each differentsample of n rolls of the die.

When n is large, however,

U � E(Ui) = 3.5 (4)

where � means “about equal to.”

Lecture Notes, Week 1 37/ 41

Page 128: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Sampling from the box

Again, throw a die n times and count the number of spots on eachroll. The sample mean is

U =1n

n∑i=1

Ui (3)

The sample mean is a sum of random variables and is itself arandom variable—it will turn out a little different in each differentsample of n rolls of the die.

When n is large, however,

U � E(Ui) = 3.5 (4)

where � means “about equal to.”

Lecture Notes, Week 1 37/ 41

Page 129: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Sampling from the box

Again, throw a die n times and count the number of spots on eachroll. The sample mean is

U =1n

n∑i=1

Ui (3)

The sample mean is a sum of random variables and is itself arandom variable—it will turn out a little different in each differentsample of n rolls of the die.

When n is large, however,

U � E(Ui) = 3.5 (4)

where � means “about equal to.”

Lecture Notes, Week 1 37/ 41

Page 130: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Estimating parameters

Thus, we can use repeated observations to estimate parameterslike the expectations of random variables.

In this example, we use observed values of independent,identically distributed random variables–n draws from a box oftickets.

This is a good model for the rolling of dice.

In other contexts, things may be more complicated.

Lecture Notes, Week 1 38/ 41

Page 131: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Estimating parameters

Thus, we can use repeated observations to estimate parameterslike the expectations of random variables.

In this example, we use observed values of independent,identically distributed random variables–n draws from a box oftickets.

This is a good model for the rolling of dice.

In other contexts, things may be more complicated.

Lecture Notes, Week 1 38/ 41

Page 132: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Estimating parameters

Thus, we can use repeated observations to estimate parameterslike the expectations of random variables.

In this example, we use observed values of independent,identically distributed random variables–n draws from a box oftickets.

This is a good model for the rolling of dice.

In other contexts, things may be more complicated.

Lecture Notes, Week 1 38/ 41

Page 133: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Estimating parameters

Thus, we can use repeated observations to estimate parameterslike the expectations of random variables.

In this example, we use observed values of independent,identically distributed random variables–n draws from a box oftickets.

This is a good model for the rolling of dice.

In other contexts, things may be more complicated.

Lecture Notes, Week 1 38/ 41

Page 134: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

Hypothetical illustration of potential outcomes for localbudgets when village council heads are women or men

Yi(0) Yi(1)Budget share if Budget share if τi

village head is village head is Unit causalVillage i male female effectVillage 1 10 15 5Village 2 15 15 0Village 3 20 30 10Village 4 20 15 -5Village 5 10 20 10Village 6 15 15 0Village 7 15 30 15Average 15 20 5

From Gerber and Green (2012), drawing on Chattopadhyay and Duflo (2004)Lecture Notes, Week 1 39/ 41

Page 135: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

An experiment where m = 2 villages are assigned totreatment and N −m = 5 go to control

15   10  15   15  

30   20  

15   20   20   10  

30   15  15   15  

   ?      ?    ?    ?  

 ?    ?  

 ?  

Lecture Notes, Week 1 40/ 41

Page 136: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

An unbiased estimator for the average causal effect

More generally, denote the units assigned to treatment byi = 1, ...,m and those assigned to control by i = m + 1, ...,N.

I Define YT =∑m

i Yi

m to be the sample mean of theassigned-to-treatment group.

I Define YC =∑N

m+1 Yi

N−m to be the sample mean of theassigned-to-control group.

Then,

E(YT − YC) = E(

∑mi Yi

m) − E(

∑Nm+1 Yi

N −m) (5)

On problem set: show that the difference-of-means estimator isunbiased for the average causal effect.

Lecture Notes, Week 1 41/ 41

Page 137: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

An unbiased estimator for the average causal effect

More generally, denote the units assigned to treatment byi = 1, ...,m and those assigned to control by i = m + 1, ...,N.

I Define YT =∑m

i Yi

m to be the sample mean of theassigned-to-treatment group.

I Define YC =∑N

m+1 Yi

N−m to be the sample mean of theassigned-to-control group.

Then,

E(YT − YC) = E(

∑mi Yi

m) − E(

∑Nm+1 Yi

N −m) (5)

On problem set: show that the difference-of-means estimator isunbiased for the average causal effect.

Lecture Notes, Week 1 41/ 41

Page 138: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

An unbiased estimator for the average causal effect

More generally, denote the units assigned to treatment byi = 1, ...,m and those assigned to control by i = m + 1, ...,N.

I Define YT =∑m

i Yi

m to be the sample mean of theassigned-to-treatment group.

I Define YC =∑N

m+1 Yi

N−m to be the sample mean of theassigned-to-control group.

Then,

E(YT − YC) = E(

∑mi Yi

m) − E(

∑Nm+1 Yi

N −m) (5)

On problem set: show that the difference-of-means estimator isunbiased for the average causal effect.

Lecture Notes, Week 1 41/ 41

Page 139: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

An unbiased estimator for the average causal effect

More generally, denote the units assigned to treatment byi = 1, ...,m and those assigned to control by i = m + 1, ...,N.

I Define YT =∑m

i Yi

m to be the sample mean of theassigned-to-treatment group.

I Define YC =∑N

m+1 Yi

N−m to be the sample mean of theassigned-to-control group.

Then,

E(YT − YC) = E(

∑mi Yi

m) − E(

∑Nm+1 Yi

N −m) (5)

On problem set: show that the difference-of-means estimator isunbiased for the average causal effect.

Lecture Notes, Week 1 41/ 41

Page 140: PLSC 503_spring 2013_lecture 1

CausalityStatistical inference

Box modelsStatistical InferenceInference under the potential outcomes model

An unbiased estimator for the average causal effect

More generally, denote the units assigned to treatment byi = 1, ...,m and those assigned to control by i = m + 1, ...,N.

I Define YT =∑m

i Yi

m to be the sample mean of theassigned-to-treatment group.

I Define YC =∑N

m+1 Yi

N−m to be the sample mean of theassigned-to-control group.

Then,

E(YT − YC) = E(

∑mi Yi

m) − E(

∑Nm+1 Yi

N −m) (5)

On problem set: show that the difference-of-means estimator isunbiased for the average causal effect.

Lecture Notes, Week 1 41/ 41