Writing with Data: Incorporating Statistics Into Causal Research

30
Writing with Data: Writing with Data: Incorporating Incorporating Statistics Into Causal Statistics Into Causal Research Research Statlab Workshop Statlab Workshop Spring 2011 Spring 2011 Brian Fried Brian Fried and and Kevin Callender Kevin Callender

description

Writing with Data: Incorporating Statistics Into Causal Research. Statlab Workshop Spring 2011 Brian Fried and Kevin Callender. Outline of Workshop. Part I: Causation and Statistics What is Causation? Correlation? Why Statistics? Threats to Inference Part II: Gathering and Using Data - PowerPoint PPT Presentation

Transcript of Writing with Data: Incorporating Statistics Into Causal Research

Page 1: Writing with Data: Incorporating Statistics Into Causal Research

Writing with Data:Writing with Data:Incorporating Statistics Incorporating Statistics

Into Causal ResearchInto Causal Research

Statlab WorkshopStatlab WorkshopSpring 2011Spring 2011

Brian FriedBrian Friedand and

Kevin CallenderKevin Callender

Page 2: Writing with Data: Incorporating Statistics Into Causal Research

Outline of WorkshopOutline of WorkshopPart I: Causation and StatisticsPart I: Causation and Statistics

What is Causation? Correlation?What is Causation? Correlation? Why Statistics?Why Statistics? Threats to InferenceThreats to Inference

Part II: Gathering and Using DataPart II: Gathering and Using Data Gathering DataGathering Data Managing DataManaging Data

Part III: Writing with StatisticsPart III: Writing with Statistics A General Outline, with an exampleA General Outline, with an example

Page 3: Writing with Data: Incorporating Statistics Into Causal Research

Causation vs. CorrelationCausation vs. Correlation

Causation…Causation…

……correlationcorrelation

Page 4: Writing with Data: Incorporating Statistics Into Causal Research

Why StatisticsWhy Statistics

Probabilistic RelationshipsProbabilistic Relationships(see previous graph)(see previous graph)

Multivariate RelationshipsMultivariate RelationshipsWe can analyze the relationships We can analyze the relationships between multiple variables at the same between multiple variables at the same time.time.(e.g. education, age, gender, income ….(e.g. education, age, gender, income …. -> voting) -> voting)

What is a regression? What is a regression?

Page 5: Writing with Data: Incorporating Statistics Into Causal Research

Threats to InferenceThreats to Inference Endogeneity (vs exogeneity of Endogeneity (vs exogeneity of

errors)errors) Autocorrelation (time series)Autocorrelation (time series) Homo/HeteroskedasticityHomo/Heteroskedasticity Internal vs. external validity Internal vs. external validity

Probably the most important step Probably the most important step in research design; advanced in research design; advanced techniques can often compensate.techniques can often compensate.

Page 6: Writing with Data: Incorporating Statistics Into Causal Research

Part II: DataPart II: DataThink about analyses early! (Ideal vs. Possible)Think about analyses early! (Ideal vs. Possible)What’s Possible? What’s Convincing?What’s Possible? What’s Convincing?

Experimental Ideal Experimental Ideal Practical Data LimitationsPractical Data Limitations Collecting Your Own DataCollecting Your Own Data Using Other DataUsing Other Data

Some data sources: Some data sources: Statlab Statlab Webpage (http://statlab.stat.yale.edu)Webpage (http://statlab.stat.yale.edu) Advisors/Professional ContactsAdvisors/Professional Contacts Yale StatCat (http://ssrs.yale.edu/statcat/)Yale StatCat (http://ssrs.yale.edu/statcat/) ICPSR (http://www.icpsr.umich.edu)ICPSR (http://www.icpsr.umich.edu) Reference Librarian (Julie Linden)Reference Librarian (Julie Linden)

Page 7: Writing with Data: Incorporating Statistics Into Causal Research

(Quant.) Data Types (Quant.) Data Types and Usesand Uses

Dependent Variable (Dependent Variable (response, response, outcome, criterion)outcome, criterion)

Independent Variables (Independent Variables (explanatory explanatory or predictor variables)or predictor variables)

Control / Confounding Variables Control / Confounding Variables Categorical and Continuous Categorical and Continuous

VariablesVariablesRemember: Types of variables we choose Remember: Types of variables we choose

determine the statistics we usedetermine the statistics we useQualitative knowledge always helps!Qualitative knowledge always helps!

Page 8: Writing with Data: Incorporating Statistics Into Causal Research

Once You’ve Found or Once You’ve Found or Collected Your DataCollected Your Data

Download the data and documentationDownload the data and documentation StatTransfer (Statlab)StatTransfer (Statlab)

Determine data file typeDetermine data file type Probably a text file (.txt, .dat, .raw)Probably a text file (.txt, .dat, .raw)

Converting text & delimited filesConverting text & delimited files

Choose a statistical software programChoose a statistical software program

Page 9: Writing with Data: Incorporating Statistics Into Causal Research

Managing your dataManaging your data

Back up all Master Data FilesBack up all Master Data FilesCodebookCodebook

Merging DataMerging Data Adding variables, cases, computing Adding variables, cases, computing

new variablesnew variables

Keep a roadmap Keep a roadmap Keep a log of all analyses with what Keep a log of all analyses with what

you have doneyou have done Save syntax filesSave syntax files

Page 10: Writing with Data: Incorporating Statistics Into Causal Research

Syntax FilesSyntax FilesWhat are they?What are they?

Text-files used to enter commands in Text-files used to enter commands in bulkbulk

Why?Why?You will make mistakes, need to make You will make mistakes, need to make

changeschanges

How do I know what to write?How do I know what to write?Program’s manual provides the Program’s manual provides the

underlying commandunderlying command

Page 11: Writing with Data: Incorporating Statistics Into Causal Research

Part III: WritingPart III: Writing

IntroductionIntroductionTheory (Lit Review)Theory (Lit Review)Data DescriptionData DescriptionAnalysis/ResultsAnalysis/ResultsConclusionConclusion

Page 12: Writing with Data: Incorporating Statistics Into Causal Research

IntroductionIntroductionQuestionQuestion

What is the question you want to answer? What is the question you want to answer?

Why should we care?Why should we care?

HypothesisHypothesisSuccinctly state your claimSuccinctly state your claim

Context & SummaryContext & Summary

Page 13: Writing with Data: Incorporating Statistics Into Causal Research

MotivationMotivation Are politics becoming more Are politics becoming more

programmatic in Brazil?programmatic in Brazil?

Is Bolsa Familia, a conditional cash Is Bolsa Familia, a conditional cash transfer (CCT) program that benefits transfer (CCT) program that benefits a quarter of Brazil’s population, a quarter of Brazil’s population, programmatic?programmatic?

An Illustrative Example: Bolsa Familia

Page 14: Writing with Data: Incorporating Statistics Into Causal Research

Programa Bolsa Família – key Programa Bolsa Família – key factsfacts

Conditional cash transfer (CCT) program, launched in Conditional cash transfer (CCT) program, launched in October October 2003. This was not the first CCT program in 2003. This was not the first CCT program in Brazil; some existing programs (like Bolsa Escola) were Brazil; some existing programs (like Bolsa Escola) were incorporated into Bolsa Familia. incorporated into Bolsa Familia.

Benefits families with per capita income below US$78.Benefits families with per capita income below US$78.12 million poor families (almost 50 million people) 12 million poor families (almost 50 million people) currently receive support in all 5,564 Brazilian currently receive support in all 5,564 Brazilian municipalities;municipalities;

Size of stipend: between US$13 and US$114, Size of stipend: between US$13 and US$114, depending on the family’s size and poverty level. depending on the family’s size and poverty level.

Average amount: US$54 per familyAverage amount: US$54 per family2009 Budget: US$ 10.5 billion (0.4% of Brazil’s GDP)2009 Budget: US$ 10.5 billion (0.4% of Brazil’s GDP)

An Illustrative Example: Bolsa Familia

Page 15: Writing with Data: Incorporating Statistics Into Causal Research

Theory/Lit. ReviewTheory/Lit. Review

What does existing theory say?What does existing theory say? What do you believe? What do you believe? Position yourself within theoretical debates.Position yourself within theoretical debates.

Identify Testable HypothesesIdentify Testable Hypotheses

Choose Method Best Suited to Testing Choose Method Best Suited to Testing Your HypothesisYour Hypothesis

Do you need statistics after all?Do you need statistics after all? Quantitative v Qualitative researchQuantitative v Qualitative research

Page 16: Writing with Data: Incorporating Statistics Into Causal Research

Research QuestionResearch Question

Do political criteria explain the variation in Do political criteria explain the variation in Bolsa Familia’s coverage across Bolsa Familia’s coverage across municipalities?municipalities?

Theoretical (Cox and McCubbins 1986, Dixit and Theoretical (Cox and McCubbins 1986, Dixit and Londregan 1996, Lindbeck and Weibell 1987) and Londregan 1996, Lindbeck and Weibell 1987) and empirical (Ames 1987, Levitt and Snyder 1995, Schady empirical (Ames 1987, Levitt and Snyder 1995, Schady 2000, Dahberg and Johansson 2002, Stokes 2004, 2000, Dahberg and Johansson 2002, Stokes 2004, Kitschelt 2010) reasons to believe that political Kitschelt 2010) reasons to believe that political spending is often targeted, especially given Brazil’s spending is often targeted, especially given Brazil’s history with clientelism and pork.history with clientelism and pork.

An Illustrative Example: Bolsa Familia

Page 17: Writing with Data: Incorporating Statistics Into Causal Research

How do politicians target?How do politicians target?

““Core”Core”

““Swing”Swing”

MobilizationMobilization

An Illustrative Example: Bolsa Familia

Page 18: Writing with Data: Incorporating Statistics Into Causal Research

Descriptive StatisticsDescriptive StatisticsVariablesVariables

Dependent Variable(s)Dependent Variable(s)

Independent Variable(s)Independent Variable(s)

Important Control Variable(s)Important Control Variable(s)

GraphsGraphs

Summary Statistics on Key VariablesSummary Statistics on Key VariablesNumber, Mean, Minimum, Maximum, Standard Number, Mean, Minimum, Maximum, Standard

DeviationDeviation

Cross-TabsCross-Tabs

Page 19: Writing with Data: Incorporating Statistics Into Causal Research

Descriptive StatisticsDescriptive Statistics

MeanStand. Dev.

Min Max Missing

Dependent Variable

Coverage in 2009 0.976 0.229 0.018 6.276 12

Explanatory Variables

PT Vote Share for Deputado Federal

0.060 0.048 0.000 0.326 345

PT Vote Share for President 0.470 0.107 0.110 0.826 18

An Illustrative Example: Bolsa Familia

Page 20: Writing with Data: Incorporating Statistics Into Causal Research

Coverage in 2009 This continuous variable is the ratio of recipients over the number estimated to be poor in each municipality in November of 2009.

PT Voteshare for Deputado Federal This continuous variable captures a core targeting strategy and measures average PT vote share for federal deputy across the 2002 and 2006 elections.

PT Voteshare for President This continuous variable captures a core targeting strategy and measures average PT vote share for president across the 2002 and 2006 elections.

Key VariablesAn Illustrative Example: Bolsa Familia

Page 21: Writing with Data: Incorporating Statistics Into Causal Research

Descriptive StatisticsDescriptive StatisticsMean

Stand. Dev.

Min Max Missing

Explanatory Variables

PT Mayor in 2008 0.098 0.297 0 1 0

Base Mayor in 2008 0.609 0.488 0 1 0

Change in Support for PT Presidential Candidate

0.055 0.080 0 0.603 18

Close Presidential Election in 2006

0.190 0.392 0 1 0

An Illustrative Example: Bolsa Familia

Page 22: Writing with Data: Incorporating Statistics Into Causal Research

So, how do I analyze my So, how do I analyze my data?data?

Correlational designCorrelational design Correlation allows you to quantify relationships Correlation allows you to quantify relationships

between variables (r, r-squared)between variables (r, r-squared) Correlation, partial correlationCorrelation, partial correlation Regression allows you predict scores on 1 variable Regression allows you predict scores on 1 variable

from subjects score on another variable(s) from subjects score on another variable(s)

Group differencesGroup differences t-test & ANOVAt-test & ANOVA Chi-square for categorical and frequency dataChi-square for categorical and frequency data

Significance v. effect sizeSignificance v. effect size

SimulationsSimulations

Page 23: Writing with Data: Incorporating Statistics Into Causal Research

Methods Methods ofof AnalysisAnalysis

(Empirical (Empirical StrategyStrategy))We discussed this in Part I, We discussed this in Part I, but one generally devotes a but one generally devotes a

section to explaining how one section to explaining how one will identify a causal will identify a causal

relationship prior to the relationship prior to the results section.results section.

Coverage = β0 + β1(political criteria) + βXX + e

Page 24: Writing with Data: Incorporating Statistics Into Causal Research

Results: Explaining Coverage in Results: Explaining Coverage in 20092009

Explanatory Variable Regression Coefficient

Core Indicators

PT Vote Share for Deputado Federal -.473***

PT Vote Share for President -.0972***

PT Mayor -.0241**

Base Mayor -.0208***

Swing Indicators

Change in Support for PT Presidential Candidate

-.175***

Close Presidential Election .00651

An Illustrative Example: Bolsa Familia

Page 25: Writing with Data: Incorporating Statistics Into Causal Research

Effect of Standard Deviation Shift of Effect of Standard Deviation Shift of Explanatory Variables on Coverage in Explanatory Variables on Coverage in

20092009

Shift Explained by Political CriteriaEffect of Shift in

Support

PT Vote Share for Deputado Federal -0.023

PT Vote Share for President -0.010

PT Mayor* -0.024

Base Mayor* -0.021

Change in Support for PT Presidential Candidate -0.014

Close Presidential Election in 2006* 0.007

Page 26: Writing with Data: Incorporating Statistics Into Causal Research

RobustnessRobustness

Identify Threats to Inference!Identify Threats to Inference!

(Do I have any?)(Do I have any?)

Page 27: Writing with Data: Incorporating Statistics Into Causal Research

Robustness Check: Relationship Robustness Check: Relationship between Coverage in 2004 and Prior between Coverage in 2004 and Prior

ElectionsElections

Shift Explained by Political Criteria

Effect of Shift in Support

PT Vote Share for Deputado Federal in 2002 0.018

PT Vote Share for President in 2002 0.034

PT Mayor in 2000* 0.002

Base Mayor in 2000* 0.005

Change in Support for PT Presidential Candidate (1998 to 2002)

-0.003

Close Presidential Election in 2002* -0.016

Page 28: Writing with Data: Incorporating Statistics Into Causal Research

Putting Output into a Putting Output into a PaperPaper

Cut and PasteCut and PasteGraphsGraphs

Cut and Paste into Word Processing documentCut and Paste into Word Processing document

Save as .jpeg or .tif fileSave as .jpeg or .tif file

TablesTablesCut and PasteCut and Paste

Format in Word Processing documentFormat in Word Processing document

Import into Excel, format, and then place in Import into Excel, format, and then place in WordWord

Page 29: Writing with Data: Incorporating Statistics Into Causal Research

More Advanced More Advanced AnalysisAnalysis

Multivariate techniques are only a start; Multivariate techniques are only a start; they do help to account for confounding they do help to account for confounding factors, allow for testing change over factors, allow for testing change over time and more complex hypothesestime and more complex hypotheses……

(See: Tabachnick & Fidell, Using Multivariate (See: Tabachnick & Fidell, Using Multivariate Statistics)Statistics)

1)1) Be honest about your abilities.Be honest about your abilities.2)2) Ask for helpAsk for help3)3) Best off including techniques that you Best off including techniques that you

fully understand, but may be worth fully understand, but may be worth learning something new!learning something new!

Page 30: Writing with Data: Incorporating Statistics Into Causal Research

Take Away MessagesTake Away Messages1)1) Begin by thinking about what question interests.Begin by thinking about what question interests.

2)2) Look for data and consider appropriate methods; Look for data and consider appropriate methods; identify what hypotheses are actually testable.identify what hypotheses are actually testable.

3)3) Design and run analysis; keep a codebook/syntax Design and run analysis; keep a codebook/syntax files!files!

4)4) Back up dataBack up data

5)5) Ask for help-especially when choosing method—Ask for help-especially when choosing method—and seek feedback on research design.and seek feedback on research design.

6)6) Research and Writing an Iterative ProcessResearch and Writing an Iterative Process