Sampling and case selection

COURSE

DATE PROFESSORFALL 2010 MICHAEL NELSON

HONORS THESIS CAPSTONEGOVERNMENT DEPARTMENT

This time...empirical methods

samplingsmall-N causal inference

sampling probability sampling

non-probability samplingsampling “challenges”

Groups in Sampling

The Theoretical Population

The Study Population

The Sampling Frame

The Sample

probability sampling from Henry

general sampling strategies from Patton

sampling & case selection challenges

• Population Size

• Sampling Bias

• probability of selection correlated with IV; will get the same relationship, but there is systematic non-representativeness

• Selection Bias

• subset of sampling bias; probability of selection correlated with DV

• underestimates the relationship (regression line b instead of a)

• Non-response Bias

• possibility that you are unable to collect data; data set is unrepresentative

misses getsx

y a, b

pop

misses

gets

x

ya

pop

b

Causal inference for small-N

researchproperties of small-N researchcase study purposes & types

strategies

Case selection

• For quantitative research, selection should be random

• For qualitative research, selection often must be done intentionally (King, Keohane and Verba, 1994).

properties of small-n research

• intensive

• field research in natural settings

• many kinds of data: observation, interview, archives

• typically: case-centered, not variable centered

Case selection strategies

Case studies and research design

from Gerring and McDermott (2007)

Gerring on case studies

Research Goals Case Study Cross-Case Study

1. Hypothesis Generating Testing

2. Validity Internal External

3. Causal Insight Mechanisms Effects

4. Scope of Proposition

Deep Broad

Empirical Factors Case Study Cross-Case Study

5. Populations of Cases

Heterogeneous Homogenous

6. Causal Strength Strong Weak

7. Useful Variation Rare Common

8. Data Availability Concentrated DispersedAdditional Factors Case Study Cross-Case Study

1. Causal Complexity

? ?

2. State of the Field ? ?

Case study purposes & types: case selection as sampling

1.Descriptive Case Study: atheoretical; goal is to understand the case itself

2.Plausibility Probe: does the empirical phenomena exist; focus on availability of data; concern with plausibility of finding relationships between variables of interest

3.Hypothesis-Generating Case Study: seeks to find a generalization about cause and effect

4.Hypothesis-Testing Case Studies

4.1. Critical Case

4.2. Rival Hypotheses

4.3. ....

Generating Hypotheses

Extreme cases

• Represent unusual values of the dependent or independent variables

• Used for hypothesis generation

• Not intended to be representative

Deviant cases

• Cases that deviate from the typical population

• A “high residual” case (outlier)

• Useful for generating hypotheses, especially new explanations for the outcome (dependent variable) of interest

Hypothesis- Testing Strategies: case selection

1.goal: establish the relationship between two or more variables

2.selection advice:

2.1. choose cases that minimize variability in the other variables that might impact the relationship you are investigating

2.2. representative sample

hypothesis - testing case studies

critical case

rival hypotheses

Selecting the typical case

• Look for cases that are “typical” other cases

• Idea is that these cases are “low residual” cases

• Useful for hypothesis testing.

Select diverse cases

• Select cases that are represent the full range of variation

• Useful for hypothesis generation and hypothesis testing

• Represent variation in the population but not necessarily the distribution of that population

Influential case

• Cases with influential configurations of the independent variables are chosen

• Useful for verifying the status of a highly influential case

• Not necessarily representative

Crucial case

• Cases that are likely to represent an outcome of interest

• Choice usually requires qualitative assessment of crucialness

• Useful for hypothesis testing

• Should be highly representative

Selecting cases on the Independent Variable

• You select cases based on the values of an independent variable(s)

• Requires that you know a little bit about all of the potential cases

• Requires you act as if you don’t know the values of the dependent variable

Mill’s Methods

agreement

difference

Most Similar cases

• Cases are selected based on their similarity on variables other than the independent variable the hypothesis is testing the outcome of interest

• Useful for hypothesis testing and generation

• Not necessarily representative of the broader

• Most Similar Systems analysis involves a non-equivalent group design:

N O X O

N O O

Thad’s example: income inequality and civil war

Income Inequality

Poverty Civil WarColonial Past

External Threat

Case Income Inequality

Poverty Colonial Past

External Threat

Civil War?

Costa Rica Moderate Yes Yup Nope No

El Salvador High Yes Yup Nope Yes

Cuba High Yes Yup Nope Yes

adapted from Thad Kousser, UCSD

Case selection challenges

Case study challenges

• Motive behind the selection of case studies is not obvious (Is it convenience? Or is it because they are good stories). Without understanding this, the project is at best useless and at worst terrible misleading.

• Generalizability – Can the lessons learned from this case be applied to a larger class?

• Falsifiability – Results are presented in such a way that it would be difficult for an impartial researcher to replicate the project and arrive at the same result.

• No or Negative Degrees of Freedom: The researcher has more explanatory variables (moving pieces) than observations.

• Selection on the Dependent Variable: Choosing cases because of their performance on outcome of interest.

Strategies: remember threats to internal & external validity!

• History, maturation, instrumentation (data limitations)

• Selection bias

• KKV give example of business school student who wants a high paid job and selects for his study sample only those graduates earning high salaries. He then relates salary to number of accounting courses. By excluding graduates with low salaries, he paradoxically underestimates the effect of additional accounting courses on income.

Geddes on selection bias

Geddes, continued

Strategies: combining with large-N

1. Goal: Increase number of observations

1.1. Comparative case with large-N analysis of embedded units

2. Goal: Study causal mechanisms

2.1. Large-N study establishes relationships between variables (causal effect)

2.2. Small-N study establishes causal mechanism, looking at intervening steps (causal mechanism)

2.3. Note: causal explanation requires an understanding of both the causal effect and the causal mechanism

3. Goal: Study of spuriousness

3.1. Large-N study establishes relationships between variables (causal effect)

3.2. Small-N study engages claims of spuriousness

4. Goal: Study of deviant cases

4.1. Large-N study establishes deviant cases

4.2. Small-N study examines deviant cases

5. Goal: Establish generality of findings

5.1. Small-N study suggests X causes Y, but lacks external validity

5.2. Large-N study looks to establish the generality of findings

Strategies:Increasing leverage for causal inference in case studies

1.Congruence Method: Test a hypothesis by understanding a case; looks for fit between theory and case; involves multiple independent variables

2.Pattern Matching: Type of congruence testing, usually focused on a single independent variable; compares alternative theories with respect to multiple outcomes

3. Process Tracing: Focus is on establishing the causal mechanism, by examining fit of theory to intervening causal steps; how does “X” produce a series of conditions that come together in some way (or don’t) to produce “Y”?

4. Counterfactual Analysis: Gain leverage through rigorous, disciplined thought experiments

Strategies: structured, focused comparison

1. “the comparison is focused because it deals selectively with only certain aspects of a historical case... and structured because it employs general questions to guide the data collection analysis in that historical case” - Alexander and George

2. Steps (Kaarbo and Beasley)2.1. Identify the research question2.2. Identify variables (usually from existing theory)2.3. Select cases: comparable cases with variation in

the values of the dependent variable, selected from across population subgroups (aids external validity)

2.4. Define and specify your measurement strategy for concepts, including a “codebook” for the questions you employ in data collection

2.5. “Code-write cases”2.6. Comparison (search for patterns) and implications

for theory

Sampling and case selection

Education

Transcript of Sampling and case selection