MEASUREMENT - Abdul Latif Jameel Poverty Action Lab · Students like chocolate Participants eat...

MEASUREMENT Martin Abel J-PAL Africa Executive Education 18-22 January 2016

Outline 1. What should you measure ?

2.  Indicators: What makes a good measure? 3. What data can you use?

4.  Challenges to measurement

5. When should you measure?

Students like

chocolate

Participants eat chocolates

Opening the Black Box – Theory of Change

Feed Students Chocolate before Class

Participants are more motivated

Less focused in class

Intervention

Inputs

Intermediary outcomes

Outcomes

Assumptions

Better Test Scores

Outputs

Participants are distracted

More focused in class

Chocolates given to students

Needs Assessment Poor Learning of Students

Black Box

!  If we don’t see an effect: where was the causal chain interrupted?

Outline 1.  What should you measure ?




Theory of Change (ToC)

• Definition: • A theory of change is a description of how an intervention is supposed to

deliver the desired results.

• ToC •  sets out the causal logic of how and why a particular project, program, or

policy will reach its intended outcomes. •  depicts the sequence of events leading to these outcomes; explores the

assumptions needed for these changes to take place.

What should you measure?

•  In 1993, a cons@tu@onal amendment in India called for a random one third of village council leader, or pradhan, posi@ons to be reserved for women.

Chattopadhyay, Raghabendra, and Esther Duflo. 2004. "Women as Policy Makers: Evidence from a Randomized Policy Experiment in India." Econometrica 72(5): 1409-43.

From Quotas to Better Health

Women Quotas

Village council has actual decision power

More women Village council

Women are empowered Women have different preferences

Different Public goods

Different health outcomes? 7

Need Assessment

Intervention

Output indicators

Intermediary outcomes

Outcomes

Assumptions

Low investment in health at village level

Women can express own preferences

Black Box

!  In practice, we need to find indicators to measure each step of the causal chain.

Source: Roduner, Schlappi (2008) Logical Framework Approach and Outcome Mapping, A constructive Attempt of Synthesis,

Objectives Hierarchy Indicators Sources of Verification Assumptions / Threats

Impact (Goal/ Overall

objective)

Public good investment represents women’s preferences

Government spending

Administrative data: Budgets, Balance Sheets

Pradhan preferences matter: imperfect/some democracy

Outcome (Project Objective)

Women voice political views

Number of times a woman spoke

Transcript from village meeting

Women develop independent views

Outputs More female Pradhans Whether or not a Panchayat had a female Pradhan

Administrative records The law is implemented, there is no backlash

Inputs (Activities)

Reservations for women

Law is passed The constitution The government realizes the need for women representation



2.   Indicators: What makes a good measure? 3. What data can you use?





Impact (Goal/ Overall

objective)


Government spending










Inputs (Activities)

Reservations for women

Law is passed The constitution The government realizes the need for women representation


More accurate ! "

Mor

e pr

ecis

e

You want accuracy and precision : Getting Accuracy and Precision

More accurate ! “Biased” “Unbiased”

On average, we get the wrong answer

On average, we get the right answer

Terms “Biased” and “Unbiased” Used to Describe Accuracy

" M

ore

prec

ise “Noisy”

“Precise”

Random error causes answer to bounce around

Measures of the same thing cluster together

Terms “Noisy” and “Precise” Used to Describe Precision

•  In theory: • How well does the indicator map to the outcome? (e.g. intelligence ! IQ tests)

•  In prac;ce: Are you geKng unbiased answers?

•  Social desirability bias (response bias)

• Anchoring bias (Strack and Mussweiler, 1997)

•  Framing effect

• Given that violence against women is a problem, should we impose nighKme

curfews?

Accuracy

•  In theory: The measure is consistent, precise, but not necessarily valid •  In prac;ce:

•  Length, fa@gue • Ambiguous wording (defini@ons, rela@onships, recall period, units of ques@on) -‐ Eg. Defini@on of terms – ‘household’, ‘income’

• Recall period/units of ques@on

• Type of answer -‐Open/Closed -‐ Choice of op@ons for closed ques@ons • Likert (i.e. Strongly disagree, disagree, neither agree nor disagree, . . .) • Rankings

Precision and Random Error


2.  Indicators: What makes a good measure? 3.  What data can you use?





Impact (Goal/ Overall objective)


Government spending










Inputs (Activities)

Reservations for women Law is passed The constitution The government realizes the need for women representation

What data do you use?

Where can you get data?

• Administra@ve Data •  State government census data •  Cell phone usage (e.g. anonymized mPesa transfers)

• Other Secondary Data • World Bank/UN/IFPRI •  Innova@ve sources, e.g. Night Light Data

•  Primary Data •  Surveys •  Games, Vigneees, structured observa@ons, etc.

Administrative Data

West Bengal Rajasthan

Issue Investment Issue Reserved Issue Reserved

W M Investment W M Investment

Drinking Water # facilities 31% 17% 9.09 54% 49% 2.62

Road Improvement

Road Condition (0-1) 31% 25% 0.18 13% 23% -0.08

Irrigation # facilities 4% 20% -0.38 2% 4% -0.02

Education Informal education center 6% 12% -0.06 5% 13%

Primary Data: Instruments to collect data

• Questionnaires •  Focus groups • Biomarkers • Mechanical tracking devices (ex.: camera) • Random spot checks • Spatial demography (ex.: GPS, satellite imagery) • Games • Vignettes (subject is presented with hypothetical situation) •  Implicit association tests (ex.: gender-occupation sorting)

Primary Data: Survey Creation

•  SMURF Criteria

• Specific – asks one ques@on at a @me

• Measurable – quan@fies accurate and unbiased informa@on

• Understandable – easy to comprehend and clearly defined

• Relevant – measure a key or intermediate outcome •  Framed – has boundaries, e.g. @me frame

Instruments (I) • Questionnaires

•  Captures information from an individual •  Can be

•  Cross-sectional: snap shot in time •  Panel: change over time, common in RCTs

•  Focus group discussions •  Interview with more than one individual at a time •  Advantages?

• Random spot checks •  Some information can’t easily be collected through baseline or endline questionnaire

•  Eg. Corruption, teacher attendance , discrimination, service delivery

•  What types of spot checks might be used?

Instruments (II) •  Biomarkers

•  Biological measurement collected from individuals •  Often the primary outcome of the programme

•  E.g. hemoglobin, worms, STIs •  May also be be used to improve accuracy of measurement

•  E.g. cotinine, hemoglobin, bacterial coliforom •  Other examples?

•  Spatial demography •  Outcome measurement using satellite image

•  E.g. agricultural productivity, cap and trade •  E.g. Nighlight data

•  Mechanical tracking devices •  Improve accuracy and frequency of reporting •  E.g. cameras, fingerprinting

Instruments (III) Vignettes •  Also tries to mitigate problem of inaccuracy in self-reports •  Presents subject with hypothetical situation and response reflects implicit biases

•  Example (Beaman et al 2009): •  Respondents in India rated “speech” delivered by council members for effectiveness •  Gender of voice varied randomly across correspondents •  Difference in mean scores reflects gender bias within communities

•  Example: •  Survey question: “A doctor and his son have an accident and the doctor is killed. The

boy is rushed to surgery, and the surgeon says, “this is my son.” What is going on?” •  A) The doctor’s wife was unfaithful •  B) The surgeon is the boy’s mother

Instruments (III) Participatory techniques • Games

•  Dictator game- altruism and allocation of resources •  Trust game •  Ultimatum game

• Structured Community Activities •  Observe how community members respond to concrete situations requiring collective action •  Examples

•  Managing community decision between alternatives •  Managing asset given for free •  Raising funds in response to grant raising opportunity

•  Measures: participation, attendance at meetings, distribution of contributions and gains, access to public goods

Data Collection Considerations •  Ethics •  Compliance •  Respondent fa@gue • Quality control •  Surveyor training •  Surveyor (gender) composi@on

• Human subjects • Data Security •  Electronic v paper •  Costs



4.   Challenges to measurement


(1) Things people do not know very well

(2) Things people do not want to talk about

(3) Abstract concepts

(4) Things that are not (always) directly observable

Things that are hard to measure

How much tea did you drink last month? A.  < 2 liters B.  2 - 5 liters C.  6 - 10 liters D.  > 11 liters

#  What: Anything to es@mate, par@cularly across @me. Prone to recall error and poor es@ma@on •  Examples: distance to health center, profit, consump@on, income, plot size

#  Strategies: •  Consistency checks / Triangula@on

•  How much did you spend in the last week on x? How much did you spend in the last 4 weeks on x?

•  Mul@ple measurements of same indicator – How many minutes does it take to walk to the health center? How many kilometers away is the health center?

•  FIELD TEST!

1. Things people do not know very well

How many cups of tea did you consume yesterday?

A.  0 B.  1 - 3 C.  4 - 6 D.  > 6

!  Asking about liters per month vs. cups yesterday - What is the trade-off?





What is Hard to Measure?

How frequently do you yell at your spouse?

A.  Daily B.  Several times per week C.  Once per week D.  Once per month E.  Never

#  What: Anything socially “risky” or something painful •  Examples: sexual ac@vity, alcohol and drug use, domes@c violence, conduct

during war@me, mental health •  SA Example: Legal status of domes@c workers.

#  Strategies: •  Don’t start with the hard stuff! •  Consider asking ques@ons in third person (“Do people take bribes…?”) •  Always ensure comfort and privacy of respondent •  Think of innova@ve techniques – see next slide

2. Things people don’t want to talk about

Innovations in Data Collection I

! What are the downsides of these approaches?





36


“I feel more empowered now than last year”

A.  Strongly disagree B.  Disagree C.  Neither agree nor disagree D.  Agree E.  Strongly agree

#  What: Poten@ally the most challenging and interes@ng type of difficult-‐to-‐measure indicators

•  Examples: empowerment, bargaining power, social cohesion, risk aversion

#  Strategies: •  Define what you mean by your abstract concept

•  Design a ques@on that people can relate to from their everyday life

3. Abstract concepts

“I feel more empowered now than last year”

A.  Strongly disagree B.  Disagree C.  Neither agree nor disagree D.  Agree E.  Strongly agree ! Alternative suggestions of how to

ask about empowerment?

SA Example: Time Preference: Failure: Do you prefer 100 Rand today or 150 Rand one month from now?

What to ask instead?

#  What: You may want to measure outcomes that you can’t ask about directly or directly observe •  Examples: corrup@on, fraud,

#  Example: Embezzlement of road building funds in Indonesia ! sugges@ons of how to measure?

4. Things that aren’t Directly Observable




5.  When should you measure?

When to collect data?

• Baseline Survey: before the random assignment • Optional – WHY?

•  Advantage: balance check, stratification, statistical power •  Disadvantage: Cost, time

Evaluation Sample

Random Assignment

Treatment group

Control group Balance Check

Endline Survey

Measure Impact

Baseline Survey


• Measure inputs and outputs specified in ToC •  Ex: Tutoring program for poor students

•  How many tutoring sessions are held? How long is each session? How many students attend? Is the appropriate teaching material available?

• Collect data on who receives treatment to monitor compliance with random assignment

Evaluation Sample

Random Assignment

Treatment group

Control group Balance Check

Endline Survey

Measure Impact

Baseline Survey


• Endline Survey: after treatment (Timing depends on intervention) • Multiple follow-up surveys possible (medium vs. long-run effects)

Evaluation Sample

Random Assignment

Treatment group

Control group Internal Validity

Balance Check

Endline Survey

Measure Impact

Baseline Survey

Case Study: Skill Certification of Unemployed Youth •  Idea:

•  There are information asymmetries in the labour market, i.e. it is difficult for firms to identify the best applicants

•  Conduct a English and Math aptitude test and issue a skill certificate that employers can use in the application process

•  Identify the theory of change • Design:

1.  Send out applications on behalf of job seekers and randomise whether to attach certificate

2.  Randomize among job seekers who will receive certificate and measure employment outcomes

! Why study both 1. and 2.?

Summary: The Importance of Measurement for RCTs

•  Randomised evalua@ons allow you to create a causal rela@onship between your outcome and your interven@on because of the sta@s@cal equivalence of the control and the treatment group.

•  To iden@fy important causal rela@onships you need to measure the right things.

•  The theory of change helps you iden@fy what you need to measure.

•  Correct measurement methodology is essen@al for aeaining a causal rela@onship. If your measuring is incorrect you jeopardize iden@fying the correct causal rela@onship.

MEASUREMENT Martin Abel J-PAL Africa Executive Education 18-22 January 2016

MEASUREMENT - Abdul Latif Jameel Poverty Action Lab · Students like chocolate Participants eat...

Documents

Transcript of MEASUREMENT - Abdul Latif Jameel Poverty Action Lab · Students like chocolate Participants eat...