The Role of Incentives in Measuring Cognitive and Non ...€¦ · non-cognitive skills in...

Intro Design Methods Results Subgroup Conclusion Appendix References

The Role of Incentives in Measuring Cognitive

and Non-cognitive Skills: Experimental

Evidence from Primary Schools in Shanghai

Yuanyuan Chen, Shuaizhang Feng, James J. Heckman,and Tim Kautz

HCEO Measuring and Assessing SkillsMarch 3, 2017

This draft, March 2, 2017

Chen, Feng, Heckman, and Kautz Shanghai Incentives 1 / 67


Outline

1 Introduction

2 Experimental Design and Data

3 Analytical Methods

4 Main Results

5 Subgroup Analyses

6 Conclusion

7 AppendixChen, Feng, Heckman, and Kautz Shanghai Incentives 2 / 67


There is a growing interest in non-cognitive skills but measurement remainsan issue

1 A growing body of evidence shows the importance ofnon-cognitive skills in predicting life outcomes (Almlund et al.,2011)

2 Interventions have been shown to improve life outcomesthrough non-cognitive skills (Kautz et al., 2014)

3 Policy-makers are interested in expanding programs to developnon-cognitive skills but require reliable measures

4 One commonly used taxonomy is the Big Five (Openness,Conscientiousness, Extraversion, Agreeableness, andNeuroticism), often collected through self-reports



Impact of an “intervention” on math performance and self-reported Big Five

−.2

−.1

0.1

.2.3

Impa

ct (s

tand

ard

devi

atio

ns)

Math score(2016)

Openness(self)

Conscientiousness(self)

Extraversion(self)

Agreeableness(self)

Emotional Stability(self)

Treatment 1 Treatment 2p<0.05 (vs. Control) p<0.10 (vs. Control)+/− Standard error



More about the “intervention”

Both treatments cost less than $1 per student

Took less than five minutes to deliver to a classroom

The projected rate of return is huge



What is this intervention?

Right before a math test and Big Five survey, fourth-gradersreceived di↵erent instructions

Treatment 1 (“Honor incentive”): Receive a certificate ofhonor if the math test score in the top 10% of the school interms of overall performance or improvement

Treatment 2 (“Financial incentive”): Receive 50 Yuan(⇡$7.5) if the math test score in the top 10% of the school interms of overall performance or improvement

Self-reported Big Five was administered directly after the mathtest



Impact of an “intervention” on math performance and self-reported Big Five

−.2

0.2

.4Im

pact

(sta

ndar

d de

viat

ions

)

Math score(2016)

Openness(self)


Extraversion(self)

Agreeableness(self)


Honor Incentive Financialp<0.05 (vs. Control) p<0.10 (vs. Control)+/− Standard error



All measures are based on a behavior

Figure 1: Determinants of Task Performance

Task Performance

Effort

Character Skills

Cognitive Skills

Incentives

Non-CognitiveSkills

Test ScoresSelf-ReportsOther Behaviors

Source: Kautz et al. (2014).



Past studies establish the importance of incentives in cognitive testing

A series of studies show that IQ scores can be improved bygiving children candy or other incentives (Almlund et al., 2011)

Borghans et al. (2008) find that incentives a↵ect the timespent on IQ tests and people with higher levels of EmotionalStability and Conscientiousness are less a↵ected by incentives

Segal (2012) shows that coding speed scores can be influencedby incentives and that people with higher levels ofConscientiousness are more intrinsically motivated

We found no experimental studies that focus on the e↵ect ofthe situation on measures of non-cognitive skills



Main research questions

1 To what extent do cognitive skill measures depend onincentives or other aspects of the situation in a school setting?

2 Do di↵erent types of students respond di↵erently?

3 Do di↵erent incentives (monetary vs. non-monetary) workdi↵erently?

4 Could self-reports of non-cognitive skills be inadvertentlya↵ected by incentives?



Key findings

The incentives had little e↵ect on overall test scores but didimprove scores for better students

The honor treatment had a large and statistically significante↵ect on self-reported Big Five measures

Students in the “honor” treatment also rated their peers betterin terms of Big Five (particularly females)



Why did the honor incentive but not financial incentive a↵ect reporting ofthe Big Five?

Somehow shifted the students’ frame of mind

Elicited more social-desirability bias by causing them to think ofpublic recognition

Other ideas?



These results suggest caution in interpreting evaluations based onself-reports

Self-reported non-cognitive skill measures are used inpolicy-evaluation and school accountability

A meta-analysis of interventions with mostly short-term (lessthan 6 month) follow-ups found e↵ect sizes of 0.22-0.27 acrossfive domains and 0.57 for another (Durlak et al., 2011)

The honor incentive had impacts of approximately 0.10-0.20standard deviations

Interventions could plausibly have a similar psychological e↵ectas the honor incentive



Outline

1 Introduction



4 Main Results

5 Subgroup Analyses

6 Conclusion



Designed to maximize power and minimize contamination

1 Within schools, ranked students by fall math test scores andrandomized triplets of students with the same scores into thecontrol group, the honor treatment, or the financial treatment

2 On test day, separated students into classrooms based ontreatment status

3 Students completed the math test and self-report of Big Five

4 Students returned to original classroom and assessed the BigFive of a peer

5 Two weeks later, teachers assessed their own students’ Big Five



Survey description

Approximately 1,900 fourth-grade students from 19 di↵erentschools in Shanghai

1st wave of survey administered in Fall 2015

2nd wave of survey administered in Spring 2016

3rd wave of survey will be administered in Spring 2017



The data include a rich set of demographics, cognitive skills, andnon-cognitive skills

Demographic variables: gender, parental education, familyincome, rural hukou, Shanghai hukuo, private school, age

Cognitive skills: IQ, pre-intervention math test, math grades,Chinese grades, English grades

Non-cognitive skills : Big Five (self, peer, teacher reports),group-leader status, 1-3 rating of daily performance (teacherreport), 1-3 rating of punctuality (teacher report), 1-3 rating ofdiscipline (teacher report)



The experiment achieved baseline equivalence between the treatment andcontrol groups

Assess baseline equivalence using 30 di↵erent pre-programvariables

Of the 90 pairwise tests between groups, only 4 of them arestatistically significant at the % level



The distribution of p-values follow a distribution consistent with baselineequivalence

0.2

.4.6

.81

p−va

lues

0 .2 .4 .6 .8 1Quantiles of the Uniform Distribution

p−values Uniform Distribution



Outline

1 Introduction



4 Main Results

5 Subgroup Analyses

6 Conclusion



Estimation model:

Yis = ↵ + �honorT honoris + �financialT financial

is + �Xis + "is .

Yis : outcome for student i in school sT honor

is : indicator for honor treatmentT financial

is : indicator for financial treatmentXis : covariates (including school fixed e↵ects)"is : error term, allowing for heteroskedasticity



Main specification

To increase precision, control for background and abilitymeasures (and school fixed e↵ects)

Demographic variables: gender, parental education, familyincome, rural hukou, Shanghai hukuo, private school, age

Cognitive skills: IQ, pre-intervention math test, math grades,Chinese grades, English grades

Non-cognitive skills : Big Five (teacher reports), group-leaderstatus, 1-3 rating of daily performance (teacher report), 1-3rating of punctuality (teacher report), 1-3 rating of discipline(teacher report)

Results are similar with no controls or various combinations ofcontrols



Outcome measures

To reduce measurement error, we apply a factor model to eachgrouping of items in the Big Five traits separately and predictfactor scores (similar results if using means of items)

All outcomes are standardized so that they are mean zero andhave a standard deviation of one



Outline

1 Introduction



4 Main Results

5 Subgroup Analyses

6 Conclusion



Impact on math performance and self-reported Big Five (Full Sample)

−.4

−.2

0.2

.4Im

pact

(sta

ndar

d de

viat

ions

)

Math score(2016)

Openness(self)


Extraversion(self)

Agreeableness(self)


Honor incentive Monetary incentivep<0.05 (vs. Control) p<0.10 (vs. Control)+/− Standard error



No one item within the Big Five drove the results



Distribution of impact on individual Big Five items

Percentage of positive estimates (honor): 88%

Percentage of positive estimates (financial): 56%

02

46

Den

sity

−.3 −.2 −.1 0 .1 .2 .3Impact (Likert scale, 1−5)

Honor Financial



Distribution of p-values associated with impacts on individual Big Five items

0.2

.4.6

.81

p−va

lues

0 .2 .4 .6 .8 1Quantiles of the Uniform Distribution

p−values (Honor) p−values (Financial)Uniform Distribution



There are some gender di↵erences in the impacts on test scores



Impact on math performance and self-reported Big Five (Males)

−.4

−.2

0.2

.4Im

pact

(sta

ndar

d de

viat

ions

)

Math score(2016)

Openness(self)


Extraversion(self)

Agreeableness(self)





Impact on math performance and self-reported Big Five (Females)

−.4

−.2

0.2

.4Im

pact

(sta

ndar

d de

viat

ions

)

Math score(2016)

Openness(self)


Extraversion(self)

Agreeableness(self)





Consider the impact of the treatments on how students ratedtheir peers in two ways



Impact on how treatment groups are assessed by peers (Full Sample)

−.4

−.2

0.2

.4Im

pact

(sta

ndar

d de

viat

ions

)

Openness(peer)

Conscientiousness(peer)

Extraversion(peer)

Agreeableness(peer)

Emotional Stability(peer)




Impact on how treatment groups assess peers (Full Sample)

−.4

−.2

0.2

.4Im

pact

(sta

ndar

d de

viat

ions

)

Openness(other)

Conscientiousness(other)

Extraversion(other)

Agreeableness(other)

Emotional Stability(other)




Impact on how treatment groups assess peers (Males)

−.4

−.2

0.2

.4Im

pact

(sta

ndar

d de

viat

ions

)

Openness(other)


Extraversion(other)






Impact on how treatment groups assess peers (Females)

−.4

−.2

0.2

.4Im

pact

(sta

ndar

d de

viat

ions

)

Openness(other)


Extraversion(other)






Outline

1 Introduction



4 Main Results

5 Subgroup Analyses

6 Conclusion



Some subgroups responded di↵erently to incentives

Better performing and better behaved students performedbetter on the math test in response to incentives

The patterns were less consistent when examining self-reportedBig Five outcomes



Impact on math test scores by subgroup based on non-cognitive measures

−.3

−.2

−.1

0.1

.2Im

pact

(sta

ndar

d de

viat

ions

)

Group leader Punctuality Performance DisciplineYes

No

High

Low

High

Low

High

Low




Impact on math test scores by subgroup based on cognitive measures

−.1

0.1

.2Im

pact

(sta

ndar

d de

viat

ions

)

IQ Math test Math gradeHigh

Low

High

Low

High

Low




Impact on math test scores by subgroup based on Big Five (part 1)

−.05

0.0

5.1

.15

Impa

ct (s

tand

ard

devi

atio

ns)

Openness Conscientiousness ExtraversionHigh

Low

High

Low

High

Low




Impact on math test scores by subgroup based on Big Five (part 2)

−.05

0.0

5.1

.15

Impa

ct (s

tand

ard

devi

atio

ns)

Agreeableness Emotional StabilityHigh

Low

High

Low




Outline

1 Introduction



4 Main Results

5 Subgroup Analyses

6 Conclusion



Conclusion

Self-reported measures are susceptible to unintended biases

Standardizing for aspects of the situation will be important forpolicy evaluation and school accountability



Outline

1 Introduction



4 Main Results

5 Subgroup Analyses

6 Conclusion



Correlations between cognitive and non-cognitive measures

IQ

0.37 Math Test 15

0.35 0.71 Math Test 16

0.13 0.29 0.28 Chinese GPA

0.35 0.61 0.61 0.33 Math GPA

0.29 0.45 0.44 0.41 0.61 English GPA

0.20 0.33 0.36 0.27 0.27 0.31 O (Teach)

0.17 0.33 0.34 0.27 0.31 0.36 0.76 C (Teach)

0.12 0.21 0.23 0.21 0.22 0.20 0.76 0.58 E (Teach)

0.09 0.16 0.15 0.18 0.12 0.18 0.56 0.64 0.49 A (Teach)

0.02 0.11 0.08 0.13 0.09 0.15 0.45 0.58 0.43 0.75 ES (Teach)

0.23 0.29 0.30 0.21 0.33 0.33 0.31 0.40 0.21 0.18 0.15 Leader

0.27 0.42 0.43 0.32 0.43 0.43 0.56 0.64 0.40 0.29 0.22 0.50 Performance

−0.03 0.08 0.08 0.09 0.02 0.13 0.16 0.19 0.05 0.13 0.03 0.07 0.22 Punctuality

0.02 0.13 0.14 0.14 0.16 0.21 0.27 0.47 0.12 0.34 0.28 0.22 0.40 0.21 Discipline



Correlations between Big Five from teacher- and self-reports

O (Teach)

0.76 C (Teach)

0.76 0.57 E (Teach)

0.56 0.64 0.48 A (Teach)

0.45 0.58 0.43 0.75 ES (Teach)

0.23 0.14 0.23 0.04 0.01 Big O (Self)

0.21 0.27 0.17 0.13 0.11 0.58 Big C (Self)

0.21 0.11 0.24 0.06 0.02 0.64 0.51 Big E (Self)

0.13 0.14 0.10 0.11 0.09 0.51 0.55 0.52 Big A (Self)

0.07 0.09 0.06 0.10 0.13 0.28 0.42 0.28 0.40 Big ES (Self)



Low-performing students showed the most improvement

−100

−50

050

100

Impr

ovem

ent i

n sc

ore

0 20 40 60 80 1002015 math score



Distribution of di↵erences in math test scores for students above themedian on the 2015 test

0.0

1.0

2.0

3.0

4D

ensi

ty

−40 −30 −20 −10 0 10 20 30 40Difference in math scores



Distribution of di↵erences in math test scores for students below themedian on the 2015 test

0.0

1.0

2.0

3D

ensi

ty

−40 −30 −20 −10 0 10 20 30 40Difference in math scores



Impact on individual items of Openness to Experience

−.2

−.1

0.1

.2Im

pact

(1 to

5 s

cale

)

Orig

inal

Crio

us a

bout

diffe

rent

thin

gs

A de

ep th

inke

r

Activ

e im

agin

atio

n

Cre

ativ

e

Like

s ar

t and

spo

rts

Pref

ers

rout

ine

wor

k

Like

dee

p an

d ca

refu

lth

inki

ng, f

ull o

f ide

as

Inte

rest

ed in

art

Soph

istic

ated

in a

rt,m

usic

, or l

itera

ture




Impact on individual items of Conscientiousness

−.1

0.1

.2.3

Impa

ct (1

to 5

sca

le)

Doe

s a

thor

ough

job

Som

ewha

t car

eles

s

Rel

iabl

e w

orke

r

Tend

s to

be

diso

rgan

ized

Lazy

Pers

ever

es u

ntil

the

task

is fi

nish

ed

Doe

s th

ings

effi

cien

tly

Mak

e pl

ans

and

follo

wth

roug

h w

ith th

em

Easi

ly d

istra

cted




Impact on individual items of Extraversion

−.1

0.1

.2.3

Impa

ct (1

to 5

sca

le)

Talk

ativ

e

Res

erve

d

Full

of e

nerg

y

Enth

usia

sm

Tend

s to

be

quie

t

Asse

rtive

per

sona

lity

Som

etim

es b

esh

y an

d in

hibi

ted

Out

goin

g an

d so

ciab

le




Impact on individual items of Agreeableness

−.2

−.1

0.1

.2Im

pact

(1 to

5 s

cale

)

Find

faul

t with

oth

ers

Hel

pful

l to

othe

rs

Qua

rrels

with

oth

ers

Has

a fo

rgiv

ing

natu

re

Gen

eral

ly tr

ustin

g

Col

d an

d al

oof

Con

side

rate

and

kind

to o

ther

es

Som

etim

es ru

de to

oth

ers

Coo

pera

te w

ith o

ther

s




Impact on individual items of Neuroticism

−.2

0.2

.4Im

pact

(1 to

5 s

cale

)

Dep

ress

ed

Rel

axed

Can

be

tens

e

Wor

ried

Emot

iona

lly s

tabl

e

Can

be

moo

dy

Keep

s ca

lm in

tens

esi

tuat

ions

Ner

vous

eas

ily




Impact on math test scores by subgroup based on demographics

−.1

0.1

.2Im

pact

(sta

ndar

d de

viat

ions

)

Income AgeHigh

Low

High

Low




Impact on math performance and self-reported Big Five (Full Sample),mean scores

−.4

−.2

0.2

.4Im

pact

(sta

ndar

d de

viat

ions

)

Math score(2016)

Openness(self)


Extraversion(self)

Agreeableness(self)





Impact on math performance and self-reported Big Five (Full Sample), nocontrols

−.4

−.2

0.2

.4Im

pact

(sta

ndar

d de

viat

ions

)

Math score(2016)

Openness(self)


Extraversion(self)

Agreeableness(self)





Impact on math performance and self-reported Big Five (Full Sample),basic demographics

−.4

−.2

0.2

.4Im

pact

(sta

ndar

d de

viat

ions

)

Math score(2016)

Openness(self)


Extraversion(self)

Agreeableness(self)





Impact on math performance and self-reported Big Five (Full Sample),basic demographics and ability

−.4

−.2

0.2

.4Im

pact

(sta

ndar

d de

viat

ions

)

Math score(2016)

Openness(self)


Extraversion(self)

Agreeableness(self)





Impact on math performance and self-reported Big Five (Full Sample),basic demographics, ability, and school fixed e↵ects

−.4

−.2

0.2

.4Im

pact

(sta

ndar

d de

viat

ions

)

Math score(2016)

Openness(self)


Extraversion(self)

Agreeableness(self)





Female students had higher levels of cognitive ability

−.2

−.1

0.1

.2.3

Mea

n

IQ score 2015 math test score Chinese grade Math grade English grade

Male Female+/− Standard error



Female students had higher levels of Big Five personality

−.4

−.2

0.2

.4M

ean

Openness to Experience(teacher)

Conscientiousness(teacher)

Extraversion(teacher)

Agreeableness(teacher)

Neuroticism(teacher)




Female students had higher levels of other non-cognitive measures

0.2

.4.6

.81

Mea

n

Group leader High performance(teacher)

High punctuality(teacher)

High discipline(teacher)




Impact on how treatment groups are assessed by peers (Males)

−.4

−.2

0.2

.4Im

pact

(sta

ndar

d de

viat

ions

)

Openness(peer)


Extraversion(peer)

Agreeableness(peer)





Impact on how treatment groups are assessed by peers (Females)

−.4

−.2

0.2

.4Im

pact

(sta

ndar

d de

viat

ions

)

Openness(peer)


Extraversion(peer)

Agreeableness(peer)





Standard error inflation factor for treatment T (Cameron and Miller,2015):

⌧T ⇡ 1 + ⇢T⇢"�Ns � 1

�.

⇢T : within-cluster correlation of Tis

⇢": within-cluster correlation of "Ns : average cluster size



Almlund, M., A. Duckworth, J. J. Heckman, and T. Kautz (2011). Personality psychology andeconomics. In E. A. Hanushek, S. Machin, and L. Woßmann (Eds.), Handbook of theEconomics of Education, Volume 4, pp. 1–181. Amsterdam: Elsevier.

Borghans, L., H. Meijers, and B. ter Weel (2008, January). The role of noncognitive skills inexplaining cognitive test scores. Economic Inquiry 46(1), 2–12.

Cameron, A. C. and D. L. Miller (2015). A practitioners guide to cluster-robust inference.Journal of Human Resources 50(2), 317–372.

Durlak, J. A., R. P. Weissberg, A. B. Dymnicki, R. D. Taylor, and K. B. Schellinger (2011).The impact of enhancing students’ social and emotional learning: A meta-analysis ofschool-based universal interventions. Child Development 82(1), 405–432.

Kautz, T., J. Heckman, R. Diris, B. ter Weel, and L. Borghans (2014). Fostering andmeasuring skills: Improving cognitive and non-cognitive skills to promote lifetime success.OECD.

Segal, C. (2012, August). Working when no one is watching: Motivation, test scores, andeconomic success. Management Science 58(8), 1438–1457.


The Role of Incentives in Measuring Cognitive and Non ...€¦ · non-cognitive skills in...

Documents

Transcript of The Role of Incentives in Measuring Cognitive and Non ...€¦ · non-cognitive skills in...