Measuring Cause and Effect in Jewish Education - Professor Adam Gamoran

Impact or Bias? Measuring Cause and Effect in Jewish Education

Adam GamoranUniversity of Wisconsin-Madison

Ripped from the Headlines!“Is Birthright Israel an Intermarriage

Panacea?” – The Forward

“Birthright Alumni Found Less Likely to Marry Non-Jews” – Ha’Aretz

“Birthright Study Offers Mixed Bag of Results on Jewish Connections” – The Jewish World

Searching for the “Cure” Israel trips, day schools, summer camps have

often been touted as “cures” for intergenerational loss of Jewishness

Most programs have not been studied at all Available research often uses designs that are

not suited to answer questions about cause and effect

Education Research Falls Short Enthusiasm for untested policies is common

in general education as well US federal policy advocates policies that lack

evidence Pay for performance Charter schools

These may or may not be good policies, but they have not been carefully tested

Education Research Falls Short Failing schools should be “restructured”

Education research has provided almost no evidence about how to undertake such radical reforms successfully

All we have is practical wisdom No studies about restructuring permit judgments

of cause and effect Meanwhile, educators have to make decisions

every day

The New Education Science The U.S. Institute of Education Sciences

(IES) is trying to change the focus of education research Limit causal claims to research designs that

support causal inference Encouraged experimental and quasi-experimental

research with attention to causal inference

The New Education Sciences This change is not really about the

method, but about the QUESTION: What education programs, practices, and

policies are effective at raising scores and reducing gaps?

If the question is “what works,” it is clear we need research methods that permit judgments about cause and effect

The New Education Sciences Why do we lack evidence about cause and

effect in education? We have not responded to the Fundamental

Problem of Causal Inference

Fundamental Problem of Causal Inference

We cannot observe a unit in both the treated and untreated conditions simultaneously

If a unit undergoes a treatment, and changes in some way, we may want to attribute the change to the treatment

But how do we know the unit would not have changed in the absence of the treatment? Depends on assumptions

Fundamental Problem of Causal Inference

If two different units are identical, we might infer causation

But this also depends on assumptions In particular, inferring causation from comparison

of two different units assumes no selectivity bias In education, this assumption rarely holds

Examples of Selectivity in Education Class size effects

Better teachers may use clout to get smaller classes Or better teachers may be more often requested and

end up with larger classes!

Teacher professional development impact Because participation is often voluntary, it is difficult

to distinguish effects of participation from effects of who participates and who does not

Examples of Selectivity in Education Jewish day school effects

Students who attend day schools may come from more Jewishly active families than other students

Or the students themselves may be more committed to Jewish involvement than other students

These conditions make it difficult to distinguish the effects of day schools from the effects of the students and families who choose day schools

Source: Grover Whitehurst, “The Institute of Education Sciences: New wine, new bottles,” 2003. http://www.ed.gov/rschstat/research/pubs/ies.html

Addressing the Selectivity Problem A randomized experiment is the optimal way

to rule out selectivity bias Participants are assigned to treatment and control

groups at random Self-selection does not occur Bias is eliminated

Addressing the Selectivity Problem U.S. education law calls for randomized trials

“…using rigorous methodological designs and techniques, including control groups and random assignment, to the extent feasible, to produce reliable evidence of effectiveness.”

Not without controversy When it comes to measuring impact, no

substitute for random assignment

Addressing the Selectivity Problem Why is experimental research so rare in

education? Education viewed more as an art than a science Ethical problem of “denying treatment” to those

who deserve it Practical problems of mounting an experiment Trade-offs between internal and external validity

Lessons from Experiments What have we learned from recent experiments?

Barriers to cooperation can be overcome Need for patience to let results emerge Importance of implementation Difficulties of scaling up Limitations to generalizability

Illustrations from recent work

Overcoming Barriers to Experiments Will school districts agree to random

assignment? Districts are increasingly recognizing the need for

unbiased assessments of new programs If we KNOW the benefits of the program, it is

unethical to conduct an experiment If we do NOT KNOW the benefits of the

program, it is unethical NOT to evaluate it

Overcoming Barriers to Experiments Milwaukee, Los Angeles, San Antonio, and

Phoenix are all examples of districts that are engaged in pioneering randomized trials

Is random assignment a “denial of services”? Not when a program is being phased in Not if the program is only available because of

the research funds

Overcoming Barriers to Experiments Design options

Treatment versus control Lagged treatment Lottery-based individual assignment Grade-by-grade randomization

Overcoming Barriers to Experiments Example: Randomized evaluation of Success

for All (SFA) Geoffrey Borman, Robert Slavin, et al. Difficulties recruiting schools Solution

Randomized assignment of grade K-2 or 3-5 to SFA Grades not assigned to SFA served as controls for

other schools

Analyzing Data from Experiments The SFA evaluation is an example of a

“cluster-randomized” design Schools, not individuals, are randomly assigned Requires a multilevel model for estimation Treatment effect are captured at the cluster level,

not the student level

Success for All Findings (Effect sizes)

-0.2

-0.1

0

0.1

0.2

0.3

0.4

Year 1 Year 2 Year 3

Letter Identification Word IdentificationWord Attack Passage Comprehension

Source: Adapted from Borman et al. (2007), Table 5.

Lessons from Experiments The SFA evaluation also illustrates the

importance of patience A one-year study would have missed the results

SFA has high fidelity of implementation It is tightly scripted Instruction can be monitored to see whether

teachers are following the script

Lessons from Experiments The three keys to a successful randomized

trial Implementation Implementation Implementation

If the program is not implemented with fidelity, it will not achieve its desired impact

Lessons from Experiments Example: Instructional Technology

Many small-scale studies have shown benefits of technology-based instruction

Federally-sponsored, large-scale study conducted by Mathematica showed zero impact

Implementation was limited E.g. in middle school math: 15 minutes per week

Lessons from Experiments Challenges of patience, implementation, and

scaling up are also salient in my work Study of professional development for

elementary science teaching in Los Angeles Science “immersion” – an extended, inquiry-

oriented science curriculum for grades 4-5 Summer professional development with ongoing

mentoring to help teachers implement

Science PD Evaluation Scaling up: Would this help science learning

throughout the district? Goal: Include all schools, do not “cherry-pick” Were able to include about half the schools in our

pool for random selection (191 schools) 80 schools randomly selected for the study 40 randomly assigned to “immersion,” 40 were

comparison schools

Science PD Evaluation Implementation: Would teachers attend the

summer professional development? 30/40 schools sent grade 4 teachers, and 22/40

schools sent grade 5 teachers 36/40 schools sent at least one teacher

Follow-up participation was weak Immersion is not tightly scripted

Instruction varied greatly across classes

Science PD Evaluation Findings: Implementation “dip”

No difference pre-treatment Year 1: Lower scores in immersion schools Year 2: Equal scores in immersion and

comparison schools

Science PD Evaluation: Grade 4

Science PD Evaluation Interpretation: Better implementation or

abandonment of immersion? Observations: Use of immersion went from 80%

in Year 1 to 30% in Year 2 Surveys: 26% of teachers in immersion schools

reported using immersion “a lot” in Year 2 Probably some of both is taking place We are awaiting results for Year 3

Lessons From Experiments Include educators and schools as partners

We cannot impose interventions on educators Implementation will succeed only with buy-in

from school staff Study recruitment often requires school district

partnership

Lessons from Experiments Experiments are strong on internal validity,

but weak on external validity We can have confidence in our judgment

about cause and effect, but not in whether the effect would generalize to other places

Generalizability in Education Experiments Limits of generalizability Class size research

Tennessee STAR experiment showed that smaller classes boost achievement in grades K-1

Effects are sustained but do not increase Similar efforts in California and Florida have

failed FL: Lack of funding CA: Lack of trained teachers and space

Generalizability in Education Experiments National survey analysis also failed to find

class size effects Unobserved selectivity or lack of

generalizability? Early Childhood Longitudinal Study

Survey of the Kindergarten class of 1988 Comparisons of teachers with two classes

confirms the finding of no effect

Is the Birthright Study an Experiment? Recent study by Leonard Saxe and colleagues

indicates positive effects of Birthright Israel 23% greater likelihood of sense of connection to

Israel 50% more likely to feel “very confident” of

ability to explain current situation in Israel 22% more likely to belong to a congregation 57% more likely to have a Jewish spouse

(married non-Orthodox respondents)

Is the Birthright Study an Experiment? Natural experiment: Comparison of applicants

who attended to applicants who did not Generalizable only to applicants Unbiased comparison?

Main reason for not attending: timing of dates offered for trip was inconvenient

“The selection process was more or less random” (p.10) – more details would help!!

No difference on observables other than age

Is the Birthright Study an Experiment? “Observable” – a condition that has been

measured Age, gender, denomination, etc.

Contrasted with “unobservable” – a condition that has not been measured Motivation, commitment

Observables can be addressed with statistical methods, unobservables are harder to control

Is the Birthright Study an Experiment? No difference on observables other than age

Jewish schooling, gender, ritual practices, etc. What about unobservables?

Motivation, commitment, interest in being Jewish, exploring Israel

Involvement in non-Jewish activities If inconvenient timing of trips was the main reason

for non-participation, who got preferred dates, why? Differences in such unobserved characteristics may

bias the results

Is the Birthright Study an Experiment? Additional concerns

Differential response rates 61.8% participants vs 42.3% non-participants Addressed with weighting, but that assumes

respondents and non-respondents are similar Reasons for non-response

No contact (26% participants, 30.5% non-participants) Lack of cooperation (6.4% participants, 19.6% non-

participants) Differential response could bias the results in either

direction

Is the Birthright Study an Experiment? Additional concerns

Censoring on marriage This study captures people who marry young In-marriage rates may differ for those who marry

older For this reason, the intermarriage finding should be

treated with particular caution

Is the Birthright Study an Experiment? The Birthright study is closer to an

experiment than most research in Jewish education

Deserves special attention Yet recognize limited generalizability and

potential problems due to non-random selection and differential response rates

Could the Birthright Study Have Been Conducted as a True Experiment? Yes, if it had been set up that way For many years, Birthright has been

oversubscribed Instead of first-come, first-served, establish a

deadline and then select participants by lottery That would have ruled out bias due to

unobserved characteristics Also get better contact information to reduce

non-response rate among non-participants

Advancing the New Education Science With all the advances in curricula and

teaching methods, we should be asking some “what works” questions

Though not always feasible, randomized trials are the optimal method for answering these questions

There are many pitfalls in executing randomized trials, but the potential benefits make the effort worthwhile

Further Reading on Randomized Trials in Education

Bloom, H. S. (2006). Learning more from social experiments: Evolving analytic approaches. New York: Russell Sage Foundation.

Bloom, H. S., Bos, J. M., & Lee, S. W. (1999). Using cluster random assignment to measure program impacts: Statistical implications for the evaluation of education programs. Evaluation Review, 23, 445–469.

Boruch, R., May, H., Turner, H., Lavenberg, J., Petrosino, A., & de Moya, D. (2004). Estimating the effects of interventions that are deployed in many places: Place-randomized trials. American Behavioral Scientist, 47, 608–633.

Borman, G. D. (2002). Experiments for educational evaluation and improvement. Peabody Journal of Education, 77, 7-27.

Borman, G. D., Gamoran, A., & Bowdon, J. (2008). A randomized trial of teacher development in elementary science: First-year effects. Journal of Research on Educational Effectiveness, 1, 237-264.

Borman, G. D., Slavin, R. E., Cheung, A. C. K., Chamberlin, A. M., Madden, N. A., & Chambers, B. (2007). Final reading outcomes of the national randomized field trial of Success for All. American Educational Research Journal, 44, 701-731.

Further Reading on Randomized Trials in Education

Cook, T. T. (2003). Why have educational evaluators chosen not to do experiments? Annals, APASS, 589, 114-149.

Dynarski, M., et al. (2007). Effectiveness of reading and mathematics software products: Findings from the first student cohort. Washington, DC: U.S. Department of Education.

Milesi, C., & Gamoran, A. (2006). Effects of class size and instruction on kindergarten achievement.” Educational Evaluation and Policy Analysis, 28, 287-313.

Raudenbush, S. W. (1997). Statistical analysis and optimal design for cluster randomized trials. Psychological Methods, 2, 173–185.

Schneider, B., Carnoy, M., Kilpatrick, J., Schmidt, W. H., & Shavelson, R. J. (2007). Estimating causal effects using experimental and observational designs. Washington, DC: American Educational Research Association.

Whitehurst, G. (2003). The Institute of Education Sciences: New wine, new bottles. http://www.ed.gov/rschstat/research/pubs/ies.html

Measuring Cause and Effect in Jewish Education - Professor Adam Gamoran

Education

Transcript of Measuring Cause and Effect in Jewish Education - Professor Adam Gamoran