Measuring Cause and Effect in Jewish Education - Professor Adam Gamoran
-
Upload
berman-jewish-policy-archive-nyu-wagner -
Category
Education
-
view
857 -
download
0
description
Transcript of Measuring Cause and Effect in Jewish Education - Professor Adam Gamoran
Impact or Bias? Measuring Cause and Effect in Jewish Education
Adam GamoranUniversity of Wisconsin-Madison
Ripped from the Headlines!“Is Birthright Israel an Intermarriage
Panacea?” – The Forward
“Birthright Alumni Found Less Likely to Marry Non-Jews” – Ha’Aretz
“Birthright Study Offers Mixed Bag of Results on Jewish Connections” – The Jewish World
Searching for the “Cure” Israel trips, day schools, summer camps have
often been touted as “cures” for intergenerational loss of Jewishness
Most programs have not been studied at all Available research often uses designs that are
not suited to answer questions about cause and effect
Education Research Falls Short Enthusiasm for untested policies is common
in general education as well US federal policy advocates policies that lack
evidence Pay for performance Charter schools
These may or may not be good policies, but they have not been carefully tested
Education Research Falls Short Failing schools should be “restructured”
Education research has provided almost no evidence about how to undertake such radical reforms successfully
All we have is practical wisdom No studies about restructuring permit judgments
of cause and effect Meanwhile, educators have to make decisions
every day
The New Education Science The U.S. Institute of Education Sciences
(IES) is trying to change the focus of education research Limit causal claims to research designs that
support causal inference Encouraged experimental and quasi-experimental
research with attention to causal inference
The New Education Sciences This change is not really about the
method, but about the QUESTION: What education programs, practices, and
policies are effective at raising scores and reducing gaps?
If the question is “what works,” it is clear we need research methods that permit judgments about cause and effect
The New Education Sciences Why do we lack evidence about cause and
effect in education? We have not responded to the Fundamental
Problem of Causal Inference
Fundamental Problem of Causal Inference
We cannot observe a unit in both the treated and untreated conditions simultaneously
If a unit undergoes a treatment, and changes in some way, we may want to attribute the change to the treatment
But how do we know the unit would not have changed in the absence of the treatment? Depends on assumptions
Fundamental Problem of Causal Inference
If two different units are identical, we might infer causation
But this also depends on assumptions In particular, inferring causation from comparison
of two different units assumes no selectivity bias In education, this assumption rarely holds
Examples of Selectivity in Education Class size effects
Better teachers may use clout to get smaller classes Or better teachers may be more often requested and
end up with larger classes!
Teacher professional development impact Because participation is often voluntary, it is difficult
to distinguish effects of participation from effects of who participates and who does not
Examples of Selectivity in Education Jewish day school effects
Students who attend day schools may come from more Jewishly active families than other students
Or the students themselves may be more committed to Jewish involvement than other students
These conditions make it difficult to distinguish the effects of day schools from the effects of the students and families who choose day schools
Source: Grover Whitehurst, “The Institute of Education Sciences: New wine, new bottles,” 2003. http://www.ed.gov/rschstat/research/pubs/ies.html
Source: Grover Whitehurst, “The Institute of Education Sciences: New wine, new bottles,” 2003. http://www.ed.gov/rschstat/research/pubs/ies.html
Addressing the Selectivity Problem A randomized experiment is the optimal way
to rule out selectivity bias Participants are assigned to treatment and control
groups at random Self-selection does not occur Bias is eliminated
Addressing the Selectivity Problem U.S. education law calls for randomized trials
“…using rigorous methodological designs and techniques, including control groups and random assignment, to the extent feasible, to produce reliable evidence of effectiveness.”
Not without controversy When it comes to measuring impact, no
substitute for random assignment
Addressing the Selectivity Problem Why is experimental research so rare in
education? Education viewed more as an art than a science Ethical problem of “denying treatment” to those
who deserve it Practical problems of mounting an experiment Trade-offs between internal and external validity
Lessons from Experiments What have we learned from recent experiments?
Barriers to cooperation can be overcome Need for patience to let results emerge Importance of implementation Difficulties of scaling up Limitations to generalizability
Illustrations from recent work
Overcoming Barriers to Experiments Will school districts agree to random
assignment? Districts are increasingly recognizing the need for
unbiased assessments of new programs If we KNOW the benefits of the program, it is
unethical to conduct an experiment If we do NOT KNOW the benefits of the
program, it is unethical NOT to evaluate it
Overcoming Barriers to Experiments Milwaukee, Los Angeles, San Antonio, and
Phoenix are all examples of districts that are engaged in pioneering randomized trials
Is random assignment a “denial of services”? Not when a program is being phased in Not if the program is only available because of
the research funds
Overcoming Barriers to Experiments Design options
Treatment versus control Lagged treatment Lottery-based individual assignment Grade-by-grade randomization
Overcoming Barriers to Experiments Example: Randomized evaluation of Success
for All (SFA) Geoffrey Borman, Robert Slavin, et al. Difficulties recruiting schools Solution
Randomized assignment of grade K-2 or 3-5 to SFA Grades not assigned to SFA served as controls for
other schools
Analyzing Data from Experiments The SFA evaluation is an example of a
“cluster-randomized” design Schools, not individuals, are randomly assigned Requires a multilevel model for estimation Treatment effect are captured at the cluster level,
not the student level
Success for All Findings (Effect sizes)
-0.2
-0.1
0
0.1
0.2
0.3
0.4
Year 1 Year 2 Year 3
Letter Identification Word IdentificationWord Attack Passage Comprehension
Source: Adapted from Borman et al. (2007), Table 5.
Lessons from Experiments The SFA evaluation also illustrates the
importance of patience A one-year study would have missed the results
SFA has high fidelity of implementation It is tightly scripted Instruction can be monitored to see whether
teachers are following the script
Lessons from Experiments The three keys to a successful randomized
trial Implementation Implementation Implementation
If the program is not implemented with fidelity, it will not achieve its desired impact
Lessons from Experiments Example: Instructional Technology
Many small-scale studies have shown benefits of technology-based instruction
Federally-sponsored, large-scale study conducted by Mathematica showed zero impact
Implementation was limited E.g. in middle school math: 15 minutes per week
Lessons from Experiments Challenges of patience, implementation, and
scaling up are also salient in my work Study of professional development for
elementary science teaching in Los Angeles Science “immersion” – an extended, inquiry-
oriented science curriculum for grades 4-5 Summer professional development with ongoing
mentoring to help teachers implement
Science PD Evaluation Scaling up: Would this help science learning
throughout the district? Goal: Include all schools, do not “cherry-pick” Were able to include about half the schools in our
pool for random selection (191 schools) 80 schools randomly selected for the study 40 randomly assigned to “immersion,” 40 were
comparison schools
Science PD Evaluation Implementation: Would teachers attend the
summer professional development? 30/40 schools sent grade 4 teachers, and 22/40
schools sent grade 5 teachers 36/40 schools sent at least one teacher
Follow-up participation was weak Immersion is not tightly scripted
Instruction varied greatly across classes
Science PD Evaluation Findings: Implementation “dip”
No difference pre-treatment Year 1: Lower scores in immersion schools Year 2: Equal scores in immersion and
comparison schools
Science PD Evaluation: Grade 4
Science PD Evaluation Interpretation: Better implementation or
abandonment of immersion? Observations: Use of immersion went from 80%
in Year 1 to 30% in Year 2 Surveys: 26% of teachers in immersion schools
reported using immersion “a lot” in Year 2 Probably some of both is taking place We are awaiting results for Year 3
Lessons From Experiments Include educators and schools as partners
We cannot impose interventions on educators Implementation will succeed only with buy-in
from school staff Study recruitment often requires school district
partnership
Lessons from Experiments Experiments are strong on internal validity,
but weak on external validity We can have confidence in our judgment
about cause and effect, but not in whether the effect would generalize to other places
Generalizability in Education Experiments Limits of generalizability Class size research
Tennessee STAR experiment showed that smaller classes boost achievement in grades K-1
Effects are sustained but do not increase Similar efforts in California and Florida have
failed FL: Lack of funding CA: Lack of trained teachers and space
Generalizability in Education Experiments National survey analysis also failed to find
class size effects Unobserved selectivity or lack of
generalizability? Early Childhood Longitudinal Study
Survey of the Kindergarten class of 1988 Comparisons of teachers with two classes
confirms the finding of no effect
Is the Birthright Study an Experiment? Recent study by Leonard Saxe and colleagues
indicates positive effects of Birthright Israel 23% greater likelihood of sense of connection to
Israel 50% more likely to feel “very confident” of
ability to explain current situation in Israel 22% more likely to belong to a congregation 57% more likely to have a Jewish spouse
(married non-Orthodox respondents)
Is the Birthright Study an Experiment? Natural experiment: Comparison of applicants
who attended to applicants who did not Generalizable only to applicants Unbiased comparison?
Main reason for not attending: timing of dates offered for trip was inconvenient
“The selection process was more or less random” (p.10) – more details would help!!
No difference on observables other than age
Is the Birthright Study an Experiment? “Observable” – a condition that has been
measured Age, gender, denomination, etc.
Contrasted with “unobservable” – a condition that has not been measured Motivation, commitment
Observables can be addressed with statistical methods, unobservables are harder to control
Is the Birthright Study an Experiment? No difference on observables other than age
Jewish schooling, gender, ritual practices, etc. What about unobservables?
Motivation, commitment, interest in being Jewish, exploring Israel
Involvement in non-Jewish activities If inconvenient timing of trips was the main reason
for non-participation, who got preferred dates, why? Differences in such unobserved characteristics may
bias the results
Is the Birthright Study an Experiment? Additional concerns
Differential response rates 61.8% participants vs 42.3% non-participants Addressed with weighting, but that assumes
respondents and non-respondents are similar Reasons for non-response
No contact (26% participants, 30.5% non-participants) Lack of cooperation (6.4% participants, 19.6% non-
participants) Differential response could bias the results in either
direction
Is the Birthright Study an Experiment? Additional concerns
Censoring on marriage This study captures people who marry young In-marriage rates may differ for those who marry
older For this reason, the intermarriage finding should be
treated with particular caution
Is the Birthright Study an Experiment? The Birthright study is closer to an
experiment than most research in Jewish education
Deserves special attention Yet recognize limited generalizability and
potential problems due to non-random selection and differential response rates
Could the Birthright Study Have Been Conducted as a True Experiment? Yes, if it had been set up that way For many years, Birthright has been
oversubscribed Instead of first-come, first-served, establish a
deadline and then select participants by lottery That would have ruled out bias due to
unobserved characteristics Also get better contact information to reduce
non-response rate among non-participants
Advancing the New Education Science With all the advances in curricula and
teaching methods, we should be asking some “what works” questions
Though not always feasible, randomized trials are the optimal method for answering these questions
There are many pitfalls in executing randomized trials, but the potential benefits make the effort worthwhile
Further Reading on Randomized Trials in Education
Bloom, H. S. (2006). Learning more from social experiments: Evolving analytic approaches. New York: Russell Sage Foundation.
Bloom, H. S., Bos, J. M., & Lee, S. W. (1999). Using cluster random assignment to measure program impacts: Statistical implications for the evaluation of education programs. Evaluation Review, 23, 445–469.
Boruch, R., May, H., Turner, H., Lavenberg, J., Petrosino, A., & de Moya, D. (2004). Estimating the effects of interventions that are deployed in many places: Place-randomized trials. American Behavioral Scientist, 47, 608–633.
Borman, G. D. (2002). Experiments for educational evaluation and improvement. Peabody Journal of Education, 77, 7-27.
Borman, G. D., Gamoran, A., & Bowdon, J. (2008). A randomized trial of teacher development in elementary science: First-year effects. Journal of Research on Educational Effectiveness, 1, 237-264.
Borman, G. D., Slavin, R. E., Cheung, A. C. K., Chamberlin, A. M., Madden, N. A., & Chambers, B. (2007). Final reading outcomes of the national randomized field trial of Success for All. American Educational Research Journal, 44, 701-731.
Further Reading on Randomized Trials in Education
Cook, T. T. (2003). Why have educational evaluators chosen not to do experiments? Annals, APASS, 589, 114-149.
Dynarski, M., et al. (2007). Effectiveness of reading and mathematics software products: Findings from the first student cohort. Washington, DC: U.S. Department of Education.
Milesi, C., & Gamoran, A. (2006). Effects of class size and instruction on kindergarten achievement.” Educational Evaluation and Policy Analysis, 28, 287-313.
Raudenbush, S. W. (1997). Statistical analysis and optimal design for cluster randomized trials. Psychological Methods, 2, 173–185.
Schneider, B., Carnoy, M., Kilpatrick, J., Schmidt, W. H., & Shavelson, R. J. (2007). Estimating causal effects using experimental and observational designs. Washington, DC: American Educational Research Association.
Whitehurst, G. (2003). The Institute of Education Sciences: New wine, new bottles. http://www.ed.gov/rschstat/research/pubs/ies.html