National Research Foundation Presentation to the Portfolio Committee on Education
NATIONAL BUREAU OF ECONOMIC RESEARCH OF ONLINE … · Education (via the National Center for the...
Transcript of NATIONAL BUREAU OF ECONOMIC RESEARCH OF ONLINE … · Education (via the National Center for the...
NBER WORKING PAPER SERIES
IS IT LIVE OR IS IT INTERNET? EXPERIMENTAL ESTIMATES OF THE EFFECTSOF ONLINE INSTRUCTION ON STUDENT LEARNING
David N. FiglioMark Rush
Lu Yin
Working Paper 16089http://www.nber.org/papers/w16089
NATIONAL BUREAU OF ECONOMIC RESEARCH1050 Massachusetts Avenue
Cambridge, MA 02138June 2010
We appreciate the financial support of the National Science Foundation and the U.S. Department ofEducation (via the National Center for the Analysis of Longitudinal Data in Education Research.) We are grateful to the introductory microeconomics professor and the large university that permittedus to randomly assign students to live and online versions of the same class. All errors are our own,and our results and conclusions do not necessarily reflect the views of the National Science Foundation,U.S. Department of Education, or the undisclosed university in question. The views expressed hereinare those of the authors and do not necessarily reflect the views of the National Bureau of EconomicResearch.
NBER working papers are circulated for discussion and comment purposes. They have not been peer-reviewed or been subject to the review by the NBER Board of Directors that accompanies officialNBER publications.
© 2010 by David N. Figlio, Mark Rush, and Lu Yin. All rights reserved. Short sections of text, notto exceed two paragraphs, may be quoted without explicit permission provided that full credit, including© notice, is given to the source.
Is it Live or is it Internet? Experimental Estimates of the Effects of Online Instruction on StudentLearningDavid N. Figlio, Mark Rush, and Lu YinNBER Working Paper No. 16089June 2010JEL No. I20,I23
ABSTRACT
This paper presents the first experimental evidence on the effects of live versus internet media of instruction.Students in a large introductory microeconomics course at a major research university were randomlyassigned to live lectures versus watching these same lectures in an internet setting, where all otherfactors (e.g., instruction, supplemental materials) were the same. Counter to the conclusions drawnby a recent U.S. Department of Education meta-analysis of non-experimental analyses of internet instructionin higher education, we find modest evidence that live-only instruction dominates internet instruction.These results are particularly strong for Hispanic students, male students, and lower-achieving students.We also provide suggestions for future experimentation in other settings.
David N. FiglioInstitute for Policy ResearchNorthwestern University2040 Sheridan RoadEvanston, IL 60208and [email protected]
Mark RushUniversity of FloridaDepartment of EconomicsP.O. Box 117140Gainesville, FL [email protected]
Lu YinUniversity of FloridaDepartment of EconomicsP.O. Box 117140Gainesville, FL [email protected]
2
Throughout the United States, public four-year colleges and universities are
facing fiscal constraints not seen in four decades. State and local appropriations for
higher education, measured as a share of personal income, have fallen virtually
monotonically since the late 1970s, and today are at a level not seen since the mid-1960s
(Mortenson, 2005). Kane and Orszag (2003) document the precipitous decline in per
student spending and stature of public four-year colleges and universities during the
1980s and 1990s, and according to the Organization of State Higher Education Executive
Officers, higher education institutions in all but five states experienced real declines in
per student revenues from state and local sources during the first half of the current
decade, at a time of flush state and local coffers. In 2006, public four-year colleges and
universities relied on tuition for over 37 percent of their total revenues for the first time in
modern history, and the recent financial crisis has surely further increased the fiscal
constraints faced by public and private universities alike.
The dramatically increased fiscal constraints facing public colleges and
universities, coupled with rapid improvements in technology, has paved the way for
higher education institutions to introduce technology-based platforms for mass
instruction. The use of internet classes has exploded over the past decade, especially in
the past few years. Over 2.6 million students took at least one online course in fall 2005,
up from 1.6 million three years earlier (Allen and Seaman, 2006). Though the majority of
these students are in community colleges and junior colleges, more than 80 percent of
doctoral/research institutions in the United States offer online classes. Each of the ten
largest four-year colleges and universities in the United States offers online classes, some
with over 400 sections and others with more than 10,000 students per term enrolled in at
3
least one online class. Today, virtually every institution with more than 15,000 students
offers online classes.
If internet-based classes are at least reasonable substitutes for live-lecture classes,
then the use of internet-based classes could be a very cost-effective method of combating
increased fiscal constraint. And in theory, internet-based classes may even dominate
live-lecture classes, as they offer students more flexibility in the timing of attendance as
well as the opportunity to review lectures to clear up confusing points. On the other
hand, internet-based lectures provide weaker incentives for students to regularly attend
and keep up with classes, and as has been documented at one major four-year institution,
last-minute cramming in internet-based courses is rampant (Donovan, Figlio and Rush,
2006). But since increasing live-lecture class sizes is associated with deleterious
consequences for students (Bettinger and Long, 2007), offering classes through an
electronic medium may be an appealing alternative mechanism for cost savings in higher
education.
A major report released by the U.S. Department of Education on June 26, 2009
provides additional support for the expansion of online education. This study, a meta-
analysis of the available research on live versus online delivery of education (primarily
higher education), suggests that online delivery of material leads to improvements in
student outcomes relative to live delivery, with hybrid live-plus-internet delivery having
the largest benefits of all. While the Department of Education's press release on the
report concentrated on the potential benefits of integrating electronic content into regular
classrooms, the ensuing news coverage (and the report itself) also emphasized the relative
benefits of online-only education.
4
That said, the studies that provided the basis for this meta-analysis may not be
sufficient to draw conclusions about the relative benefits of live versus online education.
While the authors of the meta-analysis identified more than one thousand studies of
online learning, they found only 51 studies that were at least "quasi-experimental," which
they defined as including control variables in a cross-sectional setting. Of these 51
studies, only 28 studies directly compared an online learning condition with a face-to-
face control condition. Sixteen of these studies used a simple randomization method to
assign students into either treatment or control groups, with an average study size of 84
participants. But only two of these studies had the same instructor teaching both the
treatment and control group, and none of these studies further controlled for measured
student background characteristics, a potential problem especially in small sample
situations. In the remaining two papers (Zhang, 2005; Zhang et al, 2006), written by the
same author, the researcher compared a 45-minute single-session live lecture in a
classroom setting to a 45-minute single-session e-learning experience in a research
laboratory. This is very different from a full-term course. In summary, none of the
studies cited in the widely-publicized meta-analysis released by the U.S. Department of
Education included randomly-assigned students taking a full-term course, with live
versus online delivery mechanisms, in settings that could be directly compared (i.e.,
similar instructional materials delivered by the same instructor.) The evidence base on
the relative benefits of live versus online education is therefore tenuous at best.1
1 The U.S. Department of Education meta-analysis did not include a small number of papers that have been published in economics journals that we believe meet their criteria for inclusion in the federal study. Navarro and Shoemaker (2000) present evidence that students taking an introductory macroeconomic online had significantly better test scores than students taking the same course in a live lecture format, while Brown and Liedholm (2002) report just the opposite result for students taking an introductory microeconomic class. However, Navarro and Shoemaker compared a lecture format, with no class web
5
From a public and university policy standpoint, the current state of this research is
dismaying: More students are being exposed to internet classes yet there is no satisfactory
research demonstrating whether such changes help, hinder, or have no effect on student
learning. This paper aims to fill this important gap by reporting on an experiment in
which students were randomly assigned to either an online or a live section of a course
taught by one instructor and for which the ancillaries for the class, such as the web page,
problem sets and TA support, as well as the exams, were identical between the sections.
The only difference between these sections is the method of delivery of the lectures:
Some students viewed the lectures live, as would be the case in traditional classes, while
other students viewed the lectures on the internet. Thus we are able to determine how
online delivery of lectures compares with live delivery.
The results of this experiment, therefore, have significant implications for public
and university policy: If our results suggest that internet delivery of classes is inferior to
live delivery, then for classes in which lectures are an important component, to make the
online delivery of these classes comparable from the student-learning perspective will
require generally elaborate web pages and other means of instruction and even then may
be insufficient. However, if our results suggest that internet delivery of these classes is
comparable to or more favorable than live delivery, then colleges and universities might
be emboldened to move even more classes to the internet – though the comparison should
page, to an online format which included a web page, with a bulletin board for posting questions, weekly online chat discussions with the instructor, and quizzes, which the students were required to take weekly, as well as giving the students a CD with the audio part of the lectures along with PowerPoint slides and review questions. Brown and Liedholm’s online versus live comparison contrasted a live class with (apparently) no web page to an online class with streaming videos of one semester’s lectures and a variety of additional material, such as numeric problems and repeatable quizzes. These are hardly “apples to apples” comparisons and so conclusions drawn from them about the performance of on-line versus live lectures are not robust.
6
be between large-lecture introductory courses and their internet counterparts.2 However,
it is important to note that the results of any one study, even one with high degrees of
internal validity, should be treated with caution. Section IV of this paper highlights the
limitations to generalizability of this paper, and makes recommendations for future
experiments on the topic.
II. THE CLASS AND THE EXPERIMENT
We utilize data from an experiment conducted in a large Principles of
Microeconomics class taught at a large selective doctorate-granting university. This class
is taught to between 1,600 and 2,600 students a semester by a single instructor. Typically,
the students can register for a “live” section in which they can watch the lecture in a room
with approximately 190 seats or they can register for an “online” section in which they
watch the lecture online. The lecture is videotaped as it is presented and then made
available via the class web page to all students. Once the lecture is taped, it is retained on
the Internet for the entire semester. Given the room-size constraint, most students register
for an online section. In a typical semester, approximately 50 or 60 students actually
come to any given live lecture. Because the room has vacant seats, normally no effort is
made to keep the students who registered for an Internet section from attending the live
section. In fact, because the live section is limited to 190, most of the students attending
the live lecture have registered for an online section simply because the live section was
filled when it came time for them to register for the class. The majority of the students
2 Our data are from a large introductory class and so our results probably should not be applied to courses where material is delivered in smaller sections. However, it is generally the large classes that college administrations are most eager to move online.
7
who register for the live section ultimately choose to watch the majority of the lectures
online.
Students who register for the live section and students who register for the online
section have access to the exact same class web page. The class web page has a link to
watch the lectures as well as a substantial variety of class supplements: a set of online
quizzes, past exams, and so forth. As such, both live and online students have access to a
rich web-based learning environment to supplement the class lectures. The exams are
given in the evening. Both sets of students take the exact same exams given at the exact
same time. All students, regardless of the section for which they registered, have the
same access to the instructor during office hours and have the same access to graduate
student TA help. There are no discussion sessions. So in a typical semester the only
difference between the students is the section in which they have registered, which,
because anyone can attend the live lecture or watch the lecture online, is a meaningless
distinction. Grading in the class is based on only exams. There are three exams: two
midterms and a final exam. The exams are all multiple choice and are all machine graded.
The instructor creates the exams which are primarily based on the lectures.
Because of the obvious selection problems, one cannot simply look at the
difference in the performance of students who attend the live lecture versus students who
watched the lectures online. So during the Spring 2007 semester, with the support of the
instructor and the university, we conducted an experiment with this class. Before the
class started, the instructor emailed all the students who had enrolled and offered them
8
the chance to participate in an experiment. Of the nearly 1,600 students in the class, 3273
students volunteered to be part of the experiment. If they volunteered, the instructor
promised to boost their grade by half of a letter grade at the end of the term. In exchange,
they allowed us the opportunity to randomly assign them to watching the lecture live or
watching the lecture online. Students who were assigned to watch the lecture live had
their class websites altered to remove access to the lecture online; otherwise no further
change was made to their website. Students who were assigned to the online section were
not allowed in the classroom to watch the live lecture. Indeed, for that semester only, the
only students allowed in the classroom during the live lecture were students we had
assigned to the live lecture or students who had registered for the live lecture and who
opted to not participate in the experiment. We stationed graduate students at the door to
enforce these regulations and to compile a record of which students watched each live
lecture.
The specific nature of participant recruitment in this experiment leads to potential
statistical power and external validity issues. Institutional Review Board-imposed
restrictions at the university in question made recruitment of a larger fraction of the
student population into the experiment more difficult. The instructor was limited in the
degree to which he could contact the students to recruit them into the experiment, and we
were limited as to the incentives that could be offered.4 The ideal situation from an
external validity standpoint would have been to randomly assign all students to either a
3 Among the 327 volunteers, 112 students were assigned to the live group and 215 assigned to the online group. In order to start the experiment from the first day of the class, the students were contacted before the add/drop deadline, which occurs a week after classes start. After their registration was completed, 15 of the 112 students assigned to the live group requested reassignment to an online section due to schedule conflicts. We made this reassignment but then dropped them from the analysis, leaving a total of 97 students in the live-only section. 4 Specifically, the only incentive we were allowed to offer was a five point boost in students' test scores.
9
live or online section of the class, but this was not possible given the culture of the
university, where mixed live-online classes are typically characterized by complete
student autonomy. Of the two potential concerns -- statistical power and external validity
-- associated with the recruitment of the experimental sample, external validity is more
important. While we would have liked to have recruited a larger study sample, our
sample size is nonetheless large enough to detect modest estimated effects of the
treatment. Specifically, we can detect effects on the order of two points on a 100-point
scale -- or 40 percent the size of the incentive to participate in the study. External
validity issues, on the other hand, are a much bigger potential concern, as our study
sample may not be representative of a broader population of potential students. We
discuss the limitations to external validity in section IV below.
It is important to note that the course is already a hybrid between lectures and a
rich set of internet-based applications. Therefore, one cannot view this experiment as
comparing between two purely lecture-based mediums of delivery. Rather, the
appropriate comparison is between two cases in which there exists considerable internet-
based material, but in one case the lecture portion is delivered live and paced throughout
the term and in the other case the lecture portion is delivered electronically and
downloaded on demand by the student. In many ways, therefore, this is precisely the
tradeoff that universities are increasingly facing as they decide the appropriate medium
for lecture delivery in their large classes.
10
III. THE DATA AND THE RESULTS
We have four groups of students:
1) Students who volunteered for the experiment and were randomly assigned to watching the lectures online. These students were required to watch the lectures online. 215 students fell within this group.
2) Students who volunteered for the experiment and were randomly assigned to watching the lectures live. These students were required to watch the lectures live. 97 students fell within this group.5
3) Students who did not volunteer for the experiment and were initially registered in an online section. These students were required to watch the lectures online. 1,203 students fell within this group.
4) Students who did not volunteer for the experiment and were initially registered in the live section. These students were allowed to choose whether to watch a lecture live or online, or a hybrid thereof. 77 students fell within this group.
Of these four groups, most interest focuses on comparing Groups 1 and 2. These
students had exactly the same course with one crucial difference: they were randomly
assigned to different delivery mechanisms for the lectures. Hence comparing their
performance potentially offers us an apples-to-apples comparison of an online class to a
traditional live lecture without worrying about the possibility of selection issues or how
to correct for the selection.
First, however, we examine whether the students who volunteered for the
experiment were different in observable ways from the non-volunteers. Table 1 compares
the students who volunteered for the experiment with those who did not, for two groups
of students: those who initiated the class and those who completed the class. The data
pertaining to the students’ maternal educational attainment were obtained directly from
the students; the remaining data were obtained from the university’s records. As can be
5 Fewer students were assigned to the live lecture than the Internet lectures because of the seating capacity constraint imposed by the lecture room.
11
seen in the table, experiment volunteers differ from non-volunteers along a number of
dimensions, but the differences are not unidirectional. For example, experiment
volunteers are more likely to have higher grades at the university than are non-volunteers
but volunteers tend to have lower SAT scores than do non-volunteers. Volunteers’
mothers were less likely to have graduated from college but, though statistically
insignificant, volunteer’s mothers were more likely to have earned a graduate degree. In
addition to these thoroughly mixed differences, the differences tend to be modest in
magnitude. The SAT score difference, for example, was only 18 points out of averages
exceeding 1,200. So, at least for observable measures, we see no compelling evidence
that the volunteers are markedly different than their non-volunteer classmates.
More directly relevant to the experiment is whether the attributes of volunteers
assigned to the live section are comparable to those of volunteers assigned to the online
section. These differences are reported in Table 2. As can be seen in the table, the
random assignment of volunteers successfully led to balancing of the volunteer
population into the live-only section and the online-only section. In the initial
assignment, those watching the class online had slightly higher prior university GPAs,
but that difference disappeared by the end of the class. The live section also had fewer
mothers who attended college but then apparently dropped out. However the live section
also had fewer mothers attending college at all, so any difference in family educational
attainment must be slim.
Because we see no significant, consistent differences between our volunteer
groups, we can proceed to examine their relative performances on the exams. Table 3
presents the mean test scores for the two groups of students on each of the three
12
examinations in the course, as well as the average of the three scores. We prefer to use
the average score because it has the smallest problem with measurement error, and
indeed, the standard errors are lowest with regard to the average score. Exams are scored
on the standard 0 to 100 point scale, and the mean of the average score on the exams is
just below 80 points. As can be seen from the table, the preponderance of the evidence
indicates that students perform better in the live setting than in the online setting, though
the raw differences are uneven and statistically insignificant. Students in the live section
tended to do better on the first exam, the final exam, and overall while students in the
online section performed modestly better on the second exam (though the magnitude of
the difference is small).6 It is evident from the simple means comparisons that students in
the online only setting did not perform better than did the students in the live only setting,
a finding at odds with the conclusion of the U.S. Department of Education report on
internet-based education. Moreover, these (basically zero) results are relatively
precisely-estimated; the two-point difference in average exam scores that would be
statistically detectable with the observed standard errors is small in comparison to the
five-point incentive that was offered students to participate in the experiment. Given that
the university's Institutional Review Board deemed the five-point incentive to be of
modest magnitude, by definition the realized differences in test scores are even more
modest in magnitude. Therefore, we are confident that the statistical power issues
associated with not recruiting a larger fraction of the class are not responsible for the null
findings reported in Table 3.
6 The difference in the withdrawal rate was small: Approximately 6 percent of the live section withdrew over the semester while slightly more than 4.5 percent of the online section withdrew.
13
While the overall effect of live instruction relative to internet delivery is very
modest and positive (though not statistically distinguishable from zero), these mean
effects may mask substantial differences in relative benefits of one medium of instruction
over another. For instance, students from different language backgrounds, experience or
motivation levels might have different experiences in live versus internet only settings.
While we cannot directly measure these specific types of factors, we can stratify the
estimated effects of live versus internet instruction along a few observable lines: by
student race/ethnicity, sex and prior achievement levels.7 For this last stratification, we
define “high achievers” as students whose prior college GPA was greater than or equal to
the median GPA and define “low achievers” as students whose prior college GPA was
less than the median GPA. We report these results in Table 4. The treatment effects
reported in Table 4 reflect average score differences for students enrolled in the live
section versus those enrolled in the online section. We observe that for all racial/ethnic
groups, for both male and female students, and for both high and low achievers, the
average test score is higher for the set of students in live instruction versus those in online
instruction. Importantly, in a number of cases this difference is statistically significant,
and some of the estimated differences are large in magnitude. Most notably, the average
test score grade for Hispanic students is dramatically higher in the case of live
instruction. In addition, the estimated live instruction advantage is statistically
significantly different from zero for male students and for low-achievers. While it is
premature to definitely ascribe a mechanism through which this may be operating, we can
7 These are the most logical ways to stratify our data given the limited number of background characteristics at our disposal. We were concerned that the subgroup results may be mere statistical artifacts, so we attempted a variety of stratifications of the data. In nearly every stratification we attempted, we found that at least one subgroup had statistically significantly positive estimated effects of live-only instruction.
14
propose a few. For instance, perhaps low-achieving and male students are tempted to
defer instruction and cram for exam in online-only classroom experiences or perhaps
language-minority students have increased difficulty with listening to lectures in an
internet setting. While we did not explicitly test these mechanisms, the results for the
various subgroups indicates that future experimentation that paid particularly close
attention to potentially sensitive student subgroups may be highly informative.
One possible threat to validity of this experiment involves the potential for
contamination. While it was impossible for students not selected to be in the live section
to attend the live lectures, it was certainly possible for experimental students to
surreptitiously view online lectures even though they could not do so using their own
accounts. Indeed, it is likely that at least some of the live-only students did this; only 32
percent of "live-only" students attended at least 90 percent of the live lectures, and 36
percent attended fewer than 20 percent of the live lectures! It is not clear whether this
non-compliance would bias our estimates upward or downward. On the one hand, if the
true effect of live instruction is positive, especially for some subgroups, the fact that we
could not fully prevent "live-only" students from watching some or all classes on the
internet using a friend's account may mean that our results understate the true effects of
live class attendance.
On the other hand, the potential contamination could upward-bias our results if
our live-only treatment is really better thought of as a hybrid live-plus-internet treatment.
There is, however, reason to believe that the live-only treatment is different from the
traditional live-plus-internet hybrid that the 77 non-participant students registered to the
15
live section experienced; the typical live-only participant in the experiment attended 70
percent more lectures than the typical live lecture non-participant, and was more than
three times as likely as the typical live lecture non-participant to attend at least 90 percent
of the lectures. Obviously many of the live lecture non-participant students opted to view
the lectures on-line. Therefore, it is clearly the case that being officially restricted to only
view lectures live strongly influenced the likelihood that the live-only participants would
indeed receive their material delivery in the live format.8 Hence, though we cannot know
for certain, we suspect that contamination of our experiment due to participating students
watching Internet lectures is not a major force driving our findings.
It may also be the case that live-only participants benefit from having other
classmates in the live section who are better or more motivated students, and who could
therefore have positive peer effects. (This could happen if students who enroll in the live
section are systematically better than those who enroll in the online section.) Since
section registration had historically had no bearing on whether a student could attend the
live lecture, we believe that it is unlikely that the non-participants in the live section
would be much different from the non-participants in the online section, and indeed, this
appears to be the case. In fact, if anything the non-participants in the live section have
lower observables than those in the online section. For instance, the mean SAT score for
non-participants in the live section is 1197 while the mean SAT score for non-
participants in the online section is 1245. While 24 percent of non-participants in the
8 It is also the case that many of the students who are observed rarely coming to class might actually not ever view the lectures at all. The university has several competing lecture note-taking services that are extremely popular with students. This could explain why Donovan et al. (2006) demonstrate that a significant number of students in large internet-and-live classes at a major selective state university rarely view lectures in any form. In addition, we find in our present data that students who attend fewer live lectures do substantially worse on the examinations, suggesting that many of those who attend fewer live lectures are not substituting surreptitiously downloaded internet lectures for the live lectures they eschew.
16
live-only section had mothers with only a high school degree, the corresponding value for
online-only non-participants is 18 percent. In summary, there exists no evidence that the
live-only participants' scores are being positively influenced by an improved peer group
of non-participating students who insisted on being part of the live section.
IV. LIMITATIONS TO EXTERNAL VALIDITY AND RECOMMENDATIONS FOR
FUTURE EXPERIMENTS
This paper presents the first experimental evidence of the relative efficacy of live
versus internet-only instruction in a higher education setting. While our analysis has a
high degree of internal validity, there are a number of key reasons why we believe that
our results should be taken as suggestive rather than conclusive, and why we recommend
that further experimentation in a variety of other settings take place before one can draw
definitive conclusions about the effects of different modes of lecture delivery. In this
section, we document some of the limitations to external validity, and we offer
recommendations for future experiments regarding the relative efficacy of different
lecture delivery mechanisms.
One reason that the external validity of our analysis is limited is that the
volunteers whom we recruited may not reflect the overall population of students enrolled
in the class. While our experimental live-only and internet-only groups are balanced
along a large number of dimensions, participation in the experiment was voluntary.
Moreover, the incentive used to induce participation was extra credit on the final course
grade. There is no reason to believe that responsiveness to this incentive is exogenously-
determined, and in fact, one can easily tell stories about which types of students might be
17
willing to participate in the experiment. Specifically, one might reasonably expect that
students who are motivated to achieve high grades but are relatively concerned about
their ability to earn high grades might be the students most responsive to a participation-
for-points incentive, and this could explain why our volunteer participant group has
slightly lower levels of SAT scores but slightly higher pre-course university grades (and
also higher high school grades, though the high school grades difference is not
statistically significant.) If people who are especially motivated to attain high grades
respond differently to live versus internet instruction, then our experiment has less to say
about the typical student enrolled in a very large introductory course.
This concern yields important lessons for future experiments on this topic. While
Institutional Review Board and university culture at the university in question did not
permit us to randomize all students, regardless of whether they wished to participate in a
randomized experiment, into live versus internet-only lecture categories, it will be
important for future experiments to attempt to study the entire set of students who select
into a given class, rather than a subsample of students. In the event that this is not
possible, future experiments could improve upon the external validity of the present study
by seeking to obtain a higher participant rate. We were limited by the university's
Institutional Review Board to take a more passive role than would be desirable in the
recruitment of students into the study; we could not, for instance, offer financial or in-
kind incentives to increase participation, and we were limited in the number of times that
we could contact students to encourage them to participate. Future experiments in
settings that have fewer such encumbrances might have higher degrees of external
validity.
18
There are other external validity issues associated with this experiment that could
not be solved even had we been able to randomly assign 100 percent of the students in
the introductory microeconomics class to live-only or internet-only lecture groups. One
involves the specific university setting: The university in question is one where very large
lecture classes are the norm for virtually all freshman and sophomore-level courses,
across all fields, and moreover, most of the core courses for students majoring in business
are offered on this electronic platform. The results of an experiment in this type of
university setting may not generalize to other university settings where students have less
experience with large auditorium lectures and electronically-delivered lectures. Ideally,
future experiments of this nature will take place in a wider variety of institutional
settings, so that we can begin to understand the degree to which the findings generalize
across settings. This is also the case because the university in question is one of the most
selective state universities in the United States; the results may not generalize to open-
enrollment institutions or those where students are drawn from lower in the ability and
achievement distribution. Of course, given that our findings suggest that lower-ability
students are a sub-group potentially most harmed by internet delivery, the results might
be particularly relevant for less-selective schools where more lower-ability students are
educated. These results indicate that less selective schools should not rush to internet
delivery of lectures, and instead should experiment with the relative benefits of live
versus internet courses.
In addition, introductory economics courses are generally delivered in traditional
lecture settings even at small institutions with modest class sizes. In some ways, one
might expect that this would be the type of subject matter where live instruction may be
19
the least beneficial, as members of the class tend to be relatively passive consumers of
material in the lecture setting. It may be the case that live classes might be relatively
more beneficial in other types of courses, with a greater role for interactive activities in
the classroom. In such a case, our results might be an understatement of the effects of
live versus internet class delivery in other contexts. On the other hand, introductory
economics has a number of topics that build upon one another and relies more heavily on
technical prerequisites than many other subjects do; it could be that the disciplined pacing
that comes with live-only lectures might be relatively beneficial in this type of context,
implying that the effects in other fields where pacing is less crucial may be smaller. It is
clear, therefore, that it is important to conduct similar experiments in a wider range of
subject matter classes, and classes with different levels of student direct involvement and
interactivity, in order to develop general conclusions about the relative efficacy of live
versus internet-based instruction.
In summary, while our study represents the first causal evidence of the effects of
live versus internet-based instruction in a university course delivery setting, it can only be
seen as a beginning step toward understanding the generalized effects of different
methods of instructional delivery. In order to know more generally whether live lectures
dominate internet-based lectures, under what circumstances, and for whom, numerous
additional experiments will need to be conducted, in a variety of institutional settings, in
different class sizes, and in different subject areas.
Finally, while not a threat to external validity and generalizability, our subgroup-
specific findings indicate that some student populations may be particularly sensitive to
20
the mechanism through which lecture material is delivered. Language-minority students
might have more difficulty following recorded lectures, and some students may be
relatively less disciplined in keeping up with the pace of the course when procrastination
is more possible. (This might be an explanation for the relatively large estimated effects
observed for male and lower-achievement students, though we cannot say for certain that
this is the reason.) Therefore, future experimentation that could directly test for some of
these potential mechanisms could be highly valuable. For example, if one is interested in
seeing whether delayed lecture viewing is a potential mechanism generating lower
outcomes for internet-only students, one might design an experiment in which students
were required to download (or maybe view) lectures within a certain number of days
following lecture recording. In general, it would be highly valuable to look more deeply
at the potential causal mechanisms through which different lecture delivery mechanisms
might affect student learning. Additional survey and qualitative work on questions such
as ways in which students engage with the course material, interact with the instructors
and their peers, pay attention to lectures and study for examinations would be highly
valuable, and could help universities and professors refine their courses and instructional
delivery to maximize student learning.
V. CONCLUSION
Given the clear scale economies associated with online instruction, educational
institutions are actively incorporating online instruction into their portfolios. The recent
U.S. Department of Education report suggesting that online instruction may be more
21
effective as well as being more efficient seems likely to only accelerate this process. But
the papers surveyed in that meta-analysis are not experimental in general, and those that
are experimental do not make apples-to-apples comparisons. The results of our
experimental, apples-to-apples comparison indicate that a rush to online education may
come at more of a cost than educators may suspect, given the Department of Education's
report.
We do not claim that our results are definitive. Our experiment was only
conducted one time, in a large course with significant internet resources available already
for students taking live-instruction classes. We were not able to randomly assign all
students to live versus internet delivery settings, and were forced to rely on voluntary
participation in the experiment, so while internal validity is high, the results may not
generalize to the student population as a whole. Furthermore, the institution is a major,
very selective university. That said, our strongest findings in favor of live instruction are
for the relatively low-achieving students, male students, and Hispanic students. These are
precisely the students who are more likely to populate the less selective universities and
community colleges. These students may well be disadvantaged by the movement to
online education and, to the extent that it is the less selective institutions and community
colleges that are most fully embracing online education, inadvertently they may be
harming a significant portion of their student body.
At the least, our findings indicate that much more experimentation is necessary
before one can credibly declare that online education is peer to traditional live classroom
instruction, let alone superior to live instruction. While online instruction may be more
economical to deliver than live instruction, our results indicate that -- consistent with a
22
fundamental lesson of principles of microeconomics -- the lunch may be less free than
many might believe.
23
Table 1: Baseline Summary Statistics: Volunteers versus Non-Volunteers
Students beginning the semester Students ending the semester Variable
Volunteer
Non-volunteer
Difference
Volunteer
Non-volunteer
Difference
Number of observations 312 1286 296 1186 University GPA 3.262
(0.036) 3.156
(0.022) 0.106** (0.048)
3.282 (0.036)
3.198 (0.022)
0.084* (0.048)
SAT score 1224.589(8.774)
1242.884(4.77)
-18.295* (10.623)
1228.864 (8.826)
1251.541 (4.823)
-20.108* (10.31)
ACT score 25.776 (0.353)
26.482 (0.233)
-0.706 (0.487)
25.921 (0.364)
26.648 (0.241)
-0.727 (0.5)
High school GPA 3.743 (0.053)
3.649 (0.03)
0.094 (0.067)
3.762 (0.054)
3.688 (0.031)
0.074 (0.067)
Female 0.546 (0.028)
0.496 (0.014)
0.05 (0.032)
0.541 (0.029)
0.487 (0.015)
0.053 (0.032)
Black 0.115 (0.018)
0.087 (0.008)
0.028 (0.018)
0.108 (0.018)
0.074 (0.008)
0.034* (0.018)
White 0.61 (0.028)
0.638 (0.013)
-0.028 (0.03)
0.622 (0.028)
0.648 (0.014)
-0.026 (0.031)
Asian 0.109 (0.018)
0.098 (0.008)
0.011 (0.019)
0.108 (0.018)
0.102 (0.009)
0.006 (0.02)
Hispanic 0.115 (0.018)
0.135 (0.01)
-0.02 (0.021)
0.111 (0.018)
0.136 (0.01)
-0.024 (0.022)
Mother attended high school only
0.193 (0.023)
0.181 (0.011)
0.011 (0.025)
0.193 (0.023)
0.181 (0.011)
0.011 (0.025)
Mother attended some college
0.176 (0.022)
0.179 (0.011)
-0.004 (0.025)
0.176 (0.022)
0.179 (0.011)
-0.004 (0.025)
Mother graduated from college
0.301 (0.027)
0.401 (0.015)
-0.100***(0.032)
0.301 (0.027)
0.401 (0.015)
-0.100***(0.032)
Mother earned graduate degree
0.223 (0.024)
0.195 (0.012)
0.028 (0.026)
0.223 (0.024)
0.195 (0.012)
0.028 (0.026)
Mother's education unknown
0.054 (0.057)
0.042 (0.029)
0.012 (0.011)
0.054 (0.057)
0.042 (0.029)
0.012 (0.011)
Notes: Standard errors are presented in parentheses beneath means. Differences marked ***, ** and * are statistically significant at the 1, 5 and 10 percent levels, respectively. Sixteen volunteers and 100 non-volunteers dropped the course during the semester. Questions regarding maternal education were asked during the final examination, so we only have these variables for students who completed the course; therefore, the first and second sets of columns are identical for these variables.
24
Table 2: Baseline Summary Statistics: Volunteers Assigned to Live Section versus Volunteers Assigned to Online Section
Students beginning the semester Students ending the semester Variables Live Online Difference Live Online Difference
Number of observations 97 215
91 205
University GPA 3.138 (0.073)
3.321 (0.04)
-0.183 (0.077)**
3.193 (0.071)
3.321 (0.042)
-0.128 (0.079)
SAT score 1214.133(15.944)
1228.581(10.519)
-14.448 (18.751)
1216.447(15.902)
1228.581 (10.519)
-12.134 (18.697)
ACT score 25.75 (0.602)
25.784 (0.426)
-0.034 (0.833)
26 (0.629)
25.898 (0.435)
0.102 (0.882)
High School GPA 3.794 (0.08)
3.718 (0.068)
0.076 (0.115)
3.847 (0.073)
3.724 (0.071)
0.123 (0.118)
Female 0.495 (0.051)
0.567 (0.034)
-0.072 (0.061)
0.473 (0.053)
0.571 (0.035)
-0.098 (0.063)
Black 0.124 (0.034)
0.112 (0.022)
0.012 (0.039)
0.121 (0.034)
0.102 (0.021)
0.018 (0.039)
Asian 0.093 (0.03)
0.116 (0.022)
-0.023 (0.038)
0.099 (0.031)
0.112 (0.022)
-0.013 (0.039)
White 0.639 (0.049)
0.6 (0.033)
0.039 (0.060)
0.637 (0.051)
0.615 (0.034)
0.023 (0.061)
Hispanic 0.082 (0.028)
0.126 (0.023)
-0.044 (0.039)
0.088 (0.03)
0.122 (0.023)
-0.034 (0.04)
Mother attended high school only
0.196 (0.041)
0.177 (0.026)
0.019 (0.047)
0.209 (0.027)
0.185 (0.027)
0.023 (0.050)
Mother attended some college
0.093 (0.03)
0.2 (0.027)
-0.107 (0.045)**
0.099 (0.029)
0.21 (0.029)
-0.111 (0.048)**
Mother graduated from college
0.309 (0.047)
0.274 (0.031)
0.035 (0.055)
0.330 (0.032)
0.288 (0.032)
0.042 (0.058)
Mother earned graduate degree
0.227 (0.043)
0.205 (0.028)
0.022 (0.050)
0.242 (0.029)
0.215 (0.029)
0.027 (0.053)
Mother's education unknown
0.072 (0.026)
0.042 (0.014)
0.03 (0.027)
0.077 (0.014)
0.044 (0.014)
0.033 (0.029)
Notes: Standard errors are presented in parentheses beneath means. Differences marked ***, ** and * are statistically significant at the 1, 5 and 10 percent levels, respectively.
25
Table 3: Comparison of Average Test Scores for Live Versus Online Instruction Section Exam one Exam two Final exam Average
score Number of observations 312 301 296 296
Live 84.536 (1.168)
[97]
76.692 (1.193)
[93]
75.939 (0.837)
[91]
79.940 (0.850)
[91]
Online 83.301 (0.957) [215]
76.904 (0.876) [208]
74.302 (0.799) [205]
78.502 (0.675) [205]
Difference 1.235 (1.626)
−0.212 (1.534)
1.637 (1.426)
1.440 (1.209)
Notes: Standard errors are in parentheses beneath coefficient estimates. The dependent variable is the exam score measured on a 0-to-100 point scale. Differences marked ***, ** and * are statistically significant at the 1, 5 and 10 percent levels, respectively.
26
Table 4: Heterogeneous Effects of Live Instruction Versus Online Instruction Subgroup Results by
racial/ethnic group Results by student sex
Results by achievement level
White students 1.117 (1.436)
Black students 2.828 (3.239)
Hispanic students 11.276*** (3.587)
Asian students 4.319 (3.590)
Male students 3.480** (1.680)
Female students 1.780 (1.576)
Low-achievers 4.054*** (1.536)
High-achievers 1.169 (1.635)
R-squared 0.386 0.370 0.402 Notes: Dependent variable is the average test score measured on a 0-to-100 point scale. Number of observations: 296. Standard errors are in parentheses beneath coefficient estimates. Differences marked ***, ** and * are statistically significant at the 1, 5 and 10 percent levels, respectively.
27
REFERENCES Allen, I. Elaine and Jeff Seaman. Making the Grade Online Education in the United States, 2006. Massachusetts, Sloan Consortium, 2006.
Bettinger, Eric and B. T. Long. “Institutional Responses to Reduce Inequalities in College Outcomes: Remedial and Developmental Courses in Higher Education.” In S.
Dickert-Conlin & R. Rubenstein (Eds.). Economic inequality and higher education: Access, persistence, and success. New York: Russell Sage Foundation. 2007
Brown, Byron W. and Carl E. Liedholm. “Can Web Courses Replace the Classroom in Principles of Microeconomics?” American Economic Review, May 2002 (Papers and Proceedings), 92(2), pp. 444-448.
Donovan, Colleen, David Figlio, and Mark Rush. “Cramming: The Effects of School Accountability on College-Bound Students.” Working paper, National Bureau of Economic Research, October. 2006.
Kane, Thomas and Peter Orszag. “Funding Restrictions at Public Universities: Effects and Policy Implications.” Working paper, Brookings Institution, September, 2003.
Mortenson, T. “State Tax Fund Appropriations for Higher Education: FY 1961 to FY 2005.” Postsecondary Education Opportunity, January 2005.
Navarro, Peter and Judy Shoemaker. “Policy Issues in the Teaching of Economics in Cyberspace: Research Design, Course Design, and Research Results.” Contemporary Economic Policy, July 2000, 18(3), pp. 359-366.
U.S. Department of Education, Office of Planning, Evaluation, and Policy Development, Evaluation of Evidence-Based Practices in Online Learning: A Meta-Analysis and Review of Online Learning Studies, Washington, D.C., 2009.
Zhang, D. Interactive multimedia-based e-learning: A study of effectiveness. American Journal of Distance Education, 2005, 19(3), pp 149–62.
Zhang, D., L. Zhou, R. O. Briggs, and J. F. Nunamaker, Jr. Instructional video in e-learning: Assessing the impact of interactive video on learning effectiveness. Information and Management, 2006, 43(1), pp15–27.