Task Templates Based on Misconception Researchpadi.sri.com/downloads/TR6_Misconceptions.pdf6.0...

Robert Mislevy University of Maryland

PADI Technical Report 6 | July 2005

Report Series Published by SRI International

P A D I

PADI | Principled Assessment Designs for Inquiry

Task Templates Based on Misconception Research

Jennifer G. Cromley, University of Maryland

Robert J. Mislevy, University of Maryland

SRI InternationalCenter for Technology in Learning333 Ravenswood AvenueMenlo Park, CA 94025-3493650.859.2000http://padi.sri.com

PADI Technical Report Series EditorsAlexis Mitman Colker, Ph.D. Project ConsultantGeneva D. Haertel, Ph.D. Co-Principal InvestigatorRobert Mislevy, Ph.D. Co-Principal InvestigatorKlaus Krause. Technical Writer/EditorLynne Peck Theis. Documentation Designer

Copyright © 2005 SRI International and University of Maryland. All Rights Reserved.

Acknowledgment

This material is based on work supported by the National Science Foundation under grant REC-0129331 (PADI Implementation Grant). Disclaimer Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

P R I N C I P L E D A S S E S S M E N T D E S I G N S F O R I N Q U I R Y

T E C H N I C A L R E P O R T 6

Task Templates Based on Misconception Research

Prepared by:

Jennifer G. Cromley, University of Maryland

Robert J. Mislevy, University of Maryland

ii

C O N T E N T S

1.0 Introduction 1 2.0 Novice-Expert Research 2 3.0 The Challenge for Assessment 4 4.0 PADI Design Patterns and Task Templates 5 5.0 Illustrations from Newtonian Physics 7

5.1 The Force Concept Inventory 8 5.2 Student Model 8 5.3 Evidence Model 9

5.3.1 Evaluation Procedures and Evidence Rules 9 5.3.2 Measurement Model 9

5.4 Task Models 10 5.4.1 G Problems 10 5.4.2 H Problems 12 5.4.3 Force Problems—Horizontal and Gravity 13

6.0 Illustrations from Gas Laws 15 6.1 Student Model 16 6.2 Evidence Model 16

6.2.1 Evaluation Procedures and Evidence Rules. 16 6.2.2 Measurement Model 16

6.3 Task Model 17 7.0 A General Design Pattern 18 8.0 Final Comments 19 References 20 Appendix A PADI Template for Force Problems—Gravity Only 25 Appendix B PADI Template for Force Problems—Horizontal Only 32 Appendix C PADI Template for Force Problems—Horizontal and Gravity 35 Appendix D PADI Template for Gas Law Problems 38 Appendix E PADI Design Pattern for Model Using 41

iii

F I G U R E S

Figure 1: The hierarchical structure of PADI templates (from Riconscente et al., 2005) 6 Figure 2: An FCI Problem That Tests Students’ Knowledge of Gravity but Involves No Horizontal Motion 11 Figure 3: An FCI Problem That Tests Students’ Knowledge of Horizontal Motion, but Does Not Test

Knowledge of Gravity 12 Figure 4: An FCI Problem That Tests Students’ Knowledge of Both Horizontal Motion and Gravity 14 Figure 5: A TAP Problem That Tests Students’ Knowledge of the Molecular Theory of Gases 15 Figure A-1: PADI Template “Force Problems—Gravity Only” 25 Figure A-2: PADI Task Model Variable “Direction of Motion” 27 Figure A-3: PADI Task Model Variable “Duration” 27 Figure A-4: PADI Activity “Make Explanations and Predictions from a Physical Situation” 28 Figure A-5: PADI Student Model “Direction of Motion” 28 Figure A-6: PADI Evaluation Procedure “Map Multiple-Choice Answers onto Conceptual Model” 29 Figure A-7: PADI Evaluation Phase “Model Using Multiple Choice” 29 Figure A-8: PADI Measurement Model “Andersen/Rasch Measurement Model Fragment” 30 Figure B-1: PADI Template “Force Problems—Horizontal Only” 32 Figure C-1: PADI Template “Force Problems—Horizontal and Gravity” 35 Figure D-1: PADI Template “Gas Law Problems” 38 Figure E-1: PADI Design Pattern “Model Using” 41

iv

A B S T R A C T

Researchers spend much time and effort developing assessments, including assessments of students’

conceptual knowledge. In an effort to make such assessments easier to design, the Principled Assessment

Designs for Inquiry (PADI) project has developed a framework for designing tasks and accompanying

measurement models. One application of the PADI framework involves “reverse engineering” existing

science assessments. This paper reports one such effort, motivated by assessments that elicit students’

qualitative explanations of situations that have been designed to provoke misconceptions and partial

understandings. We describe four task-specific templates we created—three based on Hestenes, Wells, and

Swackhamer’s (1992) Force Concept Inventory and one based on Novick and Nussbaums’s (1981) Test about

Particles in a Gas (TAP). We then describe an overarching framework for these templates, another PADI

object called a design pattern, based on Stewart’s concept of “Model Using.” For each template, we describe

a multivariate Student Model, a Measurement Model, and a Task Model. We conclude by suggesting how

these templates and the design pattern could help researchers (and perhaps teachers) who wish to design

new assessments in science domains where students are known to hold misconceptions.

Introduction 1

1.0 Introduction

Creating tasks to assess underlying concepts and inquiry processes in science is not easy.

The National Science Foundation has funded the Principled Assessment Designs for

Inquiry (PADI) project, under the Interagency Educational Research Initiative (IERI), to

create a conceptual framework and supporting software to help people design inquiry

assessments. Among the data structures PADI has developed to this end are design

patterns, which lay out assessment arguments at a conceptual level; task templates, which

are schemas for the operational elements of an assessment and support the creation of

families of related tasks; and task specs (short for “task specification”), which describe the

elements of individual tasks in transportable formats—specifically, the Question and Test

Interoperability (QTI) standards for electronic learning and assessment materials

developed by the IMS Global Learning Consortium (n.d.) and extensions thereof.

One type of activity within the PADI project involves taking existing assessments used for

science inquiry research and writing templates and task specification for those assessments.

We refer to this process as “reverse engineering,” in that the templates and task

specification that are developed from existing tasks could be used to reproduce the same

assessments or to produce new or analogous questions in the same or a similar domain.

This report presents the results of applying reverse engineering to two conceptual

assessments in science, the Force Concept Inventory (FCI; Hestenes et al., 1992) and the

Test about Particles in a Gas (TAP; Novick & Nussbaum, 1981). Both assessments are based

on research about student misconceptions, an area of cognitive psychology research on

expert and novice performance.

Section 2.0 provides a brief review of the theoretical basis for these assessments in the

novice-expert and misconceptions paradigms in cognitive psychology. Section 3.0

discusses the challenges that misconceptions research poses for assessment and the

benefits of making available to researchers the type of design patterns and templates that

PADI is producing. Section 4.0 summarizes the structure of PADI task templates. Section 5.0

discusses the development of a design pattern and template for the Force Concept

Inventory, a conceptual assessment of knowledge about Newtonian physics. The Student

Model, Evidence Model, and Task Model for the template are discussed. Section 6.0

describes the process of adapting the template for the Test about Particles in a Gas. Section

7.0 discusses a higher-level, less technical assessment design tool called a design pattern

(Mislevy et al., 2003), which describes a general class of tasks like the FCI and TAP tasks,

aimed at revealing student misconceptions. Section 8.0 closes with a summary of the

potential benefits to users of the design pattern and of the four templates that were

developed.

2 Novice-Expert Research

2.0 Novice-Expert Research

One of the dominant strains of cognitive psychology research from the mid-1970s through

the 1980s was the study of expert performance across numerous domains (for reviews, see

Charness & Schultetus, 1999; Ericsson & Charness, 1997). The basic premise of this line of

research was that if the characteristics of expert performance could be isolated and

identified, perhaps novices could be trained in the same specific knowledge, skills, and

abilities to move them closer to expert performance. Moving away from previous notions

of expertise as general and inborn, cognitive psychologists conducted research that led

them to see expertise as domain specific and acquired through extensive teaching and

practice (Ericsson & Smith, 1991; Ericsson, 1996). Whereas early expert-novice research

tended to consider only expert performance (e.g., in chess, de Groot, 1946/1978), later

research often contrasted experts and novices (e.g., in physics, Larkin, McDermott, Simon,

& Simon, 1980). Early research also focused on problem-solving in domains with clearly

delimited solutions and limited solution paths, called “well-defined domains” in the

literature. Physics, chess, and medicine, in particular, were studied often, but other well-

defined domains included occupations such as avionics technicians, waiters, and taxi

drivers.

In physics, several researchers contrasted physics professors with typical undergraduate

students. For example, Chi, Feltovich, and Glaser (1980) asked both physics professors and

undergraduate students to sort various physics problems. Whereas the professors sorted

the problems according to the physical laws that would be used to solve the problem (e.g.,

Newton’s third law), novices tended to sort them according to physical features of the

problem (e.g., pulley problems). In chess, Chase and Simon (1973) found that expert

players were better able than novices to reconstruct the positions of chess pieces from

memory, but only when the pieces were arrayed in actual game positions, not when they

were placed randomly on the board. In medicine, Patel and Groen (1991) found

developmental trends in medical expertise; whereas first-year medical students were not

even aware that a written case contained irrelevant information, second-year students

were distracted by that irrelevant information, while physicians recognized it as irrelevant.

In the course of developing a computer-based Intelligent Tutoring System (ITS) for military

aircraft electronics technicians, Gitomer and colleagues found that expert technicians used

a specific problem-solving strategy they termed “space splitting” (Steinberg & Gitomer,

1996). If there was an electrical fault in a line, expert avionics technicians would pick a

halfway point between the power source and the inoperable part (e.g., a wing flap or

aileron), and would test the circuit on both sides of the line. By continuing this process,

they could quickly identify the malfunctioning electrical component. Novice technicians,

by contrast, would use an inefficient process of checking each component in turn, one at a

time. Ericsson and Polson (1988) found that the expert waiter whom they studied had

developed detailed heuristics (e.g., males with certain builds were likely to order certain

types of steaks) and mnemonics for remembering diners’ orders. Expert taxi drivers studied

by Chase (1982) used mental imagery to determine the quickest route from the current

location to their destination.

Novice-Expert Research 3

These studies in well-defined domains established that expertise takes many years of

guided practice to develop, that experts have a larger base of declarative knowledge than

do novices, that their knowledge is better integrated and organized according to key

principles in the domain, that experts know more domain-specific strategies, that their use

of these strategies is automated, and that experts have more conditional knowledge about

strategies (that is, they know in what situations to enact a particular strategy).

A later body of expert-novice research applied these findings to domains that do not have

a single solution or finite set of solution strategies (called “ill-defined domains”), such as

reading (e.g., reading law cases, Lundeberg, 1987; reading poetry, Peskin, 1998), history

(e.g., Wineburg, 1991), teaching (e.g., Sabers, Cushing, & Berliner, 1991), and writing (e.g.,

Bruleux, 1991), with similar results.

Typically, researchers in novice-expert studies collect some sort of verbal report from

participants, such as a think-aloud protocol (participants verbalize everything they are

doing while performing a task), retrospective protocol (participants report after the fact

what they think they were doing), or conduct an interview. A few methodologists have

suggested that expertise researchers should simultaneously collect other process data,

such as recording computer keystrokes, what participants are looking at (using eye-

tracking devices), or reaction times (Ericsson & Smith, 1991; Magliano & Graesser, 1991).

Researchers also have used other methods, such as the sorting task described above for

Chi et al.’s (1980) physics expertise study.

One common finding across many expert-novice studies, especially in the sciences, is that

novices often have specific mistaken ideas about particular domains. In one highly

publicized example, 88% of graduating Harvard seniors surveyed believed that the seasons

were caused by Earth’s elliptical orbit around the sun, rather than by the tilt of the Earth on

its axis (Gardner, 1999). Termed “misconceptions,” these are ideas derived from daily

experience that students bring to their learning experiences and that contradict scientific

understandings and are often resistant to change.

Some other examples of misconceptions include: in biology, exercise makes breathing

faster but shallower (e.g., Michael, 1998); in subtraction, subtract the smaller number from

the larger, regardless of which number is being subtracted from (e.g., 725 – 569 = 244,

since 7 – 5 = 2, 6 – 4 = 4, and 9 – 5 = 4, e.g., Brown & van Lehn, 1980); in history, textbooks

are always correct and we can know “what happened” (e.g., Rouet, Britt, Mason, & Perfetti,

1996); in evolution, animal behavior changes offspring biology (that is, a Lamarckian belief;

e.g., Bishop & Anderson, 1990); and in Earth science, the Earth is a globe with a flat top

(e.g., Vosniadou & Brewer, 1992). Research about misconceptions thus has been a wide-

ranging and productive field for cognitive science research, and one that has been

frequently used in science inquiry research. This misconceptions research, however, has

rarely been used to design assessments in a principled manner.

4 The Challenge for Assessment

3.0 The Challenge for Assessment

Assessing students’ learning in a way that reveals misconceptions poses considerable

challenges for test designers. Students have many conceptions about any given domain,

and some of these conceptions (i.e., misconceptions) can interfere with subject matter

learning. However, research shows that students with misconceptions can nonetheless

frequently correctly answer questions on traditional assessments. Physics students, for

example, may be able to use algorithms to calculate answers to problems about which

they have no conceptual understanding (e.g., Clement, 1983; Hunt & Minstrell, 1994).

Assessments that are designed in light of the misconceptions literature, therefore, usually

eschew calculation problems, on the grounds that these can be solved without having any

conceptual understanding of the problem. In addition, because the purpose of assessing

students’ conceptions may be more diagnostic than normative, researchers (and teachers)

may want to know which specific misconceptions students have (see Frederiksen & White,

1988). This type of use for assessment therefore requires different types of Measurement

Models (and perhaps also Student Models) than does an end-of-semester physics

examination used for assigning grades.

Researchers who are studying science inquiry also may be particularly interested in

measuring conceptions, since inquiry activities (more so than didactic or cookbook lab

approaches) may reveal students’ misconceptions (e.g., Dalton, Morocco, Tivnan, & Mead,

1997; White & Frederiksen, 1998). However, developing psychometrically sound

measurement instruments of science learning that reveal students’ misconceptions is a

difficult and time-consuming task, and one at which test developers are not always

successful (see Cornely-Moss, 1995). Researchers who study science inquiry may be

spending much time, energy, and resources “reinventing the wheel” as they develop

measures for individual projects. In addition, researchers rarely have access to the kind of

sophisticated statistics and technology that make both Student Models and statistical

models more accurate and efficient. The goal of the PADI project is to address these needs

by creating structures, including design patterns and templates, to help assessors organize

their thinking about science learning into the shape of assessment arguments and tasks,

and to illustrate their use in a variety of contexts. In this technical report, we present four

templates and one design pattern that encompass misconception-based assessment in two

science domains: Newtonian physics and gas laws.

PADI Design Patterns and Task Templates 5

4.0 PADI Design Patterns and Task Templates

PADI design patterns and task templates build on the evidence-centered assessment design

(ECD) models of Mislevy, Steinberg, and Almond (2003). A good starting point is a quote

from Messick (1994):

A construct-centered approach [to assessment design] would begin by asking what

complex of knowledge, skills, or other attributes should be assessed, presumably

because they are tied to explicit or implicit objectives of instruction or are otherwise

valued by society. Next, what behaviors or performances should reveal those

constructs, and what tasks or situations should elicit those behaviors? Thus, the

nature of the construct guides the selection or construction of relevant tasks as well

as the rational development of construct-based scoring criteria and rubrics. (p. 16)

A PADI design pattern lays out, at a conceptual level, coherent sets of possibilities for the

elements of the assessment argument outlined in the Messick quotation, organized

around some aspect of scientific inquiry or conceptual knowledge. For example, there are

the targeted knowledge, skills or abilities (targeted KSAs); things that students might say,

do, or make that provide evidence of these KSAs (potential observations); and

characteristic features of task situations in which these observations might be made. The

design pattern structure is described and illustrated in detail in PADI Technical Report 1,

Design Patterns for Assessing Science Inquiry (Mislevy et al., 2003). This brief description,

however, should suffice to understand the example in Section 7.0.

More attention is focused in this presentation on task templates. The reader is referred to

PADI Technical Report 3, An Introduction to PADI Task Templates (Riconscente, Mislevy,

Hamel, & PADI Research Group, 2005). Task templates are organized around three basic

models in the Mislevy et al. (2003) evidenced-centered assessment framework—namely,

the Student Model, the Evidence Model, and the Task Model:

The Student Model contains variables that correspond to knowledge, skills, and

abilities of an examinee about which inferences will be made—decisions about

selection, placement, certification, instruction, task selection, and so on.

The Evidence Model is a set of instructions for interpreting the response (Work

Product) to a specific task. The Evidence Model contains two parts. The first is a

series of Evaluation Procedures that describe how to identify and evaluate essential

features of the Work Product. The second is a Measurement Model that tells how the

belief about the Student Model Variables for a given student should be updated in

light of the observed features of that student’s responses.

The Task Model is a generic description of a family of tasks. A Task Model contains

(1) a list of variables that are used to describe key features of the tasks, such as their

content, difficulty, and conditions under which they are presented; (2) a collection of

Presentation Material specifications that describe the structure and format of

material that will be presented to the participant as directions, stimulus, prompt, or

instruction; and (3) a collection of Work Product specifications that describe the

structure and format of traces of what the student will say, do, or make that will be

6 PADI Design Patterns and Task Templates

evaluated (e.g., item choices, sequence of solution steps, written explanations,

videotapes of performances).

Figure 1 shows the constituent objects in a PADI template, which together encompass the

Student, Evidence, and Task Models described above. The interested reader is referred to

Riconscente et al. (2005) for detailed explanations of the structure of templates. This

diagram, however, will help in understanding the examples of templates presented in the

appendices, by indicating the hierarchical structure of the pieces that make up a template.

Figure 1: The Hierarchical Structure of PADI Templates

Design Patterns

Activity

Measurement Models

Student Model

SM Variables

Observable Variables

Evaluation Phases

Materials & Presentation

Work Products

Evaluation Procedures

TM Variables

Task Model

TEMPLATE

Illustrations from Newtonian Physics 7

5.0 Illustrations from Newtonian Physics

One domain in which much work on misconceptions has been done is Newtonian physics.

Several studies have shown that students can solve typical classroom quantitative

problems in Newtonian physics, while failing to understand basic Newtonian principles

(see, for example, DiSessa, 1993; White & Frederiksen, 1998; Clement, 1983). Newtonian

motion problems can be divided into problems that consider an object moving in a

horizontal plane (e.g., a hockey puck moving across ice), problems that concern only

gravity acting on an object (either a still object or a moving object), and problems

concerning horizontal motion and the vertical force of gravity acting simultaneously. A

typical undergraduate physics assessment question asks whether students can correctly

compute the time, distance, acceleration, or force needed to move an object:

Note: From UC Berkeley Instructional Technology Program, 1994.

The following are common misconceptions in Newtonian physics:

“Impetus,” the pre-Newtonian notion that as long as a body is in motion, a force

must be acting on it (McCloskey, 1983). Students may be able to correctly calculate

the amount of force required to set a body in motion—e.g., to move a 10-kg box

from rest—but they might believe that force is required to keep the box moving. In

actuality, once the box has accelerated to its final velocity (assuming a frictionless

world), no force is required to keep it in motion.

Heavier objects exert more force (diSessa, 1993). (It does, of course, require more

force to start a heavier object in motion.)

Objects not at rest must have no forces acting on them (Clement, 1983). In actuality,

when an object such as book sits on a table, gravity (a force) pulls down on the

book, and the table pushes back up with exactly the same amount of force against

gravity.

Heavier objects fall faster than lighter ones (the belief tested in Galileo’s perhaps

apocryphal experiment at the leaning tower of Pisa).

The “Wile E. Coyote” misconception—the notion that “Any body suspended in space

will remain in space until made aware of its situation” (Hope, 1994).

Falling objects that are also moving horizontally fall straight down or in right-angle

or circular arcs. A student might be able to calculate the horizontal distance that an

A projectile is fired horizontally from a flare gun located 45.0 m above the ground. The projectile’s speed as it leaves the gun is 250 m/s.

(a) How long does the projectile remain in the air? (b) What horizontal distance does the projectile travel before striking the ground? (c) What is its speed as it strikes the ground? (d) If the projectile were simply dropped from a height of 45.0 m, instead of fired

horizontally from that height, how much time would it take to reach the ground? How does this compare with your answer to part (a)?

8 Illustrations from Newtonian Physics

object travels but misunderstand that objects that move both horizontally at a

constant speed and vertically under the force of gravity always move in a parabola.

5.1 The Force Concept Inventory

In 1992, David Hestenes and colleagues published a multiple-choice measure of students’

conceptual knowledge of Newtonian physics called the Force Concept Inventory, or FCI

(Hestenes et al., 1992). The FCI has been widely used as an undergraduate physics pre- and

posttest to measure whether students truly understand motion or whether they can

simply do calculations but lack a fundamental understanding. The FCI consists of 30

multiple-choice Newtonian physics problems that do not require any calculation but are

designed to tap students’ understanding of various aspects of Newtonian mechanics and

circular motion. The measure was constructed by reviewing prior research on correct

Newtonian conceptions and specific student misunderstandings. In the original

publication, 23 specific Newtonian force concepts were listed (Hestenes et al., 1992, Table

I) and used to construct correct options for the FCI. Additionally, 30 specific

misconceptions were listed (Table II) and used to construct incorrect options for the same

items. The Force and Motion Conceptual Evaluation (FMCE; Thornton & Sokoloff, 1998) is a

similar assessment.

The following sections describe elements necessary for task templates that describe FCI-

type tasks—that is, reverse-engineered templates that might be thought of as more

general structures from which the actual tasks could have been derived. More substantive

descriptions follow; PADI design system screen shots of actual PADI templates appear in

the appendices.

5.2 Student Model

Hestenes and colleagues (1992) originally conceived of a univariate Student Model—

students are Newtonian to either a greater or lesser extent (i.e., they answer more or fewer

FCI questions correctly). Bao and Redish (2001) suggested an alternative interpretation: A

student can be in multiple belief states (e.g., Newtonian reasoning; Galilean, or “impetus”

reasoning; and nonscientific, including Aristotelian reasoning) simultaneously, applying

one in any given situation with a probability that depends partly on the student’s

propensity to reason through each of the perspectives and partly on the features of the

situation. The Student Model Variables characterize the students’ propensities to reason

through each of the perspectives. Task parameters, discussed below, characterize a task’s

tendency to evoke thinking through the same perspectives.

This approach is similar to other multivariate models used in developmental and cognitive

psychology. For example, Robert Siegler (1976) has analyzed children’s developing ability

to solve problems in which different weights are put on either side of a balance beam.

Children at the same stage of development may give different answers to the same

balance beam problem, and children at different stages may give similar answers.

However, overall, children in the same stage show a similar distribution of answers that are

characteristic of their developmental stage. Thus, with the FCI we could imagine a student

who answers like a Newtonian 70% of the time, like a Galilean 20% of the time, and in a


nonscientific manner 10% of the time, recognizing that which kind of answer a student

gives also depends on the features of the task at hand.

In the PADI design system, this Student Model has been named the “FCI-ish Student

Model.” Students have propensities to respond to tasks in a certain class (e.g., FCI and

FMCE items) in terms of three specific conception models: Newtonian, Galilean, and

nonscientific. Formally, each student i is characterized by a vector of three real numbers,

(θi1, θi2, θi3), where higher numbers indicate a greater propensity toward a given response

class—in this case, Newtonian, Galilean, and nonscientific. Statistically, the model can be

made identified by implicitly fixing the sum of each examinee’s three parameters at zero,

or fixing the first parameter to zero. The latter approach is taken in the example illustrated

here, so there are only two Student Model Variables to be estimated, those for the Galilean

and nonscientific propensities.

5.3 Evidence Model

5.3.1 Evaluation Procedures and Evidence Rules

The FCI was originally a multiple-choice measure; evidence about the Student Model

consisted of which multiple-choice option was chosen. With the multivariate Student

Model approach we have taken in PADI, these evidence rules are expanded. Specifically,

each multiple-choice option has been mapped to known Newtonian conceptions

(Hestenes et al., 1992, Table I) and non-Newtonian misconceptions (which are further

subdivided into the Galilean and nonscientific misconceptions contained in Table II).

Although we have chosen to analyze the FCI in its original multiple-choice format, it would

be easy to adapt it for open-ended responses, which would be coded. These codes would

then be mapped to specific conceptions. We will see later that just this approach was taken

by the designers of an analogous measure, the Test about Particles in a Gas (TAP; Novick &

Nussbaum, 1981).

5.3.2 Measurement Model

The Measurement Model for the FCI was originally a simple additive model. A multivariate

statistical model for the FCI was recently investigated by Huang (2003) in dissertation

research. Huang used an Andersen/Rasch (A/R) multivariate model (Andersen, 1973), in

which a student is seen as being in several belief states, each with a certain propensity, and

each FCI question is seen as evoking belief states, each question with a certain tendency to

evoke responses of the different types.

The A/R model takes the following form for a situation in which there are m response types

and student and task parameters that correspond to them. Let Xij (i = 1, …, n: j = 1, …, k) be

independent observable random variables (i is the index for examinees, j is the index for

items), where Xij can be any integer between 1 and m. The probability that the response Xij

that student i produces for task j is of type p is given as

∑=

++==m

pjpipjpipij pXP

1)exp(/)exp()( βθβθ ,


where: p is an integer between 1 and m, indicating response class;

ipθ is the pth element in person i’s vector-valued parameter; and

jpβ is the pth element in item j’s vector-valued parameter.

Note that there are m probabilities for each examinee on a given item, representing the

probability of that person’s making any particular choice on that item.

The Andersen/Rasch model can be written as a special case of the Multidimensional

Random Coefficients Multinomial Logit Model (MRCMLM) used by the PADI project

(Adams, Wilson, & Wang, 1997). For more information, see Huang (2003).

5.4 Task Models

In the PADI design system, we have created four task templates related to misconception

research. All have the same Student and Evidence Model structures but differ as to the Task

Models. Three Task Models are for particular types of FCI problems (Hestenes et al., 1992),

and one is for the TAP (Novick & Nussbaum, 1983). In these Task Models, we do not create

new items or content that goes beyond the work done by the authors of the measures.

Rather, we write a more general set of task template structures (similar to item

specifications) that can enable researchers—and perhaps teachers—to create analogous

measures in their own domains of interest.

The three templates for FCI-type problems correspond to (a) problems involving only

gravity and no horizontal motion, which we refer to as G Problems; (b) problems involving

only horizontal motion, which do not tap students’ knowledge of gravity, which we refer to

as H Problems; and (c) problems that tap students’ knowledge of both horizontal motion

and gravity, which we call H & G Problems.

5.4.1 G Problems

A typical G Problem from the FCI is shown in Figure 2. The template titled Force

Problems—Gravity Only refers to the state of motion of the object (up, down, or no

motion), the duration of motion, friction, the identity of the vertical force (e.g., gravity, the

force of an elevator cable pulling up), whether or not there is an illustration used in the

problem, the mass and identity of the object being moved, the speed at which the object

is being moved, and the time period of interest (e.g., how long it takes an object to fall).

(Appendix A presents templates and related task specification for Force Problems—Gravity

Only.) For example, the problem shown in Figure 2 involves the continuous upward

motion of an elevator at an unspecified speed, no friction, the upward force of the elevator

cable, the downward force of gravity, and an illustration.


Figure 2: An FCI Problem That Tests Students’ Knowledge of Gravity But DoesNot

Involve Horizontal Motion

Note. From “Force Concept Inventory,” by D. Hestenes, M. Wells, & G. Swackhamer, 1992, The Physics Teacher, 30, pp.

141-158. Reprinted with permission.

Some of these Task Model Variables are important in describing incidental features of the

task situation, which may be varied to provide a range of tasks that are different on the

surface but similar as to the thinking they tend to evoke. The identity of the object is such a

variable. Other Task Model Variables are important because they characterize features that

are linked to common misconceptions. State of motion—whether an object is at rest,

moving up, or moving down—is an example of this type. Many students believe that when

an object is at rest, there are no forces acting on it.

G Problems in the FCI ask about the forces acting on an elevator being pulled up a shaft, a

steel ball that has been tossed straight up, a stone dropped from the roof of a building, a

tennis ball at an instant after it has been hit, a metal ball at an instant inside a tube, and an

office chair at rest on a floor. This particular template helps a researcher to create more FCI-

like G Problems. The researcher creates a situation of the kind described in the Task Model

and describable by the Task Model Variables. For multiple-choice tasks like those on the

FCI, options are constructed by giving one or more predictions or explanations for the

situation, each consistent with Newtonian reasoning, with Galilean impetus reasoning, or

with nonscientific (e.g., Aristotelian) reasoning. As illustrated in Section 7.0, one can reason

by analogy to another science domain in which students have misconceptions to create

one or more new templates from which to develop an entirely new measure.


5.4.2 H Problems

A typical H Problem from the FCI is shown in Figure 3. The template titled Force

Problems—Horizontal refers to the identity, direction, amount, and duration of force

applied; the mass and identity of the object being moved; whether an illustration is

included; the speed and direction of the object being moved; and the time period of

interest (before, during, or after the force is applied). (See Appendix B for the Force

Problems—Horizontal template.) Note that the G Problem and H Problem templates have

seven Task Model Variables in common: the direction of motion, duration, friction, the

identity of the force applied, whether an illustration is included, the mass of the object

being moved, and the time period of interest. For example, the problem shown in Figure 3

involves an instantaneous “kick” of unspecified force delivered at right angles to the

direction of motion of a hockey puck; the puck’s speed and mass are not specified

(although we can assume it is lighter than, say, a rocket); we are told to imagine a

frictionless situation; and the problem is illustrated.

Figure 3: An FCI Problem That Tests Students’ Knowledge of Horizontal Motion But

Does Not Involve Gravity



Task model variables that are particularly important in eliciting misconceptions are the

direction and duration of the force. Table 1 shows examples of how certain combinations

of task model variables for a Newton’s third law problem (“for every action there is an

equal and opposite reaction”) can tend to elicit specific conceptions or misconceptions

(see Hammer & Elby, 2003).


Table 1: Conceptions That Tend to Be Provoked by Combinations of Task Model

Variables

Object That Does the Hitting

Light Heavy

Light Correct Newtonian conception Misconception of more force

from the heavier object

Object

That Gets

Hit Heavy Misconception of more force

from the heavier object

Correct Newtonian

conception

Object That Does the Hitting

Slow Fast

Slow Correct Newtonian conception Misconception of more force

from the faster object

Object

That Gets

Hit Fast Misconception of more force

from the faster object

Correct Newtonian

conception

H problems in the FCI ask about a hockey puck that receives a kick, a rocket in space whose

engine turns on, a car that is pushing a truck, a truck that collides with a car, a woman

pushing a box, and two students on office chairs who push away from each other. This

template likewise would help a researcher create more FCI-like H Problems or to create a

new measure.

5.4.3 Force Problems—Horizontal and Gravity

The third Task Model we created was for more complex problems that require students to

apply their knowledge of both horizontal motion and the effect of gravity. A typical H & G

Problem from the FCI is shown in Figure 4. The template titled Force Problems—Horizontal

and Gravity (see Appendix C) refers to the identity, direction, and amount of force applied;

the mass and identity of the object being moved; whether an illustration is included; the

direction of motion; and the shape of the path of motion of the object being moved. Note

that several task model variables from the G Problem and H Problem templates are missing

from this more complex template: the duration of force applied, the speed and direction of

the object; the mass, size, and identity of the object; and the time period of interest

(before, during, or after force is applied). For example, the problem shown in Figure 4

involves the instantaneous force of a cannon (the amount of force is not specified)

shooting a (presumably heavy) cannonball; the shot is made in a forward direction parallel

to the ground; the question asks examinees to determine the shape of the path of motion;

and an illustration is included.


Figure 4: An FCI Problem That Tests Students’ Knowledge of Both Horizontal Motion

and Gravity



H & G Problems in the FCI ask about a cannonball shot out of a cannon, balls rolling off a

cliff, a bowling ball dropped from a flying airplane, and a tennis ball hit into the wind.

Similar to the other two templates, this template would allow a researcher to create more

FCI-like H & G Problems or to create a new measure.

Illustrations from Gas Laws 15

6.0 Illustrations from Gas Laws

A second scientific domain in which students have been shown to hold misconceptions is

the gas laws—systematic relationships among the pressure, volume, and temperature of

gases in closed containers. Some misconceptions that have been observed include the

notion that in a partial vacuum gases “rise to the top” or “sink to the bottom” of their

container (see Benson, Wittrock, & Baur, 1993; Lin, Cheng, & Lawrenz, 2000; Mas, Perez, &

Harris, 1987; Meheut, 1997). On the basis of several interview studies, Novick and

Nussbaum (1978, 1981) identified five core beliefs that make up a scientific conception of

gas behavior: that gases are made up of particles; that there is empty space between the

particles; that the particles are uniformly distributed in a closed container; that the

particles are in constant motion, and that when a gas becomes a liquid, there is a change in

the density of the collection of particles.

Novick and Nussbaum (1981) constructed the Test about Particles in a Gas (TAP), a

noncomputational measure of students’ conceptual knowledge about gas behavior. It is

an eight-question measure combining multiple-choice questions with drawing tasks and

other constructed-response questions that are scored by coders into conceptual

categories. A sample problem from the TAP is shown in Figure 5.

Figure 5: A TAP Problem That Tests Students’ Knowledge of the Molecular Theory of

Gases

Note. From “Pupils’ understanding of the particulate nature of matter: A cross-age study,” by S. Novick & J. Nussbaum,

1981, Science Education, 65(2), 187-196.

16 Illustrations from Gas Laws

Like the FCI, the TAP is designed to reveal students’ conceptions about how gases behave,

not to test their ability to compute typical gas law problems. For contrast, a typical high

school assessment question asks whether students can correctly compute the mass,

volume, or pressure of a gas under certain conditions, or to identify a gas given

information about its temperature, volume, and pressure:

Note: Selected gas law problems from Diamond Bar High School, Walnut Valley Unified School District, CA (Park, 1996)

As with the FCI, we “reverse engineered” a design template for constructing a TAP-like

assessment that uses the research base about student misconceptions to develop a

principled assessment (see Appendix D).

6.1 Student Model

As with the FCI, we propose that students can be seen as being in multiple belief states,

each with a certain propensity. For example, students might simultaneously believe that

particles in a gas are uniformly distributed in a closed container and also believe that

particles are non-uniformly distributed, each with a certain propensity. As with the FCI, the

original Student Model was a simple univariate model (students are characterized as either

scientific or nonscientific).

6.2 Evidence Model

Because the TAP includes both multiple-choice and constructed-response questions, its

Evidence Model is more complicated than those of the three FCI templates.

6.2.1 Evaluation Procedures and Evidence Rules.

The evidence rules for multiple-choice items on the TAP are analogous to those for the FCI

Evidence Models—evidence about the student corresponds to which multiple-choice

option was selected. For constructed-response items, however, the responses must be

coded by a coder into a priori categories that correspond to model classifications that are

mapped onto Novick and Nussbaum’s (1981) five principles—that is, coded in terms of

correct understandings or misunderstandings that are keyed to one or more of the five

targeted principles. For example, when students make a drawing of gas in a container,

coders have to rate the drawing as showing a particulate vs. continuous conception of

gases (see also Benson et al., 1993).

6.2.2 Measurement Model

We propose that the Andersen/Rasch Measurement Model (Andersen, 1995) could likewise

be applied to the TAP. Each student would be modeled as being in several belief states,

105. 1.09 g of H2 is contained in a 2.00 L container at 20.0 °C. What is the pressure in this container in mm Hg?

118. Determine the number of grams of carbon dioxide in a 450.6 mL tank at 1.80 atm and minus 50.5 °C. Determine the number of grams of oxygen that the same container will contain under the same temperature and pressure.

132. If 9.006 grams of a gas are enclosed in a 50.00 liter vessel at 273.15 K and 2.000 atmospheres of pressure, what is the molar mass of the gas? What gas is this?

Illustrations from Gas Laws 17

each with a certain propensity; each problem would also be seen as provoking certain

belief states, each with a certain propensity. As with the FCI, the original statistical model

for the TAP was a simple additive model.

6.3 Task Model

The Task Model for TAP questions has two variables in common with the FCI Task Model:

whether an illustration is used and the nature of the substances involved in the problem.

Many of the task model variables are similar to those that would be used in a conventional

calculation problem (e.g., pressure, volume, and temperature in a closed system); however,

TAP problems never give numerical quantities (similar to FCI problems). As in the FCI,

students make predictions about or explanations of gas behavior. Interestingly, Novick and

Nussbaum (1981) found that problem number 8 (shown in Figure 5) was often answered

incorrectly, even by students who demonstrated a correct conception of uniform

distribution of gases in a rigid container. An additional task model variable is therefore the

rigidity of the container (e.g., a rigid glass container vs. a flexible balloon). TAP problems

also vary the response format (e.g., multiple choice vs. drawing). As with the FCI and

mechanics principles, each TAP problem links to only one of the five scientific gas

principles.

18 A General Design Pattern

7.0 A General Design Pattern

We see the three FCI templates (H, G, and H & G) and the TAP as instantiations of a general

type of scientific reasoning that Stewart refers to as “Model Using” (Stewart & Hafner, 1994;

Stewart, Hafner, Johnson, & Finkel, 1992; see Appendix E). In Model Using, students reason

through a given model about a scientific problem. These problems do not present

anomalous data that challenge students’ existing models; rather, they provide practice on

applying targeted scientific models. In Model Using, students may consolidate their

understanding of a model and how to apply it to problems.

Model Using is itself an instantiation of a general class of scientific reasoning involving

models, called “model-based reasoning.” In addition to Model Using, Stewart and

colleagues have identified Model Elaboration and Model Revising (Stewart & Hafner, 1994).

Model Elaboration and Model Revising differ from Model Using primarily in that these

types of reasoning involve anomalous data that do not fit students’ existing models. Model

Using, on the other hand, involves practice in applying a model to a situation that does not

contradict the targeted model.

We see Model Using as a flexible design pattern that can encompass student reasoning

with both correct (i.e., scientific) and partially correct (e.g., DiSessa’s [1993] “p-prims”)

models. Model Using requires knowledge of a model, but especially conditional

knowledge about when the model can be applied to a situation (see also Larkin & Simon,

1987). In addition, students need domain-specific or general knowledge to use models.

Students can use knowledge of the model and their domain-specific knowledge together

to make predictions about and explanations of phenomena that the model applies to. For

example, given a model of gravity as a force that pulls objects downward and general

knowledge about elevators (e.g., that they are heavy, that they are attached to cables that

are moved by pulleys), students can reason through their gravity model to make

predictions about or explanations of the motion of an elevator (e.g., they could predict that

if the elevator cable were cut, the elevator would fall because of gravity).

The FCI and TAP templates, in addition to using Model Using, also depend on a preexisting

body of research that has identified student misconceptions for a particular domain. These

are required in order to construct distractors that will appeal to students who do not hold

scientific conceptions. For example, the TAP distractors were chosen on the basis of Novick

and Nussbaum’s (1978) interview study; the FCI distractors were built from a large body of

prior studies by researchers such as Clement (1983), Frederiksen and White (1988), Hunt

and Minstrell (1994), McCloskey (1983), and others. Templates could specifically point to

misconceptions that could be used to construct distractors.

Final Comments 19

8.0 Final Comments

The ultimate goal of the design pattern and task templates is that researchers, teachers, and

test developers can have worked-out samples of assessments that are theory based,

psychometrically sound, and less burdensome than a “reinvent the wheel” approach, and

that make optimal use of technology. For example, the Model Using design pattern and the

four associated templates developed here are based on research on students’

misconceptions in science—one application of the novice-expert paradigm in cognitive

psychology. These filled-in PADI task design structures are able to take a significant part of

the assessment development burden off of task designers, whether they choose to use the

templates as is or adapt them.

Tasks developed from these templates, such as the illustrations above or new ones built

around the same design structures, are grounded in a branch of cognitive research in

science learning. They could be developed either for informal use in the classroom or for

larger-scale and more formal use with psychometric models. In assessment systems that

follow this latter route, the templates’ theory-based Student Models and aligned Evidence

Models are psychometrically sound. They take advantage of sophisticated statistical

models (e.g., Rasch models, Bayes nets) that need not be developed afresh and the details

of which can be made invisible to the task designer and end user. Finally, because PADI

templates can be expressed in transportable form (specifically, they can be represented in

XML format), they are simple to share and access via the Web and to use in systems that

take advantage of automated student record keeping, adaptivity, and other technological

features.

20 References

References

Adams, R. J., Wilson, M. & Wang, W. C. (1997). The Multidimensional Random Coefficients

Multinomial Logit Model. Applied Psychological Measurement, 21, 1-23.

Andersen, E. B. (1973). Conditional inference and models for measuring. Copenhagen: Danish

Institute for Mental Health.

Andersen E.B. (1995). Residual analysis in the polytomous Rasch model. Psychometrika, 60,

375-393.

Bao, L., & Redish, E. F. (2001). Model analysis: Assessing the dynamics of student learning.

Unpublished manuscript. Retrieved November 25, 2003 from http://www.physics.ohio-

state.edu/~lbao/archive/papers/MAPre_01-0805.pdf

Benson, D. L., Wittrock, M. C., & Baur, M. E. (1993). Students’ preconceptions of the nature of

gases. Journal of Research in Science Teaching, 30(6), 587-597.

Bishop, B. A., & Anderson, C. W. (1990). Student conceptions of natural selection and its role

in evolution. Journal of Research in Science Teaching, 27(5), 415-427.

Brown, J. S., & Van Lehn, K. (1980). Repair theory: A generative theory of bugs in procedural

skills. Cognitive Science, 4, 379-426.

Bruleux, A. (1991). The analysis of writers’ think-aloud protocols: Developing a principled

coding scheme for ill-structured tasks. In G. Denhiere & J. P. Rossi (Eds.), Text and text

processing. Amsterdam: Elsevier.

Charness, N., & Schultetus, R. S. (1999). Knowledge and expertise. In F. T. Durso, R. S.

Nickerson, R. W. Schvaneveldt, S. T. Dumais, D. S. Lindsay, & M. T. H. Chi (Eds.), Handbook of

applied cognition (pp. 57-81). Chichester: John Wiley & Sons.

Chase, W. G. (1982). Spatial representations of taxi drivers. In D. R. Rogers & J. A. Sloboda

(Eds.), Acquisition of symbolic skills (pp. 111-136). New York: Plenum Press.

Chase, W. G., & Simon, H. A. (1973). The mind’s eye in chess. In W. G. Chase (Ed.), Visual

information processing (pp. 215-281). New York: Academic Press.

Chi, M. T., Feltovich, P. J., & Glaser, R. (1980). Categorization and representation of physics

problems by experts. Cognitive Science, 5, 121-152.

Clement, J. (1983). A conceptual model discussed by Galileo and used intuitively by physics

students. In D. Gentner & A. L. Stevens (Eds.), Mental models (pp. 325-340). Hillsdale, NJ:

Erlbaum.

Cornely-Moss, K. A. (1995). Kinetic theory of gases. Journal of Chemical Education, 72(8),

715-716.

Dalton, B., Morocco, C. C., Tivnan, T., & Mead, P. L. R. (1997). Supported inquiry science:

Teaching for conceptual change in urban and suburban science classrooms. Journal of

Learning Disabilities, 30(6), 670-684.

References 21

De Groot, A. (1946/1978). Thought and choice in chess. The Hague: Mouton.

DiSessa, A. A. (1993). Toward an epistemology of physics. Cognition and Instruction, 10(2 &

3), 105-225.

Ericsson K. A. (Ed.). (1996). The road to excellence: The acquisition of expert performance in the

arts and sciences, sports, and games. Hillsdale, NJ: Erlbaum.

Ericsson, K. A., & Charness, N. (1997). Cognitive and developmental factors in expert

performance. In P. J. Feltovich, K. M. Ford, & R. R. Hoffman (Eds.), Expertise in context: Human

and machine (pp. 3-41). Menlo Park, CA: AAAI Press.

Ericsson, K. A., & Polson, P. G. (1988). An experimental analysis of the mechanisms of a

memory skill. Journal of Experimental Psychology: Learning, Memory and Cognition, 14, 305-

316.

Ericsson, K. A., & Smith, J. (1991). Prospects and limits of the empirical study of expertise: An

introduction. In K. A. Ericsson & J. Smith (Eds.), Toward a general theory of expertise:

Prospects and limits (pp. 1-38). New York: Cambridge University Press.

Frederiksen, J. R., & White, B. Y. (1988). Implicit testing within an intelligent tutoring system.

Machine-Mediated Learning, 2, 351-372.

Gardner, H. (1999). The disciplined mind. New York: Simon & Schuster.

Hammer, D., & Elby, A. (2003). Tapping epistemological resources for learning physics. The

Journal of the Learning Sciences, 12, 53-90.

Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force Concept Inventory. The Physics

Teacher, 30(3), 141-158.

Hope, P. (1994). The laws of cartoon physics. Retrieved June 29, 2005, from

http://funnies.paco.to/cartoon.html

Huang, C.-W. (2003). Psychometric analyses based on evidence-centered design and

cognitive science of learning to explore students’ problem-solving in physics. Unpublished

doctoral dissertation, University of Maryland, College Park.

Hunt, E., & Minstrell, J. (1994). A cognitive approach to the teaching of physics. In K. McGilly,

(Ed.), Classroom lessons: Integrating cognitive theory and classroom practice (pp. 51-73).

Cambridge, MA: MIT Press.

Larkin, J., McDermott, J., Simon, D. P., & Simon, H. A. (1980, June 20). Expert and novice

performance in solving physics problems. Science, 208, 1335.

Larkin, J., & Simon, H. (1987). Why a diagram is (sometimes) worth ten thousand words.

Cognitive Science, 11, 65-99.

Lin, H., Cheng, H., & Lawrenz, F. (2000). The assessment of students and teachers’

understanding of gas laws. Journal of Chemical Education, 77(2), 715-716.

22 References

Lundeberg, M. A. (1987). Metacognitive aspects of reading comprehension: Studying

understanding in legal case analysis. Reading Research Quarterly, 22(4), 407-432.

Magliano, J. P., & Graesser, A. C. (1991). A three-pronged method for studying inference

generation in literary text. Poetics, 20, 193-232.

Mas, C. J. F., Perez, J. H., & Harris, H. H. (1987). Parallels between adolescents’ conception of

gases and the history of chemistry. Journal of Chemical Education, 64(7), 616-618.

McCloskey, M. (1983). Naive theories of motion. In D. Gentner & A. L. Stevens (Eds.), Mental

models (pp. 299-324). Hillsdale, NJ: Erlbaum.

Meheut, M. (1997). Designing a learning sequence about a prequantitative kinetic model of

gases: The parts played by questions and by a computer-simulation. International Journal

of Science Education, 19, 647-660.

Messick, S. (1994). The interplay of evidence and consequences in the validation of

performance assessments. Educational Researcher, 23(2), 13-23.

Michael, J. A. (1998). Students' misconceptions about perceived physiological responses.

American Journal of Physiology , 274, (Advances in Physiology Education, 19), S90-S98.

Mislevy, R. J., Hamel, L., Fried, R., Gaffney, T., Haertel, G., Hafter, A., et al. (2003). Design

patterns for assessing science inquiry (PADI Technical Report 1). Menlo Park, CA: SRI

International.

Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational

assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3-67.

Novick, S., & Nussbaum, J. (1978). Junior high school pupils’ understanding of the

particulate nature of matter: An interview study. Science Education, 62(3), 273-281.

Novick, S., & Nussbaum, J. (1981). Pupils’ understanding of the particulate nature of matter:

A cross-age study. Science Education, 65(2), 187-196.

Park, J. L. (1996). Gas law problems: Ideal gas law. Retrieved June 29, 2005 from

http://dbhs.wvusd.k12.ca.us/webdocs/GasLaw/WS-Ideal.html.

Patel, V. L., & Groen, G. J. (1991). The general and specific nature of medical expertise: A

critical look. In K. A. Ericsson & J. Smith (Eds.), Toward a General Theory of Expertise: Prospects

and Limits (pp. 93-125). New York: Cambridge University Press.

Peskin, J. (1998). Constructing meaning when reading poetry: An expert-novice study.

Cognition and Instruction, 16(3), 235-263.

Riconscente, M. M., Mislevy, R. J., Hamel, L., & PADI Research Group. (2005). An introduction

to PADI task templates (PADI Technical Report 3). Menlo Park, CA: SRI International.

Rouet, J. F., Britt, M. A., Mason, R. A., & Perfetti, C. A. (1996). Using multiple sources of

evidence to reason about history. Journal of Educational Psychology, 88, 478-493.

References 23

Sabers, D. S., Cushing, K. S., & Berliner, D. C. (1991). Differences among teachers in a task

characterized by simultaneity, multidimensionality, and immediacy. American Educational

Research Journal, 28(1), 63-88.

Steinberg, L. S., & Gitomer, D. G. (1996). Intelligent tutoring and assessment built on an

understanding of a technical problem-solving task. Instructional Science, 24, 223-258.

Stewart, J., Hafner, R., Johnson, S., & Finkel, E. (1992). Science as model building: Computers

and high-school genetics. Educational Psychologist, 27(3), 317-336.

Thornton, R. K., & Sokolofff,, D. R. (1998) Assessing student learning of Newton’s laws, The

Force and Motion Conceptual Evaluation and the evaluation of active learning laboratory

and lecture curricula. American Journal of Physics, 66, 338-352.

UC Berkeley Instructional Technology Program. (1994). The Interactive Physics Problem Set.

Retrieved October 27, 2003 from http://ist-

socrates.berkeley.edu:7521/projects/IPPS/Ch4/Prob6/Q.html

Vosniadou, S., & Brewer, W. F. (1992). Mental models of the earth: A study of conceptual

change in childhood. Cognitive Psychology, 24, 535-585.

White, B. Y., & Frederiksen, J. R. (1998). Inquiry, modeling, and metacognition: Making

science accessible to all students. Cognition and Instruction, 16(1), 3-18.

Wineburg, S. (1991). Historical problem solving: A study of the cognitive processes used in

the evaluation of documentary and pictorial evidence. Journal of Educational Psychology,

83(1) 73-87.

A P P E N D I X A

PADI Template for Force Problems— Gravity Only

Appendix A 25

Appendix A

The PADI template for “Force Problems—Gravity Only” is reproduced below. The first

illustration shows the top layer of the template. It provides an overview of all of the objects

in the hierachically organized schema (see Figure 1 of the text). The seven following

illustrations show lower layers in the template that are linked to the top layer. They specify,

in turn, information about the Student Model, the Measurement Model, and the Task

Model. The Student Model information is briefly described in the “Student Model

Summary” section in the top layer, and detailed in the subsequent linked section that gives

fuller details about the Student Model. The summary explains the multivariate Student

Model proposed above, while the linked Student Model section defines in detail the model

identification and the covariance matrix.

Measurement Model information is briefly described in the top layer’s “Evaluation

Procedures Summary” (evidence rules) and “Measurement Model Summary” (statistical

model) sections. In more detailed sections of the template, the Evaluation Procedures

explain the mapping of the multiple-choice items to Newtonian, Galilean, and

nonscientific conception categories. The Measurement Model explains the

Andersen/Rasch model.

Task Model information is briefly described in the top layer’s “Activities,” “Activity

Sequencing,” “Template-level Task Model Variables,” and “Task Model Variable Settings.”

The primary activity for this template is “Make explanations and predictions from a physical

situation,” which is linked to the template. The Task Model Variables are linked to this

particular template, with template-specific comments (e.g., the “Direction of Motion” Task

Model Variable in this template is always vertical).

In addition, the template includes links to the Model Using design pattern, relevant

research, Web resources (e.g., the Web site for the FCI), and other print resources, such as

the Bao and Redish (2001) paper.

Figure A-1: PADI Template “Force Problems—Gravity Only”

(continued)

26 Appendix A

Figure A-1: PADI Template “Force Problems—Gravity Only” (continued)

Appendix A 27

Figure A-2: PADI Task Model Variable “Direction of Motion”

Figure A-3: PADI Task Model Variable “Duration”

28 Appendix A

Figure A-4: PADI Activity “Make Explanations and Predictions from a Physical Situation”

Figure A-5: PADI Student Model “Direction of Motion”

Appendix A 29

Figure A-6: PADI Evaluation Procedure “Map Multiple-Choice Answers onto Conceptual Model”

Figure A-7: PADI Evaluation Phase “Model Using Multiple Choice”

30 Appendix A

Figure A-8: PADI Measurement Model “Andersen/Rasch Measurement Model Fragment”

A P P E N D I X B

PADI Template for Force Problems— Horizontal Only

32 Appendix B

Appendix B

The PADI template for “Force Problems—Horizontal Only” has links similar to those of the

template for “Force Problems—Gravity Only,” but has its own Template-level Task Model

Variables. Only the top layer of this template is shown.

Figure B-1: PADI Template “Force Problems—Horizontal Only”

(continued)

Appendix B 33

Figure B-1: PADI Template “Force Problems—Horizontal Only” (continued)

A P P E N D I X C

PADI Template for Force Problems— Horizontal and Gravity

Appendix C 35

Appendix C

The PADI template for “Force Problems—Horizontal and Gravity” has links similar to those

of the FCI templates above, but its own Template-level Task Model Variables. Only the top

layer of this template is shown.

Figure C-1: PADI Template “Force Problems—Horizontal and Gravity”

(continued)

36 Appendix C

Figure C-1: PADI Template “Force Problems—Horizontal and Gravity” (continued)

A P P E N D I X D

PADI Template for Gas Law Problems

38 Appendix D

Appendix D

The PADI template for “Gas Law Problems” reproduced below has its own specific

information about the Student Model, the Measurement Model (e.g., mapping student-

constructed responses to predetermined misconception categories), and Task Model (e.g.,

for constructing both multiple-choice and constructed-response items), as well as

references to the TAP measure and other print resources. The same Activity used to

structure the FCI force tasks (displayed in Appendix A) is reused in this template. Only the

top layer of the template is shown.

Figure D-1: PADI Template “Gas Law Problems”

(continued)

Appendix D 39

Figure D-1: PADI Template “Gas Law Problems” (continued)

A P P E N D I X E

PADI Design Pattern for Model Using

Appendix E 41

Appendix E

The PADI “Model Using” design pattern is reproduced below. It includes a Summary, KSAs

(knowledge, skills, and abilities), Potential Observations, Potential Work Products, and

Characteristic and Variable Features of the design pattern. The KSAs include both Focal

KSAs (e.g., the importance of conditional knowledge) and Additional KSAs (e.g., general

knowledge). Potential Observations and Work Products are very high-level statements that

are operationalized in the templates as Student Model and Task Model Variables (e.g., Work

Products in the Activity). Characteristic and Variable Features are likewise abstract

statements that are operationalized in the templates as Task Model Variables.

Figure E-1: PADI Design Pattern “Model Using”

(continued)

42 Appendix E

Figure E-1: PADI “Model Using” (continued)

SponsorThe National Science Foundation, Grant REC-0129331

Prime GranteeSRI International. Center for Technology in Learning

SubgranteesUniversity of Maryland

University of California, Berkeley. Berkeley Evaluation & Assessment Research (BEAR) Center and The Full Option Science System (FOSS)

University of Michigan. BioKIDs

P A D I

M I C H I G A NM I C H I G A N

Task Templates Based on Misconception Researchpadi.sri.com/downloads/TR6_Misconceptions.pdf6.0...

Documents

Transcript of Task Templates Based on Misconception Researchpadi.sri.com/downloads/TR6_Misconceptions.pdf6.0...