ECOLT 2006 Slide 1 October 13, 2006 Prospectus for the PADI design framework in language testing...
-
date post
21-Dec-2015 -
Category
Documents
-
view
217 -
download
2
Transcript of ECOLT 2006 Slide 1 October 13, 2006 Prospectus for the PADI design framework in language testing...
ECOLT 2006 Slide 1October 13, 2006
Prospectus for the PADI design framework in language testing
ECOLT 2006, October 13, 2006, Washington, D.C.
PADI is supported by the National Science Foundation under grant REC-0129331. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Robert J. MislevyProfessor of Measurement & Statistics
University of Maryland
Geneva D. HaertelAssessment Research Area Director
SRI International
ECOLT 2006 Slide 2October 13, 2006
Some Challenges in Language Testing
Sorting out evidence about interacting aspects of knowledge & proficiency in complex performances
Understanding the impact of “complexity factors” and “difficulty factors” on inference
Scaling up efficiently to high volume tests—task creation, scoring, delivery
Creating valid & cost-effective low volume tests
ECOLT 2006 Slide 3October 13, 2006
Evidence-Centered Design
Evidence-centered assessment design (ECD) provides language, concepts, knowledge representations, data structures, and supporting tools to help design and deliver educational assessments,
all organized around the evidentiary argument an assessment is meant to embody.
ECOLT 2006 Slide 4October 13, 2006
The Assessment Argument
What kinds of claims do we want to make about students?
What behaviors or performances can provide us with evidence for those claims?
What tasks or situations should elicit those behaviors?
Generalizing from Messick (1994)
ECOLT 2006 Slide 5October 13, 2006
Evidence-Centered Design
With Linda Steinberg & Russell Almond at ETS » The Portal project / TOEFL» NetPASS with Cisco (computer network design &
troubleshooting)
Principled Assessment Design for Inquiry (PADI)» Supported by NSF (co-PI: Geneva Haertel, SRI)» Focus on science inquiry—e.g., investigations» Models, tools, examples
ECOLT 2006 Slide 6October 13, 2006
Cognitive design for generating tasks (Embretson)
Model-based assessment (Baker)
Analyses of task characteristics—test and TLU (Bachman & Palmer)
Test specifications (Davidson & Lynch)
Constructing measures (Wilson)
Understanding by design (Wiggins)
Integrated Test Design, Development, and Delivery (Luecht)
Some allied work
From Mislevy & Riconscente, in press
Assessment DeliveryAssessment DeliveryHow do students and tasks actually interact? How do we report examinee performance?
How do students and tasks actually interact? How do we report examinee performance?
Assessment Implementation
Assessment Implementation
Conceptual Assessment Framework
Conceptual Assessment Framework
Domain ModelingDomain Modeling
Domain AnalysisDomain Analysis What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
How do we represent key aspects of the domain in terms of assessment argument.
How do we represent key aspects of the domain in terms of assessment argument.
Design structures: Student, evidence, and task models
Design structures: Student, evidence, and task models
How do we choose and present tasks, and gather and analyze responses?
How do we choose and present tasks, and gather and analyze responses?
Layers in the assessment enterpriseLayers in the assessment enterprise
Key ideas:Explicit relationshipsExplicit structuresGenerativity Re-usabilityRecombinabilityInteroperability
Key ideas:Explicit relationshipsExplicit structuresGenerativity Re-usabilityRecombinabilityInteroperability
From Mislevy & Riconscente, in press
Assessment DeliveryAssessment DeliveryHow do students and tasks actually interact? How do we report examinee performance?
How do students and tasks actually interact? How do we report examinee performance?
Assessment Implementation
Assessment Implementation
Conceptual Assessment Framework
Conceptual Assessment Framework
Domain ModelingDomain Modeling
Domain AnalysisDomain Analysis What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
How do we represent key aspects of the domain in terms of assessment argument.
How do we represent key aspects of the domain in terms of assessment argument.
Design structures: Student, evidence, and task models
Design structures: Student, evidence, and task models
How do we choose and present tasks, and gather and analyze responses?
How do we choose and present tasks, and gather and analyze responses?
Expertise research, task analysis, curriculum, target use, critical incident
analysis, ethnographic studies, etc.
Expertise research, task analysis, curriculum, target use, critical incident
analysis, ethnographic studies, etc.
In language assessment, importance of…•Psycholinguistics•Sociolinguistics•Target language use
In language assessment, importance of…•Psycholinguistics•Sociolinguistics•Target language use
From Mislevy & Riconscente, in press
Assessment DeliveryAssessment DeliveryHow do students and tasks actually interact? How do we report examinee performance?
How do students and tasks actually interact? How do we report examinee performance?
Assessment Implementation
Assessment Implementation
Conceptual Assessment Framework
Conceptual Assessment Framework
Domain ModelingDomain Modeling
Domain AnalysisDomain Analysis What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
How do we represent key aspects of the domain in terms of assessment argument.
How do we represent key aspects of the domain in terms of assessment argument.
Design structures: Student, evidence, and task models
Design structures: Student, evidence, and task models
How do we choose and present tasks, and gather and analyze responses?
How do we choose and present tasks, and gather and analyze responses?
Tangible stuffTangible stuff
e.g., what gets made and how it operates in testing situation
e.g., what gets made and how it operates in testing situation
From Mislevy & Riconscente, in press
Assessment DeliveryAssessment DeliveryHow do students and tasks actually interact? How do we report examinee performance?
How do students and tasks actually interact? How do we report examinee performance?
Assessment Implementation
Assessment Implementation
Conceptual Assessment Framework
Conceptual Assessment Framework
Domain ModelingDomain Modeling
Domain AnalysisDomain Analysis What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
How do we represent key aspects of the domain in terms of assessment argument.
How do we represent key aspects of the domain in terms of assessment argument.
Design structures: Student, evidence, and task models
Design structures: Student, evidence, and task models
How do we choose and present tasks, and gather and analyze responses?
How do we choose and present tasks, and gather and analyze responses?
How do you get from here to here?How do you get from here to here?
From Mislevy & Riconscente, in press
Assessment DeliveryAssessment DeliveryHow do students and tasks actually interact? How do we report examinee performance?
How do students and tasks actually interact? How do we report examinee performance?
Assessment Implementation
Assessment Implementation
Conceptual Assessment Framework
Conceptual Assessment Framework
Domain ModelingDomain Modeling
Domain AnalysisDomain Analysis What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
How do we represent key aspects of the domain in terms of assessment argument.
How do we represent key aspects of the domain in terms of assessment argument.
Design structures: Student, evidence, and task models
Design structures: Student, evidence, and task models
How do we choose and present tasks, and gather and analyze responses?
How do we choose and present tasks, and gather and analyze responses?We will focus today on two
“hidden” layers:We will focus today on two
“hidden” layers:
From Mislevy & Riconscente, in press
Assessment DeliveryAssessment DeliveryHow do students and tasks actually interact? How do we report examinee performance?
How do students and tasks actually interact? How do we report examinee performance?
Assessment Implementation
Assessment Implementation
Conceptual Assessment Framework
Conceptual Assessment Framework
Domain ModelingDomain Modeling
Domain AnalysisDomain Analysis What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
How do we represent key aspects of the domain in terms of assessment argument.
How do we represent key aspects of the domain in terms of assessment argument.
Design structures: Student, evidence, and task models
Design structures: Student, evidence, and task models
How do we choose and present tasks, and gather and analyze responses?
How do we choose and present tasks, and gather and analyze responses?We will focus today on two
“hidden” layers:We will focus today on two
“hidden” layers:
Domain modeling, which concerns the Assessment
Argument
Domain modeling, which concerns the Assessment
Argument
From Mislevy & Riconscente, in press
Assessment DeliveryAssessment DeliveryHow do students and tasks actually interact? How do we report examinee performance?
How do students and tasks actually interact? How do we report examinee performance?
Assessment Implementation
Assessment Implementation
Conceptual Assessment Framework
Conceptual Assessment Framework
Domain ModelingDomain Modeling
Domain AnalysisDomain Analysis What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
How do we represent key aspects of the domain in terms of assessment argument.
How do we represent key aspects of the domain in terms of assessment argument.
Design structures: Student, evidence, and task models
Design structures: Student, evidence, and task models
How do we choose and present tasks, and gather and analyze responses?
How do we choose and present tasks, and gather and analyze responses?
And the Conceptual Assessment Framework, which concerns generative
& re-combinable design schemas
And the Conceptual Assessment Framework, which concerns generative
& re-combinable design schemas
From Mislevy & Riconscente, in press
Assessment DeliveryAssessment DeliveryHow do students and tasks actually interact? How do we report examinee performance?
How do students and tasks actually interact? How do we report examinee performance?
Assessment Implementation
Assessment Implementation
Conceptual Assessment Framework
Conceptual Assessment Framework
Domain ModelingDomain Modeling
Domain AnalysisDomain Analysis What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
How do we represent key aspects of the domain in terms of assessment argument.
How do we represent key aspects of the domain in terms of assessment argument.
Design structures: Student, evidence, and task models
Design structures: Student, evidence, and task models
How do we choose and present tasks, and gather and analyze responses?
How do we choose and present tasks, and gather and analyze responses?
More on theAssessment Argument
More on theAssessment Argument
ECOLT 2006 Slide 15October 13, 2006
PADI Design Patterns
Organized around elements of assessment argument Narrative structures for assessing pervasive kinds of
knowledge / skill / capabilities Based on research & experience , e.g.
» PADI: Design under constraint, inquiry cycles, representations» Compliance w. Grice’s maxims; cause/effect reasoning; giving
spoken directions
Suggest design choices that apply to different contexts, levels, purposes, formats » Capture experience in structured form» Organized in terms of assessment argument
ECOLT 2006 Slide 16October 13, 2006
A Design Pattern Motivated by Grice’s Relation Maxims
Attribute Value(s)
Name Grice’s Relation Maxim—Responding to a Request
Summary In this design pattern, an examinee will demonstrate following Grice’s Relation Maxim in a given language, by producing or selecting a response in a situation that presents a request for information (e.g., conversation).
Central claims In contexts/situations with xxx characteristics, can formulate and respond to representations of implicature from referents .
semantic implication pragmatic implication
Additional knowledge that may be at issue
Substantive knowledge in domain; Familiarity with cultural models; Knowledge of language
ECOLT 2006 Slide 17October 13, 2006
Characteristic features
The stimulus situation needs to present a request for relevant information to the examinee, either explicitly or implicitly.
Variable task features
Production or choice as response?If production, oral or written production required?If oral, single response to a preconfigured situation or part of an evolving conversation?If evolving conversation, open or structured interview? Formality of prepackaged products (multiple choice, video taped conversations, written questions or conversations, one to one or more conversations which are prepared by interviewers)Formality of information and task (concrete or abstract, immediate or remote, information requiring retrieval or transformation, familiar or unfamiliar setting and topic, written or spoken)If prepackaged speech stimulus: length, content, difficulty of language, explicitness of request, degree of cultural dependence.Content of situation (familiar or unfamiliar, degree of difficulty)Time pressure (e.g., time for planning and response)Opportunity for control the conversation
Grice’s Relation Maxims
ECOLT 2006 Slide 18October 13, 2006
Potential performances and work products
Constructed oral response
Constructed written or typed-in response
Answer to a multiple-choice question where alternatives vary
Potential features of performance to evaluate
Whether a student can formulate representations of implicature, as they are required in the given situation.
Whether a student can make a conversational contribution or express the idea towards the accepted direction.
Whether a student provides the relevant information as is required.
Whether quality of choice among alternatives offered for a production in a given situation satisfies the Relation Maxim.
Potential rubrics (later slide)
Examples (in paper)
Grice’s Relation Maxims
ECOLT 2006 Slide 19October 13, 2006
Some Relationships between Design Patterns and Other TD Tools
Conceptual models for proficiency &Task characteristic frameworks » Grist for design choices about KSAs & task
features» DPs present integrated design space
Test specifications» DPs for generating argument, design choices» Test specs for documenting, specifying choices
From Mislevy & Riconscente, in press
Assessment DeliveryAssessment DeliveryHow do students and tasks actually interact? How do we report examinee performance?
How do students and tasks actually interact? How do we report examinee performance?
Assessment Implementation
Assessment Implementation
Conceptual Assessment Framework
Conceptual Assessment Framework
Domain ModelingDomain Modeling
Domain AnalysisDomain Analysis What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
What is important about this domain?What work and situations are central in this domain?What KRs are central to this domain?
How do we represent key aspects of the domain in terms of assessment argument.
How do we represent key aspects of the domain in terms of assessment argument.
Design structures: Student, evidence, and task models
Design structures: Student, evidence, and task models
How do we choose and present tasks, and gather and analyze responses?
How do we choose and present tasks, and gather and analyze responses?
More on the Conceptual Assessment FrameworkMore on the Conceptual Assessment Framework
ECOLT 2006 Slide 21October 13, 2006
Evidence-centered assessment design
The three basic models
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
Technical specs that embody the elements suggested in the
design pattern
Technical specs that embody the elements suggested in the
design pattern
ECOLT 2006 Slide 22October 13, 2006
Evidence-centered assessment design
The three basic models
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
Conceptual RepresentationConceptual Representation
ECOLT 2006 Slide 23October 13, 2006
Screen shot of user interface
User-Interface RepresentationUser-Interface Representation
ECOLT 2006 Slide 24October 13, 2006
High-level UML Representation of the PADI Object Model
UML Representation(sharable data structures,
“behind the screen”)
UML Representation(sharable data structures,
“behind the screen”)
ECOLT 2006 Slide 25October 13, 2006
What complex of knowledge, skills, or other attributes should be assessed?
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
Evidence-centered assessment design
ECOLT 2006 Slide 26October 13, 2006
The NetPass Student Model
Can use same student model with different tasks.
Can use same student model with different tasks.
Multidimensional measurement model with selected aspects of
proficiency
Multidimensional measurement model with selected aspects of
proficiency
ECOLT 2006 Slide 27October 13, 2006
What behaviors or performances should reveal those constructs?
Evidence-centered assessment design
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
ECOLT 2006 Slide 28October 13, 2006
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
What behaviors or performances should reveal those constructs?
Evidence-centered assessment design
From unique student work product to evaluations of observable variables—
i.e., task-level “scoring”
From unique student work product to evaluations of observable variables—
i.e., task-level “scoring”
ECOLT 2006 Slide 29October 13, 2006
4 Responses and explanations are relevant as required for current purposes of the exchange and neither more elaborated than appropriate or insufficient for the context. They fulfill the demands of the task with at most minor lapses in completeness. They are appropriate for the task and exhibit coherent discourse.
3 Responses and explanations address the task appropriately and are relevant as required for current purposes of the exchange, but they may either more elaborated than required or fall short of being fully developed.
2 The responses and explanations are connected to the task, but are either markedly excessive in information supplied or not very relevant to the current purpose of the exchange. Some relevant information might be missing or inaccurately cast.
1 The responses and explanations are either grossly relevant or are very limited in content or coherence. In either case they may be only minimally connected to the task.
0 Speaker makes no attempt to respond or response is unrelated to the topic. A writing response at this level merely copies sentences from the topic, rejects the topic or is otherwise not connected to the topic. A spoken response is not connected to the direct or implied request for information.
Skeletal Rubric for Satisfaction of Quality Maxims
ECOLT 2006 Slide 30October 13, 2006
Re-usable (tailorable) to different tasks & projects
Can be multiple aspects of performance being rated.
May be 1-1 relationship with Student model Variables, but need not be.
That is, there can be multiple aspects of proficiency that are involved in probability of high / satisfactory/ certain style of response
Notes re Observable Variables
ECOLT 2006 Slide 31October 13, 2006
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
What behaviors or performances should reveal those constructs?
Evidence-centered assessment design
Values of observable variables used to update probability distributions for
student-model variables via psychometric model—i.e., test-level scoring.
Values of observable variables used to update probability distributions for
student-model variables via psychometric model—i.e., test-level scoring.
ECOLT 2006 Slide 32October 13, 2006
An NetPass Evidence-Model Fragment for Design
Re-usable conditional-probability fragments and variable names for different tasks with
the same evidentiary structure.
Re-usable conditional-probability fragments and variable names for different tasks with
the same evidentiary structure.
Measurement models indicate which SMVs, in which combinations, affect which
observables. Task features influence which ones and how much, in structured
measurement models.
Measurement models indicate which SMVs, in which combinations, affect which
observables. Task features influence which ones and how much, in structured
measurement models.
ECOLT 2006 Slide 33October 13, 2006
What tasks or situations should elicit those behaviors?
Evidence-centered assessment design
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
ECOLT 2006 Slide 34October 13, 2006
Representations to the student, and sources of variation
ECOLT 2006 Slide 35October 13, 2006
Task Specification Template -Determining Key Features (Wizards)
Setting CorporationConference
CenterUniversity
Building Length Less than 100mMore than 100m
Ethernet Standard 10BaseT100BaseT
Subgroup Name TeacherStudentCustomer
Bandwidth for a Subgroup Drop 10Mbps100Mbps
Growth Requirements GivenNA
ECOLT 2006 Slide 36October 13, 2006
Structured Measurement Models Examples of models
»Multivariate Random Coefficients Multinomial Logit Model (MRCMLM; Adams, Wilson, & Wang, 1997)»Bayes nets (Mislevy, 1996)»General Diagnostic Model (von Davier & Yamamoto)
By relating task characteristics to difficulty with respect to different aspects of proficiency, create tasks with known properties.
Can create families of tasks around same evidentiary frameworks; e.g., For “read & write” tasks, can vary characteristics of texts, directives, audience, purpose.
ECOLT 2006 Slide 37October 13, 2006
Structured Measurement Models Articulated connection between task
characteristics and models of proficiency Moves beyond “modeling difficulty”
»Traditional test theory a bottleneck in multivariate environment
Dealing with “complexity factors” and “difficulty factors” (Robinson)
»Model complexity factors as covariates for difficulty parameters wrt those aspects of proficiency they impact»Model difficulty factors as either SMVs, if target of inference, or as noise, if nuisance.
ECOLT 2006 Slide 38October 13, 2006
Advantages: A framework that…
Guides task and test construction (Wizards) Provides high efficiency and scalability By relating task characteristics to difficulty,
allows creating tasks with targeted properties
Promotes re-use of conceptual structures (DPs, arguments) in different projects
Promotes re-use of machinery in different projects
ECOLT 2006 Slide 39October 13, 2006
Evidence of effectiveness
Cisco»Certification & training assessment»Simulation-based assessment tasks
IMS/QTI»Conceptual model for standards for data structures for computer-based testing
ETS»TOEFL»NBPTS
ECOLT 2006 Slide 40October 13, 2006
Conclusion
Isn’t this just a bunch of new words for describing what we already do?
ECOLT 2006 Slide 41October 13, 2006
An answer (Part 1)
No.
ECOLT 2006 Slide 42October 13, 2006
An answer (Part 2)
An explicit, general framework makes similarities and implicit principles explicit:» To better understand current assessments…» To design for new kinds of assessment…
– Tasks that tap multiple aspects of proficiency– Technology-based tasks (e.g., simulations)– Complex observations, student models, evaluation
» To foster re-use, sharing, & modularity– Concepts & arguments– Pieces of machinery & processes (QTI)
ECOLT 2006 Slide 43October 13, 2006
For more information…
www.education.umd.edu/EDMS/mislevy/
Has links to PADI, Cisco, articles, etc.
(e.g., CRESST report on Task-Based Language Assessment.)