Dubrowski assessment and evaluation
-
Upload
adam-dubrowski -
Category
Education
-
view
1.653 -
download
1
description
Transcript of Dubrowski assessment and evaluation
MODELS FOR ASSESSING LEARNING AND EVALUATING LEARNING OUTCOMES
Adam Dubrowski PhD
www.wordle.net
AcknowledgementDr. Kathryn Parker, Learning Institute
Introduction
Introduction
Is Eli OK?
Introduction
Environment
Intake
Interaction
Reaction
Function
Eli
Content
• Assessment
• Evaluation: Models and Frameworks
• Working Example: Evaluation of a Hand Hygiene Program
• Assessment Instruments
• Standards and Rigor: Assessment Instruments
• Working Example: OSATS
• Standards and Rigor: Evaluations
• Summary and Take-home Message
Introduction
Assessments
Summative
Formative
Evaluations Assessments
Introduction
Introduction
Evaluations
Outcome
Process
Introduction
Evaluations
Outcome
Process
Alkin, M.C. & Christie, C.A. (2004). The evaluation theory tree. In Alkin, M.C. (ed.), Evaluation Roots: Tracing Theorists’ Views and Influences. London: Sage Publications.
Characteristic USE METHODS(Research)
VALUE
Purpose of evaluation To generate information that will be useful and inform decision making.
To generalize findings from one program to another similar program.
To render judgment on the merit or worth of the program.
Role of evaluator Meet the needs of the stakeholders; negotiator
Researcher Judge
Role of program staff/stakeholders
Close collaborator; drives the evaluation
Inform or approve the research question(s)
Provider (source) of information
Questions How will this evaluation data be used? By whom? For what purpose?
Can these findings be generalized to other programs under similar conditions?How do these findings contribute to what we know about how people learn and change?
Does the program meet the predetermined criteria?Was the program developed and implemented in the best interest of the public?
Voices Daniel Stufflebeam (1983)Michael Patton (2007)
Ralph Tyler (1942)Peter Rossi (2006)
Michael Stake (1975)Guba & Lincoln (1989)
Introduction
Introduction
Reflections
Where would you sit on the Evaluation Tree?What about others in your “system”?
Alkin, M.C. & Christie, C.A. (2004). The evaluation theory tree. In Alkin, M.C. (ed.), Evaluation Roots: Tracing Theorists’ Views and Influences. London: Sage Publications.
Characteristic USE METHODS(Research)
VALUE
Purpose of evaluation To generate information that will be useful and inform decision making.
To generalize findings from one program to another similar program.
To render judgment on the merit or worth of the program.
Role of evaluator Meet the needs of the stakeholders; negotiator
Researcher Judge
Role of program staff/stakeholders
Close collaborator; drives the evaluation
Inform or approve the research question(s)
Provider (source) of information
Questions How will this evaluation data be used? By whom? For what purpose?
Can these findings be generalized to other programs under similar conditions?How do these findings contribute to what we know about how people learn and change?
Does the program meet the predetermined criteria?Was the program developed and implemented in the best interest of the public?
Voices Daniel Stufflebeam (1983)Michael Patton (2007)
Ralph Tyler (1942)Peter Rossi (2006)
Michael Stake (1975)Guba & Lincoln (1989)
Introduction
IntroductionCharacteristic Adam
ResearchKath
USE (Leaning towards Research)
Purpose of evaluation To generalize findings from one program to another similar program.
To generate information that will be useful for decision making. To gain a better understanding of how programs work.
Role of evaluator Researcher Collaborator
Role of program staff/stakeholders
Inform or approve the research question(s)
Close collaborator; drives the evaluation
Questions Can these findings be generalized to other programs under similar conditions?How do these findings contribute to what we know about how people learn and change?
How will this evaluation data be used? By whom? For what purpose? How do these findings contribute to what we know about how people learn and change?
Voices Ralph Tyler (1942)Peter Rossi (2006)
Patton (2007); Chen (2002); Donaldson (2001)
Models and Frameworks
• CIPP Evaluation Model (Stufflebeam, 1983)
• Kirkpatrick’s Model of Evaluating Training Programs (Kirkpatrick, 1998)
• Moore’s Evaluation Framework (Moore et al., 2009)
• Miller’s Framework for Clinical Assessments (Miller, 1990)
CIPP
• CIPP: Context, Input, Process, Product
Stufflebeam, 1983; Steinert, 2002
CIPP
Context
• What is the relation of the course to other courses?• Is the time adequate?• Should courses be integrated or separate?
CIPP
Inputs
• What are the entering ability, learning skills and motivation of students?
• What is the students’ existing knowledge?• Are the objectives suitable?• Does the content match student abilities?• What is the theory/practice balance?• What resources/equipment are available?• How strong are the teaching skills of teachers?• How many students/teachers are there?
CIPP
Process
• What is the workload of students?• Are there any problems related to
teaching/learning?• Is there effective 2-way communication?• Is knowledge only transferred to students, or do they
use and apply it?• Is the teaching and learning process continuously
evaluated?• Is teaching and learning affected by
practical/institutional problems?
CIPP
Product
• Is there one final exam at the end or several during the course?
• Is there any informal assessment?• What is the quality of assessment?• What are the students’ knowledge levels after the
course?• How was the overall experience for the teachers and
for the students?
CIPP
Methods used to evaluate the curriculum
• Discussions• Informal conversation or observation• Individual student interviews• Evaluation forms• Observation in class by colleagues• Performance test• Questionnaire• Self-assessment• Written test
CIPP
CIPP focuses on the process and informs the program/curriculum for future improvements.
Limited emphasis on the outcome.
Kirkpatrick’s Model of Evaluating Training Programs
Results
Behavior
Learning
Reaction
Results
Behavior
Learning
Reaction
Kirkpatrick’s Model of Evaluating Training Programs
• Reaction: The degree to which the expectations of the participants about the setting and delivery of the learning activity were met
• Questionnaires completed by attendees after the activity
Kirkpatrick’s Model of Evaluating Training Programs
Results
Behavior
Learning
Reaction
• Learning: The degree to which participants recall and demonstrate in an educational setting what the learning activity intended them to be able to do
• Tests and observation in educational setting
Kirkpatrick’s Model of Evaluating Training Programs
Results
Behavior
Learning
Reaction
• Behavior: The degree to which participants do what the educational activity intended them to be able to do in their practices
• Observation of performance in patient care setting; patient charts; administrative databases
Kirkpatrick’s Model of Evaluating Training Programs
Results
Behavior
Learning
Reaction
• Results: The degree to which the health status of patients as well as the community of patients changes due to changes in the practice
• Health status measures recorded in patient charts or administrative databases, epidemiological data and reports
Kirkpatrick’s Model of Evaluating Training Programs
• The importance of Kirkpatrick’s Model is in its ability to identify a range of dimensions that needs to be evaluated in order to inform us about the educational quality of a specific program.
• It focuses on the outcomes.
• However, it provides limited information about the individual learner.
Moore’s Framework
Does
Shows How
Knows How
Knows
Results
Behavior
Learning
Reaction
Miller’s Framework for Clinical Assessments
Does(Performance)
Shows How (Competence)
Knows How (Procedural Knowledge)
Knows(Declarative Knowledge)
Does(Performance)
Shows How (Competence)
Knows How (Procedural Knowledge)
Knows(Declarative Knowledge)
Miller’s Framework for Clinical Assessments
• Knows: Declarative knowledge. The degree to which participants state what the learning activity intended them to know
• Pre- and posttests of knowledge.
Miller’s Framework for Clinical Assessments
Does(Performance)
Shows How (Competence)
Knows How (Procedural Knowledge)
Knows(Declarative Knowledge)
• Knows how: Procedural knowledge. The degree to which participants state how to do what the learning activity intended them to know how to do
• Pre- and posttests of knowledge
Miller’s Framework for Clinical Assessments
Does(Performance)
Shows How (Competence)
Knows How (Procedural Knowledge)
Knows(Declarative Knowledge)
• Shows how: The degree to which participants show in an educational setting how to do what the learning activity intended them to be able to do
• Observation in educational setting
Miller’s Framework for Clinical Assessments
Does(Performance)
Shows How (Competence)
Knows How (Procedural Knowledge)
Knows(Declarative Knowledge)
• Does: Performance. The degree to which participants do what the learning activity intended them to be able to do in their practices.
• Observation of performance in patient care setting; patient charts; administrative databases
Miller’s Framework for Clinical Assessments
• The importance of Miller’s Framework is in its ability to identify learning objectives and link them with appropriate testing contexts (where) and instruments (how).
Assumption
• Moore’s and Kirkpatrick’s models assume a relationship between the different levels.
• If learners are satisfied, they will learn more, will be able to demonstrate the new skills, transfer them to the clinical setting, and consequently the health of patients and communities will improve!
Models assumptions are not met!
Building Skills, Changing Practice: Simulator Training for Hand Hygiene
Protocols
Canadian Institutes of Health ResearchPartnerships for Health System Improvement
A. McGeer, MA. Beduz, A. Dubrowski
Purpose
• Current models of knowledge delivery about proper hand hygiene rely on didactic session
• However, transfer to practice is low
Purpose
Didactic courses
Clinical practice
Simulation
Does
Shows How
Knows How
Knows
Purpose
• The primary purpose was to investigate the effectiveness of simulation-based training of hand hygiene on transfer of knowledge to clinical practice
Project
Recruitment
Clinical monitoring (audits)
Clinical monitoring (audits)
Simulation
Δ in Behavior
Δ in Learning
Reaction
Results: Reactions
• Simulation Design Scale• Educational Practices Questionnaire
– 159 respondents - average response: – ‘Agree’ or ‘Strongly Agree’ (4 or 5 on a
5-pt scale) to each of 40 items– Rated 39 of these items as ‘Important’
(4 on a 5-pt scale)
Results: Learning
Bef-Pat/Env Bef-Asp Aft-Bfl Aft-Pat/Env0
10
20
30
40
50
60
70
80
90
100 98
24
51
9292
4448
93
PreTest (n=558)Post Test (n=550)
Moment 1 Moment 2 Moment 3 Moment 4
Results: Behaviour
Pre-test Post-testHand Hygiene Compliance(average of all opportunities; all 4 moments)
62% (n=558
opportunities)
64% (n=550
opportunities)
Average Adequacy(Excellent/ Satisfactory/ Unsatisfactory)
Satisfactory (n=252
observations)
Satisfactory (n=282
observations)Average Duration 13 seconds
(n=254 observations)
13 seconds(n=252
observations)# Extra Washes 140 90
Conclusions
• No transfer of knowledge
• No clear relationship between the assessments of reactions, learning and behaviour.
Summary
Summary
• Outcome based models of program evaluation may be providing limited use (i.e. they are mostly on the research branch).
• Need new, more complex models that incorporate both processes and outcomes (i.e. span across more branches).
Summary
• Assessment instruments, which feed these models are critical for successful evaluations.
• We need to invest efforts in standardization and rigorous development of these instruments.
Evaluations Assessments
Assessment Instruments
Assessment Instruments
Standards and Rigor
Scientific Advisory Committee of the Medical Outcomes Trust
The SAC defined a set of eight key attributes of instruments that apply to measuring three properties:
1. Distinguish between two or more groups,
2. assess change over time,
3. predict future status.
Scientific Advisory Committee of the Medical Outcomes Trust
The SAC defined a set of eight key attributes of instruments:
– The Conceptual and Measurement Model
– Reliability– Validity– Responsiveness or sensitivity to
change– Interpretability– Burden– Alternative Forms of Administration– Cultural And Language Adaptations
The concept to be measured needs to be defined properly and should match its
intended use.
Scientific Advisory Committee of the Medical Outcomes Trust
The SAC defined a set of eight key attributes of instruments:
– The Conceptual and Measurement Model
– Reliability– Validity– Responsiveness or sensitivity to
change– Interpretability– Burden– Alternative Forms of Administration– Cultural And Language Adaptations
Reliability is the degree to which the instrument is free of random error, which means free from errors in measurement caused by chance factors that influence
measurement.
Scientific Advisory Committee of the Medical Outcomes Trust
The SAC defined a set of eight key attributes of instruments:
– The Conceptual and Measurement Model
– Reliability– Validity– Responsiveness or sensitivity to
change– Interpretability– Burden– Alternative Forms of Administration– Cultural And Language Adaptations
Validity is the degree to which the instrument measures what it purports to
measure.
Scientific Advisory Committee of the Medical Outcomes Trust
The SAC defined a set of eight key attributes of instruments:
– The Conceptual and Measurement Model
– Reliability– Validity– Responsiveness or sensitivity to
change– Interpretability– Burden– Alternative Forms of Administration– Cultural And Language Adaptations
Responsiveness is an instrument’s ability to detect change over time.
Scientific Advisory Committee of the Medical Outcomes Trust
The SAC defined a set of eight key attributes of instruments:
– The Conceptual and Measurement Model
– Reliability– Validity– Responsiveness or sensitivity to
change– Interpretability– Burden– Alternative Forms of Administration– Cultural And Language Adaptations
Interpretability is the degree to which one can assign easily understood meaning to an
instrument's score.
Scientific Advisory Committee of the Medical Outcomes Trust
The SAC defined a set of eight key attributes of instruments:
– The Conceptual and Measurement Model
– Reliability– Validity– Responsiveness or sensitivity to
change– Interpretability– Burden– Alternative Forms of Administration– Cultural And Language Adaptations
Burden refers to the time, effort and other demands placed on those to whom the instrument is administered (respondent burden) or on those who administer the
instrument (administrative burden)
Scientific Advisory Committee of the Medical Outcomes Trust
The SAC defined a set of eight key attributes of instruments:
– The Conceptual and Measurement Model
– Reliability– Validity– Responsiveness or sensitivity to
change– Interpretability– Burden– Alternative Forms of Administration– Cultural And Language Adaptations
Alternative means of administration include self report, interviewer-administered,
computer assisted, etc. Often it is important to know whether these modes of
administration are comparable
Scientific Advisory Committee of the Medical Outcomes Trust
The SAC defined a set of eight key attributes of instruments:
– The Conceptual and Measurement Model
– Reliability– Validity– Responsiveness or sensitivity to
change– Interpretability– Burden– Alternative Forms of Administration– Cultural And Language Adaptations
Cultural and Language adaptations or translations.
Scientific Advisory Committee of the Medical Outcomes Trust
The SAC defined a set of eight key attributes of instruments:
– The Conceptual and Measurement Model
– Reliability– Validity– Responsiveness or sensitivity to
change– Interpretability– Burden– Alternative Forms of Administration– Cultural And Language Adaptations
Scientific Advisory Committee of the Medical Outcomes Trust
Do our assessment instruments fit these criteria?
Working Example: OSATS
• Objective Assessments of Technical Skills
• Bell-ringer exam (12 minutes per station)
• Minimum of 6-8 stations
• Direct or video assessments of technical performance
Working Example: OSATS
The Conceptual and Measurement Model
Reznick R, Regehr G, MacRae H, Martin J, McCulloch W. Testing technical skill via an innovative "bench station" examination. Am J Surg. 1997 Mar;173(3):226-30.
Martin JA, Regehr G, Reznick R, MacRae H, Murnaghan J, Hutchison C, Brown M. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg. 1997 Feb;84(2):273-8.
Working Example: OSATS
Reliability and Validity
Faulkner H, Regehr G, Martin J, Reznick R. Validation of an objective structured assessment of technical skill for surgical residents. Acad Med. 1996 Dec;71(12):1363-5.
Kishore TA, Pedro RN, Monga M, Sweet RM. Assessment of validity of an OSATS for cystoscopic and ureteroscopic cognitive and psychomotor skills. J Endourol. 2008 Dec;22(12):2707-11.
Goff B, Mandel L, Lentz G, Vanblaricom A, Oelschlager AM, Lee D, Galakatos A, Davies M, Nielsen P. Assessment of resident surgical skills: is testing feasible? Am J Obstet Gynecol. 2005 Apr;192(4):1331-8; discussion 1338-40. VAL Martin JA, Reznick RK, Rothman A, Tamblyn RM, Regehr G. Who should rate candidates in an objective structured clinical examination? Acad Med. 1996 Feb;71(2):170-5.
Working Example: OSATS
Alternative Forms of Administration
Maker VK, Bonne S. Novel hybrid objective structured assessment of technical skills/objective structured clinical examinations in omprehensive perioperative breast care: a three-year analysis of outcomes. J Sualrg Educ. 2009 Nov-Dec;66(6):344-51.
Lin SY, Laeeq K, Ishii M, Kim J, Lane AP, Reh D, Bhatti NI. Development andpilot-testing of a feasible, reliable, and valid operative competency assessment tool for endoscopic sinus surgery. Am J Rhinol Allergy. 2009 May-Jun;23(3):354-9.
Goff BA, VanBlaricom A, Mandel L, Chinn M, Nielsen P. Comparison of objective, structured assessment of technical skills with a virtual reality hysteroscopy trainer and standard latex hysteroscopy model. J Reprod Med. 2007 May;52(5):407-12.
Datta V, Bann S, Mandalia M, Darzi A. The surgical efficiency score: a feasible, reliable, and valid method of skills assessment. Am J Surg. 2006, Sep;192(3):372-8.
Working Example: OSATS
Adaptations
Setna Z, Jha V, Boursicot KA, Roberts TE. Evaluating the utility of workplace-based assessment tools for specialty training. Best Pract Res Clin Obstet Gynaecol. 2010 Jun 30.
Dorman K, Satterthwaite L, Howard A, Woodrow S, Derbew M, Reznick R, Dubrowski A. Addressing the severe shortage of health care providers in Ethiopia: bench model teaching of technical skills. Med Educ. 2009 Jul;43(7):621-7.
The SAC defined a set of eight key attributes of instruments:
The Conceptual and Measurement Model
Reliability Validity
Responsiveness or sensitivity to changeInterpretabilityBurden
Alternative Forms of Administration Cultural And Language Adaptations
Working Example: OSATS
Working Example: OSATS
Applications to program evaluation
Chipman JG, Schmitz CC. Using objective structured assessment of technical skills to evaluate a basic skills simulation curriculum for first-year surgical residents. J Am Coll Surg. 2009 Sep;209(3):364-370.
Siddighi S, Kleeman SD, Baggish MS, Rooney CM, Pauls RN, Karram MM. Effects of an educational workshop on performance of fourth-degree perineal laceration repair. Obstet Gynecol. 2007 Feb;109(2 Pt 1):289-94.
VanBlaricom AL, Goff BA, Chinn M, Icasiano MM, Nielsen P, Mandel L. A new curriculum for hysteroscopy training as demonstrated by an objective structured assessment of technical skills (OSATS). Am J Obstet Gynecol. 2005 Nov;193(5):1856-65.
Anastakis DJ, Wanzel KR, Brown MH, McIlroy JH, Hamstra SJ, Ali J, Hutchison CR, Murnaghan J, Reznick RK, Regehr G. Evaluating the effectiveness of a 2-year curriculum in a surgical skills center. Am J Surg. 2003 Apr;185(4):378-85.
Evaluations Assessments
Standards and Rigor
The Joint Committee on Standards for Educational Evaluation (1990)
Utility Standards • The utility standards are intended to ensure that an
evaluation will serve the information needs of intended users.
The Joint Committee on Standards for Educational Evaluation (1990)
Utility Standards Feasibility Standards • The feasibility standards are intended to ensure
that an evaluation will be realistic, prudent, diplomatic, and frugal.
The Joint Committee on Standards for Educational Evaluation (1990)
Utility Standards Feasibility Standards Propriety Standards • The propriety standards are intended to ensure that
an evaluation will be conducted legally, ethically, and with due regard for the welfare of those involved in the evaluation, as well as those affected by its results.
The Joint Committee on Standards for Educational Evaluation (1990)
Utility Standards Feasibility Standards Propriety Standards Accuracy Standards • The accuracy standards are intended to ensure that
an evaluation will reveal and convey technically adequate information about the features that determine worth or merit of the program being evaluated.
Conclusions
Conclusions
Evaluations
Outcome
Process
Evaluations Assessments
Conclusions
Conclusions
Three philosophies/paradigms:• Use• Methods• Value
Conclusions
Where would you sit on the Evaluation Tree?What about others in your “system”?
Take-home
• Need more complex evaluation models that look at both processes and outcomes
• Choices will depend on the intended use (i.e. the branch on the Evaluation Tree)
• Both the evaluation models and the assessments instruments need standardization and rigor.
Beast
Environment
Intake
Interaction
Reaction
Function
Fluffy