Dubrowski assessment and evaluation

MODELS FOR ASSESSING LEARNING AND EVALUATING LEARNING OUTCOMES

Adam Dubrowski PhD

www.wordle.net

AcknowledgementDr. Kathryn Parker, Learning Institute

Introduction

Introduction

Is Eli OK?

Introduction

Environment

Intake

Interaction

Reaction

Function

Eli

Content

• Assessment

• Evaluation: Models and Frameworks

• Working Example: Evaluation of a Hand Hygiene Program

• Assessment Instruments

• Standards and Rigor: Assessment Instruments

• Working Example: OSATS

• Standards and Rigor: Evaluations

• Summary and Take-home Message

Introduction

Assessments

Summative

Formative

Evaluations Assessments

Introduction

Introduction

Evaluations

Outcome

Process

Alkin, M.C. & Christie, C.A. (2004). The evaluation theory tree. In Alkin, M.C. (ed.), Evaluation Roots: Tracing Theorists’ Views and Influences. London: Sage Publications.

Characteristic USE METHODS(Research)

VALUE

Purpose of evaluation To generate information that will be useful and inform decision making.

To generalize findings from one program to another similar program.

To render judgment on the merit or worth of the program.

Role of evaluator Meet the needs of the stakeholders; negotiator

Researcher Judge

Role of program staff/stakeholders

Close collaborator; drives the evaluation

Inform or approve the research question(s)

Provider (source) of information

Questions How will this evaluation data be used? By whom? For what purpose?

Can these findings be generalized to other programs under similar conditions?How do these findings contribute to what we know about how people learn and change?

Does the program meet the predetermined criteria?Was the program developed and implemented in the best interest of the public?

Voices Daniel Stufflebeam (1983)Michael Patton (2007)

Ralph Tyler (1942)Peter Rossi (2006)

Michael Stake (1975)Guba & Lincoln (1989)

Introduction

Introduction

Reflections

Where would you sit on the Evaluation Tree?What about others in your “system”?

Alkin, M.C. & Christie, C.A. (2004). The evaluation theory tree. In Alkin, M.C. (ed.), Evaluation Roots: Tracing Theorists’ Views and Influences. London: Sage Publications.

Characteristic USE METHODS(Research)

VALUE

Purpose of evaluation To generate information that will be useful and inform decision making.

To generalize findings from one program to another similar program.

To render judgment on the merit or worth of the program.

Role of evaluator Meet the needs of the stakeholders; negotiator

Researcher Judge




Provider (source) of information

Questions How will this evaluation data be used? By whom? For what purpose?

Can these findings be generalized to other programs under similar conditions?How do these findings contribute to what we know about how people learn and change?

Does the program meet the predetermined criteria?Was the program developed and implemented in the best interest of the public?

Voices Daniel Stufflebeam (1983)Michael Patton (2007)

Ralph Tyler (1942)Peter Rossi (2006)

Michael Stake (1975)Guba & Lincoln (1989)

Introduction

IntroductionCharacteristic Adam

ResearchKath

USE (Leaning towards Research)

Purpose of evaluation To generalize findings from one program to another similar program.

To generate information that will be useful for decision making. To gain a better understanding of how programs work.

Role of evaluator Researcher Collaborator




Questions Can these findings be generalized to other programs under similar conditions?How do these findings contribute to what we know about how people learn and change?

How will this evaluation data be used? By whom? For what purpose? How do these findings contribute to what we know about how people learn and change?

Voices Ralph Tyler (1942)Peter Rossi (2006)

Patton (2007); Chen (2002); Donaldson (2001)

Models and Frameworks

• CIPP Evaluation Model (Stufflebeam, 1983)

• Kirkpatrick’s Model of Evaluating Training Programs (Kirkpatrick, 1998)

• Moore’s Evaluation Framework (Moore et al., 2009)

• Miller’s Framework for Clinical Assessments (Miller, 1990)

CIPP

• CIPP: Context, Input, Process, Product

Stufflebeam, 1983; Steinert, 2002

CIPP

Context

• What is the relation of the course to other courses?• Is the time adequate?• Should courses be integrated or separate?

CIPP

Inputs

• What are the entering ability, learning skills and motivation of students?

• What is the students’ existing knowledge?• Are the objectives suitable?• Does the content match student abilities?• What is the theory/practice balance?• What resources/equipment are available?• How strong are the teaching skills of teachers?• How many students/teachers are there?

CIPP

Process

• What is the workload of students?• Are there any problems related to

teaching/learning?• Is there effective 2-way communication?• Is knowledge only transferred to students, or do they

use and apply it?• Is the teaching and learning process continuously

evaluated?• Is teaching and learning affected by

practical/institutional problems?

CIPP

Product

• Is there one final exam at the end or several during the course?

• Is there any informal assessment?• What is the quality of assessment?• What are the students’ knowledge levels after the

course?• How was the overall experience for the teachers and

for the students?

CIPP

Methods used to evaluate the curriculum

• Discussions• Informal conversation or observation• Individual student interviews• Evaluation forms• Observation in class by colleagues• Performance test• Questionnaire• Self-assessment• Written test

CIPP

CIPP focuses on the process and informs the program/curriculum for future improvements.

Limited emphasis on the outcome.

Kirkpatrick’s Model of Evaluating Training Programs

Results

Behavior

Learning

Reaction

Results

Behavior

Learning

Reaction


• Reaction: The degree to which the expectations of the participants about the setting and delivery of the learning activity were met

• Questionnaires completed by attendees after the activity


Results

Behavior

Learning

Reaction

• Learning: The degree to which participants recall and demonstrate in an educational setting what the learning activity intended them to be able to do

• Tests and observation in educational setting


Results

Behavior

Learning

Reaction

• Behavior: The degree to which participants do what the educational activity intended them to be able to do in their practices

• Observation of performance in patient care setting; patient charts; administrative databases


Results

Behavior

Learning

Reaction

• Results: The degree to which the health status of patients as well as the community of patients changes due to changes in the practice

• Health status measures recorded in patient charts or administrative databases, epidemiological data and reports


• The importance of Kirkpatrick’s Model is in its ability to identify a range of dimensions that needs to be evaluated in order to inform us about the educational quality of a specific program.

• It focuses on the outcomes.

• However, it provides limited information about the individual learner.

Moore’s Framework

Does

Shows How

Knows How

Knows

Results

Behavior

Learning

Reaction

Miller’s Framework for Clinical Assessments

Does(Performance)

Shows How (Competence)

Knows How (Procedural Knowledge)

Knows(Declarative Knowledge)

Does(Performance)





• Knows: Declarative knowledge. The degree to which participants state what the learning activity intended them to know

• Pre- and posttests of knowledge.


Does(Performance)




• Knows how: Procedural knowledge. The degree to which participants state how to do what the learning activity intended them to know how to do

• Pre- and posttests of knowledge


Does(Performance)




• Shows how: The degree to which participants show in an educational setting how to do what the learning activity intended them to be able to do

• Observation in educational setting


Does(Performance)




• Does: Performance. The degree to which participants do what the learning activity intended them to be able to do in their practices.

• Observation of performance in patient care setting; patient charts; administrative databases


• The importance of Miller’s Framework is in its ability to identify learning objectives and link them with appropriate testing contexts (where) and instruments (how).

Assumption

• Moore’s and Kirkpatrick’s models assume a relationship between the different levels.

• If learners are satisfied, they will learn more, will be able to demonstrate the new skills, transfer them to the clinical setting, and consequently the health of patients and communities will improve!

Models assumptions are not met!

Building Skills, Changing Practice: Simulator Training for Hand Hygiene

Protocols

Canadian Institutes of Health ResearchPartnerships for Health System Improvement

A. McGeer, MA. Beduz, A. Dubrowski

Purpose

• Current models of knowledge delivery about proper hand hygiene rely on didactic session

• However, transfer to practice is low

Purpose

Didactic courses

Clinical practice

Simulation

Does

Shows How

Knows How

Knows

Purpose

• The primary purpose was to investigate the effectiveness of simulation-based training of hand hygiene on transfer of knowledge to clinical practice

Project

Recruitment

Clinical monitoring (audits)

Clinical monitoring (audits)

Simulation

Δ in Behavior

Δ in Learning

Reaction

Results: Reactions

• Simulation Design Scale• Educational Practices Questionnaire

– 159 respondents - average response: – ‘Agree’ or ‘Strongly Agree’ (4 or 5 on a

5-pt scale) to each of 40 items– Rated 39 of these items as ‘Important’

(4 on a 5-pt scale)

Results: Learning

Bef-Pat/Env Bef-Asp Aft-Bfl Aft-Pat/Env0

10

20

30

40

50

60

70

80

90

100 98

24

51

9292

4448

93

PreTest (n=558)Post Test (n=550)

Moment 1 Moment 2 Moment 3 Moment 4

Results: Behaviour

Pre-test Post-testHand Hygiene Compliance(average of all opportunities; all 4 moments)

62% (n=558

opportunities)

64% (n=550

opportunities)

Average Adequacy(Excellent/ Satisfactory/ Unsatisfactory)

Satisfactory (n=252

observations)

Satisfactory (n=282

observations)Average Duration 13 seconds

(n=254 observations)

13 seconds(n=252

observations)# Extra Washes 140 90

Conclusions

• No transfer of knowledge

• No clear relationship between the assessments of reactions, learning and behaviour.

Summary

Summary

• Outcome based models of program evaluation may be providing limited use (i.e. they are mostly on the research branch).

• Need new, more complex models that incorporate both processes and outcomes (i.e. span across more branches).

Summary

• Assessment instruments, which feed these models are critical for successful evaluations.

• We need to invest efforts in standardization and rigorous development of these instruments.


Assessment Instruments

Standards and Rigor

Scientific Advisory Committee of the Medical Outcomes Trust

The SAC defined a set of eight key attributes of instruments that apply to measuring three properties:

1. Distinguish between two or more groups,

2. assess change over time,

3. predict future status.


The SAC defined a set of eight key attributes of instruments:

– The Conceptual and Measurement Model

– Reliability– Validity– Responsiveness or sensitivity to

change– Interpretability– Burden– Alternative Forms of Administration– Cultural And Language Adaptations

The concept to be measured needs to be defined properly and should match its

intended use.






Reliability is the degree to which the instrument is free of random error, which means free from errors in measurement caused by chance factors that influence

measurement.






Validity is the degree to which the instrument measures what it purports to

measure.






Responsiveness is an instrument’s ability to detect change over time.






Interpretability is the degree to which one can assign easily understood meaning to an

instrument's score.






Burden refers to the time, effort and other demands placed on those to whom the instrument is administered (respondent burden) or on those who administer the

instrument (administrative burden)






Alternative means of administration include self report, interviewer-administered,

computer assisted, etc. Often it is important to know whether these modes of

administration are comparable






Cultural and Language adaptations or translations.


Do our assessment instruments fit these criteria?

Working Example: OSATS

• Objective Assessments of Technical Skills

• Bell-ringer exam (12 minutes per station)

• Minimum of 6-8 stations

• Direct or video assessments of technical performance


The Conceptual and Measurement Model

Reznick R, Regehr G, MacRae H, Martin J, McCulloch W. Testing technical skill via an innovative "bench station" examination. Am J Surg. 1997 Mar;173(3):226-30.

Martin JA, Regehr G, Reznick R, MacRae H, Murnaghan J, Hutchison C, Brown M. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg. 1997 Feb;84(2):273-8.


Reliability and Validity

Faulkner H, Regehr G, Martin J, Reznick R. Validation of an objective structured assessment of technical skill for surgical residents. Acad Med. 1996 Dec;71(12):1363-5.

Kishore TA, Pedro RN, Monga M, Sweet RM. Assessment of validity of an OSATS for cystoscopic and ureteroscopic cognitive and psychomotor skills. J Endourol. 2008 Dec;22(12):2707-11.

Goff B, Mandel L, Lentz G, Vanblaricom A, Oelschlager AM, Lee D, Galakatos A, Davies M, Nielsen P. Assessment of resident surgical skills: is testing feasible? Am J Obstet Gynecol. 2005 Apr;192(4):1331-8; discussion 1338-40. VAL Martin JA, Reznick RK, Rothman A, Tamblyn RM, Regehr G. Who should rate candidates in an objective structured clinical examination? Acad Med. 1996 Feb;71(2):170-5.


Alternative Forms of Administration

Maker VK, Bonne S. Novel hybrid objective structured assessment of technical skills/objective structured clinical examinations in omprehensive perioperative breast care: a three-year analysis of outcomes. J Sualrg Educ. 2009 Nov-Dec;66(6):344-51.

Lin SY, Laeeq K, Ishii M, Kim J, Lane AP, Reh D, Bhatti NI. Development andpilot-testing of a feasible, reliable, and valid operative competency assessment tool for endoscopic sinus surgery. Am J Rhinol Allergy. 2009 May-Jun;23(3):354-9.

Goff BA, VanBlaricom A, Mandel L, Chinn M, Nielsen P. Comparison of objective, structured assessment of technical skills with a virtual reality hysteroscopy trainer and standard latex hysteroscopy model. J Reprod Med. 2007 May;52(5):407-12.

Datta V, Bann S, Mandalia M, Darzi A. The surgical efficiency score: a feasible, reliable, and valid method of skills assessment. Am J Surg. 2006, Sep;192(3):372-8.


Adaptations

Setna Z, Jha V, Boursicot KA, Roberts TE. Evaluating the utility of workplace-based assessment tools for specialty training. Best Pract Res Clin Obstet Gynaecol. 2010 Jun 30.

Dorman K, Satterthwaite L, Howard A, Woodrow S, Derbew M, Reznick R, Dubrowski A. Addressing the severe shortage of health care providers in Ethiopia: bench model teaching of technical skills. Med Educ. 2009 Jul;43(7):621-7.


The Conceptual and Measurement Model

Reliability Validity

Responsiveness or sensitivity to changeInterpretabilityBurden

Alternative Forms of Administration Cultural And Language Adaptations



Applications to program evaluation

Chipman JG, Schmitz CC. Using objective structured assessment of technical skills to evaluate a basic skills simulation curriculum for first-year surgical residents. J Am Coll Surg. 2009 Sep;209(3):364-370.

Siddighi S, Kleeman SD, Baggish MS, Rooney CM, Pauls RN, Karram MM. Effects of an educational workshop on performance of fourth-degree perineal laceration repair. Obstet Gynecol. 2007 Feb;109(2 Pt 1):289-94.

VanBlaricom AL, Goff BA, Chinn M, Icasiano MM, Nielsen P, Mandel L. A new curriculum for hysteroscopy training as demonstrated by an objective structured assessment of technical skills (OSATS). Am J Obstet Gynecol. 2005 Nov;193(5):1856-65.

Anastakis DJ, Wanzel KR, Brown MH, McIlroy JH, Hamstra SJ, Ali J, Hutchison CR, Murnaghan J, Reznick RK, Regehr G. Evaluating the effectiveness of a 2-year curriculum in a surgical skills center. Am J Surg. 2003 Apr;185(4):378-85.


Standards and Rigor

The Joint Committee on Standards for Educational Evaluation (1990)

Utility Standards • The utility standards are intended to ensure that an

evaluation will serve the information needs of intended users.


Utility Standards Feasibility Standards • The feasibility standards are intended to ensure

that an evaluation will be realistic, prudent, diplomatic, and frugal.


Utility Standards Feasibility Standards Propriety Standards • The propriety standards are intended to ensure that

an evaluation will be conducted legally, ethically, and with due regard for the welfare of those involved in the evaluation, as well as those affected by its results.


Utility Standards Feasibility Standards Propriety Standards Accuracy Standards • The accuracy standards are intended to ensure that

an evaluation will reveal and convey technically adequate information about the features that determine worth or merit of the program being evaluated.

Conclusions

Conclusions

Evaluations

Outcome

Process


Conclusions

Conclusions

Three philosophies/paradigms:• Use• Methods• Value

Conclusions

Where would you sit on the Evaluation Tree?What about others in your “system”?

Take-home

• Need more complex evaluation models that look at both processes and outcomes

• Choices will depend on the intended use (i.e. the branch on the Evaluation Tree)

• Both the evaluation models and the assessments instruments need standardization and rigor.

Beast

Environment

Intake

Interaction

Reaction

Function

Fluffy

Dubrowski assessment and evaluation

Education

Transcript of Dubrowski assessment and evaluation