1 Examination Accreditation Program Put to the Test: Exam Development Made Easy April 26, 2008...
-
Upload
blaise-fisher -
Category
Documents
-
view
213 -
download
0
Transcript of 1 Examination Accreditation Program Put to the Test: Exam Development Made Easy April 26, 2008...
1
Examination Accreditation Program
Put to the Test: Exam Development Made Easy
April 26, 2008Pasadena – California -USA
2
Put to the Test: Introducing the Presenters
Critical Issues
Lawrence J. Fabrey, Ph.D.
Content DevelopmentTadas Dabsys
Writing Exam QuestionsJames F. Fryer, Ed.D., CPCU
Performance Analysis Nikki Eatchel M.A.
3
Critical Issues in Examination Development
• Validity • Reliability • Fairness • Legal Defensibility
• Standards
4
Critical Issues: Validity
Chapter 1 in the Standards– Standards for Educational and Psychological
Testing (AERA/APA/NCME, 1999)– 24 separate standards
Also in Chapter 14 – Testing in Employment and Credentialing– Especially 14.8 through 14.10: Explicit definition,
link between job and test content, rationale basis
5
Critical Issues: Validity
We cannot claim an examination is “valid” “Links in the chain of evidence used to support
the validity of the examination results” Links include:
– Job Analysis, Test Specifications, Item Writing, Standard Setting, and Examination Construction, Administration and Scoring.
6
Critical Issues: Validity
Traditional Sources of Validity Evidence• Content• Criterion Related (Predictive or Concurrent)• Construct
Now -- Validity is a unitary concept: The degree of accumulated evidence is key
7
Critical Issues: Validity
Current (1999 Standards) –
Validity Evidence based on:• Test Content• Response Processes• Internal Structure• Relation to Other Variables• Consequences of Testing
8
Critical Issues: Validity
For Licensing, Validity Evidence Based on Test Content is Critical
• Job analysis provides the foundation• Identifies job-related activities/tasks (and sometimes KSAs)
that will be used to define test content
9
Critical Issues: Reliability
Chapter 2 in the Standards– 20 distinct standards (some not
relevant for RE licensing exams)Reliability refers to the consistency
of measurement (reproducibility)
10
Critical Issues: Reliability
All scores (all measurements) have a component of error
Scales, equivalence
11
Critical Issues: Reliability
In general, four factors increase reliability:
1. Measurement of homogeneous content
2. Heterogeneous candidate groups
3. Longer tests
4. High quality items
12
Critical Issues: Reliability
Estimates of reliability to be derived depend on measurement model:– Classical Test Theory (CTT)– Item Response Theory (IRT)
13
Critical Issues: Reliability
Classical Test Theory– Unit of measurement: based on # correct– Attempt to estimate “true score”– Estimates of reliability focus on
consistency of scores
14
Critical Issues: Reliability
Item Response Theory– Based on probabilities of correct responses – Attempt to estimate probability based on
underlying trait (or ability) parameter– Estimates of reliability focus on information
obtained
15
Critical Issues: Fairness
Chapter 7 in the Standards Four principle aspects of fairness:
1. Lack of bias
2. Equitable treatment in testing process
3. Equality in outcomes
4. Opportunity to learn
16
Critical Issues: Fairness
Lack of bias• Technical definition:
Bias is present if different meanings of scores are present for identifiable subgroups
• Differential item functioning (DIF)• Judgmental and statistical processes
• Content-related sources• Response-related sources
17
Critical Issues: Fairness
Equitable treatment in testing process• Equal opportunity to demonstrate
knowledge• Standardized testing conditions
• Security (before, during, after)
• Equal opportunity to prepare• Candidate materials
18
Critical Issues: Fairness
Equality in outcomes• Are equal passing rates for identifiable
subgroups required?• No, what is required is that all outcomes
must have the same meaning• Equality of the passing point
19
Critical Issues: Fairness
Opportunity to learn• Primarily applicable to educational
achievement testing
20
Critical Issues: Fairness
Highlights of Chapters 8, 9, and 10
8: Rights and Responsibilities of Test Takers – focus on candidates’ rights (e.g., to information)
9: Testing Individuals of Diverse Linguistic Backgrounds – issues of translation, adaptation, modification and potential impact on validity
10: Testing Individuals with Disabilities – ADA requires “reasonable accommodations”
21
Critical Issues: Legal Defensibility
• All Standards could be considered• Standards are not legal
requirements• New Standards under development
22
Critical Issues: Legal Defensibility
Other considerations• ARELLO Guidelines for Accreditation • EEOC Guidelines (1978)
23
Put to the Test:
Content DevelopmentTadas Dabsys
24
Exam Content Development
Job Analysis Survey Sampling Methodology Analysis of the Survey Results Content Specifications
25
Job Analysis Survey
Job Analysis Procedure
– Content Development– Format Development
26
Sampling Methodology
Sampling – Representative Sample Target Groups Sources for Addresses/Contact Information
27
Analysis of the Survey Results
Demographic Information Professional Activities Knowledge
– Rating and Respective Response Scales– Summary Statistics– Identification of Qualifying Professional
Activities/Knowledge
28
Content Specifications
Development of Content Specifications Establishing Content Base for Inclusion in
Exams Linking Knowledge Areas to Professional
Activities Development of Operational Definitions Test Outline and Content Weighting
29
Put to the Test:
Writing Exam QuestionsJames F. Fryer, Ed.D., CPCU
30
Developing the Exam Question(Item Writing)
How?National Subject Matter Experts
– Qualifications– Experience– Background in Job Analysis
31
Developing the Exam Question(Item Writing)
Linking Back to the Job Analysis: Activities and Tasks Subject/Content Area of the Task The Knowledge Statement (KSAs)
32
Developing the Exam Question(Item Writing)
Linking to the Knowledge Statement:
Content Topic Area – Leasehold EstatesContent Sub-Topic Area – Basic Elements of a LeaseJob Task – Negotiate a LeaseKSA – Knowledge of the financial ramifications of the common lease provisions referred to as Net Lease, Gross Lease, Triple-Net Lease, Percentage Lease, Base Rent, Effective Rent
33
Developing the Exam Question(Item Writing)
Linking to the Knowledge Statement:
Knowledge Statement – The entry level practitioner must be able to compare and contrast the interrelationships of the commonly negotiated lease provisions referred to as Net Lease, Gross Lease, Triple-Net Lease, Percentage Lease, Base Rent, Effective Rent and equate the comparison in terms of financial impact on cost to the landlord or tenant.
34
Developing the Exam QuestionBalancing Cognitive Levels:
Recall: Able to state the definition of a “latent defect.”Application: Applies knowledge of the definition of “latent defect” by recognizing and classifying certain described property conditions as meeting the definition of a latent defect.Analysis: Applies knowledge of the definition by recognizing certain described property conditions as meeting the definition and interpreting the information to determine what action should be made.
35
Developing the Exam Question(Item Writing)
Basic Item Writing Principles: One Correct Answer Relevant and Appropriate Realistic Important to Related Task Straightforward Clear and Concise No Clues that Give Away the Answer Entry-Level Reading Level
36
Developing the Exam Question(Item Writing)
Item Writing Principles for Multiple-Choice: Define the Task Express a Complete Thought Reduce the Reading Load Use of Negative Words Response Options need to:
– Fits the Stem Logic– Equal in Length– Unique Meaning– Plausible– Avoid “All the Above” and “None of the Above”
37
Put to the Test:
Performance Analysis Nikki Shepherd Eatchel, M.A.
38
Exam Question (Item) Performance Analysis
There are various types of analyses used for evaluation of exam and item performance:
Item Response Theory (1 parameter, 3 parameter)
Classical Test Theory
Classical Test Theory is the most common analysis type used for evaluation of state-based licensure exams.
39
Exam Question (Item) Performance Analysis
Classical Test Theory – Item Statistics
p-value: Percentage of candidates who answer an item correctly
rpbis: Correlation between performance on an item and
performance on the overall exam
40
Exam Question (Item) Performance Analysis
Classical Test Theory – Item Statistics
option analysis: Percentage of candidates who answer each option
option rpbis: Correlation between performance on the options and the overall exam
41
Exam Question (Item) Performance Analysis
Classical Test Theory – Issues to Consider
p-value: Too high, too lowrpbis: Too lowoptions: Distractors with no responses
Distractors with more responses than key
O rbpis: Distractors higher than keyOmits Above expectations
42
Exam Question (Item) Performance Analysis
Classical Test Theory – Sample Statistics
43
Exam Question (Item) Performance Analysis
Classical Test Theory – Second sample
44
Exam Question (Item) Performance Analysis
Forensic Data Analysis – Item Level
With the increasing security issues within the testing industry (e.g., brain dump sites, black market content, etc.) forensic data analysis is critical for the evaluation of a testing program.
45
Exam Question (Item) Performance Analysis
Forensic Data Analysis – Item Level
What does Forensic Data tell you?
46
Exam Question (Item) Performance Analysis
Forensic Data Analysis Suspicious Candidate Activity
Candidates who did better on harder questions than they did on easier questions
Candidates who achieved a high score (in the top 20%) on the exam while only viewing questions for a very short period of time (over half of the items 10 seconds or less each)
47
Exam Question (Item) Performance Analysis
Forensic Data Analysis Suspicious Candidate Activity
Test takers who viewed some items for extended periods of time while cycling quickly (less than 8 seconds each) through the remaining items
Test takers with scores around chance-level while spending short amounts of time on each item (pre-viewing).
48
Exam Question (Item) Performance Analysis
Forensic Data Analysis – Item Level
While there are a number of item and exam level
statistics available for forensic data analysis, many
of them revolve around classical test theory and
item level performance. These types of analyses
can be valuable in both test development
forecasting and in evaluating security issues.
49
The ARELLO Accreditation Program
Overview of the function and purpose
50
The ARELLO Accreditation Program
Independent Evaluation Confidence and Protection Defensible Standards Enhanced Quality Facilitates License Recognition
51
QUESTIONS?COMMENTS?
Fabulous Prize Giveaway