Best practices in using standardized tests for talent identification
Transcript of Best practices in using standardized tests for talent identification
Introduction Background and disclaimers
Focus on ability testing
Especially group-administered
Screening for talent
Cognitive Abilities Test (CogAT)
Guidelines for Adapting tests
Test use
Computer-based and internet-delivered testing
Quality control in scoring, test analysis, and reporting
Common Fallacies Ability is innate and, if well measured, constant There are culture-fair or culture-free tests Poor children perform better on measures of school
achievement than on ability tests g is sufficient when measuring ability Ability is sufficient for the attainment of excellence National norms are needed for school-based programs Highly correlated tests will identify the same children Lohman, D. F. (2006). Beliefs about differences between ability and accomplishment: From folk theories to cognitive science. Roeper Review, 29, 32-40
How effective are good ability tests in identifying the best students?
1200 children tested with
Raven Progressive Matrices
Naglieri Nonverbal Ability Test
Cognitive Abilities Test (Form 6)
Nationally normed achievement test
Ability Test Corr. with
Reading Ach
Percent of
students
above 90th PR
identified
CogAT
Verbal .78 54 %
Quantitative .65 43 %
Nonverbal .58 38 %
Raven .48 30 %
NNAT .40 27 %
Identifying High Achievers
Defining terms Screening (versus placement)
Identification (versus selection)
Gifted (versus talented)
Talent development services (versus gifted program)
Potential (versus accomplishment)
Opportunity to learn
Normative Perspective
Basics of Screening Know what you are screening for
Educational programs & outcomes
Cast a broad net Use multiple sources of information
Emphasize “or” rather than “and” to combine scores
Measure critical aptitudes as reliably as possible Reasoning abilities
More than one symbol system & test format
Domain knowledge & skill More than tool skills
Interest, perseverance, creativity Trained raters, behaviorally anchored checklists
To Screen-or Not to Screen All Children “Identified” children
Multiple Ability Test Battery (Ability Profiles) to guide
both identification & placement
Short Screening Test (Single score)
Multiple Ability Test Battery (Ability Profiles) to help
guide placement
Threats to construct validity
Construct Over-
representation
Example: WISC-IV
Construct Under-
representation
Example: Only an IQ score
WISC-IV Spanish Target population: Bilingual Spanish-speaking
children in the U.S. with no more than 5 years of U.S. schooling
Translated & adapted the verbal tests
Calibrated the Spanish Verbal Scale with the normative (English) Verbal Scale
Two additional PRs that index opportunity to learn by % education in U.S.
+ Parental educational level
Language versus cultural loading WISC IV – Spanish
Language load dramatically reduced
What about cultural loading?
Reasoning Indexes
Verbal Comprehension (Vocab, Similarities, Comprehension)
Perceptual Reasoning (Block Design, Picture Concepts, Matrix Reasoning)
Which index most affected by % education received in U.S. schools?
WISC-IV Spanish Scores
by Percent Education in the U.S.
84858687888990919293949596979899
100101
Full Scale IQ Verbal Comp Perceptual
Reasoning
WISC-IV Score
Sc
ale
Sc
ore
(M
= 1
00
, S
D =
15
)
Minimal (<30%)
Most
All (100%)
Threats to construct validity
Construct Over-
representation
Example: WISC-IV
Construct Under-
representation
Example: Only an IQ score
CogAT Profiles Batteries
Verbal
Quantitative
Nonverbal
Pattern & level of scores capture in a profile code
“A” All three scores are at approximately the sAme level
“B” One score is aBove or Below the other scores
“E” An Extreme difference (> 24 IQ-like points)
0
5
10
15
20
25
30
35
40
45
50
1 2 3 4 5 6 7 8 9
Perc
en
t
Median Stanine
Profile Frequency by Median Stanine
Flat
Weakness
Lohman, D. F., Gambrell, J., & Lakin, J. (2008). The commonality of extreme discrepancies
in the ability profiles of academically gifted students. Psychological Science , 50, 269-282.
Threats to construct validity
Construct Over-representation
Example: WISC-IV
Construct Under-representation
Example: Global IQ score only
Construct-irrelevant difficulty
Examples: Unfamiliar
language, test format or content
Construct irrelevant sources of difficulty Obvious
Oral language for bilingual students
Reading for children who cannot read
Not so obvious
Unfamiliar test formats
Unfamiliar content
Poverty, gender, and regional differences
Only one kind of content or format
Tradeoffs
Reducing construct-irrelevant sources of difficulty
Providing informative directions in student’s language(s)
E.g. Both L1 and L2 (familiar teacher)
Repeat practice items, provide additional examples
More than perfunctory practice items/tests
The necessity of good practice tests
Better level the playing field
Can help teach more broadly useful thinking skills
Practice Activities Scores are most valid when students clearly
understand what they are supposed to do
Unequal preparation – by accident or design
Levels the playing field
Activities can help teach important thinking skills
Teacher guide and student practice booklet
By battery (V, Q, N)
Levels 5/6, 7, 8, 9, 10, 11, 12
Opportunity to learn?
Similar to norm group
Verbal, Quantitative, &
Nonverbal
Academic knowledge &
skills
Unlike norm group
Nonverbal & Quantitative
(standard norms)
+ Verbal & achievement
(special norms)
Figural Reasoning tests
Reduce effects of oral language (but not culture)
Construct under-representation
Quantitative reasoning an important ally
Directly linked to instruction
Comprehensive “Nonverbal” tests
Verbal concepts/reasoning pictorially
best for young children
Quantitative reasoning pictorially and/or small integers
Unlike norm group
Option 1: “Nonverbal” tests
Examples of a more Comprehensive “Nonverbal” test: CogAT Form 7
All required subtests use only pictures
Optional English/Spanish Sentence Completion subtest
Bilingual, English/Spanish Primary Level Tests
Directions for all tests in Spanish and/or English
Prompts for the optional Sentence Completion test (levels 5 – 8) in Spanish and/or English
Not a just a translated test
Items for all tests in the Verbal Battery separately created within ELL and non-ELL groups,
Tests assembled from item that worked well in all groups.
Sentence Completion .
Which one swims in the ocean? ¿Cuál animal nada en el océano?
A _________ swims in the ocean. A cat B shark C bird
Comprehensive Verbal, Quantitative , & Figural Test
• A better, more comprehensive measure academic talent for all children than nonverbal battery alone
• Smaller differences between ELL and non-ELL children than on the nonverbal battery!
• More equitable gifted identification
Percent Scoring in the Top 5% on Each CogAT7 Primary Test Belonging to Various Subgroups
Test Format
Bi-Lingual
Poor Asian Hisp. Black
Sample Percent
6 23 4 21 15
Percent in “Gifted” group
Picture Verbal 5 14 7 20 8
Picture Quant 4 9 11 11 5
Figural NV 2 11 8 12 7
For more information on CogAT7 see:
Cognitively Speaking, Vol 7 (August 2011). Introducing CogAT Form 7
Lohman, D. F., & Gambrell, J. (2012). Use of nonverbal measures in gifted identification. Journal of Psychoeducational Assessment.
Both papers and other materials at:
http://faculty.education.uiowa.edu/dlohman
Complete Test Screening Form
Picture/Verbal Analogies
Number Analogies
Picture/Verbal Class.
Figure Matrices
Figure Classification
Number Series
Sentence Completion
Number Puzzles
Paper Folding
V
Q
N
SAS
Picture/Verbal Analogies
Number Analogies
Figure Matrices
Similar format Varied content
Warning?
Primary Levels (K-2) Upper Levels (3+)
Picture Analogies
Number Analogies
Figure Matrices
Verbal Analogies
Number Analogies
Figure matrices
Upper Levels (3+)
Verbal Analogies
Number Analogies
Figure matrices
Option to omit or not score for ELL
students
Effectiveness for Screening
When followed by Complete CogAT, Iowa’s, or a good individually-administered ability test
2 – 3 times more effective than commonly used screening tests
Success in identifying gifted students (top 3%) on placement test
Percent
administered follow-up test
Typical Screening
Test (out of 10)
CogAT Screening
Test (out of 10)
30% 8.0 9.6
20% 7.3 9.3
10% 5.4 7.9
3% 2.6 4.5
Scatterplot r = .60, cut score = 89th PR
Test2_60
4.002.000.00-2.00-4.00
Test1
4.00
2.00
0.00
-2.00
-4.00
1.25
1.25
1.00
.00
mark60
Scatterplot r = .60, cut points 1.25 SD (89 PR)
Highest
Scatterplot r = .80, cut point 93rd PR
Test2_80
4.003.002.001.000.00-1.00-2.00-3.00-4.00
Test1
4.00
3.00
2.00
1.00
0.00
-1.00
-2.00
-3.00
-4.00
1.5
1.5
1.00
.00
mark80
Scatterplot r = .80, Cut points 1.5 SD (93 PR)
Lohman-Renzulli Matrix
Teacher Rating on Learning Ability,
Motivation, or Creativity
Low teacher ratings
High teacher
ratings
CogAT Verbal
OR Quantitative-Nonverbal
( >95th PR )
II I
( 80th – 95th
PR) IV III
Opportunity to learn?
Similar to norm group
Verbal, Quantitative, &
Nonverbal
Academic knowledge &
skills
Unlike norm group
Nonverbal & Quantitative
(standard norms)
+ Verbal & achievement
(special norms)
A broader estimate of talent using commonly administered ability and achievement test scores, especially language based
Often simple ranks will do
Cumulate more precise norms (PRs) across years
Lohman (in press) “Nontraditional uses of traditional assessments.” In C. M. Callahan & H. Hertberg-Davis (Eds.) Fundamentals of gifted education (draft on my website)
Unlike norm group
Option 2: Common assessments but special norms
Using appropriate (multiple) norm or comparison groups
If schools do not need national norms to identify promising runners or swimmers, then why are they mandatory for identifying promising scientists?
Importance of multiple perspectives – the nation, the district, the school, one or more OTL groups
Multiple Perspectives The need for special programming depends most
importantly on the discrepancy between a child’s development and that of his or her classmates
Except for regional talent searches, summer programs that draw from different schools, etc… Make better use of local norms!
For grade 3 bilingual students, compare scores to: Other bilingual students in grade 3 in the school
Other students in grade 3 in the district
Other grade 3 students in the nation
How do you get local norms? Ask for them!
Compute them yourself
Spreadsheet (Excel)
data manipulation tools from test publishers
Approximate with ranks
Beyond local norms Simple rank orders (within domain)
Rank orders within OTL groups
When the goal is talent identification – not identifying the gifted
Opportunity to learn
The senator’s question, revisited
Among the poor children in your school, might there be any who might have talent in writing or mathematics or …? If so, might it be helpful to recognize and encourage that talent?
Index by eligibility for free/reduced-price lunch
Rank order scores within OTL groups
Lohman (in press) “Nontraditional uses of traditional assessments.”
In C. M. Callahan & H. Hertberg-Davis (Eds.) Fundamentals of gifted education
Multiple Programming Options for ELL Children Current level of achievement is primary guide
Programming goal: to encourage interests and improve achievement at a rate faster than would otherwise occur
For on- and below-grade-level achievement options include: tutors, after-school or weekend classes/clubs, etc. Motivational component critical.
For achievement well in advance of peers, consider single-subject acceleration
Main Points Do not just translate test items (ITC guidelines)
Be aware of common misconceptions
Critical importance of opportunity to learn
When screening, use liberal cut scores
Value of comprehensive “nonverbal” tests
Provide informative practice
Use multiple measures of multiple constructs
Combine measures appropriately
Profiles are more important at the extremes
Value of multiple perspectives
Importance of multiple programming options