Selected topics in psychometrics · Examples of DIF items - Medical admission tests Education...
Transcript of Selected topics in psychometrics · Examples of DIF items - Medical admission tests Education...
Selected topics in psychometrics
L8: Differential Item Functioning
Patrıcia MartinkovaNMST570, December 3, 2019
Department of Statistical Modelling
Institute of Computer Science, Czech Academy of Sciences
Institute for Research and Development of Education
Faculty of Education, Charles University, Prague
Outline
1. Review
2. Introduction to DIF
3. DIF detection in binary items
4. Conclusion
Review Introduction to DIF DIF detection in binary items Conclusion
IRT models Estimation of IRT models
Review - IRT models
• IRT models for binary data
• Rasch model, 1PL, 2PL, 3PL, 4PL IRT model
• IRT models for ordinal data
• Cumulative logits
• Graded Response Model (GRM)
• Adjacent-category logits
• Partial Credit Model (PCM)
• Generalized Partial Credit Model (GPCM)
• Rating Scale Model (RSM)
• IRT models for nominal data
• Baseline-category logits
• Nominal Response Model (NRM)
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 1/41
Review Introduction to DIF DIF detection in binary items Conclusion
IRT models Estimation of IRT models
Review - IRT models
• Item Characteristic Curve (ICC), Item Response Function (IRF)
• Item Information Function (IIF), Test Information Function (TIF)
• Likelihood function
• Parameter estimation
• Joint maximum likelihood (JML)
• Conditional maximum likelihood (CML)
• Marginal maximum likelihood (MML)
• Model fit
• Item fit
• Person fit
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 2/41
Review Introduction to DIF DIF detection in binary items Conclusion
DIF definition DIF examples DIF conceptual issues DIF and fairness
Differential Item Functioning
Differential Item Functioning (DIF)
Two subjects with the same underlying ability but from different groups have
different probability to answer question correctly
• Two groups referred to as reference and focal (usually minority)
• Two types of DIF - uniform and non-uniform
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 3/41
Review Introduction to DIF DIF detection in binary items Conclusion
DIF definition DIF examples DIF conceptual issues DIF and fairness
Examples of DIF items
Example (SAT) (Cramp & McDougall, 2018):
Runner is to marathon as
a. envoy is to embassy
b. martyr is to massacre
c. oarsman is to regatta
d. referee is to tournament
e. horse is to stable
Who might have been disadvantaged? (Specify reference and focal group)
Cramp, A., & McDougall, J. (2018). Doing theory on education: Using popular culture to
explore key debates. Routledge.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 4/41
Review Introduction to DIF DIF detection in binary items Conclusion
DIF definition DIF examples DIF conceptual issues DIF and fairness
Examples of DIF item
Tipping example (Martiniello & Wolf, 2012)
Of the following, which is the closest approximation of a 15 percent tip on a
restaurant check of $24.99?
a. $2.50
b. $3.00
c. $3.75
d. $4.50
Example of spelling test (orally administered):
Spell word girder
Martiniello, M., & Wolf, M. (2012). Exploring ells’ understanding of word problems in
mathematics assessments: The role of text complexity and student background knowledge.
In S. Celedon-Pattichis & N. Ramirez (Eds.), Beyond good teaching: Strategies that are
imperative for English language learners in the mathematics classroom. Reston, VA: NCTM.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 5/41
Review Introduction to DIF DIF detection in binary items Conclusion
DIF definition DIF examples DIF conceptual issues DIF and fairness
Examples of DIF items - Medical admission tests
Education ”Growth of long bones”
A) occurs in growth cartilage
B) is hormone-controlled
C) usually ends at about 10-13 years of age, in boys earlier than in girls
D) usually ends around 16-19 years of age, in girls earlier than in boys
(Martinkova et al., (2019): more often correctly answered by males)
”Deficiency of vitamin D in childhood could cause”A) rickets
B) scurvy
C) dwarfism
D) mental retardation
(Drabinova and Martinkova (2017): more often correctly answered by females)
Drabinova, A., & Martinkova, P. (2017). Detection of differential item functioning with
nonlinear regression: A non-IRT approach accounting for guessing. Journal of Educational
Measurement, 54(4), 498–517.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 6/41
Review Introduction to DIF DIF detection in binary items Conclusion
DIF definition DIF examples DIF conceptual issues DIF and fairness
Examples of DIF items - health-related outcome measures
Pain ”How often did pain prevent you from walking more than 1 mile?”
(Amtmann et al. (2010): reported more often by older patient)
”How often did pain prevent you from standing for more than 1 hour?”
(Amtmann et al. (2010): reported more often by older patients)
Depression ”I felt like crying”
(Pilkonis et al. (2011): endorsed more often by females)
Anger ”I was angry when people were unfair”
(endorsed more often by older patients)
“I was angry when I did something stupid”
(Pilkonis et al. (2011): endorsed more often by older patients)
Amtmann, D., Cook, K. F., Jensen, M. P., Chen, W.-H., Choi, S., Revicki, D., ...,
Callahan, L., et al. (2010). Development of a PROMIS item bank to measure pain
interference. Pain, 150(1), 173–182.Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., Cella, D., & Group,
P. C. (2011). Item banks for measuring emotional distress from the patient-reported
outcomes measurement information system (PROMIS R©): Depression, anxiety, and anger.
Assessment, 18(3), 263–283.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 7/41
Review Introduction to DIF DIF detection in binary items Conclusion
DIF definition DIF examples DIF conceptual issues DIF and fairness
DIF vs. difference in total scores
Comparing total scores only can lead to incorrect conclusions about item/test
fairness (Martinkova et al., 2017)
• Case study 1: Homeostasis Concept Inventory
• Significant difference (Fig A), but no DIF item
• Case study 2: Simulated dataset based on GMAT
• Identical distributions of total score (Fig B), DIF items present
Martinkova, P., Drabinova, A., Liaw, Y. L., Sanders, E. A., McFarland, J. L., & Price, R.
M. (2017). Checking equity: Why differential item functioning analysis should be a routine
part of developing conceptual assessments. CBE—Life Sciences Education, 16(2), rm2.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 8/41
Review Introduction to DIF DIF detection in binary items Conclusion
DIF definition DIF examples DIF conceptual issues DIF and fairness
DIF vs. difference in total scores (cont.)
Comparing total scores only can lead to incorrect conclusions about item/test
fairness (Martinkova et al., 2017)
• Case study 2: Simulated dataset based on GMAT
• Identical distributions of total score (Fig B), DIF items present
Martinkova, P., Drabinova, A., Liaw, Y.-I., et al. (2017). Checking equity: Why
differential item functioning analysis should be a routine part of developing conceptual
assessments. CBE—Life Sciences Education, 16(2), rm2.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 9/41
Review Introduction to DIF DIF detection in binary items Conclusion
DIF definition DIF examples DIF conceptual issues DIF and fairness
DIF vs. difference in item scores
Comparing item scores can lead to incorrect conclusions about item fairness.
In case of different distributions of total scores:
• Difference in item scores (in the same direction as the difference in the
total scores) may be expectable
• No difference in item scores may be actually sign of unfair item.
Thus, both item score and latent ability (total score) need to be taken into
account to assess item fairness.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 10/41
Review Introduction to DIF DIF detection in binary items Conclusion
DIF definition DIF examples DIF conceptual issues DIF and fairness
DIF as multidimensionality problem
DIF as multidimensionality problem:
• Existence of another dimension tested on the particular item besides the
primary latent variable
Exercise:
What is the primary and the secondary latent variable tested in previously
described examples?
• Regatta example
• Spelling example (girder)
• Deficiency of vitamin D in childhood
• Tipping example
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 11/41
Review Introduction to DIF DIF detection in binary items Conclusion
DIF definition DIF examples DIF conceptual issues DIF and fairness
DIF and item fairness
DIF items are potentially unfair. However, DIF items are not necessarily
threat to fairness and validity. Content experts must decide on item fairness
based on classification of the secondary latent trait causing DIF:
• Unrelated to content being tested
• DIF item is considered unfair, item should be reworded/removed
• Related to content being tested
• DIF item is not considered unfair, item can inform teaching
Exercise:
Classify secondary latent trait in following items:
• Regatta example
• Spelling example (girder)
• Deficiency of vitamin D in childhood
• Tipping example
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 12/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
DIF detection methods in binary items
• Based on total score
• Delta plot
• Mantel-Haenszel test
• Logistic regression
• Generalized logistic regression
• Based on latent ability and Item Response Theory models
• Lord’s (Wald) test
• Raju’s area test
• Likelihood ratio test (LRT)
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 13/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Delta plot - overview
• Angoff & Ford (1973)
• compares proportions of correct answers
• displays non-linear transformation of proportions (using quantiles) item
detection threshold
• fixed to 1.5
• normal approximation (Magis & Facon, 2012).
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 14/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Delta plot - motivation
Assumption of parameter invariance implies approximately linear form for
proportions of correct responses
Delta plot provides comparison of transformed proportions of correct
responses per item and by group of respondents (Angoff & Ford, 1973)
Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude.
Journal of Educational Measurement, 10(2), 95-105.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 15/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Delta plot - introducing delta scores
In more detail:
• For each item j = 1, . . . , J and each group (reference and focal) proportions
of correct answers are calculated:
πjR = 1IR
∑IRi=1 Yij and πjF = 1
IF
∑IFi=1 Yij
• Transformation into standard normal deviates:
zjR = qZ (1− πjR) and zjF = qZ (1− πjF ),
where qZ is quantile of the standard normal distribution
• Transformation into delta scores:
∆jR = 4 · zjR + 13 and ∆jF = 4 · zjF + 13
Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude.
Journal of Educational Measurement, 10(2), 95-105.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 16/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Delta plot - DIF detection
• Pairs of delta scores (∆jR ,∆jF ) can be displayed on a scatter plot
(so called Delta plot or Diagonal plot)
• Delta scores of the reference group on the X axis and of the focal group on
the Y axis
• Delta scores create an ellipsoid with major axis:
∆jF = a + b∆jR , where
b =s2F − s2
R +√
(s2F − s2
R)2 + 4s2RF
2sRFa = mF − b ·mR
s2F , s2
R sample variance of delta scores, sRF their sample covariance and mR
and mF their sample means
Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude.
Journal of Educational Measurement, 10(2), 95-105.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 17/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Delta plot - DIF detection
• DIF detection is based on distance of delta scores from major axis
Dj =b∆jR + a−∆jF√
b2 + 1
• Detection threshold:
• Fixed: Items is marked as DIF if |Dj | > 1.5 (Angoff & Ford, 1973)
• Based on normal approximation (Magis & Facon, 2014).
Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude.
Journal of Educational Measurement, 10(2), 95-105.Magis, D., & Facon, B. (2014). deltaPlotR: An R package for differential item functioning
analysis with Angoff’s Delta plot. Journal of Statistical Software, 59(1), 1-19.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 18/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Delta plot in ShinyItemAnalysis
• ShinyItemAnalysis offers functionality of deltaPlotR package
• Provides delta plot in ggplot2
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 19/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Mantel-Haenszel test
MH is an extension of the χ2-test of independence on contingency tables
• Contingency tables summarize item responses by group membership for
given item
• MH test combines all levels of total scores
In more detail:
For each level of total score k = 0, . . . ,K , contingency table is created:
Y = 1 Y = 0
Reference group Ak Bk
Focal group Ck Dk
Odds ratio for total score k :
αk =Ak/Bk
Ck/Dk=
AkDk
BkCk
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 20/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Mantel-Haenszel test - introducing αMH
In case of independence of item score and group membership for total score k:
αk =AkDk
BkCk≈ 1
MH combines all levels of total score:
αMH =
∑Kk=0(AkDk/Nk)∑Kk=0(BkCk/Nk)
= Weighted average of odds ratio through all levels of the total score
αMH
≈ 1, no DIF
> 1, DIF favoring reference group
< 1, DIF favoring focal group
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 21/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Mantel-Haenszel test - introducing ∆MH
αMH is often standardized through log transformation, centering the value
around 0:
∆MH = −2.35 · log(αMH)
However, then the interpretation is different!
∆MH
≈ 0, no DIF
> 0, DIF favoring focal group
< 0, DIF favoring reference group
∆MH can be used to determine DIF effect size (Holland & Thayer, 1985):
|∆MH|
< 1, Category A = negligible effect
∈ [1, 1.5), Category B = moderate effect
≥ 1.5, Category C = large effect
Holland, P. W., & Thayer, D. T. (1985). An alternate definition of the ETS delta scale of
item difficulty. ETS Research Report Series, 1985(2), i-10.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 22/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Test statistic
Contingency table for given item and total score k :
Y = 1 Y = 0
Reference group Ak Bk NRk = Ak + Bk
Focal group Ck Dk NFk = Ck + Dk
N1k = Ak + Ck N0k = Bk + Dk Nk
Test statistic for testing whether αMH = 1:
MH =
[∑Kk=0
(Ak − NRkN1k
Nk
)− 0.5
]2
∑Kk=0
NRkNFkN1kN0k
N2k (Nk−1)
≈ χ21
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 23/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Mantel-Haenszel - pros and cons
Pros:
+ Simple method
+ Easily implemented
+ Detects DIF in small samples
Cons:
− Does not detect non-uniform DIF
ShinyItemAnalysis offers step-by-step calculation of MH statistics.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 24/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Logistic regression for DIF detection
• LR models probability of correct answer on item j by respondent i based on
their total score and group membership (Swaminathan & Rogers, 1990)
• Introducing effect of total score, group membership, and their interaction
• Nonzero effect of group membership indicates uniform DIF
• Nonzero effect of interaction indicates uniform DIF
• DIF detection based on Wald’s test or likelihood ratio test of the submodel
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using
logistic regression procedures. Journal of Educational measurement, 27(4), 361-370.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 25/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Logistic regression for DIF detection
In more detail:
P(Yij = 1|Xi ,Gi ) =eb0j+b1jXi
1 + eb0j+b1jXi
eb0j+b1jXi+b2jGi
1 + eb0j+b1jXi+b2jGi
eb0j+b1jXi+b2jGi+b3jXi :Gi
1 + eb0j+b1jXi+b2jGi+b3jXi :Gi
where Xi is total score , Gi group membership (Gi = 0 for reference group, Gi
= 1 for focal group , Xi : Gi their interaction
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 26/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Interpretation of parameters
Intercept b0j
- Probability of answering item j correctly for reference group (Gi = 0)
respondent with zero total score (Xi = 0)
- P(Yij = 1|Xi = 0,Gi = 0) = eb0j
1+eb0j
Effect of total score b1j
- Gives log odds ratio for answering item j correctly comparing two respondents
from the same group differing by one point in total score
Effect of group membership b2j
- Gives log odds ratio for answering item j correctly comparing two respondents
from reference and focal group with the same total score
- b0 + b2 is an intercept (baseline probability) for focal group
Effect of interaction b3j
- Indicates how effect of total score differs for focal and reference group
- b1 + b3 is an effect of total score for focal groupPatrıcia Martinkova NMST570, L8: Differential Item Functioning 27/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
DIF effect size
Determined by Nagelkerke’s R2 (Nagelkerke et al., 1991)
• Proportional reduction in ”error variance”
• Different cut-off values:
R2
< 0.13 Cat. A = negligible effect∈ [0.13, 0.26) Cat. B = moderate effect (Zumbo & Thomas, 1997)≥ 0.26 Cat. C = large effect
R2
< 0.035 Cat. A = negligible effect∈ [0.035, 0.07) Cat. B = moderate effect (Jodoin & Gierl, 2001)≥ 0.07 Cat. C = large effect
Nagelkerke, N. J. (1991). A note on a general definition of the coefficient of
determination. Biometrika, 78(3), 691-692.Zumbo, B. D., & Thomas, D. R. (1997). A measure of effect size for a model-based
approach for studying DIF. Working paper.Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an
effect size measure with the logistic regression procedure for DIF detection. Applied
Measurement in Education, 14(4), 329-349.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 28/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Statistical significance
The statistical significance is determined by test of submodel
• Likelihood-ratio test
• Wald’s test
Testing:
Any DIF H0 : b2 = 0 & b3 = 0 vs. H1 : b2 6= 0 or b3 6= 0
Uniform DIF H0 : b2 = 0 | b3 = 0 vs. H1 : b2 6= 0 | b3 = 0
Non-uniform DIF H0 : b3 = 0 vs. H1 : b3 6= 0
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 29/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Reparametrization
P(Yij = 1|Zi ,Gi ) =e(aj+ajDIFGi )(Zi−bj−bjDIFGi )
1 + e(aj+ajDIFGi )(Zi−bj−bjDIFGi )=
eajGi (Zi−bjGi )
1 + eajGi (Zi−bjGi )
• Zi standardized total score
• ajGi discrimination of item j for group Gi
• bjGi difficulty of item j for group Gi
Interpretation:
bj0 difficulty of item j for reference group
bj1 = bj0 + bjDIF difficulty of item j for focal group
aj0 discrimination of item j for reference group
aj1 = aj0 + ajDIF discrimination of item j for focal group
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 30/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Logistic regression - pros and cons
Pros:
+ Detects DIF in medium-size samples
+ Detects both uniform and non-uniform DIF
Cons:
− Does not account for possible guessing
− Does not account for possible inattention
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 31/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Generalized logistic regression for DIF detection
= Extension of logistic regression model for DIF detection accounting for
guessing and inattention (Drabinova & Martinkova, 2017)
In more detail:
P(Yij = 1|Xi ,Gi ) = cjGi + (djGi − cjGi )eajGi (Xi−bjGi )
1 + eajGi (Xi−bjGi )
Drabinova, A., & Martinkova, P. (2017). Detection of Differential Item Functioning with
Nonlinear Regression: A Non-IRT Approach Accounting for Guessing. Journal of Educational
Measurement, 54(4), 498-517.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 32/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Generalized logistic regression for DIF detection
• Also called 4PL non-IRT model
• Offers wide range of models which can be obtained by
• fixing parameters to selected value
(e.g. d = 1 to get 3PL model)
• fixing parameters between groups
(e.g. common guessing and inattention)
Drabinova, A., & Martinkova, P. (2017). Detection of Differential Item Functioning with
Nonlinear Regression: A Non-IRT Approach Accounting for Guessing. Journal of Educational
Measurement, 54(4), 498-517.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 33/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
R software
Logistic regression for DIF detection
• R package difR (Magis, Beland, Tuerlinckx, & De Boeck, 2010)
• difLogistic() function
Generalized logistic regression
• R package difNLR (Hladka & Martinkova, 2019)
• difNLR() function
ShinyItemAnalysis offers both methods.
Hladka A. & Martinkova P. (2019). difNLR: DIF and DDF detection by non-linear
regression models. R package version 1.3.0.Magis, D., Beland, S., Tuerlinckx, F. & De Boeck, P. (2010). A general framework and an
R package for the detection of dichotomous differential item functioning. Behavior Research
Methods, 42(3), 847-862.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 34/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
IRT-based methods
Methods based on IRT models:
• Lord’s (Wald’s) test: Difference between parameters
• Raju’s test: Area between the curves (difference or absolute difference)
• Likelihood ratio test
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 35/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Lord’s (Wald) test
DIF detection based on testing difference in parameters for reference and focal
group (Lord, 1980)
(biR − biF )2
var (biR) + var (biF )
D−→ χ21
Lord, F. M. (1980). Application of item response theory to practical testing problems.
Hillsdale Erlbaum Associates, Inc.Patrıcia Martinkova NMST570, L8: Differential Item Functioning 36/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Raju’s test
Method based on area between two characteristic curves
(Raju, 1988, 1990)
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika,
53(4), 495-502.Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas
between two item response functions. Applied Psychological Measurement, 14(2), 197-207.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 37/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Raju’s test - unsigned area between curves
Unsigned area (UA) between 2 characteristic curves:
UA = P(Y = 1|θ,G = R)− P(Y = 1|θ,G = F)
=
|bR − bF | 1PL∣∣∣ 2(aR−aF )
aRaFlog(
1 + exp(
aRaF (bR−bF )aR−aF
))− (bR − bF )
∣∣∣ 2PL
(1− c)∣∣∣ 2(aR−aF )
aRaFlog(
1 + exp(
aRaF (bR−bF )aR−aF
))− (bR − bF )
∣∣∣ 3PL
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 38/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Likelihood ratio test
DIF detection based on likelihood ratio test of submodel
(Thissen, Steinberg, & Wainer, 1988, 1993)
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study
of group difference in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp.
147-169). Lawrence Erlbaum Associates, Inc.Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning
using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.),
Differential item functioning (pp. 67-113). Lawrence Erlbaum Associates, Inc.
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 39/41
Review Introduction to DIF DIF detection in binary items Conclusion
Delta-Plot Mantel-Haenszel Logistic regression Generalized logistic regression IRT-based methods
Pros and cons
Pros:
+ Applicable for 1PL-4PL IRT models
+ More precise estimate of latent trait
Cons:
− Computationally demanding
− Needs large sample size
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 40/41
Review Introduction to DIF DIF detection in binary items Conclusion
Conclusion
DIF/DDF analysis should be used routinely in test development
• to check for fairness with respect to groups
• to inform teaching
DIF detection methods in binary items
• Delta-Plot
• Mantel-Haenszel test
• Logistic regression
• Generalized logistic regression
• IRT/based methods: Lord’s (Wald) test, Raju’s test, LRT
Patrıcia Martinkova NMST570, L8: Differential Item Functioning 41/41
References [1]
Amtmann, D., Cook, K. F., Jensen, M. P., Chen, W.-H., Choi, S., Revicki, D., . . . others
(2010). Development of a promis item bank to measure pain interference. Pain,
150(1), 173–182.
Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude.
Journal of Educational Measurement, 10(2), 95–105.
Cramp, A., & McDougall, J. (2018). Doing theory on education: Using popular culture to
explore key debates. Routledge.
Drabinova, A., & Martinkova, P. (2017). Detection of differential item functioning with
nonlinear regression: A non-IRT approach accounting for guessing. Journal of
Educational Measurement, 54(4), 498–517.
Hladka, A., & Martinkova, P. (2019). difNLR: DIF and DDF detection by non-linear
regression models. [Computer software manual]. Retrieved from
https://CRAN.R-project.org/package=difNLR (R package version 1.3.0)
Holland, P. W., & Thayer, D. T. (1985). An alternate definition of the ets delta scale of
item difficulty. ETS Research Report Series, 1985(2), i–10.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect
size measure with the logistic regression procedure for DIF detection. Applied
Measurement in Education, 14(4), 329–349.
Lord, F. M. (1980). Application of item response theory to practical testing problems.
Hillsdale Erlbaum Associates, Inc.
References [2]
Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an
R package for the detection of dichotomous differential item functioning. Behavior
Research Methods, 42(3), 847–862.
Magis, D., & Facon, B. (2014). deltaPlotR: An R package for differential item functioning
analysis with Angoff’s Delta plot. Journal of Statistical Software, 59(1), 1–19.
Martiniello, M., & Wolf, M. (2012). Exploring ELLs’ understanding of word problems in
mathematics assessments: The role of text complexity and student background
knowledge. In S. Celedon-Pattichis & N. Ramirez (Eds.), Beyond good teaching:
Strategies that are imperative for English language learners in the mathematics
classroom. Reston, VA: NCTM.
Martinkova, P., Drabinova, A., Liaw, Y.-L., Sanders, E. A., McFarland, J. L., & Price, R. M.
(2017). Checking equity: Why differential item functioning analysis should be a
routine part of developing conceptual assessments. CBE—Life Sciences Education,
16(2), rm2.
Martinkova, P., Hladka, A., Leupen, S., Stepanek, L., & Kralıckova, M. (2019). Towards
better admission tests: Routinizing detailed validation of entrance exams in medical
education.
(Submitted)
Nagelkerke, N. J., et al. (1991). A note on a general definition of the coefficient of
determination. Biometrika, 78(3), 691–692.
References [3]
Pilkonis, P. A., Choi, S. W., Reise, S. P., Stover, A. M., Riley, W. T., Cella, D., & Group,
P. C. (2011). Item banks for measuring emotional distress from the patient-reported
outcomes measurement information system (PROMIS R©): depression, anxiety, and
anger. Assessment, 18(3), 263–283.
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4),
495–502.
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas
between two item response functions. Applied Psychological Measurement, 14(2),
197–207.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using
logistic regression procedures. Journal of Educational measurement, 27(4), 361–370.
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of
group difference in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity
(p. 147-169). Lawrence Erlbaum Associates, Inc.
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning
using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.),
Differential item functioning (p. 67-113). Lawrence Erlbaum Associates, Inc.
Zumbo, B., & Thomas, D. (1997). A measure of effect size for a model-based approach for
studying DIF.
(Working paper)