1 9/23/2015 Patient Reported Outcomes Measurement Information System (PROMIS) Patient Reported...
-
Upload
arron-simmons -
Category
Documents
-
view
214 -
download
0
Transcript of 1 9/23/2015 Patient Reported Outcomes Measurement Information System (PROMIS) Patient Reported...
1 04/19/23
Patient Reported Outcomes Patient Reported Outcomes Measurement Measurement
Information System (PROMIS)Information System (PROMIS)National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) U01 grant
Ron D. Hays, Ph.D., UCLA Ron D. Hays, Ph.D., UCLA
September 29, 2006 September 29, 2006
GIM/HSR Research Seminar SeriesGIM/HSR Research Seminar Series
12:02pm-12:59 pm12:02pm-12:59 pm
2 04/19/23
NIH Director Elias A. Zerhouni, MD
“There is a pressing need to better quantify clinically important symptoms and outcomes that are now difficult to measure. Clinical outcome measures, such as x-rays and lab tests, have minimally immediate relevance to the day-to-day functioning of patients with chronic diseases such as arthritis, multiple sclerosis, and asthma, as well as chronic pain conditions.
Often, the best way patients can judge the effectiveness of treatments is by perceived changes in symptoms. One main goal of the PROMIS initiative is to develop a set of publicly available computerized adaptive tests for the clinical research community.”
3 04/19/23
NIH RM04-011: Dynamic Assessment of Patient-Reported Chronic Disease Outcomes
Translation arm of re-engineering clinical research enterprise
Chronic diseases and their treatment affect “quality of life”
Improved assessment of “subjective” clinical outcomes
- Self-reported symptoms and other health-related quality of life domains
4 04/19/23
ADL – Activities of Daily LivingIADL – Instrumental Activities of Daily LivingG – Global Item
Satisfaction
Satisfaction
Satisfaction
Satisfaction
Performance
G Symptoms
G Activities: Instrumental Activities of Daily Living
[IADL] (e.g. errands)
Other
Social Support
G Anxiety
G Anger/Aggression
G Depression
G Fatigue
G Substance Abuse
Positive Psychological Functioning
Negative Impacts of Illness
Subjective Well-Being (positive affect)
Positive Impacts of Illness
Meaning and Coherence (spirituality)
G Emotional Distress
Mastery and Control (self-efficacy)
Cognitive Function
G Central: neck and back (twisting, bending, etc)
G Lower Extremities: walking, arising, etc [mobility]
G Upper Extremities: grip, buttons, etc [dexterity]
G Function/Disability
G Physical Health
G Mental Health
G Social Health
Satisfaction
G Health
G Pain
Patient-Reported Outcomes (PROs)Preliminary PROMIS Domains (light shade)
G Role Participation
1108054
5 04/19/23
Initial PROMIS Domains
Physical functioning (4)
Pain (3)
Fatigue (2)
Social/role participation (2)
Emotional distress
Anxiety
Depression
Anger
Alcohol abuse
6 04/19/23
Objectives
Develop and test a large bank of items measuring health-related quality of life
Create a publicly available, adaptable and sustainable system allowing clinical researchers access to a
* common item repository
* computer adaptive testing (CAT) platform
for efficient assessment across a range of chronic diseases
7 04/19/23
What would youlike to measure in your study?
Domain definitions
item content review
item characteristics
Click below for:
or select a box on the right and proceed
http://www.nihpromis.org
pain
emotional distress
physicalfunction
fatigue
social roleparticipation
8 04/19/23
Collaborative Agreement
Steering Committee
6 Primary Research Sites
Statistical Coordinating Center
Scientific Advisory Board
9 04/19/23
PROMIS Network
UNC –Chapel Hill ● UNC –Chapel Hill ●
● Duke University● Duke University
● Stanford University● Stanford University
Stony Brook University
●
Stony Brook University
●● University of Pittsburgh● University of Pittsburgh
● University of Washington● University of Washington
CORE ♥CORE ♥ ● NIHNIH● NIHNIH
10 04/19/23
Primary Research Sites
Duke (Kevin Weinfurt, evaluation committee, participation and data quality committee, use in clinical trials, cancer supplement)
Stanford (Jim Fries, physical function, domain hierarchy)
Stony Brook (Arthur Stone, fatigue, ecological momentary assessment)
UNC (Darren DeWalt, social/role participation, pediatrics, literacy)
University of Pittsburgh (Paul Pilkonis, emotional distress, sleep)
University of Washington (Dagmar Amtmann, pain, universal access)
11 04/19/23
Statistical Coordinating Center
Northwestern (David Cella and cast)
UCLA (Hays, Liu, Reise, Spritzer, Morales)
Other consultants (e.g., Dennis Revicki)
Westat
12 04/19/23
What PROMIS Promises
P recisionP recisionR epositoryR epository
Outcome toolsOutcome tools
M ethodologiesM ethodologies
I nterpretabilityI nterpretability
S oftwareS oftware
13 04/19/23
Precision
Fewer items needed for equal precision
Making assessment briefer
Error is understood at the individual level
Enabling individual assessment
14 04/19/23
Item Bank(IRT-calibrated items)
ItemResponseTheory(IRT) 0.0
0.5
1.0
1.5
2.0
2.5
-3 -2 -1 0 1 2 3
Theta
Info
rma
tio
n
0.0
0.2
0.4
0.6
0.8
1.0
-3 -2 -1 0 1 2 3
Theta
Pro
ba
bil
ity o
f R
esp
on
se
Short FormInstruments
CAT
Items fromInstrument
A
Item Pool
Items fromInstrument
B
Items fromInstrument
C
NewItems
SecondaryData Analysis
CognitiveTesting
FocusGroups
Content ExpertReview
Questionnaireadministered to largerepresentative sample
15 04/19/23
0%10%20%30%40%50%60%70%80%90%
100%
-4
-3.5 -3
-2.5 -2
-1.5 -1
-0.5 0
0.5 1
1.5 2
2.5 3
3.5 4
Level of Physical Functioning
Pro
ba
bil
ity
of
"Ye
s"
Re
sp
on
se
Item 1 (Difficulty = -1) Item 3 (Difficulty = 1)
Item Characteristic Curves (able to climb flight of stairs versus run a mile)
16 04/19/23
Item Characteristic Curves(2-Parameter Model)
17 04/19/23
Item Banks
no pain
mild pain
moderate pain
severe pain extreme pain
Pain Item Bank
Item1
Item2
Item3
Item4
Item5
Item6
Item7
Item8
Item9
Itemn
An item bank is comprised of a large collection of items measuring a single domain (e.g., pain).
18 04/19/23
Fatigue Measure and Standard Error Comparision by Test Length
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
-4 -3 -2 -1 0 1 2 3 4
Fatigue Measure
Sta
nd
ard
Err
or
5 Item CAT 10 Item CAT 72 Item Bank 6 Item SF 13 Item Scale
Fatigue Measure and Standard Fatigue Measure and Standard Error Comparison by Test Error Comparison by Test
LengthLength
19 04/19/23
Item Bank(IRT-calibrated emotional distress items)
-3 -2 -1 0 1 2 3
Severe high moderate low very low
Emotional Distress
0.0
0.2
0.4
0.6
0.8
1.0
-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00
How often did you feel nervous?
All of the time
Most of the time
Little of the time
Some of the time
None of the time
20 04/19/23
Item Bank(IRT-calibrated emotional distress items)
-3 -2 -1 0 1 2 3
Severe high moderate low very low
Emotional Distress
0.0
0.2
0.4
0.6
0.8
1.0
-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00
How often did you feel nervous?
Some of the time
21 04/19/23
Item Bank(IRT-calibrated emotional distress items)
-3 -2 -1 0 1 2 3
Severe high moderate low very low
Emotional Distress
0.0
0.2
0.4
0.6
0.8
1.0
-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00
How often did you feel nervous?
Some of the time
22 04/19/23
Item Bank(IRT-calibrated emotional distress items)
-3 -2 -1 0 1 2 3
Severe high moderate low very low
Emotional Distress
0.0
0.2
0.4
0.6
0.8
1.0
-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00
How often did you feel hopeless?
All of the time
Most of the time
Little of the time
Some of the time
None of the time
23 04/19/23
Item Bank(IRT-calibrated emotional distress items)
-3 -2 -1 0 1 2 3
Severe high moderate low very low
Emotional Distress
0.0
0.2
0.4
0.6
0.8
1.0
-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00
How often did you feel hopeless?
Some of the time
24 04/19/23
Item Bank(IRT-calibrated emotional distress items)
-3 -2 -1 0 1 2 3
Severe high moderate low very low
Emotional Distress
0.0
0.2
0.4
0.6
0.8
1.0
-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00
How often did you feel worthless?
All of the time
Most of the time
Little of the time
Some of the time
None of the time
25 04/19/23
Item Bank(IRT-calibrated emotional distress items)
-3 -2 -1 0 1 2 3
Severe high moderate low very low
Emotional Distress
0.0
0.2
0.4
0.6
0.8
1.0
-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00
How often did you feel worthless?
Little of the time
26 04/19/23
Item Bank(IRT-calibrated emotional distress items)
-3 -2 -1 0 1 2 3
Severe high moderate low very low
0.0
0.2
0.4
0.6
0.8
1.0
-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00
How often did you feel worthless?
Little of the time
Target in onemotional
distress score
27 04/19/23
Repository
28 04/19/23
Repository: PROMIS Item Library
Literature searches and investigator contributions to the PROMIS domains
Relational database of more than 7,000 items
Catalog characteristics of items including
Context
Stem
Response options
Time frame
Instrument of origin (if applicable) Intellectual property status Track modifications to items
29 04/19/23
Outcome Tools
30 04/19/23
Item LibraryItem LibraryItem LibraryItem Library
CATCATCATCAT Build-a-PROBuild-a-PROBuild-a-PROBuild-a-PROPick-a-PROPick-a-PROPick-a-PROPick-a-PRO
31 04/19/23
Methodologies
32 04/19/23
Methodologies
Qualitative Item Review
Expert item review of 6,871 items
26 focus groups; over 120 patients interviewed; over 700 surveys
Analysis Plan
Classical test theory and IRT analyses
Sampling Plan
11,500 people; 951 items; minimum n = 500/item
33 04/19/23
Interpretation
34 04/19/23
LowLowLowLow HighHighHighHigh
PersonPerson Fatigue Score Fatigue ScorePersonPerson Fatigue Score Fatigue Score
Interpretation
Q Q QQ Q Q Q Q QQ Q Q Q Q QQ Q Q Q Q QQ Q Q Q Q Q Q Q Q LikelyLikely““I get tiredI get tiredwhen I runwhen I runa marathon”a marathon”
LikelyLikely““I get tiredI get tiredwhen I runwhen I runa marathon”a marathon”
UnlikelyUnlikely““I get tiredI get tiredwhen I getwhen I getout of a out of a chair” chair”
UnlikelyUnlikely““I get tiredI get tiredwhen I getwhen I getout of a out of a chair” chair” ItemItem Location LocationItemItem Location Location
Q Q QQ Q QQ Q QQ Q Q Q Q QQ Q Q
Q Q QQ Q Q Q Q QQ Q Q Q Q QQ Q QQ Q QQ Q Q
Q Q QQ Q QQ Q QQ Q QQ Q QQ Q Q
Q Q Q Q
Q Q Q Q
Q Q Q Q
35 04/19/23
LowLowLowLow HighHighHighHigh
PRO Bank PRO Bank PersonPerson Score ScorePRO Bank PRO Bank PersonPerson Score Score
Interpretation Aids
30 40 50 60 70
M = 50, SD = 10
T = (z * 10) + 50
36 04/19/23
LowLowLowLow HighHighHighHigh
Example of high fatigue
Fatigue Score=60Fatigue Score=60Fatigue Score=60Fatigue Score=60
30 40 50 60 70
This patient’s fatigue score is 60, significantly worse than average (50). People who score 60 on fatigue tend to answer questions as follows:
…”I have been too tired to climb one flight of stairs: VERY MUCH
…”I have had enough energy to go out with my family: A LITTLE BIT
Click here if you would like to see this patient’s individual answers
37 04/19/23
LowLowLowLow HighHighHighHigh
Example of low fatigue
Fatigue Score=40Fatigue Score=40Fatigue Score=40Fatigue Score=40
30 40 50 60 70
This patient’s fatigue score is 40, significantly better than average (50). People who
score 40 on fatigue tend to answer questions as follows:
…”I have been too tired to climb one flight of stairs: SOMEWHAT…”I have had enough energy to go out with my family: VERY MUCH
Click here if you would like to see this patient’s individual answers
38 04/19/23
Software
39 04/19/23
Software
Survey Construction Report Module
Clinical/PRO Data
Study Protocols
Coming 2007-2008 to www.nihpromis.org
Item Banks
Patients Clinicians
Item Library
Laptop PhonePDAWeb
Survey Module
Web Laptop PDA IVR
CAT CAT Pick-a-PROPick-a-PROBuild-a-PROBuild-a-PRO
Domains and itemDomains and item banks banks
40 04/19/23
Web-based administration: Emotional distress (Anger) item
41 04/19/23
Four possible administration formats
Automatic advance, not allowed to go back
Automatic advance, allowed to go back
Click to continue after response, not allowed to go back
Click to continue, allowed to go back
42 04/19/23
Evaluated administration format
806 participants in Polimetrix PollingPlace registry
56 items measuring the performance of social/role activities
Items rated on a 5-point frequency scale ranging from ”never” to “always”
56 items measuring satisfaction with social/role activities
Items rated on a 5-point extent scale ranging from “not at all” to “very much”
43 04/19/23
Analysis Plan
Examined differences in:
Time spent
Mean domain scores
Variance in scores
Reliability
44 04/19/23
Administration format conclusions
Use automatic advance rather than continue button
Use back button to guard against accidental key entry
Response time cost was minimal
No effect on missing data or scores
45 04/19/23
PROMIS Sampling
Administer a large number of items to a range of population subgroups (general population and clinical) to permit the estimation of item parameters for item banks in five health-related quality of life domains.
46 04/19/23
Number of items
1013 items
784 items: 14 item banks (56 items per bank) in 5 domains
167 legacy items
62 demographic Items
146-202 items administered to any one respondent
n = 11,500 overall (500 observations per item minimum)
47 04/19/23
Target Polimetrix General Population Demographics
GenderMale 50%
Female 50%
Age
18-29 = 20%
30-44 = 20%
45-59 = 20%
60-74 = 20%
75+ = 20%
Ethnicity (match the general population)
Black = 12.3%
Hispanic or Latino = 12.5%
Education Min 25% High school graduate or less
48 04/19/23
SamplesN
General population (Polimetrix) 7,500
Cancer (Duke, Polimetrix) 1000
Heart Disease (Duke, Polimetrix) 500
Rheumatoid arthritis (Stanford) 500
Osteoarthritis (Stanford, Polimetrix) 500
Psychiatric (Pittsburgh, Polimetrix) 500
Spinal Cord Injury (Polimetrix) 500
Chronic Obstructive Pulmonary Disease (Polimetrix)
500
TOTAL 11,500
49 04/19/23
Domain Sub-Domain Items Form Sample N
Emotional Distress
Anxiety 56A Gen. Pop. 1 500
Depression 56
Anger/Aggression 56B Gen. Pop. 2 500
Alcohol Abuse 56
Physical Function
Part I 56C Gen. Pop. 3 500
Part II 56
Part III 56 G Gen. Pop. 7 500
Fatigue Impact 56
D Gen. Pop. 4 500Experience 56
Social Role Impact 56
E Gen. Pop. 5 500Experience 56
Pain
Interference 56F Gen. Pop. 6 500
Quality 56
Behavior 56 G Gen. Pop. 7 (above)
Full Bank Administration
50 04/19/23
Item Number → 1 1 1 1 1 1 5 5 5
Sample Size ↓ 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 4 5 6
General Pop. 8 250 H H H H H H H
General Pop. 9 250 I I I I I I I
General Pop. 10 250 J J J J J J J
Heart Disease250 H H H H H H H
250 J J J J J J J
Cancer 250 I I I I I I I
250 K K K K K
Block Administration
51 04/19/23
Item Number → 1 1 1 1 1 1 5 5 5
Sample Size ↓ 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 4 5 6
General Pop. 8 250 H H H H H H H
General Pop. 9 250 I I I I I I I
General Pop. 10 250 J J J J J J J
Heart Disease250 H H H H H H H
250 J J J J J J J
Cancer 250 I I I I I I I
250 K K K K K
Between Banks Block Administration Strategy
Item 5 is administered in Block Forms H and I
to General Population Sample 8
as well as Heart Disease
and Cancer Samples
and General Population Sample 9
52 04/19/23
Types of AnalysesTypes of Analyses
• Classical Test Theory StatisticsClassical Test Theory Statistics
• IRT Model AssumptionsIRT Model Assumptions
• Model FitModel Fit
• Differential Item FunctioningDifferential Item Functioning
• Item CalibrationItem Calibration
53 04/19/23
Classical Test Theory StatisticsClassical Test Theory Statistics
• Out of rangeOut of range
• Item frequencies and distributionsItem frequencies and distributions
• Inter-item correlationsInter-item correlations
• Item-scale correlationsItem-scale correlations
• Internal consistency reliabilityInternal consistency reliability
54 04/19/23
IRT Model AssumptionsIRT Model Assumptions
• (Uni)dimensionality(Uni)dimensionality
• Local independenceLocal independence
• MonotonicityMonotonicity
55 04/19/23
Sufficient UnidimensionalitySufficient Unidimensionality
• Confirmatory factor modelsConfirmatory factor models
One factorOne factor
Bifactor (general and group factors)Bifactor (general and group factors)
56 04/19/23
Local IndependenceLocal Independence
• After controlling for dominant factor(s), item After controlling for dominant factor(s), item pairs should not be associated.pairs should not be associated.
Look at residual correlations (> 0.20)Look at residual correlations (> 0.20)
57 04/19/23
MonotonicityMonotonicity
• Probability of selecting a response category Probability of selecting a response category indicative of better health should increase as indicative of better health should increase as underlying health increases.underlying health increases.
• Item response function graphs withItem response function graphs with
y-axis: proportion positive for item stepy-axis: proportion positive for item step
x-axis: raw scale score minus item scorex-axis: raw scale score minus item score
58 04/19/23
59 04/19/23
Category Response Curves for Category Response Curves for Samejima’s Graded Response ModelSamejima’s Graded Response Model
0.0
0.2
0.4
0.6
0.8
1.0
-3 -2 -1 0 1 2 3
Theta (q)
P(X
=k|q
)
P (X = 3|q)P (X = 1|q)
P (X = 2|q)
P (X = 4|q)
60 04/19/23
Model FitModel Fit
• Compare observed and expected response Compare observed and expected response frequencies by item and response categoryfrequencies by item and response category
• Items that do not fit and less discriminating Items that do not fit and less discriminating items identified and reviewed by content items identified and reviewed by content expertsexperts
61 04/19/23
Differential Item FunctioningDifferential Item Functioning
• Uniform DIF Uniform DIF
Threshold parameterThreshold parameter
• Non-uniform DIF Non-uniform DIF
Discrimination parameter Discrimination parameter
• Gender, race/ethnicity, age, education, Gender, race/ethnicity, age, education, diseasedisease
62 04/19/23
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4
Trait level
Pro
ba
bili
ty o
f "Y
es
" R
es
po
ns
e
DIF – Location(Item 1)
DIF – Slope(Item 2)
Hispanics
Whites
Whites
Hispanics
Dichotomous Items Showing DIF(2-Parameter Model)
63 04/19/23
Item CalibrationItem Calibration
• Item parameters (threshold, discrimination)Item parameters (threshold, discrimination)
• Mean differences for studied disease groupsMean differences for studied disease groups
64 04/19/23
Questions?
Public website: http://www.nihpromis.org/
Peer-reviewed manuscripts, e.g.:
Hays, R. D. et al. (in press). Item response theory analyses of physical functioning items in the Medical Outcomes Study. Medical Care.
Reeve, B. B., et al. (submitted). Psychometric evaluation and calibration of health-related quality of life items banks: Plans for the Patient-Reported Outcome Measurement Information System (PROMIS)
65 04/19/23
Acknowledgements
“Slides” in this presentation were lifted or adapted from PROMIS presentations by David Cella, Richard Gershon, Bryce Reeve, and Jim Fries.
66 04/19/23
Appendices
Pre-Application Meeting for the RFA-RM-04-011: Dynamic Assessment of Patient-Reported Chronic Disease OutcomesMonday, January 26, 2004Deborah N. Ader, Ph.D. and Lawrence J. Fine, M.D., Dr.PHTotal Running Time: 02:40:08
http://videocast.nih.gov/PastEvents.asp?c=4&s=151