1 Locating and Assessing the Usefulness of Health Measures for Health Disparities Research Anita L....
-
Upload
imogene-alexander -
Category
Documents
-
view
215 -
download
0
Transcript of 1 Locating and Assessing the Usefulness of Health Measures for Health Disparities Research Anita L....
1
Locating and Assessing the Usefulness of Health Measures for Health
Disparities Research
Anita L. Stewart, Ph.D.University of California, San Francisco
Clinical Research with Diverse CommunitiesEPI 222, SpringApril 26, 2005
2
Outline
Locating measures Basic psychometric properties Rationale for multi-item measures Additional measurement considerations in
health disparities research Steps in selecting measures for your study
3
Outline
Locating measures Basic psychometric properties Rationale for multi-item measures Additional measurement considerations in
health disparities research Steps in selecting measures for your study
4
Need Measures with Good Psychometric Properties
Measure assesses concept of interest Low levels of missing data Good variability Evidence of reliability Evidence of validity Responsive to change (for interventions)
5
Inappropriate Measures can Result in:
Conceptual inadequacy – Measuring wrong concept for your study
Poor data quality (e.g. missing data) Poor variability Poor reliability and validity Inability to detect true associations among
variables– e.g., no measured change in outcome when
change occurred
6
Good Variability
All (or nearly all) scale levels are represented
Distribution approximates bell-shaped normal
7
Indicators of Variability
Range of scores (possible, observed) Mean, median, mode Standard deviation (standard error) Skewness, kurtosis % at floor (lowest score) % at ceiling (highest score) Inter-quartile range
8
Reliability
Extent to which an observed score is free of random error
Population-specific; reliability increases with:– sample size– variability in scores (dispersion)– a person’s level on the scale
9
Reliability Coefficient
Typically ranges from .00 - 1.00 Higher scores indicate better reliability Types of reliability tests
– Internal-consistency– Test-retest– Inter-rater– Intra-rater
10
Internal Consistency Reliability: Cronbach’s Alpha Requires multiple items supposedly measuring
same construct to calculate Extent to which all items measure the same
construct (same latent variable) Internal consistency reliability is a function of:
– Number of items– Average correlation among items– Variability of items in your sample
11
Minimum Standardsfor Internal Consistency Reliability For group comparisons (e.g., regression,
correlational analyses)– .70 or above is minimum (Nunnally, 1978)– .80 is optimal– above .90 is unnecessary
For individual assessment (e.g., treatment decisions)– .90 or above (.95) is preferred (Nunnally, 1978)
12
Reliable Scale?
NO! There is no such thing as a “reliable” scale We only have accumulated “evidence” of
reliability in a variety of populations in which it has been tested
13
Validity
Does a measure (or instrument) measure what it is supposed to measure?
And…Does a measure NOT measure what it is NOT supposed to measure?
14
Validation of Measures is an Iterative, Lengthy Process
Validity is not a property of the measure– validity is a property of a measure for
particular purpose and sample– validation studies for one purpose and
sample may not serve another purpose or sample
Accumulation of evidence:– Different samples– Longitudinal designs
15
Three Major Forms of Measurement Validity
Content Criterion Construct
16
Construct Validity Basics
A process of answering the following questions:
What is the hypothesis? What are the results? Do the results support (confirm) the
hypothesis?
17
Construct Validity: NOTE
Sometimes the hypothesis is that the measure will NOT be correlated with certain other measures, or will be less correlated with some than with others
THUS, observing a low or non-significant correlation can confirm construct validity
18
Outline
Locating measures Basic psychometric properties Rationale for multi-item measures Additional measurement considerations in
health disparities research Steps in selecting measures for your study
19
Single- and Multi-Item Measures
A single-item measure consists of only one item
Response choices are interpretable Example: How would you rate your health?
1 - Excellent 2 - Very good 3 - Good 4 - Fair 5 - Poor
20
Multi-Item Measures or Scales
Multi-item measures are created by combining two or more items into an overall measure or scale score
Summated score, scale score– A score in which multiple items are
“summed” or combined
21
Example of a 2-item Measure or Scale
How much of the time .... tired?
1 - All of the time
2 - Most of the time
3 - Some of the time
4 - A little of the time
5 - None of the time
How much of the time…. full of energy?
1 - All of the time
2 - Most of the time
3 - Some of the time
4 - A little of the time
5 - None of the time
22
Step 1: Reverse One Item So They Are All in the Same Direction
How much of the time .... tired?
1 - All of the time
2 - Most of the time
3 - Some of the time
4 - A little of the time
5 - None of the time
How much of the time…. full of energy?
1=5 All of the time
2=4 Most of the time
3=3 Some of the time
4=2 A little of the time
5=1 None of the time
Reverse “energy” item so high score = more energy
23
Step 1: Reverse One Item So They Are All in the Same Direction
How much of the time .... tired?
1 - All of the time
2 - Most of the time
3 - Some of the time
4 - A little of the time
5 - None of the time
How much of the time…. full of energy?
1=5 All of the time
2=4 Most of the time
3=3 Some of the time
4=2 A little of the time
5=1 None of the time
Reverse “energy” item so high score = more energy
24
Step 2: Sum the Two Items
How much of the time .... tired?
1 - All of the time
2 - Most of the time
3 - Some of the time
4 - A little of the time
5 - None of the time
How much of the time…. full of energy?
1=5 All of the time
2=4 Most of the time
3=3 Some of the time
4=2 A little of the time
5=1 None of the time
Highest score= 10 (tired none of the time, full of energy allof the time)
25
Step 2: Sum the Two Items
How much of the time .... tired?
1 - All of the time
2 - Most of the time
3 - Some of the time
4 - A little of the time
5 - None of the time
How much of the time…. full of energy?
1=5 All of the time
2=4 Most of the time
3=3 Some of the time
4=2 A little of the time
5=1 None of the time
Lowest score= 2 (tired all of the time, full of energy noneof the time)
26
Advantages of Multi-item measures
More scale values (enhances sensitivity)– Moved from 2 items with 1-5 levels to 1 scale with
9 levels (2 – 10) Improves score distribution (more normal) Reduces number of variables needed to
measure one concept Improves reliability (reduces random error) Can estimate a score if some items missings
27
Outline
Locating measures Basic psychometric properties Rationale for multi-item measures Additional measurement considerations in
health disparities research Steps in selecting measures for your study
28
Additional Measurement Issues: Health Disparities Research
Measurement adequacy and equivalence in diverse groups
29
Group Comparisons Are Even More Problematic
Health disparities studies involve comparing mean levels of health
Requires conceptual equivalence Also, if psychometric properties are not
comparable across groups…– potential true differences may be obscured
– observed group differences may be inaccurate
30
Why Not Use Culture-Specific Measures? Measurement goal is to identify measures
that can be used across all groups, yet maintain sensitivity to diversity and have minimal bias
Most health disparities studies require comparing mean scores across diverse groups– need comparable measures
31
Issues Concerning Group Comparisons Disparities in observed scores can be due
to – culturally- or group-mediated differences
in true score (true differences) -- OR -- – bias - systematic differences between
group observed scores not attributable to true scores
32
Bias - A Special Concern
Measurement bias may make group comparisons invalid
Bias can be due to group differences in:
– the meaning of concepts or items
– the extent to which measures represent concepts
– cognitive processes of responding
– appropriateness of methods
33
Psychometric Adequacy in One Group
Conceptual
Psychometric
Adequacyin 1 Group
EquivalenceAcross Groups
Concept equivalentacross groups
Psychometric propertiesmeet minimal standards
within one group
Psychometric propertiesinvariant (equivalent)
across groups
Concept meaningfulwithin one group
34
Psychometric Adequacy in a Diverse Group
Psychometric properties meet minimal standards– Adequate reliability/reproducibility– Confirmation of theoretically-based factor
structure – Construct validity evidence– Responsiveness to change evidence
35
Psychometric Adequacy in a Diverse Group (cont.)
Measures have similar measurement properties in a diverse group as in original mainstream groups on which the measures were developed, i.e., similar– reliability– factor structure– construct validity– responsiveness to change
36
Psychometric Equivalence
Conceptual
Psychometric
Adequacyin 1 Group
EquivalenceAcross Groups
Concept equivalentacross groups
Psychometric propertiesmeet minimal standards
within one group
Psychometric propertiesinvariant (equivalent)
across groups
Concept meaningfulwithin one group
37
Equivalence of Factor Structure: Psychometric Invariance
Psychometric invariance (equivalence) Important properties of theoretically-based
factor structure of measurement model do not vary across groups
38
Methods for Assessing Equivalence of Factor Structure
Exploratory factor analysis– Two or more groups– Subjective comparison of factor structure
Confirmatory factor analysis – Two or more groups– Test for equivalence of factor structure
» test fit of theoretical model to data
39
Outline
Locating measures Basic psychometric properties Rationale for multi-item measures Additional measurement considerations in
health disparities research Steps in selecting measures for your study
40
The Problem
You are beginning a study You know the concepts (variables) of interest Question:
Which measure of ________ should I use?» A popular measure» One that a colleague used successfully» Create your own
41
Basic Steps in Selecting Appropriate Measures1. Specify context (research question, target group)2. Define concept for your study3. Review potential measures for:
a) conceptual match to your definition b) adequate psychometric properties in your target
group5. Pretest potential measures in your target group6. Choose best ones based on pretest results OR7. Adapt if necessary to address problems
42
1. Specify Context
A. Research question and how concept fits research
B. Nature of target population
C. Practical constraints
43
1A. Context: How Concept Fits Research Question
State problem or question being addressed Describe purpose of measure
– Evaluate intervention (outcome)
– Describe population
– Covariate
– Independent variable
44
Outcome Measures of Interventions: Entire Study Depends on These
Requires special attention to selecting the best measure that …– taps content areas that the intervention is likely to
change – has good variability at baseline, room to improve– has excellent reliability and validity– is appropriate and acceptable to target population– is sensitive to change
45
Main Dependent Variable of Non-intervention Studies
Pay special attention to selecting the best measure that …– taps full content of concept
– has good variability (variance to predict)
– evidence of reliability and validity
– is appropriate and acceptable to target population
46
1B. Context: Nature of Population
Describe known characteristics of your target population– Age (range, mean)
– Range of health states
»chronic conditions, frailty
– SES (e.g. educational level)
– % with literacy problems
– Racial/ethnic and language diversity
47
1C. Context: Practical Constraints
Time frame for completing study Personnel available
– Research assistants, interviewers Other costs
– Data entry, mailings, phone, coding Preferred method of administration Acceptable respondent burden
48
Step 2: Define Each Concept ForYour Study Define each concept from your perspective,
taking into account– Your study questions– Your target population
For outcome concepts:– Describe how the intervention or independent
variables might affect it– Describe specific types of changes you expect
49
Define Each Concept (cont.)
Include response dimension in definition(what is it about the concept you are interested in?)– Frequency
– Intensity
– Proportion of time
– Whether they have condition/symptom
50
Example: Defining Pain in Your Study
Context: clinical intervention to minimize stomach pain
Define exactly how you expect to reduce pain:– eliminate pain completely?– reduce severity of pain when it occurs?– reduce frequency of pain?– change quality of pain?
Concept you aim to improve varies across these
51
Step 3. Review Potential Measures
Identify candidate measures for all domains or concepts in your framework
For health outcomes:– Generic or condition-specific profiles of multiple
domains OR measures of single domains Redundancy OK for now Do NOT develop your own questions unless
it is absolutely necessary
52
Locating Specific Measures
Reference databases– Medline, Pubmed, Psychinfo, others
Compendia of measures– Books that compile and review various measures
Web is fast becoming the best resource– Specific measures– Web resources from measurement core
Identify researchers doing work in a field and contact them for their measures
53
Review Potential Measures for:
Conceptual appropriateness & relevance– in your study– in target group
Clear scoring rules Psychometric adequacy in target group(s) Practicality Acceptability
– To respondents and interviewers
54
Conceptual Relevance
Example: you are interested in reports of perceived discrimination in the health care setting
In reviewing measures of discrimination; most are about– Discrimination over the lifecourse– Discrimination in various life settings (work,
school) Not relevant for your purpose
55
Psychometric Adequacy for Your Study
In samples similar to yours:– good variability (e.g., no floor or ceiling effects)– low percent of missing data– good reliability– good validity
As an outcome for your planned intervention– responsiveness, sensitivity to change in similar
population– able to detect expected magnitude of change
56
Limited Data on Measurement Properties of Many Measures
Not easy to find this information Many studies do not report any
psychometric properties
– Assume the properties from original study carry over
57
Limited Data on Measurement Properties of Many Measures (cont.)
Especially in diverse populations:
– Few studies test measures across diverse groups
– Even when diverse groups are included in research»sample sizes usually too small to conduct
measurement studies by subgroups
58
Review Measures for Practicality
Method of administration appropriate for your study
Scoring rules clearly documented, or computer scoring algorithm available
Measure available at cost you can afford You are allowed to adapt it if necessary Costs of administration within study resources
59
Practical Considerations
Once you have decided on the measures, you must think about:
• Obtaining permission• Method of administration• Data collection• Scoring• Availability of translations if needed
60
Practical - Scoring
Know ahead of time how you plan to score the items– Count of “correct” answers?– Sum Likert items into a summated scale?
Are scoring instructions or computer scoring programs available?
Can scoring programs be purchased from developers?
Do you have a scoring codebook?
61
Review Measures for Availability of Translations if Needed
If you need the questionnaire in another language, are there translations available?– Official (published and tested)
– Unofficial (by some other researcher)
62
Translation Availability
Is the measure available in the language of your target populations?
Yes No
•Know the method of translation •Assess adequacy or quality of translation
•Perform double translation•Use bilingual, bicultural translators
63
Review Measures for Acceptability
Acceptability is the ease with which a measure can be used in your setting and population
Acceptability to target population – respondent burden (length, time needed), distress– burden for sickest, oldest, least educated– culturally sensitive
Acceptability to interviewers– interviewer burden– do they like administering the questionnaire?– amount of training needed
64
Respondent Burden
Diverse populations may have more difficulty with instruments, take longer to complete
Perceived burden– a function of item difficulty, distress due to
content, perceived value of survey, expectations of length
– is as important as actual burden
65
5. Choose Best Measures to Pretest in Your Target Population
Select best measures for all concepts in your conceptual framework– existing instrument in its entirety
– subscales of relevant domains (e.g., only those that meet your needs)
66
Pretest Pretesting essential for priority measures (e.g.,
outcomes) Pretest is to identify:
– problems with method of administration – unacceptable respondent burden– problems with questions or response choices
» Hard to understand, complex, vague
– words and phrases that do not mean what you intended to target population
67
Types of Pretests
General pretest, small (N=10) Cognitive interviewing
(N=5-10 each group) Large pretest (N=100)
– test measurement properties prior to major study
68
General Pretest (Small): Debriefing Pretest Goal
– Find out how well subjects do with the procedures
– Estimate time needed to complete instrument– Identify serious problems
Procedures– Subjects answer entire questionnaire– At end, debrief– Close to true task
69
Debriefing Questions After Administration of Survey..
Ask respondents: Were any questions confusing? Which words were hard to understand? Which questions were difficult to answer?
caused distress? Was questionnaire too long? Confusing instructions?
70
Problems with General Pretests
Respondents… often don’t understand the task. don’t want to appear as if they didn’t
understand have a hard time telling you anything was
wrong easier to say everything was fine
71
Pretest Several Measures of Same Concept?
If you are unsure about which of several measures will be appropriate for your study– pilot test all you are considering
– can use pilot test results to select best one Saves time
– if test only one measure and it has many problems, have to repeat entire process for next candidate measure
72
Conduct Pretests in All Diverse Groups Being Included in Your Study
Important to recruit people from each of your target populations
– Won’t learn anything if you just recruit friends, persons easy to recruit
73
Cognitive Interviewing
Individual in-depth interviews with individuals using open-ended probes to assess– how items are interpreted– adequacy of response choices
Typically 1.5 hr interview
74
Cognitive Interviewing Helps You Learn About the 4 Steps in Answering Questions
Interpret and understand the question– as intended by the researchers
Retrieve the information – various schemas used to access memory
Judgment formation - formulate an answer– calculate or judge the correct information
Edit response - decide what to report– is answer embarrassing, socially undesirable?
75
Summary
Selecting best measures is critical to validity of research
Very little published information on measurement properties in diverse groups– New area of focus and policy attention
– Raises issues of conceptual and psychometric adequacy and equivalence
Pretesting is the most important thing you can do
76
Conclusions
Methods described here are “ideal”– Impractical for most researchers
Apply these methods to your most important measures– e.g., outcomes, key independent variables
Keep learning– Good, appropriate measures remain the foundation
of excellent research