1 Class 9 Steps in Creating and Testing Scale Scores, and Presenting Measurement Data December 1,...
-
Upload
della-smith -
Category
Documents
-
view
213 -
download
0
Transcript of 1 Class 9 Steps in Creating and Testing Scale Scores, and Presenting Measurement Data December 1,...
1
Class 9
Steps in Creating and Testing Scale Scores, and Presenting Measurement Data
December 1, 2005
Anita L. Stewart Institute for Health & Aging
University of California, San Francisco
2
Overview
Steps in creating and testing scales in your sample
– Preparing raw data– Deriving scale scores– Testing scaling properties in your sample
Presenting measurement results Presenting change scores
3
Overview
Steps in creating and testing scales in your sample
– Preparing raw data– Deriving scale scores– Testing scaling properties in your sample
Presenting measurement results Presenting change scores
5
Preparing Surveys for Data Entry: 4 Steps
Review surveys for data quality Reclaim missing and ambiguous data Address ambiguities in the questionnaire
prior to data entry Code open-ended items
6
Review Surveys for Data Quality
Examine each survey in detail as soon as it is returned, and mark any..– Missing data
– Inconsistent or ambiguous answers
– Skip patterns that were not followed
7
Reclaim Missing and Ambiguous Data
Go over problems with respondent– If survey returned in person, review then
– If mailed, call respondent ASAP, go over missing and ambiguous answers
– If you cannot reach by telephone, make a copy for your files and mail back the survey with request to clarify missing data
8
Address Ambiguities in the Questionnaire Prior to Data Entry
When two choices are circled for one question, randomly choose one (flip a coin)
Clarify entries that might not be clear to data entry person
9
Code Open-Ended Items
Open-ended responses have no numeric code– e.g., name of physician, reason for visiting
physician Goal of coding open-ended items
– create meaningful categories from variety of responses
– minimize number of categories for better interpretability
– Assign a numeric score for data entry
10
Example of Open-Ended Responses
1.What things do you think are important for doctors at this clinic to do to give you high quality care?
Listen to your patients more often Pay more attention to the patient Not to wait so long Be more caring toward the patient Not to have so many people at one time Spend more time with the patients Be more understanding
11
Process of Coding Open-Ended Data
Develop classification scheme– Review responses from 25 or more questionnaires – Begin a classification scheme– Assign unique numeric codes to each category– Maintain a list of codes and the verbatim answers
for each– Add new codes as new responses are identified
If a response cannot be classified, assign a unique code and address it later
12
Example of Open-Ended Codes
Communication = 1 Listen to your patients more often = 1 Pay more attention to the patient = 1 Access to care = 2 Not to wait so long = 2 Not to have so many people at one time = 2Allow more time = 3 Spend more time with the patients = 3Emotional Support = 4 Be more understanding = 4 Be more caring toward the patient
13
Verify Assigned Codes
Ideally, have a second person independently classify each response according to final codes
Investigator can review a small subset of questionnaires to assure that coding assignment criteria are clear and are being followed
14
Reliability of Open-Ended Codes
Depends on quality of question, of codes assigned, and the training and supervision of coders
Initial coder and second coder should be concordant in over 90% of cases
15
Data Entry
Set up file Double entry of about 10% of surveys
– SAS or SPSS will compare two for accuracy» Acceptable 0-5% error» If 6% or greater – consider re-entering data
16
Item Naming Conventions
Optimal coding is to assign raw items their questionnaire number – Can always link back to questionnaire easily
Some people assign a variable name to the questionnaire item– This will drive you crazy
17
Print Frequencies of Each Item and Review: Range Checks
Verify that responses for each item are within acceptable range– Out of range values can be checked on
original questionnaire» corrected or considered “missing”
– Sometimes out of range values mean that an item has been entered in the wrong column» a check on data entry quality
18
Print Frequencies of Each Item and Review: Consistency Checking
Determine that skip patterns were followed Number of responses within a skip pattern
need to equal number who answered “skip in” question appropriately
19
Print Frequencies of Each Item and Review: Consistency Checking (cont.)
1. Did your doctor prescribe any medications? (yes, no)
1a. If yes, did your doctor explain the side effects of the medication?
If 75 respondents (of 90) said yes to 1, expect 75 responses to question 1a.– Often will find that more people(e.g., 80) answered
the second question than were supposed to
20
Print Frequencies of Each Item and Review: Consistency Checking (cont.)
Need to go back to a questionnaires of those with problems – check whether initial “filter” item was
incorrectly answered or whether respondent inadvertently answered subset
– sometimes you won’t know which was correct Hopefully this was caught during initial
review of questionnaire and corrected by asking respondent
21
Overview
Interpreting cognitive interviewing results Steps in creating and testing scales in your
sample– Preparing raw data– Deriving scale scores– Testing scaling properties in your sample
Presenting measurement results Presenting change scores
22
Deriving Scale Scores
Develop a “codebook” of scoring rules– Handout – Summary of variables and variable
coding Create scores with computer algorithms in
SAS, SPSS, or other program Review scores to detect programming errors Revise computer algorithms as needed Review final scores
23
Codebook of Scoring Rules (Handout)
Codebook is a scoring guide for entire instrument – Scale or subscale name, description of scale,
item numbers, item scoring (e.g., reverse some items if needed), what a high score means, missing data rules
– Special coding of certain items Sometimes rules conform to published
scoring rules
24
Variable Naming Conventions (Variables of Composite of Items) Assigning variable names is an important step
– make them as meaningful as possible– plan them for all questionnaires at the beginning
For study with more than one source of data, a suffix can indicate which point in time and which questionnaire– B for baseline, 6 for 6-month, Y for one year– M for medical history, L for lab tests
25
Variable Naming Conventions (Variables of Composite of Items) (cont.)
Medical History Questionnaire
HYPERTMB HYPERTM6
Baseline 6 months
26
Variable Naming Conventions (cont.)
A prefix can help sort variable groupings alphabetically– e.g., S for symptoms
SPAINB, SFATIGB, SSOBB
27
Creating Likert Scale Scores
Translate codebook scoring rules into program code (SAS, SPSS):– Determine direction of scoring of final measure
– Reverse all items that are not already in that direction
– Average remaining items
– Apply missing data rule e.g., if more than 50% missing
28
Review Summary Statistics of Derived Scores to Detect Programming Errors
Run raw data through program– can be a preliminary subset of raw data to debug
program Review summary statistics of scores to
determine accuracy of program– Do the mean, SD make sense?– Is the observed range appropriate? Are there any
cores outside the possible range?– Does the % missing seem about right?
29
Revise Computer Algorithms As Needed
For those that don’t make sense, review programming statements
Locate errors and correct
30
Review Final Scores
Review scores again Repeat process until you are satisfied that
the computer algorithm is producing accurate scores– For a complete test of programming
accuracy, calculate a few scores by hand from one or two questionnaires» Make sure those respondents’ scores match what
you get
31
Overview
Steps in creating and testing scales in your sample
– Preparing raw data– Deriving scale scores– Testing scaling properties in your sample
Presenting measurement results Presenting change scores
32
Testing Scaling Properties and Reliability in Your Sample for Multi-Item Scales
Obtain item-scale correlations– Part of internal consistency reliability
program Calculate reliability in your sample
(regardless of known reliability in other studies) – internal-consistency for multi-item scales– test-retest if you obtained it
33
Review Results (Handout)
Item-scale correlations– Be sure each item correlates at least .30 and
preferably .40 with the total scale (corrected for overlap)
Internal consistency (Cronbach’s alpha)– Should be at least .70– If lower, see if modifying items (above) will
improve it Test-retest reliability
– Should meet standards
34
Overview
Steps in creating and testing scales in your sample
– Preparing raw data– Deriving scale scores– Testing scaling properties in your sample
Presenting measurement results Presenting change scores
35
Presenting Measurement Results (Handout)
Present for each final scale:– % missing– Mean, standard deviation– Observed range, possible range– Floor and ceiling effects, skewness statistic– Reliability information
» Internal consistency reliability
» Range of item-scale correlations
» Number of item-scale correlations > .30
36
Overview
Steps in creating and testing scales in your sample
– Preparing raw data– Deriving scale scores– Testing scaling properties in your sample
Presenting measurement results Presenting change scores
37
Types of Change Scores
Measured change– Difference in scores between baseline and
follow-up Percentage change
– Measured change divided by baseline score Perceived change
– Single question asking respondent whether and how much they think they changed (from some prior time period)
38
Measured Change
Difference in scores from baseline to follow-up Example:
Pain during the past 2 weeks, 0-10 cm Visual Analog Scale– Time 1 (baseline) - score of 5– Time 2 (follow-up) - score of 8– Difference = +3 or -3 depending on which way we
subtract
39
Measured Change: What is Missing?
How should we interpret a change score of +3 or -3?
Depends on:– Direction of scores (is higher score better or worse)– Which was subtracted from which?
» Follow-up minus baseline? (T2 - T1)
» Baseline minus follow-up? (T1 - T2)
40
Measured Change: What is Missing? How to calculate depends on what you want the
change score to indicate – positive score is improvement or worsening
Positive score to indicate improvement:– high score is better
» Subtract time 2 from time 1» Positive change score = improvement
– high score is worse» Subtract time 1 from time 2» Positive change score = improvement
41
Example of Change Score
You want a positive change to indicate improvement– high score is better
Subtract score nearest “worst” end from score nearest “best” end
(worst) 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 (best)
time 1 time 2
Time 2 minus Time 1 = change of +4 (improved by 4 points)
42
Example of Change Score
You want a positive change to indicate improvement– high score is worse
Subtract score nearest “best” end from score nearest “worst” end
(best) 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 (worst)
time 2 time 1
Time 1 minus Time 2 = change of +4 (improved by 4 points)
43
Interpreting Change Scores: What is Wrong?
A study predicting utilization of health care (outpatient visits) over a 1-year period as a function of self-efficacy
A results sentence:– “Reduced utilization at one year was associated
with level of self efficacy at baseline (p < .01) and with 6-month changes in self efficacy (p < .05).”
44
Interpreting Change Scores: Making it Clearer
“Reduced outpatient visits at one year were associated with lower levels of self efficacy at baseline (p < .01) and with 6-month improvements in self efficacy.”
Old way:– “Reduced utilization at one year was associated
with level of self efficacy at baseline (p < .01) and with 6-month changes in self-efficacy.”
45
Presenting Change Scores in Tables: What is Wrong?
Change in anxiety over a 1-year period for two groups
1 year change in anxiety p
Exercise group - 40 < .001Education group +4 ns
46
Presenting Change Scores in Tables: Making it Clearer
Change in anxiety over a 1-year period for two groups
1 year change* in anxiety p
Exercise group - 40 < .001Education group +4 ns
*Change scores are 1-year minus baseline; negative score indicates decrease in anxiety
47
Percentage Change
Measured change divided by baseline score
Example: pain measure, higher is more pain– change score of -2, baseline score of 6
– 2/6 = 33% reduction in pain
48
Example of Percentage Change: Problem with Likert Scales
You want a positive change to indicate improvement (and high score is better)
Subtract score nearest “worst” end from score nearest “best” end
(worst) 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 (best)
time 1 time 2
Time 2 minus Time 1 = change of +4 (improved by 4 points)4 / 8 = 50% improvement
49
Example of Percentage Change: Problem with Likert Scales (cont.)
You want a positive change to indicate improvement– high score is worse
Subtract score nearest “best” end from score nearest “worst” end
(best) 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 (worst)
time 2 time 1
Time 1 minus Time 2 = change of +4 (improved by 4 points)4 / 16 = 25% improvement
50
Percentage Change Scores Only Work for Ratio-Level Measures
Can do percentage change only on scales with a – True zero (zero represents the absence of the trait in
question) Ratio scores - weight in pounds Person weighs 150 pounds
– Gains 10, gained 15% of original weight– Loses 10, lost 15% of original weight
51
Perceived Change (Retrospective Change)
How much has your physical functioning changed since your surgery?1 - very much worse2 - much worse3 - worse4 - no change5 - better6 - much better7 - very much better
52
Perceived/Retrospective Change Perceived change enables respondent to define
physical functioning in terms of what it means to them Measured change is a change on specific questions
that were contained in the particular measure, e.g.– Difficulty walking– Difficulty climbing stairs
If the person had no change in these particular items, their measured change score will be 0 (no change)
If the same person became much worse in terms of bending over, they will report that they became worse
53
Perceived/Retrospective Change Recommend including both types of
measures to assess change– Measured change enables
» comparison with other studies
» May be more sensitive because has more scale levels (if multi-item measure)
– Perceived/Retrospective change enables» Person to report on domain using their own definition
» Picks up changes “unmeasured” by particular measure