1 Class 9 Steps in Creating and Testing Scale Scores, and Presenting Measurement Data December 1,...

55
1 Class 9 Steps in Creating and Testing Scale Scores, and Presenting Measurement Data December 1, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

Transcript of 1 Class 9 Steps in Creating and Testing Scale Scores, and Presenting Measurement Data December 1,...

1

Class 9

Steps in Creating and Testing Scale Scores, and Presenting Measurement Data

December 1, 2005

Anita L. Stewart Institute for Health & Aging

University of California, San Francisco

2

Overview

Steps in creating and testing scales in your sample

– Preparing raw data– Deriving scale scores– Testing scaling properties in your sample

Presenting measurement results Presenting change scores

3

Overview

Steps in creating and testing scales in your sample

– Preparing raw data– Deriving scale scores– Testing scaling properties in your sample

Presenting measurement results Presenting change scores

4

Preparing Raw Data

Prepare surveys for data entry Data entry Range and consistency checks

5

Preparing Surveys for Data Entry: 4 Steps

Review surveys for data quality Reclaim missing and ambiguous data Address ambiguities in the questionnaire

prior to data entry Code open-ended items

6

Review Surveys for Data Quality

Examine each survey in detail as soon as it is returned, and mark any..– Missing data

– Inconsistent or ambiguous answers

– Skip patterns that were not followed

7

Reclaim Missing and Ambiguous Data

Go over problems with respondent– If survey returned in person, review then

– If mailed, call respondent ASAP, go over missing and ambiguous answers

– If you cannot reach by telephone, make a copy for your files and mail back the survey with request to clarify missing data

8

Address Ambiguities in the Questionnaire Prior to Data Entry

When two choices are circled for one question, randomly choose one (flip a coin)

Clarify entries that might not be clear to data entry person

9

Code Open-Ended Items

Open-ended responses have no numeric code– e.g., name of physician, reason for visiting

physician Goal of coding open-ended items

– create meaningful categories from variety of responses

– minimize number of categories for better interpretability

– Assign a numeric score for data entry

10

Example of Open-Ended Responses

1.What things do you think are important for doctors at this clinic to do to give you high quality care?

Listen to your patients more often Pay more attention to the patient Not to wait so long Be more caring toward the patient Not to have so many people at one time Spend more time with the patients Be more understanding

11

Process of Coding Open-Ended Data

Develop classification scheme– Review responses from 25 or more questionnaires – Begin a classification scheme– Assign unique numeric codes to each category– Maintain a list of codes and the verbatim answers

for each– Add new codes as new responses are identified

If a response cannot be classified, assign a unique code and address it later

12

Example of Open-Ended Codes

Communication = 1 Listen to your patients more often = 1 Pay more attention to the patient = 1 Access to care = 2 Not to wait so long = 2 Not to have so many people at one time = 2Allow more time = 3 Spend more time with the patients = 3Emotional Support = 4 Be more understanding = 4 Be more caring toward the patient

13

Verify Assigned Codes

Ideally, have a second person independently classify each response according to final codes

Investigator can review a small subset of questionnaires to assure that coding assignment criteria are clear and are being followed

14

Reliability of Open-Ended Codes

Depends on quality of question, of codes assigned, and the training and supervision of coders

Initial coder and second coder should be concordant in over 90% of cases

15

Data Entry

Set up file Double entry of about 10% of surveys

– SAS or SPSS will compare two for accuracy» Acceptable 0-5% error» If 6% or greater – consider re-entering data

16

Item Naming Conventions

Optimal coding is to assign raw items their questionnaire number – Can always link back to questionnaire easily

Some people assign a variable name to the questionnaire item– This will drive you crazy

17

Print Frequencies of Each Item and Review: Range Checks

Verify that responses for each item are within acceptable range– Out of range values can be checked on

original questionnaire» corrected or considered “missing”

– Sometimes out of range values mean that an item has been entered in the wrong column» a check on data entry quality

18

Print Frequencies of Each Item and Review: Consistency Checking

Determine that skip patterns were followed Number of responses within a skip pattern

need to equal number who answered “skip in” question appropriately

19

Print Frequencies of Each Item and Review: Consistency Checking (cont.)

1. Did your doctor prescribe any medications? (yes, no)

1a. If yes, did your doctor explain the side effects of the medication?

If 75 respondents (of 90) said yes to 1, expect 75 responses to question 1a.– Often will find that more people(e.g., 80) answered

the second question than were supposed to

20

Print Frequencies of Each Item and Review: Consistency Checking (cont.)

Need to go back to a questionnaires of those with problems – check whether initial “filter” item was

incorrectly answered or whether respondent inadvertently answered subset

– sometimes you won’t know which was correct Hopefully this was caught during initial

review of questionnaire and corrected by asking respondent

21

Overview

Interpreting cognitive interviewing results Steps in creating and testing scales in your

sample– Preparing raw data– Deriving scale scores– Testing scaling properties in your sample

Presenting measurement results Presenting change scores

22

Deriving Scale Scores

Develop a “codebook” of scoring rules– Handout – Summary of variables and variable

coding Create scores with computer algorithms in

SAS, SPSS, or other program Review scores to detect programming errors Revise computer algorithms as needed Review final scores

23

Codebook of Scoring Rules (Handout)

Codebook is a scoring guide for entire instrument – Scale or subscale name, description of scale,

item numbers, item scoring (e.g., reverse some items if needed), what a high score means, missing data rules

– Special coding of certain items Sometimes rules conform to published

scoring rules

24

Variable Naming Conventions (Variables of Composite of Items) Assigning variable names is an important step

– make them as meaningful as possible– plan them for all questionnaires at the beginning

For study with more than one source of data, a suffix can indicate which point in time and which questionnaire– B for baseline, 6 for 6-month, Y for one year– M for medical history, L for lab tests

25

Variable Naming Conventions (Variables of Composite of Items) (cont.)

Medical History Questionnaire

HYPERTMB HYPERTM6

Baseline 6 months

26

Variable Naming Conventions (cont.)

A prefix can help sort variable groupings alphabetically– e.g., S for symptoms

SPAINB, SFATIGB, SSOBB

27

Creating Likert Scale Scores

Translate codebook scoring rules into program code (SAS, SPSS):– Determine direction of scoring of final measure

– Reverse all items that are not already in that direction

– Average remaining items

– Apply missing data rule e.g., if more than 50% missing

28

Review Summary Statistics of Derived Scores to Detect Programming Errors

Run raw data through program– can be a preliminary subset of raw data to debug

program Review summary statistics of scores to

determine accuracy of program– Do the mean, SD make sense?– Is the observed range appropriate? Are there any

cores outside the possible range?– Does the % missing seem about right?

29

Revise Computer Algorithms As Needed

For those that don’t make sense, review programming statements

Locate errors and correct

30

Review Final Scores

Review scores again Repeat process until you are satisfied that

the computer algorithm is producing accurate scores– For a complete test of programming

accuracy, calculate a few scores by hand from one or two questionnaires» Make sure those respondents’ scores match what

you get

31

Overview

Steps in creating and testing scales in your sample

– Preparing raw data– Deriving scale scores– Testing scaling properties in your sample

Presenting measurement results Presenting change scores

32

Testing Scaling Properties and Reliability in Your Sample for Multi-Item Scales

Obtain item-scale correlations– Part of internal consistency reliability

program Calculate reliability in your sample

(regardless of known reliability in other studies) – internal-consistency for multi-item scales– test-retest if you obtained it

33

Review Results (Handout)

Item-scale correlations– Be sure each item correlates at least .30 and

preferably .40 with the total scale (corrected for overlap)

Internal consistency (Cronbach’s alpha)– Should be at least .70– If lower, see if modifying items (above) will

improve it Test-retest reliability

– Should meet standards

34

Overview

Steps in creating and testing scales in your sample

– Preparing raw data– Deriving scale scores– Testing scaling properties in your sample

Presenting measurement results Presenting change scores

35

Presenting Measurement Results (Handout)

Present for each final scale:– % missing– Mean, standard deviation– Observed range, possible range– Floor and ceiling effects, skewness statistic– Reliability information

» Internal consistency reliability

» Range of item-scale correlations

» Number of item-scale correlations > .30

36

Overview

Steps in creating and testing scales in your sample

– Preparing raw data– Deriving scale scores– Testing scaling properties in your sample

Presenting measurement results Presenting change scores

37

Types of Change Scores

Measured change– Difference in scores between baseline and

follow-up Percentage change

– Measured change divided by baseline score Perceived change

– Single question asking respondent whether and how much they think they changed (from some prior time period)

38

Measured Change

Difference in scores from baseline to follow-up Example:

Pain during the past 2 weeks, 0-10 cm Visual Analog Scale– Time 1 (baseline) - score of 5– Time 2 (follow-up) - score of 8– Difference = +3 or -3 depending on which way we

subtract

39

Measured Change: What is Missing?

How should we interpret a change score of +3 or -3?

Depends on:– Direction of scores (is higher score better or worse)– Which was subtracted from which?

» Follow-up minus baseline? (T2 - T1)

» Baseline minus follow-up? (T1 - T2)

40

Measured Change: What is Missing? How to calculate depends on what you want the

change score to indicate – positive score is improvement or worsening

Positive score to indicate improvement:– high score is better

» Subtract time 2 from time 1» Positive change score = improvement

– high score is worse» Subtract time 1 from time 2» Positive change score = improvement

41

Example of Change Score

You want a positive change to indicate improvement– high score is better

Subtract score nearest “worst” end from score nearest “best” end

(worst) 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 (best)

time 1 time 2

Time 2 minus Time 1 = change of +4 (improved by 4 points)

42

Example of Change Score

You want a positive change to indicate improvement– high score is worse

Subtract score nearest “best” end from score nearest “worst” end

(best) 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 (worst)

time 2 time 1

Time 1 minus Time 2 = change of +4 (improved by 4 points)

43

Interpreting Change Scores: What is Wrong?

A study predicting utilization of health care (outpatient visits) over a 1-year period as a function of self-efficacy

A results sentence:– “Reduced utilization at one year was associated

with level of self efficacy at baseline (p < .01) and with 6-month changes in self efficacy (p < .05).”

44

Interpreting Change Scores: Making it Clearer

“Reduced outpatient visits at one year were associated with lower levels of self efficacy at baseline (p < .01) and with 6-month improvements in self efficacy.”

Old way:– “Reduced utilization at one year was associated

with level of self efficacy at baseline (p < .01) and with 6-month changes in self-efficacy.”

45

Presenting Change Scores in Tables: What is Wrong?

Change in anxiety over a 1-year period for two groups

1 year change in anxiety p

Exercise group - 40 < .001Education group +4 ns

46

Presenting Change Scores in Tables: Making it Clearer

Change in anxiety over a 1-year period for two groups

1 year change* in anxiety p

Exercise group - 40 < .001Education group +4 ns

*Change scores are 1-year minus baseline; negative score indicates decrease in anxiety

47

Percentage Change

Measured change divided by baseline score

Example: pain measure, higher is more pain– change score of -2, baseline score of 6

– 2/6 = 33% reduction in pain

48

Example of Percentage Change: Problem with Likert Scales

You want a positive change to indicate improvement (and high score is better)

Subtract score nearest “worst” end from score nearest “best” end

(worst) 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 (best)

time 1 time 2

Time 2 minus Time 1 = change of +4 (improved by 4 points)4 / 8 = 50% improvement

49

Example of Percentage Change: Problem with Likert Scales (cont.)

You want a positive change to indicate improvement– high score is worse

Subtract score nearest “best” end from score nearest “worst” end

(best) 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 (worst)

time 2 time 1

Time 1 minus Time 2 = change of +4 (improved by 4 points)4 / 16 = 25% improvement

50

Percentage Change Scores Only Work for Ratio-Level Measures

Can do percentage change only on scales with a – True zero (zero represents the absence of the trait in

question) Ratio scores - weight in pounds Person weighs 150 pounds

– Gains 10, gained 15% of original weight– Loses 10, lost 15% of original weight

51

Perceived Change (Retrospective Change)

How much has your physical functioning changed since your surgery?1 - very much worse2 - much worse3 - worse4 - no change5 - better6 - much better7 - very much better

52

Perceived/Retrospective Change Perceived change enables respondent to define

physical functioning in terms of what it means to them Measured change is a change on specific questions

that were contained in the particular measure, e.g.– Difficulty walking– Difficulty climbing stairs

If the person had no change in these particular items, their measured change score will be 0 (no change)

If the same person became much worse in terms of bending over, they will report that they became worse

53

Perceived/Retrospective Change Recommend including both types of

measures to assess change– Measured change enables

» comparison with other studies

» May be more sensitive because has more scale levels (if multi-item measure)

– Perceived/Retrospective change enables» Person to report on domain using their own definition

» Picks up changes “unmeasured” by particular measure

54

Next Week: Class 10

Future directions Factor analysis

55

Homework for Next Week

Final paper (see outline - Handout)