Reliability, Validity, & Scaling

Reliability, Validity, & Scaling


Reliability, Validity, & Scaling. Reliability. Repeatedly measure unchanged things. Do you get the same measurements? Charles Spearman, Classical Measurement Theory. If perfectly reliable, then corr between true scores and measurements = +1. r < 1 because of random error. - PowerPoint PPT Presentation

Transcript of Reliability, Validity, & Scaling

Page 1: Reliability, Validity, & Scaling

Reliability, Validity, & Scaling

Page 2: Reliability, Validity, & Scaling

Reliability• Repeatedly measure unchanged things.• Do you get the same measurements?• Charles Spearman, Classical

Measurement Theory.• If perfectly reliable, then corr between true

scores and measurements = +1.• r < 1 because of random error.• error symmetrically distributed about 0.

Page 3: Reliability, Validity, & Scaling

True Scores and Measurements• Reliability is the squared correlation between

true scores and measurement scores.

• Reliability is the proportion of the variance in the measurement scores that is due to differences in the true scores rather than due to random error.








TXX rr

Page 4: Reliability, Validity, & Scaling

• Systematic error– not random– measuring something else, in addition to the

construct of interest• Reliability cannot be known, can be


Page 5: Reliability, Validity, & Scaling

Test-Retest Reliability• Measure subjects at two points in time.• Correlate ( r ) the two sets of

measurements.• .7 OK for research instruments• need it higher for practical applications

and important decisions.• M and SD should not vary much from Time

1 to Time 2, usually.

Page 6: Reliability, Validity, & Scaling

Alternate/Parallel Forms• Estimate reliability with r between forms.• M and SD should be same for both forms.• Pattern of corrs with other variables should

be same for both forms.

Page 7: Reliability, Validity, & Scaling

Split-Half Reliability

• Divide items into two random halves.• Score each half.• Correlate the half scores.• Get the half-test reliability coefficient, rhh • Correct with Spearman-Brown


hhsb r



Page 8: Reliability, Validity, & Scaling

Cronbach’s Coefficient Alpha

• Obtained value of rsb depends on how you split the items into haves.

• Find rsb for all possible pairs of split halves.• Compute mean of these.• But you don’t really compute it this way.• This is a lower bound for the true reliability.• That is, it underestimates true reliability.

Page 9: Reliability, Validity, & Scaling

Maximized Lambda4

• This is the best estimator of reliability.• Compute rsb for all possible pairs of split

halves.• The largest rsb = the estimated reliability.• If more than a few items, this is

unreasonably tedious.• But there are ways to estimate it.

Page 10: Reliability, Validity, & Scaling

Construct Validity• To what extent are we really

measuring/manipulating the construct of interest?

• Face Validity – do others agree that it sounds valid?

Page 11: Reliability, Validity, & Scaling

Content Validity• Detail the population of things (behaviors,

attitudes, etc.) that are of interest.• Consider your operationalization of the

construct (the details of how you proposed to measure it) as a sample of that population.

• Is your sample representative of the population – ask experts.

Page 12: Reliability, Validity, & Scaling

Criterion-Related Validity• Established by demonstrating that your

operationalization has the expected pattern of correlations with other variables.

• Concurrent Validity – demonstrate the expected correlation with other variables measured at the same time.

• Predictive Validity – demonstrate the expected correlation with other variables measured later in time.

Page 13: Reliability, Validity, & Scaling

• Convergent Validity – demonstrate the expected correlation with measures of other constructs.

• Discriminant Validity – demonstrate the expected lack of correlation with measures of other constructs.

Page 14: Reliability, Validity, & Scaling

Scaling• Scaling = construction of instruments for

measuring abstract constructs.• I shall discuss the creation of a Likert-

scale, my favorite type of scale.

Page 15: Reliability, Validity, & Scaling

Likert Scales• Define the Concept• Generate Potential Items

– About 100 statements.– On some, agreement indicates being high on

the measured attribute– On others, agreement indicates being low on

the measured attribute

Page 16: Reliability, Validity, & Scaling

Likert Response Scale– Use a multi-point response scale like this:

1. People should make certain that their actions never intentionally harm others even to a small degree.

Strongly DisagreeDisagreeNeutralAgreeStrongly Agree

Page 17: Reliability, Validity, & Scaling

Evaluate the Potential Items• Get judges to evaluate each item on a 5-point

scale– 1 – Agreement = very low on attribute– 2 – Agreement = low on attribute– 3 – Agreement tells you nothing– 4 – Agreement = high on attribute– 5 – Agreement = very high on attribute

• Select items with very high or very low means and little variability among the judges.

Page 18: Reliability, Validity, & Scaling

Alternate Method of Item Evaluation

• Ask some judges to respond to the items in the way they think someone high in the attribute would respond.

• Ask other judges to respond as would one low in the attribute.

• Prefer items that best discriminate between these two groups

• Also ask judges to identify items that are unclear or confusing.

Page 19: Reliability, Validity, & Scaling

Pilot Test the Items

• Administer to a sample of persons from the population of interest

• Conduct an item analysis (more on this later)

• Prefer items which have high item-total correlations

• Consider conducting a factor analysis (more on this later)

Page 20: Reliability, Validity, & Scaling

Administer the Final Scale

• on each item, response which indicates least amount of the attribute scored as 1

• next least amount response scored as 2• and so on• respondent’s total score = sum of item

scores or mean of item scores• dealing with nonresponses on some items• reflecting items (reverse scoring)

Page 21: Reliability, Validity, & Scaling

Item Analysis• You believe the scale is unidimensional.• Each item measures the same thing.• Item scores should be well correlated.• Evaluate this belief with an item analysis.

– is the scale internally consistent?– if so, it is also reliable.– are there items that do not correlate well with

the others?

Page 22: Reliability, Validity, & Scaling

Item Analysis of Idealism Scale

• Bring KJ-Idealism.sav into PASW.• Available at

Page 23: Reliability, Validity, & Scaling

• Click Analyze, Scale, Reliability Analysis.

Page 24: Reliability, Validity, & Scaling

• Select all ten items and scoot them to the Items box on the right.

• Click the Statistics box.

Page 25: Reliability, Validity, & Scaling

• Check “Scale if item deleted” and then click Continue.

Page 26: Reliability, Validity, & Scaling

• Back on the initial window, click OK.• Look at the output.• The Cronbach alpha is .744, which is


Reliability Statistics

.744 10

Cronbach'sAlpha N of Items

Page 27: Reliability, Validity, & Scaling

Item-Total StatisticsItem-Total Statistics

32.42 23.453 .444 .71832.79 22.702 .441 .71732.79 21.122 .604 .69032.33 22.436 .532 .70532.33 22.277 .623 .69532.07 24.807 .337 .73334.29 24.152 .247 .74932.49 24.332 .308 .73633.38 22.063 .406 .72533.43 24.650 .201 .755


Scale Mean ifItem Deleted

ScaleVariance if

Item Deleted


Cronbach'sAlpha if Item


Page 28: Reliability, Validity, & Scaling

Troublesome Items

• Items 7 and 10 are troublesome.• Deleting them would increase alpha.• But not by much, so I retained them.• Item 7 stats are especially distressing:• “Deciding whether or not to perform an act

by balancing the positive consequences of the act against the negative consequences of the act is immoral.”

Page 29: Reliability, Validity, & Scaling

What Next?

• I should attempt to rewrite item 7 to make it more clear that it applies to ethical decisions, not other cost-benefit analysis.

• But this is not my scale,• And who has the time?

Page 30: Reliability, Validity, & Scaling

Scale Might Not Be Unidimensional

• If the items are measuring two or more different things, alpha may well be low.

• You need to split the scale into two or more subscales.

• Factor analysis can be helpful here (but no promises).