Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability...

81
Reliability Psych 395 - DeShon

Transcript of Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability...

Page 1: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Reliability

Psych 395 - DeShon

Page 2: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

How Do We Judge Psychological Measures?

Two concepts: Reliability and Validity Reliability: How consistent is the assessment over

time, items or raters? How reproducible are the measurements? How much measurement error is involved?

Validity: How well does an assessment measure what it is supposed to measure? How accurate is our assessment?

– An assessment is valid if it measures what it purports to measure.

Page 3: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Correlation Review

Page 4: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Some Data from the 2005 Baseball Season

Team Payroll Win % ERA Attendance

Yankees 208.31 (1st) 58.6% 4.52 4.09

Red Sox 123.51 (2nd) 58.6% 4.74 2.85

White Sox 75.18 (13th) 61.1% 3.61 2.34

Tigers 69.09 (15th) 43.8% 4.51 2.02

Devil Rays 29.68 (30th) 41.4% 5.39 1.14

Page 5: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Questions We Might Ask …

How strongly is payroll associated with winning percentage?

How strongly is payroll associated with making the playoffs?

How can we answer these?

Page 6: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Option 1: Plot the Data

Page 7: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.
Page 8: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.
Page 9: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Option 2: Quantify the Association with the Correlation Coefficient

Page 10: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

The Correlation Coefficient

Credited to Karl Pearson (1896) Measures the degree of linear association

between two variables. Ranges from -1.0 to 1.0 Sign refers to direction

– Negative: As X increases Y decreases– Positive: As X increases Y increases

Page 11: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

One Formula

Symbolized by r Covariance of X and Y Divided by the

Product of the SDs of X and Y.

XY

X Y

covr

s s

Page 12: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Calculation of r for Payroll (X) and Winning Percentage (Y)

covXY = 1.13

sX = 34.23

sY = .07

47.40.2

13.1

)07.0)(23.34(

13.1cov

YX

XY

ssr

Page 13: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Calculation of r for Payroll (X) and Making Post-Season (Y)

Y coded so that 1=Playoffs 0=No covXY = 8.24

sX = 34.23

sY = .45

53.42.15

24.8

)45.0)(23.34(

24.8cov

YX

XY

ssr

Page 14: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Examples of CorrelationsSource: Meyer et al. (2001)

Associations r

Test Anxiety and Grades -.17

SAT and Grades in College .20

GRE Quant. and Graduate School GPA .22

Quality of Marital Relationships and Quality of Parent-Child Relationships

.22

Alcohol and Aggressive Behavior .23

Height and Weight .44

Gender and Height .67

Page 15: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Commonly Used Rule of Thumb

+/- .10 is Small +/- .30 is Medium +/- .50 is Large Use these with care. This guidelines only

provide a loose framework for thinking about the size of correlations

Sources: Cohen (1988) and Kline (2004)

Page 16: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

r=0

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

observed

tru

e

Page 17: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

r=.10

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

observed

tru

e

Page 18: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

r=.20

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

observed

tru

e

Page 19: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

r=.30

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

observed

tru

e

Page 20: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

r=.40

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

observed

tru

e

Page 21: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

r=.50

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

observed

tru

e

Page 22: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

r=.60

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

observed

tru

e

Page 23: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

r=.70

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

observed

tru

e

Page 24: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

r=.80

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

observed

tru

e

Page 25: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

r=.90

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

observed

tru

e

Page 26: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

r=1.0

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

observed

tru

e

Page 27: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Now Back to Reliability

Page 28: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Classical Test Theory

X = T + E

where

X = Observed Score

T = True Score

E = Error score

Page 29: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Consider the Construct of Self-Esteem

Global self-esteem reflects a person’s overall evaluation of value and worth.

William James (1890) argued that self-esteem was the result of an individual’s perceived successes divided by their pretensions

Rosenberg (1965) defined global self-esteem as an individual’s overall judgment of adequacy

We can’t directly observe self-esteem

Page 30: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Measuring Self-Esteem

We can ask people questions that reflect individual differences in self-esteem.

– “I feel that I have a number of good qualities”– “I see myself as a person with high self-esteem”

We assume that a “hidden” self-esteem variable causes people to respond to these questions.

We do not want to assume that these items are perfect indicators of an individual’s level of self-esteem.

Page 31: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.
Page 32: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Classical Test Theory

X = T + E

where

X = Observed Score

T = True Score

E = Error score

Page 33: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Classical Test Theory Assumptions

1. True scores and errors are uncorrelated (independent)

2. Errors across people average to zero

3. Across repeated measurements, a person’s average score is ≈ equal to his/her true score.

Page 34: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Thinking about Total Variability

If X = T + E, then:

var (X) = var (T) + var (E)

                  

Page 35: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Reliability Coefficients

Reliability coefficients reflect the proportion of true score variance to observed score variance

Therefore reliabilities range from 0.0 (no true score variance) to

1.0 (all true-score variance)

var( )

var( )xx

Tr

X

Page 36: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Classic Definition of Reliability

The ratio of true score variance to total score variance.

Test 1: Total Variance = 10; True Score Variance = 9.

Test 2: Total Variance = 20; True Score Variance = 15.

Which Test is More Reliable?

Page 37: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Reliability

More technical: To what extent do observed scores reflect true scores?

How consistent is the assessment?

Page 38: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Three Kinds of Reliability

Internal Consistency (Content)– Random error affects responses to items on an

assessment

Test-Retest (Time)– The construct stays the same. However, random

errors vary from one occasion to the next.

Inter-Rater (Observer Biases)

Page 39: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Internal Consistency

Use a 5-item measure of Self-Esteem.– 1. I feel that I am a person of worth, at least on an equal

basis with others.– 2. I feel that I have a number of good qualities.– 3. All in all, I am inclined to feel that I am a failure.– 4. I am able to do things as well as most other people.– 5. I feel I do not have much to be proud of.

Response Options (1 = Strongly Disagree to 5 = Strongly Agree)

Page 40: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Internal Consistency

Page 41: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Correlate All Items (N = 450)

Item 1 Item 2 Item 3 Item 4 Item 5

Item 1 -

Item 2 .70 -

Item 3 .38 .45 -

Item 4 .50 .51 .41 -

Item 5 .32 .25 .43 .25 -

Page 42: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Summary Statistics of those Correlations

Average: .42 Standard Deviation: .13 Minimum: .25 (Items 4 & 5) Maximum: .70 (Items 1 & 2) Standardized Alpha = .78 Alpha is an index of how strongly the items

on a measure are associated with each other.

Page 43: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Coefficient Alpha

Page 44: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Where you need (1) the # of items (called k) and (2) the average inter-item correlation. This formula yields the standardized alpha.

1 ( 1)ij

ij

k r

k r

Coefficient Alpha ()

Page 45: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Coefficient Alpha versus Split-half reliability Estimates…

Split-Half Reliability – Divide the items on the assessment into 2 halves and then correlate the two halves.

Problem: Estimates fluctuate depending on what items get split into which halves.

Alpha is the average of all possible split-half reliabilities.

Page 46: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Sample Matrix

Item 1 Item 2 Item 3 Item 4 Item 5 Item 6

John 4 3 5 5 3 2

Paul 4 5 5 3 4 4

Ringo 2 2 2 1 2 3

George 4 4 3 2 5 4

Page 47: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Real Results

10 Item Measure of Self-Esteem for 451 women. Correlate the average of the odd number items with

the average of the even number items: r = .79 Correlate average first five items with the average of

the last five items: r = .67 Average Inter-Item r = .46 Standardized Alpha = .89

Page 48: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Caveats about Coefficient Alpha ….

Recall – what goes into the Alpha calculation:– Number of items– Average inter-item correlation

There are at least two things to think about when considering Coefficient Alpha…– Length of the Assessment– Dimensionality

Page 49: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Pay Attention to the Length of the Assessment.

Page 50: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Constant average inter-item correlation (e.g. .420) but increase the number of items….

Items Standardized Alpha

5 .78

6 .81

7 .84

8 .85

9 .87

10 .88

100 .99

Page 51: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Now let’s use something like an average inter-item correlation of .15

Items Standardized Alpha

5 .47

10 .64

15 .73

20 .78

25 .82

30 .84

100 .95

Page 52: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

With enough items it is possible to achieve very high alpha coefficients…

Page 53: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Dimensionality of the Measure

Page 54: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Let’s Get To The Same Average Inter-Item Correlation in Two Ways

Example from Schmitt (1996)

Page 55: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Item Pool #1 (Average inter-item r = .5)

Item 1. 2. 3. 4. 5. 6.

1. 1.0

2. .8 1.0

3. .8 .8 1.0

4. .3 .3 .3 1.0

5. .3 .3 .3 .8 1.0

6. .3 .3 .3 .8 .8 1.0

Page 56: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Item Pool #2 (Average inter-item r = .5)

Item 1. 2. 3. 4. 5. 6.

1. 1.0

2. .5 1.0

3. .5 .5 1.0

4. .5 .5 .5 1.0

5. .5 .5 .5 .5 1.0

6. .5 .5 .5 .5 .5 1.0

Page 57: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Let’s Calculate Alphas…

Item Pool #1: Average Inter-item r = .50, number of items = 6.

Item Pool #2: Average Inter-item r = .50, number of items = 6.

Standardized Alpha for Item Pool #1 = Standardized Alpha for Item Pool #2 = .86

Same alphas but the underlying correlation matrices are quite different…

Page 58: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Alpha does NOT index unidimensionality

Page 59: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

What is unidimensionality?

Unidimensionality can be rigorously defined as the existence of one latent trait underlying the set of items (Hattie, 1985, p. 152).

Simply put, all of the items forming the instrument all measure just one thing.

Turns out that 100% “pure” unidimensionality is hard to achieve for personality and attitude measures.

Try to get items that are as close as possible to a unidimensionality set.

Page 60: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

A Few Tips

Think about the construct Pay attention to the number of items on a

scale and the average item correlation. Always look at the inter-item correlation

matrix. Motto: An essential ingredient in the research

process is the judgment of the scientist. (Jacob Cohen, 1923-1998).

Page 61: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Question: What is a good alpha level?Answer: It depends….

Page 62: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Reliability Standards

Reliability Standards– .7 for research– .9 for actual decisions

But…– “Does a .50 reliability coefficient stink? To answer this

question, no authoritative source will do. Rather, it is for the user to determine what amount of error variance he or she is willing to tolerate, given the specific circumstances of the study.” Pedhazur and Scmelkin (1991, p. 110)

Page 63: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Test-Retest Reliability

The extent to which scores at one time point do not perfectly correlate with scores at another time point is an indicator of error

Correlation is an estimate of the reliability ratio

This assumes the underlying construct is stable.

Page 64: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Test-Retest Reliability

What Time Interval? Long enough that memory biases are not present but short enough that there is no expectation of true change.

Cattell et al (1970, p. 320): “When the lapse of time is insufficient for people themselves to change.”

Watson (2004) suggested 2-weeks.

Page 65: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Inter-Rater Reliability

Just like test-retest reliability Correlation of ratings from 2 or more judges Correlation is an estimate of the reliability

ratio

Page 66: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Question: What is one undesirable consequence of measurement error?

Researchers are often concerned about attenuation in predictor-criterion associations due to measurement error.

Assume that measures of X and Y have alphas of .60 and .70, respectively. An estimate of the upper limit on the observed correlation between X and Y is .65

Page 67: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Take the square root of the product of the two reliabilities

Measure 1 Measure 2 Upper Limit

.50 .85 .65

.60 .85 .71

.70 .85 .77

.80 .85 .82

.90 .85 .87

Page 68: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Correcting Correlations for Attenuation

rr

r rc

xy

xx yy

rxy = observed correlation between x and yrxx and ryy = reliability coefficients of x and y

Page 69: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Appling the Formula

Reliability

Measure 1

Reliability

Measure 2

Observed

Correlation

Corrected

.50 .60 .40 .73

.60 .70 .40 .62

.70 .80 .40 .53

.80 .90 .40 .47

.90 .90 .40 .44

Page 70: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Standard Error of Measurement

Estimating the precision of individual scores

Page 71: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Standard Error of Measurement = Standard deviation of the error around any individual’s true score

2 SEM captures95% of the error

Page 72: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Calculation of the Standard Error of Measurement (SEM)

1x xxSEM s r

xs xxr= SD of test scores = test reliability

10 1 .84 4SEM

10 1 .19 9SEM

good reliability low SEM

poor reliability high SEM

Page 73: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Standard Error of Measurement Assumptions

• a reliability coefficient based on an appropriate measure

• the sample appropriately represents the population

1x xxSEM s r

Page 74: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Confidence Bands

There are additional complexities involved in setting confidence bands around observed scores but we won’t cover them in PSY 395 (see Nunnally & Bernstein, 1994, p. 259)

SEM Confidence Interval– 95% Confidence: Z = 1.96 (Often round to 2)– 68% Confidence: Z = 1.0

SEMZScoreObservedCI Confidence

Page 75: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Consider 2 Tests

Page 76: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Case 1

The CAT (Creative Analogies Test) has 100 items. Assume the SEM of this test is 10.

Amy scored 75 The 95% Band = Score (2 *SEM) So the 95% Confidence Band around her

score is 55 to 95

Page 77: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Case 2

The CAT-2 (Creative Analogies Test V.2) also has 100 items. Assume the SEM of this test is 2

Amy scored 75 95% Confidence Band around her score is

71 to 79 Why? Recall the 95% Band = Score (2 *SEM)

Page 78: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Which test should be used to make decisions about Graduate School

Admission? Why?

Page 79: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Decisions….Decisions…

TRUTH

Doesn’t Have “it” Has “it”

Test Decision

Doesn’t Have “it” Correct

False Negative

Has “it” False Positive

Correct

Page 80: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

Cut Scores

Cut scores are set values that determine who “passes” and who “fails” the test.– Commonly used for licensure or certification (bar

exam, medical licensure, civil service)

What is the impact of the standard error of measurement on interpreting cut scores?

Page 81: Reliability Psych 395 - DeShon. How Do We Judge Psychological Measures? Two concepts: Reliability and Validity Reliability: How consistent is the assessment.

The smaller the SEM the better. Why?