Reliability - The extent to which a test or instrument gives consistent measurement

• Reliability- The extent to which a test or instrument gives consistent measurement- The strength of the relation between observed scores and true scores.

• Test-retest reliability (coefficient of stability)- Correlate two administrations of the same test.

• Parallel form reliability (coefficient of equivalence)- Correlate two forms of the same test

• Split half reliability (Spearman-Brown prophecy formula)- Correlate two halves of the test

• Internal consistency reliability (Cronbach α)- Correlate every item with every other item.

T X1

Reliability is the extent to which your observed score represents your true scoreE = X – TThe test yielding the score of X1 is more reliable than that giving X2

X2

Reliability is the extent to which individual differences or rank ordering of individuals based on the observed scores represent that based on the true scores.

One operations of this definition is the correlation between observed scores and true scores, XT, which is called reliability index.

Another operation is the squared correlation between observed score and true score or the proportion of observed score variance that is true score variance, or proportion of the consistent rank ordering, XT

2

T

X

In reality, it is the extent to which two tests yield similar results or similar rank

ordering of the individuals, XX’

X’

X

Test-retestParallel formSplit halfInternal consistency

When ρxx' = 1,1. the measurement has been made without error

(e=0 for all examinees).2. X = T for all examinees.3. all observed score variance reflects true-score

variance.4. all difference between observed scores are true

score differences.5. the correlation between observed scores and

true scores is 1.6. the correlation between observed scores and

errors is zero.

When ρxx’ = 0, 1. only random error is included in the

measurement. 2. X = E for all examinees. 3. all observed score variance reflects error

variance. 4. all difference between observed scores are

errors of measurement. 5. the correlation between observed scores and

true scores is 0. 6. the correlation between the observed scores

and errors is 1.

When ρxx’ is between zero and 1,

1. the measurement include some error and some truth.

2. X = T + E.

3. observed score variance include true-score and error variance.

4. difference between scores reflect true-score differences and error.

5. the correlation between observed scores and true scores is reliability.

6. the correlation between observed scores and error is the square root of 1 – reliability.

• Validity- The extent to which a test or instrument truly measures what it is expected to measure. - The use of a bathroom scale to measure weight is valid whereas the use of a bathroom scale to measure height is invalid.

• Content validity refers to the extent to which the items on a test are representative of a specified domain content. Achievement and aptitude (but not personality and attitude) tests are concerned with content validity.

• Construct validity refers to the extent to which items on a test are representative of the underlying construct, e.g., personality or attribute. Personality and attitude tests are concerned with construct validity. The process to establish construct validity is referred to as construct validation.

• Criterion related validity, including predictive validity and concurrent validity, refers to the extent to which a test correlates with future behaviors which the test is intended to predict.

0.660.820.740.670.82

0.790.720.380.800.56

0.43 -0.04

0.650.64

0.650.050.180.42

EasygoingResponsivenes

s

0.54 0.650.40 0.610.58 0.460.39 0.520.64

0.37 0.820.57 0.630.74 0.670.68

0.640.690.65

0.480.590.680.62

0.480.590.680.62

AuthoritativeParenting

AuthoritarianParenting

0.84

0.94

0.85

1.04

0.88

0.93

0.78

1.01

- 0.38

PhysicalPunishment

NonReasoning

AuthoritarianDirectiveness

VerbalHostility

Warmth

InductiveReasoning

DemocraticParticipation

Construct Validity: Internal Structure

Communication Avoidance

Social Withdrawal

Assertive Leadership

Behavioral Aggression

Verbal Aggression

.55

.58

.73

.96

.94

.94

.90

.70

.60

.65

.67

.69

.87

.89

.82

Perceived Social Competence

Time 1

Perceived Social Competence

Time 2

Peer Acceptance

Time 1

Peer Acceptance

Time 2

.59

.50

.54

.65

.62

.66

Single Indicator

Single Indicator

.54

.24

-.38

-.16 -.24

-.13

-.35-.13

.23

.27.17

-.27

-.15

-.17

-.20

Construct Validity: Network Relations

A Synthetic Multitrait-Multimethod Matrix

Method 1 Method 2 Method 3

Traits A1 B1 C1 A2 B2 C2 A3 B3 C3

A1 (.95)

Method 1 B1 .28 (.86)

C1 .58 .39 (.92)

A2 .86 .32 .57 (.95)

Method 2 B2 .30 .90 .40 .39 (.76)

C2 .52 .31 .86 .55 .26 (.84)

A3 .73 .10 .43 .64 .17 .37 (.48)

Method 3 B3 .10 .63 .17 .22 .67 .19 .15 (.41)

C3 .35 .16 .52 .31 .17 .56 .41 .30 (.58)

A Synthetic Multitrait-Multimethod Matrix

Method 1 Method 2 Method 3

Traits A1 B1 C1 A2 B2 C2 A3 B3 C3

A1 RL

Method 1 B1 DV RL

C1 DV DV RL

A2 CV HH HH RL

Method 2 B2 HH CV HH DV RL

C2 HH HH CV DV DV RL

A3 CV HH HH CV HH HH RL

Method 3 B3 HH CV HH HH CV HH DV RL

C3 HH HH CV HH HH CV DV DV RL

A-Level University GPA

SATA-Level

Criterion-Related Validity

Concurrent

Predictive

Rejected Selected

Qualifying score

Selected group

Test Scores

Cri

teri

on

Distribution of criterion scores for selected group

Distribution of scores on the criterion if no examinees were excluded

Restriction of Range Effect

Reliability - The extent to which a test or instrument gives consistent measurement

Documents

Transcript of Reliability - The extent to which a test or instrument gives consistent measurement