Reliability

Discrepancies between true ability and measurement of ability constitute errors of measurement.

In Psychological Testing, ERROR does not imply that a mistake has been made. It implies that there will always be inaccuracy in measurements.

Tests that are free of measurement error are deemed to be reliable.

Tests that have too much error are deemed to be unreliable.

It is assumed that each person has a true score that would be obtained if there were no errors in measurement.

The difference between the true score and the observed score results from measurement error.

X -T= EWhere X – observed scoreT- true scoreE- error

It is assumed that the true score for an individual will not change with repeated applications of the same test.

Because of random error, however, repeated applications of the same test can produce different scores.

The standard deviation will be the standard error of measurement.

Remember that the standard deviation tells us about the average deviation around the mean.

The standard error of measurements tell us, on the average, how much the score varies from the true score.

In practice, the standard deviation of the observed score and the reliability of the test are used to estimate the standard error of measurement.

Federal government guidelines require that a test be reliable before one can use it to make employment and educational placement decisions (Heubert and Hauser, 1999).

Models of Reliability

Time Sampling: The Test -Retest Method

Is used to evaluate the error associated with administering a test at 2 different times.

Administer the same test on 2 well-specified occassions and find the correlation between scores from the 2 administrations.


Item Sampling: Parallel Forms MethodEquivalent Forms ReliabilityParallel Forms

• Determines the error variance that is attributable to the selection of one particular set of items

• Compares two equivalent forms of a test that measure the same attribute

• Pearson Product Moment Correlation


• Split Half Method

• A test is given and is divided into halves that are scored separately. The results of one half of the test are then compared with the results of the other.

• Odd-even system• Correlation between

the 2 halves

• Kuder-Richardson 20 Formula (KR20)

• Use to calculate for the reliability of the test in which the items are dichotomous, scored 0 or 1 (usually for right or wrong)

• Sum of the product of people passing each item times the proportion of people failing each item


• Split Half Method • Spearman- Brown Formula: use to correct for the half length of the test

Kuder-Richardson 21 (KR21)

A special case of the reliability formula that does not require the calculation of the p’s and q’s., instead it uses the mean test score

Assumes that all items are of average difficulty

Coeficient Alpha

Cronbach Alpha

The most general method of finding estimates of reliability through internal consistency.

How reliable is reliable?

What is “high enough”?

The answer depends on the use of the test.

.70 - .80 are good enough for the purposes of basic research.

In CLINICAL SETTINGS, a .90 reliability index may not be good enough; greater than .95 should be required

Reliability

Education

Transcript of Reliability