Validity

It is “the degree to which a certain inference from a test is appropriate or meaningful” (Drummond, 2000)

It is the extent to which a test does the job desired of it; the evidence may be either empirical or logical (Lyman, 1991)

It is the extent to which a test measures what it is supposed to measure (Murphy & Davidshofer, 1998)

Types Purpose Procedure Types of Tests

Content To compare whether the test items match the set of goals and objectives

Compare test blueprint with the school, course, program objectives. Use panel of experts in content area (eg teachers, professors)

Survey achievement tests, Criterion-referenced tests, examinations


Criterion:Predictive

To determine whether there is a relationship between a test and a criterion measure to be obtained in the future

Correlate test scores with criterion measure obtained after a period of time

Scholastic aptitude, General aptitude batteries, Prognostic tests, Readiness tests, Personality tests


Construct To determine whether a construct exists and to understand the traits or concepts that make up the set of scores or items

Conduct multivariate statistical analysis, discriminant analysis, multivariate analysis of variance

Intelligence tests, aptitude tests, personality tests

It refers to the degree to which test scores are consistent, dependable or repeatable; it is the function of the degree to which test scores are free from errors (Drummond, 2000)

It refers to the consistency of test scores obtained by the same persons when reexamined with the same test on different occasions, or with different sets of equivalent items, or under other variable examining conditions (Anastasi and Urbina, 1997).

The concept of reliability underlies the error of measurement of a single score whereby we can predict the range of fluctuation likely to occur in a single individual’s score as a result of irrelevant chance factors.

The other concept of reliability refers to the consistency of a test based on the number of items in the test and the average inter correlations among all items and computing the average of these inter correlations among test items.

Method Procedure Coefficient Problems

Test-retest Same procedure given twice with time interval testing

Stability Memory effectPractice effectChange over time

Alternate forms Equivalent tests given with time between testing

Equivalence and stability

Hard to develop 2 equivalent testsMay reflect change in behavior over time

Method Procedure Coefficient Problems

Internal Consistency

One test given at one time only (test divided into part in split-half)

Equivalence and internal consistence

Uses shortened forms (split half) Only good if traits are unitary or homogenousGives high estimate on a speeded testHard to compute by hand

Validity

Education

Transcript of Validity