6. Validity

8/2/2019 6. Validity

1/31

PSY 6535Psychometric Theory

Validity Part 1


2/31

Overview

Content validity

Criterion-related validity


3/31

Issues of Validity

Does the test actually measure what it is

purported to measure? Do differences in tests

scores reflect true differences in the

underlying construct?

Are inferences based on the test scores

justified?


4/31

Example:

Validity of a Measure

The use of the polygraph (lie detector test) is

not nearly as valid as some say and can easily

be beaten and should never be admitted into

evidence in courts of law, say psychologists

from two scientific communities who were

surveyed on the validity of polygraphs. APANews Release


5/31

Validity is About Inferences.

Cronbach (1971): Validation is the process of

collecting evidence to support the types of

inferences that are drawn from test scores.

Validity is the degree to which all ofthe

accumulated evidence supports the intended

interpretation of test scores for the intended

purpose. (AERA, APA, NCME, 1999, p. 11).


6/31

Validity for what?

Inferences and decisions based on test scores

A person with this score is likely to

Be a better parent Do well in law school

Be most satisfied as an engineer

Steal from his/her employer


7/31

Types of Validity

Content

Criterion-based

ConstructConstruct

(general evidence gathering)

Content

(more theory-based)

Criterion-related

(more data-based)


8/31

Content Validity of a Measure

Collectively, do the items adequately

represent all of the domains of the construct

of interest?

Staring Point: A Well Defined Construct.

Often have a panel of experts judge whether

items adequately sample the domain of

interest.


9/31

Example: 1st Grade Math Objectives

What 1st Graders in School District X Should:

A. Be able to add any two positive numbers

whose sum is 20 or less.

B. Subtract any two numbers (each less than

15) whose difference is a positive number.


10/31

Item Pool Which are Content Valid?

1. 13 + 2 =___

2. 12 5 =____

3. 10 13 = ____

4. 26 15 = ____

5. 13 + 4 7 = ____

6. Sammy has 10 pennies. He lost 2. How many pennies does

Sammy have now?

A. 2 pennies; B. 8 pennies; C. 10 pennies; D. 12 pennies


11/31

Example: Depression(Modified from the DSM IV)

A complex of symptoms marked by:

Disruptions in appetite and weight

Insomnia or hypersomnia

Loss of interest or pleasure in activities

Loss of energy

Feelings of worthlessness

Feels sad or empty nearly everyday

Frequent deathrelated thoughts


12/31

Item Pool Which are Content Valid?

I feel blue or sad.

I feel nervous when speaking to someone in

authority.

I have crying spells.

Im always willing to admit it when I make a

mistake.

I felt that everything I did was an effort.

I never resent being asked to return a favor.

I experience spells of terror or panic.


13/31

Assessing Content Validity

Steps for assessing content validity:

1. Describe the content domain

2. Determine the areas of the content domain that are measured

by each item

3. Compare the structure of the test with the structure of thecontent domain

Challenges:

Difficulty in defining the domain

Categorizing the content domain and map items to the categories

Ensure representativeness


14/31

Contamination & Deficiency

Construct Measure

Relevance

(Content Validity)

MeasureContamination

MeasureDeficiency


15/31

What do we want?

A measure that samples from all important

domains or aspects (Low Deficiency)

A measure that does not include anything

irrelevant (Low Contamination)

That is, a measure that adequately captures

all of the domains of the construct that it is

intended to measure. (High Content Validity)


16/31

Criterion-related Evidence for a Measure

What should this test predict? What inferences are we

going to use this test to make?

Criterion-related validation is data based.

Does the test actually predict behavior that it is

supposed to predict?

Correlate an honesty test with employee theft

Correlate a pencil and paper measure of delinquency

with arrest records

Correlate a measure of study habits with actual grades


17/31

Two Main Types of

Criterion-Related Validity

Predictive validityfuture criteria

Concurrent validitycurrentcriteria


18/31

Criterion-related validity:

Concurrent validity

Students who have been admitted to Wayne

State take the SAT. Their GPA is recorded at the

same time. The correlation between the test scores and

performance is computed. This correlation is

sometimes called a validity coefficient.


19/31

Criterion-related validity:

Predictive validity

Students take the SAT (or ACT) during High

School and then some are selected into Wayne

State. Later, their SAT scores are correlated withtheir college GPA.

This correlation is also sometimes called a

validity coefficient.

If SAT scores and college GPA are correlated,

then the SAT has some degree of predictive

validity for predicting college GPA.


20/31

Problem:

Small Samples = Imprecise Estimates

Sample Size Observed

Correlation

Lower Bound of

95% CI

Upper Bound of

95% CI

10 .50 -.33 .89

20 .50 .04 .79

50 .50 .25 .69

100 .50 .33 .64

200 .50 .39 .60

400 .50 .42 .571000 .50 .45 .55


21/31

Problem: Range Restriction

Range Restriction The variance in scores in the

sample at hand is smaller than the variance in

scores in the population of interest.

Range restriction is thought to reduce theobserved correlation between test scores and

criterion measures. (Exceptions are possible)

In the previous examples where was the

restriction/why was there restriction?


22/31

Example: range restriction

JobPerforman

ce

General cognitive ability


23/31


JobPerforman

ce



24/31


JobPerforman

ce



25/31

When/where might we find

range restriction?

Sample of employees chosen based on high test

scores and interview scores (high scores on

predictor) Sample of current employees promoted due to

high performance (high scores on criterion

measure)

In both cases variability is being reduced (either

in the predictor variable or in the criterion

variable)


26/31

Measurement Error

Reliability Index of the presence of

measurement error (1.0 reliability = No error)

Unreliability in the predictor and criterion serves

to reduce (attenuate) their observed correlation Researchers are often concerned about

attenuation in predictor-criterion associations


27/31

When/where might we find unreliability?

Everywhere!

Tests used as predictors (e.g., measures of

depression)

Criterion measures (e.g., ratings of client

well-being)

Unreliability is a concern for both

predictors and criteria Unreliability in

both can reduce correlations


28/31

Assume that measures of X and Y have

alphas of .60 and .70, respectively. The

observed r between X and Y is .40. However,

we might want to know how much this

correlation is depressedby

measurement error.


29/31

Correction for Attenuation

Where:

rxy = observed correlation between x and y

rxx and ryy = reliability coefficients for x and y

xy

c

xx yy

rr

r r


30/31

Correcting for Measurement Error

Reliability

Measure x

Reliability

Measure y

Observed

Correlation

Corrected

Correlation

.50 .60 .40 .73

.60 .70 .40 .62

.70 .80 .40 .53

.80 .90 .40 .47

.90 .90 .40 .44


31/31

Summary Issues

Criterion-related Validity What sample will we use?

Small Samples More Imprecision in the correlation estimate

Issues of Generalization

What is our Criterion? How do we measure it?

Variability is needed for both Predictor and Criterion variables

Attenuation Due to Measurement Error

Predictor-Criterion Overlap

Same items on both measures bad!

6. Validity

Documents

Transcript of 6. Validity