6. Validity

download 6. Validity

of 31

Transcript of 6. Validity

  • 8/2/2019 6. Validity

    1/31

    PSY 6535Psychometric Theory

    Validity Part 1

  • 8/2/2019 6. Validity

    2/31

    Overview

    Content validity

    Criterion-related validity

  • 8/2/2019 6. Validity

    3/31

    Issues of Validity

    Does the test actually measure what it is

    purported to measure? Do differences in tests

    scores reflect true differences in the

    underlying construct?

    Are inferences based on the test scores

    justified?

  • 8/2/2019 6. Validity

    4/31

    Example:

    Validity of a Measure

    The use of the polygraph (lie detector test) is

    not nearly as valid as some say and can easily

    be beaten and should never be admitted into

    evidence in courts of law, say psychologists

    from two scientific communities who were

    surveyed on the validity of polygraphs. APANews Release

  • 8/2/2019 6. Validity

    5/31

    Validity is About Inferences.

    Cronbach (1971): Validation is the process of

    collecting evidence to support the types of

    inferences that are drawn from test scores.

    Validity is the degree to which all ofthe

    accumulated evidence supports the intended

    interpretation of test scores for the intended

    purpose. (AERA, APA, NCME, 1999, p. 11).

  • 8/2/2019 6. Validity

    6/31

    Validity for what?

    Inferences and decisions based on test scores

    A person with this score is likely to

    Be a better parent Do well in law school

    Be most satisfied as an engineer

    Steal from his/her employer

  • 8/2/2019 6. Validity

    7/31

    Types of Validity

    Content

    Criterion-based

    ConstructConstruct

    (general evidence gathering)

    Content

    (more theory-based)

    Criterion-related

    (more data-based)

  • 8/2/2019 6. Validity

    8/31

    Content Validity of a Measure

    Collectively, do the items adequately

    represent all of the domains of the construct

    of interest?

    Staring Point: A Well Defined Construct.

    Often have a panel of experts judge whether

    items adequately sample the domain of

    interest.

  • 8/2/2019 6. Validity

    9/31

    Example: 1st Grade Math Objectives

    What 1st Graders in School District X Should:

    A. Be able to add any two positive numbers

    whose sum is 20 or less.

    B. Subtract any two numbers (each less than

    15) whose difference is a positive number.

  • 8/2/2019 6. Validity

    10/31

    Item Pool Which are Content Valid?

    1. 13 + 2 =___

    2. 12 5 =____

    3. 10 13 = ____

    4. 26 15 = ____

    5. 13 + 4 7 = ____

    6. Sammy has 10 pennies. He lost 2. How many pennies does

    Sammy have now?

    A. 2 pennies; B. 8 pennies; C. 10 pennies; D. 12 pennies

  • 8/2/2019 6. Validity

    11/31

    Example: Depression(Modified from the DSM IV)

    A complex of symptoms marked by:

    Disruptions in appetite and weight

    Insomnia or hypersomnia

    Loss of interest or pleasure in activities

    Loss of energy

    Feelings of worthlessness

    Feels sad or empty nearly everyday

    Frequent deathrelated thoughts

  • 8/2/2019 6. Validity

    12/31

    Item Pool Which are Content Valid?

    I feel blue or sad.

    I feel nervous when speaking to someone in

    authority.

    I have crying spells.

    Im always willing to admit it when I make a

    mistake.

    I felt that everything I did was an effort.

    I never resent being asked to return a favor.

    I experience spells of terror or panic.

  • 8/2/2019 6. Validity

    13/31

    Assessing Content Validity

    Steps for assessing content validity:

    1. Describe the content domain

    2. Determine the areas of the content domain that are measured

    by each item

    3. Compare the structure of the test with the structure of thecontent domain

    Challenges:

    Difficulty in defining the domain

    Categorizing the content domain and map items to the categories

    Ensure representativeness

  • 8/2/2019 6. Validity

    14/31

    Contamination & Deficiency

    Construct Measure

    Relevance

    (Content Validity)

    MeasureContamination

    MeasureDeficiency

  • 8/2/2019 6. Validity

    15/31

    What do we want?

    A measure that samples from all important

    domains or aspects (Low Deficiency)

    A measure that does not include anything

    irrelevant (Low Contamination)

    That is, a measure that adequately captures

    all of the domains of the construct that it is

    intended to measure. (High Content Validity)

  • 8/2/2019 6. Validity

    16/31

    Criterion-related Evidence for a Measure

    What should this test predict? What inferences are we

    going to use this test to make?

    Criterion-related validation is data based.

    Does the test actually predict behavior that it is

    supposed to predict?

    Correlate an honesty test with employee theft

    Correlate a pencil and paper measure of delinquency

    with arrest records

    Correlate a measure of study habits with actual grades

  • 8/2/2019 6. Validity

    17/31

    Two Main Types of

    Criterion-Related Validity

    Predictive validityfuture criteria

    Concurrent validitycurrentcriteria

  • 8/2/2019 6. Validity

    18/31

    Criterion-related validity:

    Concurrent validity

    Students who have been admitted to Wayne

    State take the SAT. Their GPA is recorded at the

    same time. The correlation between the test scores and

    performance is computed. This correlation is

    sometimes called a validity coefficient.

  • 8/2/2019 6. Validity

    19/31

    Criterion-related validity:

    Predictive validity

    Students take the SAT (or ACT) during High

    School and then some are selected into Wayne

    State. Later, their SAT scores are correlated withtheir college GPA.

    This correlation is also sometimes called a

    validity coefficient.

    If SAT scores and college GPA are correlated,

    then the SAT has some degree of predictive

    validity for predicting college GPA.

  • 8/2/2019 6. Validity

    20/31

    Problem:

    Small Samples = Imprecise Estimates

    Sample Size Observed

    Correlation

    Lower Bound of

    95% CI

    Upper Bound of

    95% CI

    10 .50 -.33 .89

    20 .50 .04 .79

    50 .50 .25 .69

    100 .50 .33 .64

    200 .50 .39 .60

    400 .50 .42 .571000 .50 .45 .55

  • 8/2/2019 6. Validity

    21/31

    Problem: Range Restriction

    Range Restriction The variance in scores in the

    sample at hand is smaller than the variance in

    scores in the population of interest.

    Range restriction is thought to reduce theobserved correlation between test scores and

    criterion measures. (Exceptions are possible)

    In the previous examples where was the

    restriction/why was there restriction?

  • 8/2/2019 6. Validity

    22/31

    Example: range restriction

    JobPerforman

    ce

    General cognitive ability

  • 8/2/2019 6. Validity

    23/31

    Example: range restriction

    JobPerforman

    ce

    General cognitive ability

  • 8/2/2019 6. Validity

    24/31

    Example: range restriction

    JobPerforman

    ce

    General cognitive ability

  • 8/2/2019 6. Validity

    25/31

    When/where might we find

    range restriction?

    Sample of employees chosen based on high test

    scores and interview scores (high scores on

    predictor) Sample of current employees promoted due to

    high performance (high scores on criterion

    measure)

    In both cases variability is being reduced (either

    in the predictor variable or in the criterion

    variable)

  • 8/2/2019 6. Validity

    26/31

    Measurement Error

    Reliability Index of the presence of

    measurement error (1.0 reliability = No error)

    Unreliability in the predictor and criterion serves

    to reduce (attenuate) their observed correlation Researchers are often concerned about

    attenuation in predictor-criterion associations

  • 8/2/2019 6. Validity

    27/31

    When/where might we find unreliability?

    Everywhere!

    Tests used as predictors (e.g., measures of

    depression)

    Criterion measures (e.g., ratings of client

    well-being)

    Unreliability is a concern for both

    predictors and criteria Unreliability in

    both can reduce correlations

  • 8/2/2019 6. Validity

    28/31

    Assume that measures of X and Y have

    alphas of .60 and .70, respectively. The

    observed r between X and Y is .40. However,

    we might want to know how much this

    correlation is depressedby

    measurement error.

  • 8/2/2019 6. Validity

    29/31

    Correction for Attenuation

    Where:

    rxy = observed correlation between x and y

    rxx and ryy = reliability coefficients for x and y

    xy

    c

    xx yy

    rr

    r r

  • 8/2/2019 6. Validity

    30/31

    Correcting for Measurement Error

    Reliability

    Measure x

    Reliability

    Measure y

    Observed

    Correlation

    Corrected

    Correlation

    .50 .60 .40 .73

    .60 .70 .40 .62

    .70 .80 .40 .53

    .80 .90 .40 .47

    .90 .90 .40 .44

  • 8/2/2019 6. Validity

    31/31

    Summary Issues

    Criterion-related Validity What sample will we use?

    Small Samples More Imprecision in the correlation estimate

    Issues of Generalization

    What is our Criterion? How do we measure it?

    Variability is needed for both Predictor and Criterion variables

    Attenuation Due to Measurement Error

    Predictor-Criterion Overlap

    Same items on both measures bad!