Research Design Part 2 Variability, Validity, Reliability.

Research Design Part 2Variability, Validity, Reliability

Objectives

Refine research purpose & questions

Variability

Validity External, internal, criterion, content, construct

Reliability Test-retest, inter-rater, internal consistency,

instrument

Variability

Different values of the independent variable

3 sources… systematic, error, extraneous

Variability

1. Systematic Variability within the Ind. variables

Design study to maximize systematic variability Rewards vs. management styles

Select the right sample & methods

Variability

2. Error Sampling & measurement error Eliminate as many conditions as possible

Similar leagues, ages, abilities Increase reliability of the instrument

Variability

3. Extraneous Control as much as possible Not a planned part of the research Influence outcome that we don’t want Examples…

Variability

3. Extraneous Examples

Measure teaching techniques of V & R between 2 sections of 497 to see level of comprehension.

Measure differences between a week of TRX & Crossfit using the same fitness assessment at the end

Main Function of Research…

Maximize systematic variability, control extraneous variability, & minimize error variability

Validity & Reliability

Validity Degree to which something measures what it is supposed to

measure

Reliability Consistency or repeatability of results


You are hitting the target consistently, but missing the center.

Consistently and systematically measuring the wrong value for all respondents

Random hits spread across the target but seldom hit the center

Get a good group average, but not a consistent one

Hits are spread across the target but consistently missing the center.

consistently hit the center of the target


Can a measurement/instrument be reliable, but not valid? Weighing on a broken scale

Can a measurement/instrument be valid, but not reliable?

To be useful, a test/measurement must be both valid and reliable

Validity

External validity

Internal validity

Test/criterion validity

Content validity

Construct validity

External Validity

Generalizability of the results

Population external validity Characteristics & results can only be generalized to those with

similar characteristics Does sample represent the entire population

Demographics Psych experiments with college students Use multiple PE classes, intramural leagues, sports, teams,

conferences Control through sampling

External Validity

Ecological external validity Conditions of the research are generalizable to similar

characteristics Physical surroundings Time of day

AM vs. PM

More common in testing … GRE

Internal Validity

Confidence in the cause and effect relationship in a study.

Strongest when the study’s design (subjects, instruments/measurements,

and procedures) effectively controls possible sources of error so that

those sources are not reasonably related to the study’s results.

The key question that you should ask in any experiment is:

“Could there be an alternative cause, or causes, that explain my

observations and results?”

Internal Validity

History Extraneous incidents/events that occur during the research to

effect results Only impacts studies across time Attendance at football games/coaching change Survey at IHSA @ parent behavior & parent fight breaks out in

middle of survey across the gym

Internal Validity

Selection If there are systematic differences in groups of subjects

Gender – boys more active than girls Higher motivation level More positive attitude toward study

Compare GRE scores & grad school performance between sequences

Occurs when random sampling isn’t used

Internal Validity

Statistical regression If doing pre-test/post-test those

scoring extremely high or low on first test will often “regress to the mean” on the second test Scoring based more on luck than

actual performance The regression effect causes the

change & not the treatment Don’t group the high/low scores for

the post-test

Note: The less reliable the instrument the greater the regression.

Internal Validity

Pre-testing Pre-test can increase/decrease motivation

Gives subjects opportunities to practice

Practice can be a positive so they get a true score Pedometers (A. McGee thesis)

Instrument can make people think after the pre-test Motivation instruments

Internal Validity

Instrumentation Changes in calibration of the exam, instrument – Experimental

research Changes in observer scoring

Fatigue/ boredom Reality judging shows

Maturation Experimental research

Internal Validity

Diffusion of intervention Experimental research

Attrition/Mortality Subjects drop out/lost

Low scorers on GRE drop out of grad school Coaching techniques & loss of players

Internal Validity

Experimenter effect Presence, demeanor of researcher impacts +/- Course instructor is PI

Course evals Coach or teacher conducting the study

Teacher staying in the room when they complete PAQ-C

Subject effect Subjects’ behaviors change because they are subjects Subjects want to present themselves in the best light Hawthorn effect

Test/Criterion Validity

Degree to which a measure/test is related to some recognized standard or criterion

Increase test validity Create an intelligence test and then compare subjects scores on our test

with their scores on the IQ test Use 2 motivation instruments Giving subjects our intelligence test and the IQ test at the same time Use abbreviated Myers Briggs – 126 vs. 72 items at the same time

Content Validity

Also called face validity

Degree to which a test adequately samples what is covered in a course

Usually used in education

Does a measurement appear to measure what it purports to measure?

No statistical measure/systematic procedure to test this

Content Validity

Often, experts (panel) are used to verify the content validity of measurements in research studies

Content validity is useful, but not the strongest/most credible way of evaluating a measurement

Examples Rewards listing Competency categories

Construct Validity

Degree to which a test/ measurement measures a hypothetical construct Overall quality of measurement

Construct Variables… recruitment, motivation, mental preparation

Examples Do the selected variables completely measure recruitment? How well does the instrument measure mental preparation? Do the questions adequately test motivation?

Construct Validity

Threats to construct validity

Using one method to measure the construct

Inadequate explanation of a construct Ex. Depression = lethargy, loss of appetite, difficulty in concentration, etc…

Measuring just one construct & making inferences Using 1 item to measure personality Ex. Myers Briggs = 4 dichotomies

Validity Overview

Content

Test

Construct

Test content

Standards

How well constructs describe relationship

Reliability

Degree to which a test/measurement yields consistent and repeatable results

Often reported as a correlation coefficient… Cronbach Alpha

Cronbach's alpha Internal consistencyα ≥ 0.9 Excellent0.8 ≤ α < 0.9 Good0.7 ≤ α < 0.8 Acceptable0.6 ≤ α < 0.7 Questionable0.5 ≤ α < 0.6 Poorα < 0.5 Unacceptable

Look at articles.

4 Sources of Measurement Error

1. The participants Health Motivation Mood Fatigue Anxiety

4 Sources of Measurement Error

2. The testing Changes in time limits Changes in directions How rigidly the instructions were followed Atmosphere of the test/conditions

Sources of Measurement Error

3. The instrumentation Sampling of items

Calibration of (mechanical) instruments

Poor questions

4. The scoring Different scoring procedures Competence, experience, dedication of scorers GRE…

Methods to Establish Reliability

Test-Retest Reliability (stability)

Alternate forms

Internal consistency

Agreement/ Inter-rater Reliability

Test-Retest Reliability (1)

Repeat test on same subjects at a later time Usually retest on a different day Use correlation coefficients between subjects’ two

scores

Used extensively in fitness & motor skills tests

Used less for pencil and paper tests

Alternate Forms Reliability (2)

Alternate forms reliability Construct 2 tests that measure the same thing

Give the 2 tests to the same individuals at about the same time.

Highly used on standardized tests CPRP/CPRE exam (125 questions)

Rarely used on physical tests because of the difficulty to develop 2 equivalent tests

Internal Consistency Reliability (3)

Split half reliability

Similar to alternate form except 1 form is used Divide form into comparable halves (even

?’s/odd ?’s) Do not use first half vs. second half because of testing

fatigue Correlate # of odds & evens correct

Internal Consistency Reliability (3)

Average inter-item correlation

Identify question numbers that measure a construct.

Correlate the responses to these questionsPsychological tests

Inter-Rater Reliability (4)

Two or more persons rate or observe

Common in observational research & performance based assessments involving judgmentsGRE writing exam scoring

Will be expressed as a correlation coefficient or a percentage of agreement

Does not indicate anything about consistency of performances

In Summary

Pick variables that have a chance of varying (systematic variability)

Pick a reliable instrument (error variability, statistical regression, reliability)

Use random sampling whenever possible (extraneous variability, internal validity)

Control external validity thru sampling process at multiple sites (population external validity)

In Summary

Control external validity thru similar environmental processes (ecological external validity)

Make sure survey measures what it is supposed to (content & construct validity)

Fully plan data collection process (reliability)

Research Design Part 2 Variability, Validity, Reliability.

Documents

Transcript of Research Design Part 2 Variability, Validity, Reliability.