Research Design Part 2 Variability, Validity, Reliability.
-
Upload
allison-herrin -
Category
Documents
-
view
225 -
download
0
Transcript of Research Design Part 2 Variability, Validity, Reliability.
Research Design Part 2Variability, Validity, Reliability
Objectives
Refine research purpose & questions
Variability
Validity External, internal, criterion, content, construct
Reliability Test-retest, inter-rater, internal consistency,
instrument
Variability
Different values of the independent variable
3 sources… systematic, error, extraneous
Variability
1. Systematic Variability within the Ind. variables
Design study to maximize systematic variability Rewards vs. management styles
Select the right sample & methods
Variability
2. Error Sampling & measurement error Eliminate as many conditions as possible
Similar leagues, ages, abilities Increase reliability of the instrument
Variability
3. Extraneous Control as much as possible Not a planned part of the research Influence outcome that we don’t want Examples…
Variability
3. Extraneous Examples
Measure teaching techniques of V & R between 2 sections of 497 to see level of comprehension.
Measure differences between a week of TRX & Crossfit using the same fitness assessment at the end
Main Function of Research…
Maximize systematic variability, control extraneous variability, & minimize error variability
Validity & Reliability
Validity Degree to which something measures what it is supposed to
measure
Reliability Consistency or repeatability of results
Validity & Reliability
You are hitting the target consistently, but missing the center.
Consistently and systematically measuring the wrong value for all respondents
Random hits spread across the target but seldom hit the center
Get a good group average, but not a consistent one
Hits are spread across the target but consistently missing the center.
consistently hit the center of the target
Validity & Reliability
Can a measurement/instrument be reliable, but not valid? Weighing on a broken scale
Can a measurement/instrument be valid, but not reliable?
To be useful, a test/measurement must be both valid and reliable
Validity
External validity
Internal validity
Test/criterion validity
Content validity
Construct validity
External Validity
Generalizability of the results
Population external validity Characteristics & results can only be generalized to those with
similar characteristics Does sample represent the entire population
Demographics Psych experiments with college students Use multiple PE classes, intramural leagues, sports, teams,
conferences Control through sampling
External Validity
Ecological external validity Conditions of the research are generalizable to similar
characteristics Physical surroundings Time of day
AM vs. PM
More common in testing … GRE
Internal Validity
Confidence in the cause and effect relationship in a study.
Strongest when the study’s design (subjects, instruments/measurements,
and procedures) effectively controls possible sources of error so that
those sources are not reasonably related to the study’s results.
The key question that you should ask in any experiment is:
“Could there be an alternative cause, or causes, that explain my
observations and results?”
Internal Validity
History Extraneous incidents/events that occur during the research to
effect results Only impacts studies across time Attendance at football games/coaching change Survey at IHSA @ parent behavior & parent fight breaks out in
middle of survey across the gym
Internal Validity
Selection If there are systematic differences in groups of subjects
Gender – boys more active than girls Higher motivation level More positive attitude toward study
Compare GRE scores & grad school performance between sequences
Occurs when random sampling isn’t used
Internal Validity
Statistical regression If doing pre-test/post-test those
scoring extremely high or low on first test will often “regress to the mean” on the second test Scoring based more on luck than
actual performance The regression effect causes the
change & not the treatment Don’t group the high/low scores for
the post-test
Note: The less reliable the instrument the greater the regression.
Internal Validity
Pre-testing Pre-test can increase/decrease motivation
Gives subjects opportunities to practice
Practice can be a positive so they get a true score Pedometers (A. McGee thesis)
Instrument can make people think after the pre-test Motivation instruments
Internal Validity
Instrumentation Changes in calibration of the exam, instrument – Experimental
research Changes in observer scoring
Fatigue/ boredom Reality judging shows
Maturation Experimental research
Internal Validity
Diffusion of intervention Experimental research
Attrition/Mortality Subjects drop out/lost
Low scorers on GRE drop out of grad school Coaching techniques & loss of players
Internal Validity
Experimenter effect Presence, demeanor of researcher impacts +/- Course instructor is PI
Course evals Coach or teacher conducting the study
Teacher staying in the room when they complete PAQ-C
Subject effect Subjects’ behaviors change because they are subjects Subjects want to present themselves in the best light Hawthorn effect
Test/Criterion Validity
Degree to which a measure/test is related to some recognized standard or criterion
Increase test validity Create an intelligence test and then compare subjects scores on our test
with their scores on the IQ test Use 2 motivation instruments Giving subjects our intelligence test and the IQ test at the same time Use abbreviated Myers Briggs – 126 vs. 72 items at the same time
Content Validity
Also called face validity
Degree to which a test adequately samples what is covered in a course
Usually used in education
Does a measurement appear to measure what it purports to measure?
No statistical measure/systematic procedure to test this
Content Validity
Often, experts (panel) are used to verify the content validity of measurements in research studies
Content validity is useful, but not the strongest/most credible way of evaluating a measurement
Examples Rewards listing Competency categories
Construct Validity
Degree to which a test/ measurement measures a hypothetical construct Overall quality of measurement
Construct Variables… recruitment, motivation, mental preparation
Examples Do the selected variables completely measure recruitment? How well does the instrument measure mental preparation? Do the questions adequately test motivation?
Construct Validity
Threats to construct validity
Using one method to measure the construct
Inadequate explanation of a construct Ex. Depression = lethargy, loss of appetite, difficulty in concentration, etc…
Measuring just one construct & making inferences Using 1 item to measure personality Ex. Myers Briggs = 4 dichotomies
Validity Overview
Content
Test
Construct
Test content
Standards
How well constructs describe relationship
Reliability
Degree to which a test/measurement yields consistent and repeatable results
Often reported as a correlation coefficient… Cronbach Alpha
Cronbach's alpha Internal consistencyα ≥ 0.9 Excellent0.8 ≤ α < 0.9 Good0.7 ≤ α < 0.8 Acceptable0.6 ≤ α < 0.7 Questionable0.5 ≤ α < 0.6 Poorα < 0.5 Unacceptable
Look at articles.
4 Sources of Measurement Error
1. The participants Health Motivation Mood Fatigue Anxiety
4 Sources of Measurement Error
2. The testing Changes in time limits Changes in directions How rigidly the instructions were followed Atmosphere of the test/conditions
Sources of Measurement Error
3. The instrumentation Sampling of items
Calibration of (mechanical) instruments
Poor questions
4. The scoring Different scoring procedures Competence, experience, dedication of scorers GRE…
Methods to Establish Reliability
Test-Retest Reliability (stability)
Alternate forms
Internal consistency
Agreement/ Inter-rater Reliability
Test-Retest Reliability (1)
Repeat test on same subjects at a later time Usually retest on a different day Use correlation coefficients between subjects’ two
scores
Used extensively in fitness & motor skills tests
Used less for pencil and paper tests
Alternate Forms Reliability (2)
Alternate forms reliability Construct 2 tests that measure the same thing
Give the 2 tests to the same individuals at about the same time.
Highly used on standardized tests CPRP/CPRE exam (125 questions)
Rarely used on physical tests because of the difficulty to develop 2 equivalent tests
Internal Consistency Reliability (3)
Split half reliability
Similar to alternate form except 1 form is used Divide form into comparable halves (even
?’s/odd ?’s) Do not use first half vs. second half because of testing
fatigue Correlate # of odds & evens correct
Internal Consistency Reliability (3)
Average inter-item correlation
Identify question numbers that measure a construct.
Correlate the responses to these questionsPsychological tests
Inter-Rater Reliability (4)
Two or more persons rate or observe
Common in observational research & performance based assessments involving judgmentsGRE writing exam scoring
Will be expressed as a correlation coefficient or a percentage of agreement
Does not indicate anything about consistency of performances
In Summary
Pick variables that have a chance of varying (systematic variability)
Pick a reliable instrument (error variability, statistical regression, reliability)
Use random sampling whenever possible (extraneous variability, internal validity)
Control external validity thru sampling process at multiple sites (population external validity)
In Summary
Control external validity thru similar environmental processes (ecological external validity)
Make sure survey measures what it is supposed to (content & construct validity)
Fully plan data collection process (reliability)