Reliability - The extent to which a test or instrument gives consistent measurement
-
Upload
orla-simmons -
Category
Documents
-
view
16 -
download
1
description
Transcript of Reliability - The extent to which a test or instrument gives consistent measurement
• Reliability- The extent to which a test or instrument gives consistent measurement- The strength of the relation between observed scores and true scores.
• Test-retest reliability (coefficient of stability)- Correlate two administrations of the same test.
• Parallel form reliability (coefficient of equivalence)- Correlate two forms of the same test
• Split half reliability (Spearman-Brown prophecy formula)- Correlate two halves of the test
• Internal consistency reliability (Cronbach α)- Correlate every item with every other item.
T X1
Reliability is the extent to which your observed score represents your true scoreE = X – TThe test yielding the score of X1 is more reliable than that giving X2
X2
Reliability is the extent to which individual differences or rank ordering of individuals based on the observed scores represent that based on the true scores.
One operations of this definition is the correlation between observed scores and true scores, XT, which is called reliability index.
Another operation is the squared correlation between observed score and true score or the proportion of observed score variance that is true score variance, or proportion of the consistent rank ordering, XT
2
T
X
In reality, it is the extent to which two tests yield similar results or similar rank
ordering of the individuals, XX’
X’
X
Test-retestParallel formSplit halfInternal consistency
When ρxx' = 1,1. the measurement has been made without error
(e=0 for all examinees).2. X = T for all examinees.3. all observed score variance reflects true-score
variance.4. all difference between observed scores are true
score differences.5. the correlation between observed scores and
true scores is 1.6. the correlation between observed scores and
errors is zero.
When ρxx’ = 0, 1. only random error is included in the
measurement. 2. X = E for all examinees. 3. all observed score variance reflects error
variance. 4. all difference between observed scores are
errors of measurement. 5. the correlation between observed scores and
true scores is 0. 6. the correlation between the observed scores
and errors is 1.
When ρxx’ is between zero and 1,
1. the measurement include some error and some truth.
2. X = T + E.
3. observed score variance include true-score and error variance.
4. difference between scores reflect true-score differences and error.
5. the correlation between observed scores and true scores is reliability.
6. the correlation between observed scores and error is the square root of 1 – reliability.
• Validity- The extent to which a test or instrument truly measures what it is expected to measure. - The use of a bathroom scale to measure weight is valid whereas the use of a bathroom scale to measure height is invalid.
• Content validity refers to the extent to which the items on a test are representative of a specified domain content. Achievement and aptitude (but not personality and attitude) tests are concerned with content validity.
• Construct validity refers to the extent to which items on a test are representative of the underlying construct, e.g., personality or attribute. Personality and attitude tests are concerned with construct validity. The process to establish construct validity is referred to as construct validation.
• Criterion related validity, including predictive validity and concurrent validity, refers to the extent to which a test correlates with future behaviors which the test is intended to predict.
0.660.820.740.670.82
0.790.720.380.800.56
0.43 -0.04
0.650.64
0.650.050.180.42
EasygoingResponsivenes
s
0.54 0.650.40 0.610.58 0.460.39 0.520.64
0.37 0.820.57 0.630.74 0.670.68
0.640.690.65
0.480.590.680.62
0.480.590.680.62
AuthoritativeParenting
AuthoritarianParenting
0.84
0.94
0.85
1.04
0.88
0.93
0.78
1.01
- 0.38
PhysicalPunishment
NonReasoning
AuthoritarianDirectiveness
VerbalHostility
Warmth
InductiveReasoning
DemocraticParticipation
Construct Validity: Internal Structure
Communication Avoidance
Social Withdrawal
Assertive Leadership
Behavioral Aggression
Verbal Aggression
.55
.58
.73
.96
.94
.94
.90
.70
.60
.65
.67
.69
.87
.89
.82
Perceived Social Competence
Time 1
Perceived Social Competence
Time 2
Peer Acceptance
Time 1
Peer Acceptance
Time 2
.59
.50
.54
.65
.62
.66
Single Indicator
Single Indicator
.54
.24
-.38
-.16 -.24
-.13
-.35-.13
.23
.27.17
-.27
-.15
-.17
-.20
Construct Validity: Network Relations
A Synthetic Multitrait-Multimethod Matrix
Method 1 Method 2 Method 3
Traits A1 B1 C1 A2 B2 C2 A3 B3 C3
A1 (.95)
Method 1 B1 .28 (.86)
C1 .58 .39 (.92)
A2 .86 .32 .57 (.95)
Method 2 B2 .30 .90 .40 .39 (.76)
C2 .52 .31 .86 .55 .26 (.84)
A3 .73 .10 .43 .64 .17 .37 (.48)
Method 3 B3 .10 .63 .17 .22 .67 .19 .15 (.41)
C3 .35 .16 .52 .31 .17 .56 .41 .30 (.58)
A Synthetic Multitrait-Multimethod Matrix
Method 1 Method 2 Method 3
Traits A1 B1 C1 A2 B2 C2 A3 B3 C3
A1 RL
Method 1 B1 DV RL
C1 DV DV RL
A2 CV HH HH RL
Method 2 B2 HH CV HH DV RL
C2 HH HH CV DV DV RL
A3 CV HH HH CV HH HH RL
Method 3 B3 HH CV HH HH CV HH DV RL
C3 HH HH CV HH HH CV DV DV RL
A-Level University GPA
SATA-Level
Criterion-Related Validity
Concurrent
Predictive
Rejected Selected
Qualifying score
Selected group
Test Scores
Cri
teri
on
Distribution of criterion scores for selected group
Distribution of scores on the criterion if no examinees were excluded
Restriction of Range Effect