Measurement in Psychology I: RELIABILITY Lawrence R. Gordon.
-
Upload
fatima-aylward -
Category
Documents
-
view
220 -
download
0
Transcript of Measurement in Psychology I: RELIABILITY Lawrence R. Gordon.
Measurement in Psychology I:RELIABILITY
Lawrence R. Gordon
Do you support the civil union legislation?
What are some of the ways in which you can ask this question?
How do you measure the response (operational definitions)?
Levels of Measurement
Nominal scales giving names to data, putting into categories Examples: sex, race labels; baseball uniform
numbers
Ordinal scales numbers give order but not distance Examples: mailbox numbers; class rankings
Levels of Measurement (cont.)
Interval scales numbers indicate order and distance (they are
separated by equal distances or intervals) Example: Fahrenheit temperature
Ratio scales numbers indicate order, distance, AND have a
true zero point (zero = there isn’t any) Examples: height; weight; miles per hour; time
Levels of Measurement ExampleAuto race which started at 2 pm
Driver Make FinishOrder
FinishTime
ElapsedTime
Mary Corvette 1 3:00 1.00
Joe Mustang 2 3:15 1.25
Tom BMW 3 3:30 1.50
Ann Ferrari 4 4:00 2.00
Nominal Nominal Ordinal Interval Ratio
Closed vs. Open Responses
Closed responses (a.k.a. forced choice) Examples (rate civil union support on a scale 1 to
9) Advantages
• you know what the responses will be (or what they should be!) because of restrictions on choice
• easy to empirically evaluate (relatively)
• gives data that gives a straightforward answer to how you ask your question
• coding not necessary, usually
Closed vs. Open ResponsesClosed responses (a.k.a. forced choice)
Disadvantages • may not be sensitive enough to get some interesting
information• will not give you as clear an indication of what
participants think/feel/report
“Do you agree that same-sex couples should have the right to marry/civil union?”
1 2 3 4 5 6 7 8 9Disagree Agree
Completely Completely
Support Civil Union (histogram)
Agreement (9='Agree Completely')
9.08.07.06.05.04.03.02.01.0
Attitudes toward Civil Union
Psyc 109, Fall 2001
Fre
qu
en
cy (
of
19
5)
140
120
100
80
60
40
20
0
Agreement (9='Agree Completely')
9.08.07.06.05.04.03.02.01.0
Attitudes toward Civil Union
Psyc 109, Fall 2002
Fre
qu
en
cy (
of
19
5)
140
120
100
80
60
40
20
0
Support Civil Union (area graph)Attitudes toward Civil Union
Psyc 109, Fall 2001
Agreement
Agree Cmpltly
8.00
7.00
6.00
Midpoint
4.00
3.00
2.00
Disagr Cmpltly
Fre
qu
en
cy (
of
19
5)
140
120
100
80
60
40
20
0
Attitudes toward Civil Union
Psyc 109, Fall 2002
Agreement
Agree Cmpltly8.00
7.006.00
Midpoint4.00
3.002.00
Disagr Cmpltly
Fre
qu
en
cy (
of
19
5)
140
120
100
80
60
40
20
0
Compare the Graphs: Same Info
Agreement (9='Agree Completely')
9.08.07.06.05.04.03.02.01.0
Attitudes toward Civil Union
Psyc 109, Fall 2002
Fre
qu
en
cy (o
f 1
95
)
140
120
100
80
60
40
20
0
Attitudes toward Civil Union
Psyc 109, Fall 2002
Agreement
Agree Cmpltly8.00
7.006.00
Midpoint4.00
3.002.00
Disagr Cmpltly
Fre
qu
en
cy (o
f 1
95
)
140
120
100
80
60
40
20
0
Closed vs. Open Responses
Open responses (a.k.a. free response) • Examples (Do you support the civil union
legislation? Why?)
Example from the survey used the first day?“Please describe yourself in 12 words or less”
• more on this in a bit...
Advantages• gives any answer participant wants
• not restricted by choices
Closed vs. Open Responses
Open responses (cont.) Disadvantages
• have to code to empirically evaluate (time intensive, need to find people who will do it)
• reliability issues!
Reliability
Consistency (stays the same)Repeatable (get the same results again and
again) Measures need to be reliable to be good
measuresNow, some nitty-gritty...
Reliability (cont.)
Measuring closed responses you don’t need to put things into categories reliable over time (do you get the same answers
again and again?) if the answers vary greatly from one time of
measurement to the next, the measurement is not reliable
Reliability (cont.)Measuring closed responses (cont.)
scales (sets of questions designed to measure something) need to be given multiple times, or in multiple forms, and the answers must remain similar for the scale to be reliable
Example (personality scale?)
Types of reliability Stability (“test-retest reliability”) Equivalence (“parallel forms reliability”) Consistency (“split-half reliability”) Homogeneity (“internal consistency reliability”)
Reliability Quick ExampleAny test, scale, inventory with items: E.g., a 50-item test, scored 0-50:
Form A 9/4 9/4, Form AExaminee 9/4 9/25 Form A Form B Odd Even1 George 27 35 27 33 15122 Alice 49 46 49 40 30193 Mary 30 35 30 27 13174 Larry 16 10 16 19 795 Linda 27 24 27 20 10176 Doug 40 42 40 48 22187 Chuck 21 18 21 35 10118 Judy 42 39 42 35 1923
Test-retest: Form A, 9/4 vs 9/25 (“r=.92") StabilityParallel forms: Form A vs Form B, 9/4 (“r=.69") EquivalenceCross form: Form A 9/25 vs Form B 3/19 (“r=.72") Stab & EquivSplit-half: Odd vs Even, Form A 9/4 (“r=.79") ConsistencyAlpha reliability No example – data from all 50 items Internal consistency
Reliability (cont.)
Measuring open responses Will often code into categories (Examples) How do you assess reliability?
Reliability (cont.)Measuring open responses (cont.)
Does everyone put the response into the same category? If yes, you have good inter-coder reliability
more specific operational definitions will increase this reliability
Coding personality responses into categories Using positive, negative, and neutral
descriptors
Reliability (cont.)Measuring behavioral responses through
observation special cases of open response, can’t really
control what participants do coding and/or rating what you observe reliability of ratings (interrater reliability? If all
raters agree on the rating, then yes.) need to be very clear on operational definitions
Baggage claim study (Scherer & Ceschi, 2000)
Assessing Reliability
Steps decide on operational definitions of your
variables and scale(s) of measurement train your coders/raters, answer questions, and
alleviate confusion do the coding and rating compare responses were the measurements reliable?
Reliability ExerciseMeasuring your personalityLooking for “big” traits
defining big traits and training coders The Big Five Personality Factors
1. Open to Experience (O) vs. Closed to Experience (NO)
2. Conscientious (C) vs. Nonconscientious (NC)
3. Extraverted (E) vs. Introverted (NE)
4. Agreeable (A) vs. Unagreeable (NA)
5. Neurotic (N) vs. Nonneurotic (NN)
Which one best fits the description?Do the coding!
Reliability ExerciseMeasuring your personalityLooking for “big” traitscompare responses to other coders
intercoder reliability List number on which you agreed List number on which you disagreed Calculate the percentages
were the measurements reliable?
And for next time…is reliability enough?
If your measurement is reliable, does that mean that it is good?
Does being reliable make your measurement valid?