Developing a Measure: scales, validity and reliability
description
Transcript of Developing a Measure: scales, validity and reliability
Developing a Measure: scales, validity and reliability
Types of Measures
1. Observational2. Physiological and Neuroscientific3. Self-report
--majority of social & behavioral science research
Self-report measures People’s replies to written
questionnaires or interviews Can measure:
▪ thoughts (cognitive self-reports)▪ feelings (affective self-reports)▪ actions (behavioral self-reports)
Self-Report
Self-reported momentary emotions: Positive and Negative Affect
Schedule (PANAS)
(Watson, Clark & Tellegen,1988) Indicate the extent you feel this way right now: enthusiasticNot at all enthusiastic 1 2 3 4 5
Very enthusiastic
Indicate the extent you feel this way right now: upset
Not at all upset 1 2 3 4 5 Very upset
Scales of Measurement
Nominal
Hot = 1
Warm = 3
Cold = 2
Ordinal
1st Place Sample
2nd Place Sample
3rd Place Sample4th Place Sample
5th Place Sample
Thing beingmeasured Interval Interval
Ratio
Scales of MeasurementFour Types
Distinction between scales is due to the meaning of numbers
1. Nominal Scale—numbers assigned are only labels.2. Ordinal Scale—a rank ordering.3. Interval Scale—each number equidistant from the
next, but no zero point (majority of measures).4. Ratio Scale—each number is equidistant and there
is a true zero point.
Scales of Measurement
Type of Scale Determines Statistics and Power
Statistics PowerNominal Chi-square LowOrdinal Rank-order tests ModerateInterval Parametric tests
(F-tests, t-tests)High
Ratio Parametric tests and math operations
High
Attributes of Good Measures Valid: measure assesses the
construct it is intended to and is not influenced by other factors
Reliable: the consistency of a measure, does it provide the same result repeatedly.
Reliability and Validity
Reliable but not Valid Dependable measure, but doesn’t measure what it should
Example: Arm length to measure self-esteem.
Valid but not Reliable Measures what it should, but not dependably
Example: Stone as a measure of weight in Great Britain.
Reliability vs. Validity Visual
Central dot = construct we are seeking to measure
Reliability Assessments 1 Test-Retest Reliability
Measure administered at two points in time to assess consistency. Works best for things that do not change over time (e.g., intelligence).
Internal Consistency ReliabilityJudgments of consistency of results across items
in the same test administration session. 1. Intercorrelation: Chronbach’s α (> .65 is preferred)2. Split halves reliability
Types of Validity Content Validity
Does the measure represent the range of possible items the it should cover based on the meaning of the measure.
Predictive Validitymeasure predicts criterion measures that are
assessed at a later time. Ex: Does aptitude assessment predict later success?
Construct ValidityDoes the measure actually tap into intended construct?
Developing Items for a New Measure Guided spontaneous response from individuals in
sample population (thought listings, essay questions…)
Face valid items: develop items that appear to measure your construct.
Pilot test a larger set of items and choose those that are more reliable & valid.
Reversed coded items indicate whether participants are paying attention.
Use common response scale types Likert Scale:
To what extent do you agree with the following statement… (0 to 9, strongly disagree-strongly agree)
Semantic Differential:What is your response to (insert person, object, place, issue)? (-5 to +5, good-bad, like-dislike, warm-cold)
Pitfalls of New Measures
The measure exists already in the literature
Restriction of range: responses either at high or low end of scale (skew).
Can you trust responses? Social desirability, demand characteristics & satisficing.
Simple things I have learned.1. Develop subjective and objective versions of a
new scale Example: Contact with Blacks scale:
Objective: % of your neighborhood growing upSubjective: No Blacks—a lot of Blacks
2. Using 5+ items worded similarly provides greatly increased reliability and likelihood of success.
3. Human targets are rarely evaluated below the midpoint of the scale, so use more scale points (9 instead of 5 points).
**Most Important** If you have a larger study ready and a great idea for a new scale comes up, build something and give it a shot!
A Few Types of Non-scale measures Response time measures Physiological measures Neuroscience: fMRI and other brain imaging Indirect measures: projective tests, etc. Facial and other behavior coding schemes
(verbal/nonverbal) Cognitive measures: (memory,
perception…) Task performance: academic, physical… Game theory: prisoner’s dilemma…
SPSS: Reliability
Chronbach’s α: AnalyzeScaleReliability Analysis
Pull over all scale items Click Statistics, select inter-item correlations
OK
Try Van Camp, Barden & Sloan (2010) data file. Centrality1-Centrality8. Compare to manuscript.
Many other reliability analyses involve correlations (test-retest, split halves) or probabilities (inter-rater reliability).
Case Processing Summary
N %
Cases Valid 109 86.5
Excludeda 17 13.5
Total 126 100.0
a. Listwise deletion based on all variables in the
procedure.
Reliability Statistics
Cronbach's
Alpha
Cronbach's
Alpha Based on
Standardized
Items N of Items
.706 .743 8
Inter-Item Correlation Matrix
centrality1rev centrality2 centrality3 centrality4rev centrality5 centrality6 centrality7 centrality8rev
centrality1rev 1.000 .244 .069 .297 .082 .170 .148 .208
centrality2 .244 1.000 .298 .323 .509 .411 .588 .031
centrality3 .069 .298 1.000 .206 .398 .337 .398 .042
centrality4rev .297 .323 .206 1.000 .213 .160 .350 .284
centrality5 .082 .509 .398 .213 1.000 .589 .637 -.063
centrality6 .170 .411 .337 .160 .589 1.000 .475 .075
centrality7 .148 .588 .398 .350 .637 .475 1.000 -.041
centrality8rev .208 .031 .042 .284 -.063 .075 -.041 1.000
SPSS-Output
END
Advanced Scale Development Techniques Factor Analysis:
determines factor structure of measures (does your measure assess one construct or multiple constructs? Is your proposed construct coherent?)
Multi-trait Multi-method Matrix: using combination of existing measures and manipulations to establish convergent/ divergent validity with measure.
Reliability Assessments 2 Inter-rater Reliability
Independent judges score participant responses and the % of agreement is assessed to indicate reliability. Used particularly for measures requiring coding (video coding, spontaneous responses…).