IS 4800 Empirical Research Methods for Information Science Class Notes Feb. 24, 2012
IS 4800 Empirical Research Methods for Information Science Class Notes March 16, 2012
description
Transcript of IS 4800 Empirical Research Methods for Information Science Class Notes March 16, 2012
IS 4800 Empirical Research Methods for Information Science
Class Notes March 16, 2012
Instructor: Prof. Carole Hafner, 446 [email protected] Tel: 617-373-5116
Course Web site: www.ccs.neu.edu/course/is4800sp12/
Outline
• Sampling and statistics (cont.)
• T test for paired samples
• T test for independent means
• Analysis of Variance
• Two way analysis of Variance
3
Relationship Between Population and Samples When a Treatment
Had No EffectPopulation
M1 M2
Sample 2Sample 1
4
Relationship Between Population and Samples When a Treatment
Had An EffectControlgrouppopulation
c
Controlgroupsample
Mc
Treatmentgroupsample
Mt
Treatmentgrouppopulation
t
Population
Mean? Variance?
2
Sampling
Sample of size N
Mean values from all possible samples of size Naka “distribution of means”
N
XM
NM
22
N
MXSD
2
2 )(
ZM = ( M - M
Z tests and t-tests
t is like Z:
Z = M - μ /
t = M – μ / μ = 0 for paired samples
We use a stricter criterion (t) instead of Z because is based on an estimate of the population variance while is based on a known population variance.
M
MS
MS
M
S2 = Σ (X - M)2 = SS
N – 1 N-1S2
M = S2/N
Given info about population of changescores and thesample size we willbe using (N)
T-test with paired samples
Now, given a particular sample of change scores of size N
We can compute the distribution of means
We compute its mean
and finally determine the probability that this mean occurred by chance
?
= 0S2 est 2 from sample = SS/df
MS
Mt
df = N-1
S2M = S2/N
t test for independent samples
Given two samples
Estimate population variances(assume same)
Estimate variancesof distributions of means
Estimate varianceof differences between means(mean = 0)
This is now yourcomparison distribution
Estimating the Population Variance
S2 is an estimate of σ2
S2 = SS/(N-1) for one sample (take sq root for S)
For two independent samples – “pooled estimate”:S2 = df1/dfTotal * S1
2 + df2/dfTotal * S22
dfTotal = df1 + df2 = (N1 -1) + (N2 – 1)
From this calculate variance of sample means: S2M = S2/N
needed to compute t statistic
S2difference = S2
Pooled / N1 + S2Pooled / N2
t test for independent samples, continued
This is yourcomparison distributionNOT normal, is a ‘t’ distribution
Shape changes depending on df
df = (N1 – 1) + (N2 – 1)
Distribution of differencesbetween means
Compute t = (M1-M2)/SDifference Determine if beyond cutoff score for test parameters (df,sig, tails) from lookup table.
ANOVA: When to use
• Categorial IVnumerical DV (same as t-test)
• HOWEVER:– There are more than 2 levels of IV so:– (M1 – M2) / Sm won’t work
12
ANOVA Assumptions
• Populations are normal
• Populations have equal variances
• More or less..
13
Basic Logic of ANOVA
• Null hypothesis– Means of all groups are equal.
• Test: do the means differ more than expected give the null hypothesis?
• Terminology– Group = Condition = Cell
14
Accompanying Statistics• Experimental
– Between-subjects• Single factor, N-level (for N>2)
– One-way Analysis of Variance (ANOVA)
• Two factor, two-level (or more!)– Factorial Analysis of Variance
– AKA N-way Analysis of Variance (for N IVs)
– AKA N-factor ANOVA
– Within-subjects• Repeated-measures ANOVA (not discussed)
– AKA within-subjects ANOVA
15
• The Analysis of Variance is used when you have more than two groups in an experiment– The F-ratio is the statistic computed in an Analysis of
Variance and is compared to critical values of F
– The analysis of variance may be used with unequal sample size (weighted or unweighted means analysis)
– When there are just 2 groups, ANOVA is equivalent to the t test for independent means
ANOVA: Single factor, N-level (for N>2)
One-Way ANOVA – Assuming Null Hypothesis is True…
Within-Group EstimateOf Population Variance
21est
22est
23est
2estwithin
Between-Group EstimateOf Population Variance
M1
M2
M3
2estbetween
2
2
estwithin
estbetweenF
Justification for F statistic
Calculating F
Example
Example
Using the F Statistic
• Use a table for F(BDF, WDF)– And also α
BDF = between-groups degrees of freedom =
number of groups -1
WDF = within-groups degrees of freedom =
Σ df for all groups = N – number of groups
One-way ANOVA in SPSS
23
Data
0
1
2
3
4
5
6
1 Day 2 Day 3 Day
Performance
Mean
24
Analyze/Compare Means/One Way ANOVA…
SPSS Results…
ANOVA
Performance
24.813 2 12.406 9.442 .001
27.594 21 1.314
52.406 23
Between Groups
Within Groups
Total
Sum ofSquares df Mean Square F Sig.
F(2,21)=9.442, p<.05
26
Factorial Designs
• Two or more nominal independent variables, each with two or more levels, and a numeric dependent variable.
• Factorial ANOVA teases apart the contribution of each variable separately.
• For N IVs, aka “N-way” ANOVA
27
Factorial Designs
• Adding a second independent variable to a single-factor design results in a FACTORIAL DESIGN
• Two components can be assessed– The MAIN EFFECT of each independent variable
• The separate effect of each independent variable
• Analogous to separate experiments involving those variables
– The INTERACTION between independent variables • When the effect of one independent variable changes over levels of a
second
• Or– when the effect of one variable depends on the level of the other variable.
Example
Wait Time Sign in Student Centervs. No Sign
Satisfaction
0
2
4
6
8
10
12
Level 1 Level 2
Level of Independent Variable A
Val
ue
of t
he
Dep
end
ent
Var
iab
le
Level 1 Level 2
Example of An Interaction - Student Center Sign – 2 Genders x 2 Sign Conditions
F
M
NoSign
Sign
30
Two-way ANOVA in SPSS
31
Analyze/General Linear Model/Univariate
32
Results
Tests of Between-Subjects Effects
Dependent Variable: Performance
26.507a 5 5.301 3.685 .018
210.855 1 210.855 146.547 .000
20.728 2 10.364 7.203 .005
.002 1 .002 .001 .974
1.680 2 .840 .584 .568
25.899 18 1.439
401.250 24
52.406 23
SourceCorrected Model
Intercept
TrainingDays
Trainer
TrainingDays * Trainer
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .506 (Adjusted R Squared = .369)a.
33
Results
34
Degrees of Freedom
• df for between-group variance estimates for main effects– Number of levels – 1
• df for between-group variance estimates for interaction effect – Total num cells – df for both main effects – 1– e.g. 2x2 => 4 – (1+1) – 1 = 1
• df for within-group variance estimate– Sum of df for each cell = N – num cells
• Report: “F(bet-group, within-group)=F, Sig.”
Publication formatTests of Between-Subjects Effects
Dependent Variable: Performance
26.507a 5 5.301 3.685 .018
210.855 1 210.855 146.547 .000
20.728 2 10.364 7.203 .005
.002 1 .002 .001 .974
1.680 2 .840 .584 .568
25.899 18 1.439
401.250 24
52.406 23
SourceCorrected Model
Intercept
TrainingDays
Trainer
TrainingDays * Trainer
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .506 (Adjusted R Squared = .369)a.
N=24, 2x3=6 cells => df TrainingDays=2, df within-group variance=24-6=18
=> F(2,18)=7.20, p<.05
36
Reporting rule
• IF you have a significant interaction
• THEN – If 2x2 study: do not report main effects, even if
significant– Else: must look at patterns of means in cells to
determine whether to report main effects or not.
Results?
TrainingDaysTrainerTrainingDays * Trainer
Sig.0.340.120.41
n.s.
Results?
TrainingDaysTrainerTrainingDays * Trainer
Sig.0.340.120.02
Significant interaction between TrainingDaysAnd Trainer, F(2,22)=.584, p<.05
Results?
TrainingDaysTrainerTrainingDays * Trainer
Sig.0.340.020.41
Main effect of Trainer, F(1,22)=.001, p<.05
Results?
TrainingDaysTrainerTrainingDays * Trainer
Sig.0.040.120.01
Significant interaction between TrainingDaysAnd Trainer, F(2,22)=.584, p<.05
Do not report TrainingDays as significant
Results?
TrainingDaysTrainerTrainingDays * Trainer
Sig.0.040.020.41
Main effects for both TrainingDays, F(2,22)=7.20, p<.05, and Trainer,F(1,22)=.001, p<.05
“Factorial Design”
• Not all cells in your design need to be tested– But if they are, it is a “full factorial design”, and you
do a “full factorial ANOVA”
Real-Time Retrospective
Agent
Text
X
43
Higher-Order Factorial Designs
• More than two independent variables are included in a higher-order factorial design– As factors are added, the complexity of the experimental
design increases• The number of possible main effects and interactions increases
• The number of subjects required increases
• The volume of materials and amount of time needed to complete the experiment increases