IS 4800 Empirical Research Methods for Information Science Class Notes March 16, 2012

43
IS 4800 Empirical Research Methods for Information Science Class Notes March 16, 2012 Instructor: Prof. Carole Hafner, 446 WVH [email protected] Tel: 617-373-5116 Course Web site: www.ccs.neu.edu/course/is4800sp12/

description

IS 4800 Empirical Research Methods for Information Science Class Notes March 16, 2012. Instructor: Prof. Carole Hafner, 446 WVH [email protected] Tel: 617-373-5116 Course Web site: www.ccs.neu.edu/course/is4800sp12/. Outline. Sampling and statistics (cont.) T test for paired samples - PowerPoint PPT Presentation

Transcript of IS 4800 Empirical Research Methods for Information Science Class Notes March 16, 2012

Page 1: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

IS 4800 Empirical Research Methods for Information Science

Class Notes March 16, 2012

Instructor: Prof. Carole Hafner, 446 [email protected] Tel: 617-373-5116

Course Web site: www.ccs.neu.edu/course/is4800sp12/

Page 2: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

Outline

• Sampling and statistics (cont.)

• T test for paired samples

• T test for independent means

• Analysis of Variance

• Two way analysis of Variance

Page 3: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

3

Relationship Between Population and Samples When a Treatment

Had No EffectPopulation

M1 M2

Sample 2Sample 1

Page 4: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

4

Relationship Between Population and Samples When a Treatment

Had An EffectControlgrouppopulation

c

Controlgroupsample

Mc

Treatmentgroupsample

Mt

Treatmentgrouppopulation

t

Page 5: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

Population

Mean? Variance?

2

Sampling

Sample of size N

Mean values from all possible samples of size Naka “distribution of means”

N

XM

NM

22

N

MXSD

2

2 )(

ZM = ( M - M

Page 6: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

Z tests and t-tests

t is like Z:

Z = M - μ /

t = M – μ / μ = 0 for paired samples

We use a stricter criterion (t) instead of Z because is based on an estimate of the population variance while is based on a known population variance.

M

MS

MS

M

S2 = Σ (X - M)2 = SS

N – 1 N-1S2

M = S2/N

Page 7: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

Given info about population of changescores and thesample size we willbe using (N)

T-test with paired samples

Now, given a particular sample of change scores of size N

We can compute the distribution of means

We compute its mean

and finally determine the probability that this mean occurred by chance

?

= 0S2 est 2 from sample = SS/df

MS

Mt

df = N-1

S2M = S2/N

Page 8: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

t test for independent samples

Given two samples

Estimate population variances(assume same)

Estimate variancesof distributions of means

Estimate varianceof differences between means(mean = 0)

This is now yourcomparison distribution

Page 9: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

Estimating the Population Variance

S2 is an estimate of σ2

S2 = SS/(N-1) for one sample (take sq root for S)

For two independent samples – “pooled estimate”:S2 = df1/dfTotal * S1

2 + df2/dfTotal * S22

dfTotal = df1 + df2 = (N1 -1) + (N2 – 1)

From this calculate variance of sample means: S2M = S2/N

needed to compute t statistic

S2difference = S2

Pooled / N1 + S2Pooled / N2

Page 10: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

t test for independent samples, continued

This is yourcomparison distributionNOT normal, is a ‘t’ distribution

Shape changes depending on df

df = (N1 – 1) + (N2 – 1)

Distribution of differencesbetween means

Compute t = (M1-M2)/SDifference Determine if beyond cutoff score for test parameters (df,sig, tails) from lookup table.

Page 11: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

ANOVA: When to use

• Categorial IVnumerical DV (same as t-test)

• HOWEVER:– There are more than 2 levels of IV so:– (M1 – M2) / Sm won’t work

Page 12: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

12

ANOVA Assumptions

• Populations are normal

• Populations have equal variances

• More or less..

Page 13: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

13

Basic Logic of ANOVA

• Null hypothesis– Means of all groups are equal.

• Test: do the means differ more than expected give the null hypothesis?

• Terminology– Group = Condition = Cell

Page 14: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

14

Accompanying Statistics• Experimental

– Between-subjects• Single factor, N-level (for N>2)

– One-way Analysis of Variance (ANOVA)

• Two factor, two-level (or more!)– Factorial Analysis of Variance

– AKA N-way Analysis of Variance (for N IVs)

– AKA N-factor ANOVA

– Within-subjects• Repeated-measures ANOVA (not discussed)

– AKA within-subjects ANOVA

Page 15: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

15

• The Analysis of Variance is used when you have more than two groups in an experiment– The F-ratio is the statistic computed in an Analysis of

Variance and is compared to critical values of F

– The analysis of variance may be used with unequal sample size (weighted or unweighted means analysis)

– When there are just 2 groups, ANOVA is equivalent to the t test for independent means

ANOVA: Single factor, N-level (for N>2)

Page 16: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

One-Way ANOVA – Assuming Null Hypothesis is True…

Within-Group EstimateOf Population Variance

21est

22est

23est

2estwithin

Between-Group EstimateOf Population Variance

M1

M2

M3

2estbetween

2

2

estwithin

estbetweenF

Page 17: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

Justification for F statistic

Page 18: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

Calculating F

Page 19: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

Example

Page 20: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

Example

Page 21: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

Using the F Statistic

• Use a table for F(BDF, WDF)– And also α

BDF = between-groups degrees of freedom =

number of groups -1

WDF = within-groups degrees of freedom =

Σ df for all groups = N – number of groups

Page 22: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

One-way ANOVA in SPSS

Page 23: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

23

Data

0

1

2

3

4

5

6

1 Day 2 Day 3 Day

Performance

Mean

Page 24: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

24

Analyze/Compare Means/One Way ANOVA…

Page 25: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

SPSS Results…

ANOVA

Performance

24.813 2 12.406 9.442 .001

27.594 21 1.314

52.406 23

Between Groups

Within Groups

Total

Sum ofSquares df Mean Square F Sig.

F(2,21)=9.442, p<.05

Page 26: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

26

Factorial Designs

• Two or more nominal independent variables, each with two or more levels, and a numeric dependent variable.

• Factorial ANOVA teases apart the contribution of each variable separately.

• For N IVs, aka “N-way” ANOVA

Page 27: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

27

Factorial Designs

• Adding a second independent variable to a single-factor design results in a FACTORIAL DESIGN

• Two components can be assessed– The MAIN EFFECT of each independent variable

• The separate effect of each independent variable

• Analogous to separate experiments involving those variables

– The INTERACTION between independent variables • When the effect of one independent variable changes over levels of a

second

• Or– when the effect of one variable depends on the level of the other variable.

Page 28: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

Example

Wait Time Sign in Student Centervs. No Sign

Satisfaction

Page 29: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

0

2

4

6

8

10

12

Level 1 Level 2

Level of Independent Variable A

Val

ue

of t

he

Dep

end

ent

Var

iab

le

Level 1 Level 2

Example of An Interaction - Student Center Sign – 2 Genders x 2 Sign Conditions

F

M

NoSign

Sign

Page 30: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

30

Two-way ANOVA in SPSS

Page 31: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

31

Analyze/General Linear Model/Univariate

Page 32: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

32

Results

Tests of Between-Subjects Effects

Dependent Variable: Performance

26.507a 5 5.301 3.685 .018

210.855 1 210.855 146.547 .000

20.728 2 10.364 7.203 .005

.002 1 .002 .001 .974

1.680 2 .840 .584 .568

25.899 18 1.439

401.250 24

52.406 23

SourceCorrected Model

Intercept

TrainingDays

Trainer

TrainingDays * Trainer

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .506 (Adjusted R Squared = .369)a.

Page 33: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

33

Results

Page 34: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

34

Degrees of Freedom

• df for between-group variance estimates for main effects– Number of levels – 1

• df for between-group variance estimates for interaction effect – Total num cells – df for both main effects – 1– e.g. 2x2 => 4 – (1+1) – 1 = 1

• df for within-group variance estimate– Sum of df for each cell = N – num cells

• Report: “F(bet-group, within-group)=F, Sig.”

Page 35: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

Publication formatTests of Between-Subjects Effects

Dependent Variable: Performance

26.507a 5 5.301 3.685 .018

210.855 1 210.855 146.547 .000

20.728 2 10.364 7.203 .005

.002 1 .002 .001 .974

1.680 2 .840 .584 .568

25.899 18 1.439

401.250 24

52.406 23

SourceCorrected Model

Intercept

TrainingDays

Trainer

TrainingDays * Trainer

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .506 (Adjusted R Squared = .369)a.

N=24, 2x3=6 cells => df TrainingDays=2, df within-group variance=24-6=18

=> F(2,18)=7.20, p<.05

Page 36: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

36

Reporting rule

• IF you have a significant interaction

• THEN – If 2x2 study: do not report main effects, even if

significant– Else: must look at patterns of means in cells to

determine whether to report main effects or not.

Page 37: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

Results?

TrainingDaysTrainerTrainingDays * Trainer

Sig.0.340.120.41

n.s.

Page 38: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

Results?

TrainingDaysTrainerTrainingDays * Trainer

Sig.0.340.120.02

Significant interaction between TrainingDaysAnd Trainer, F(2,22)=.584, p<.05

Page 39: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

Results?

TrainingDaysTrainerTrainingDays * Trainer

Sig.0.340.020.41

Main effect of Trainer, F(1,22)=.001, p<.05

Page 40: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

Results?

TrainingDaysTrainerTrainingDays * Trainer

Sig.0.040.120.01

Significant interaction between TrainingDaysAnd Trainer, F(2,22)=.584, p<.05

Do not report TrainingDays as significant

Page 41: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

Results?

TrainingDaysTrainerTrainingDays * Trainer

Sig.0.040.020.41

Main effects for both TrainingDays, F(2,22)=7.20, p<.05, and Trainer,F(1,22)=.001, p<.05

Page 42: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

“Factorial Design”

• Not all cells in your design need to be tested– But if they are, it is a “full factorial design”, and you

do a “full factorial ANOVA”

Real-Time Retrospective

Agent

Text

X

Page 43: IS 4800 Empirical Research Methods  for Information Science Class Notes March 16, 2012

43

Higher-Order Factorial Designs

• More than two independent variables are included in a higher-order factorial design– As factors are added, the complexity of the experimental

design increases• The number of possible main effects and interactions increases

• The number of subjects required increases

• The volume of materials and amount of time needed to complete the experiment increases