STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical...

32
STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: [email protected]

Transcript of STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical...

Page 1: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

STAT 3130Guest Speaker:

Ashok Krishnamurthy, Ph.D.Department of Mathematical and Statistical Sciences

24 January 2011

Correspondence: [email protected]

Page 2: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

Outline

• A brief overview of STAT 3120

• One-way Analysis of Variance (ANOVA)

• ANOVA example

• Implementing ANOVA in R statistical programming language

Page 3: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

A quick review of STAT 3120

• Catalog description

– A SAS/SPSS based course aimed at providing students with a foundation in statistical methods, including review of descriptive statistics, confidence intervals, hypothesis testing, t-tests, basic Regression and Chi-Square tests.

Page 4: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

Statistical Inference

• Statistical inference is the process of drawing conclusions from data that are subject to random variation.

• The conclusion of a statistical inference is a statistical proposition.

– Estimating the mean and variance of a distribution.

– Confidence interval estimation the mean and variance of a distribution.

– Hypothesis tests on the mean and variance of a distribution.

Page 5: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

Theory of point estimation

• There is at least one parameter whose value is to be approximated on the basis of a sample.

• The approximation is done using an appropriate statistic.

• This statistic is called a point estimator for .

Page 6: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

CI for population mean,

Page 7: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

CI for population variance

Page 8: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

CI for difference between two normal population means

1 2, known but unequal 2 21 2

1 22

1 2

x x zn n

1 2, known and equal 21 2

21 2

1 1 x x z

n n

1 2, unknown but equal 21 2 ,2

1 2

1 1 pdf

x x t sn n

1 2

2 21 1 2 22

1 2

where 2

1 1

2p

df n n

n s n ss

n n

Page 9: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

CI for difference between two normal population means

1 2, unknown and unequal 2 21 2

1 2 ,21 2

df

s sx x t

n n

where ?df

Page 10: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

Hypothesis Testing

• In the estimation problem there is no preconceived notion concerning the actual value of the parameter .

• In contrast, when testing a hypothesis on , there is a preconceived notion concerning its value.

• There are two theories,– The hypothesis proposed by the experimenter, denoted H1

– The negation of H1, denoted H0

Page 11: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

Tests concerning the mean of one normal population

0 0 0 0 0 0

1 0 1 0 1 0

: : :

: : :

H H H

H H H

0

Test Statistic

XZ

n

0

Test Statistic

XT

s

n

(or)

Page 12: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

Tests concerning the difference between two normal population means

0 1 2 0 1 2 0 1 2

1 1 2 1 1 2 1 1 2

: : :

: : :

H H H

H H H

Independent samples of sizes n1 and n2.

1 2 1 2

2 21 2

1 2

Test Statistic

X XZ

n n

1 2 1 2

2

1 2

Test Statistic

1 1p

X Xt

sn n

1 2 1 2

2 21 2

1 2

Test Statistic

X XT

s sn n

(or) (or)

Page 13: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

Tests concerning variance of one normal population

2 2 2 2 2 20 0 0 0 0 0

2 2 2 2 2 21 0 1 0 1 0

: : :

: : :

H H H

H H H

22

20

Test Statistic

1n S

Page 14: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

Tests concerning ratio of variances of two normal populations

2 2 2 2 2 20 0 0

2 2 2 2 2 21 1 1

: : :

: : :

x y x y x y

x y x y x y

H H H

H H H

2

22 22

2 2 2 2

2

Test Statistic

1

1

x

yx xx

y y x y

y

n SS S

Fm S S S

Page 15: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

Independent T-test

Mann-Whitney Test

Paired T-test

Wilcoxon Rank Sum

One Way ANOVA

Kruskall Wallis Test

Repeated Measures ANOVA

Friedman’s ANOVA

Pearson Correlation or Regression

Spearman Correlation or Kendall’s Tau

Ind. Factorial ANOVA or Regression

Factorial Repeated Measures ANOVA

Factorial Mixed ANOVA

Multiple Regression

Multiple Regression/ANCOVA

Pearson Chi-Square or Likelihood Ratio

Logistic Regression

Loglinear Analysis

MANOVA

Factorial MANOVA

MANCOVA

Yes

No

No

Yes

No

Yes

Yes

Yes

Yes

Yes

Yes

Yes

No

Yes

No

Yes

No

Yes

Different

Same

Same

Both

Different

Different

Different

Same

Different

Yes

Different

Two

Three +

Categorical

Continuous

Both

Categorical

Categorical

Both

Both

Continuous

Categorical

Continuous

Categorical

Categorical

Continuous

One

Two +

One

Two +

One

Two +

Continuous

Categorical

ContinuousTwo +

How Many DependentVariables?

One

What TypeOf Outcome?

How ManyPredictors?

What type Of predictors?

If Categorical Predictor,How many Categories?

If Categorical Predictor,Same Participants or Different in each category?

Does Data MeetParametric Assumptions?

ANALYSIS TOOL

Logistic Regression/Discriminant

Page 16: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

Comparing several means

• It is often necessary to compare many populations for a quantitative variable.

• That is, we may want to compare the mean outcome over several populations to determine whether they have the same mean outcome and if not, where differences exist.

• The standard method of analysis for these types of problems is the one-way Analysis of Variance, often abbreviated ANOVA

Page 17: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

Can we just use several pairwise t-tests?

• You might be tempted to use t-tests to make such comparisons. Why would this be difficult?

# groups # pair-wise test

3 3 4 6

5 106 157 21

and so on….

Page 18: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

One-way ANOVA Contd.

• The method of ANOVA allow for comparison of the mean over more than two independent groups.

• In particular, it tests the following hypotheses for comparing over k groups:

0 1 2

1

: =

: Atleast two means are differentkH

H

Page 19: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

Assumptions of ANOVA

• Populations have normal distributions

• Population standard deviations are equal

• Observations are independent, both within and between samples

Page 20: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

One-way ANOVA Contd.

• A rejection of the null hypothesis tells us that there is at least one group with a differing mean (though there could be more than one group that is different).

• If we do not reject the null hypothesis, then we can only conclude that there is no significant difference among the groups.

Page 21: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

One-way ANOVA procedure

• Total variation in a measured response is partitioned into components that can be attributed to recognizable sources of variation.

• For example, suppose we wish to investigate the sulfur content of 5 coal reams in a certain geographical region. Then we would test,

0 1 2 3 4 5

1

: = =

: for some and i j

H

H i j

Page 22: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

ANOVA Table

Sources of variation df Sum of Squares (SS)

Mean Sum of Squares (MS)

F

Between groups(Treatment)Within groups (Error)

*

Total * *

Page 23: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

Computational Shortcuts2

2Total

1 1

2 2

Treatment1

Error

ink

iji j

ki

i i

TSS SST Y

N

T TSS SSB

n N

SS SSW SST SSB

Page 24: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

ANOVA Table

Sources of variation df Sum of Squares (SS)

Mean Sum of Squares (MS)

F

Between groups k - 1 SSB

Within groups N - k SSW *

Total N - 1 TSS * *

* See page 682 for a general format of a One-Way ANOVA Table

MSB

MSW1

SSBMSB

k

SSW

MSWN k

Page 25: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

ANOVA Example

A biologist is doing research on elk in their natural Colorado habitat. Three regions are under study, each region having about the same amount of forage and natural cover.

To determine if there is a difference in elk life spans between the three regions, a sample of 6, 5, and 6 mature elks from each region are tranquilized and have a tooth removed.

A laboratory examination of the teeth reveals the ages of the elk. Results for each sample are given in the below table.

Page 26: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

ANOVA Example Contd.Region Age

A 4A 10A 11A 9A 8A 6B 7B 3B 8B 4B 8C 5C 6C 4C 2C 4C 3

Are there differences in age (elk life spans) over the different regions?

If so, where are such differences occurring?

Page 27: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

ANOVA Example Contd.

222 2 2 2 2

1 1

22 2 2 2 2

1

1024 10 11 3 114

17

10248 30 2448

6 5 6 17

114 48 66

ink

iji j

ki

i i

TSST Y

N

T TSSB

n N

SSW SST SSB

Page 28: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

R Programming Language

• Free software for statistical computing and graphics: http://www.r-project.org/

• Developed at Bell Laboratories

• Considered a baby version of S/S+

• S+ sells for about $2000/year subscription

Page 29: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

R code to run an ANOVA (elk data)

> elk <- read.csv("elk.csv", sep=",", header=T)

> boxplot(elk$Age ~ elk$Region, ylab = "Age", xlab = "Region", main = "Boxplot for Elk data")

> Elk.ANOVA <- aov(elk$Age ~ elk$Region)

> summary(Elk.ANOVA)

Page 30: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

R output for ANOVA (elk data)

Source Df Sum Sq Mean Sq F value Pr(>F)___ elk$Region 2 48.000 24.000 5.0909 0.0218 *Residuals 14 66.000 4.714 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Page 31: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

Next class: Post-hoc tests

• Bonferroni correction

• Tukey’s HSD test

• Fisher’s LSD

• Newman-Keul test

• Scheffe method

Page 32: STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu.

Fixed versus random effects• When we consider the effect of a factor, it can be either

fixed or random. If we are interested in the particular levels of a factor, then it is fixed, e.g., gender, socio-economic class, fertilizer, drug. If we are not interested in the particular levels, but rather have selected the levels to make inference about the factor, then the factor is random.

• For example, what if there was an effect of hospital on a person’s recovery? A random sample of hospitals would allow us to study this relationship. Here we are interested in whether there is a relationship rather than describing an effect for each individual hospital. These types of models are called random effects models.