STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two...

40
STAT 135 Lab 7 Distributions derived from the normal distribution, and comparing independent samples. Rebecca Barter March 16, 2015

Transcript of STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two...

Page 1: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

STAT 135 Lab 7Distributions derived from the normal

distribution, and comparing independent samples.

Rebecca Barter

March 16, 2015

Page 2: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

The χ2 distribution

Page 3: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

The χ2 distribution

We have seen several instances of test statistics which follow a χ2

distribution, for example, the generalized likelihood ratio, Λ

−2 log(Λ) ∼ χ2df

and the Pearson χ2 goodness-of-fit test statistic

X 2 =n∑

i=1

(Oi − Ei )2

Ei∼ χ2

n−1−dim(θ)

It turns out that the χ2 distribution is very related to the N(0, 1)distribution.

Page 4: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

The χ2 distribution

The χ2 distribution can be generated from the sum of squarednormals:

If Z ∼ N(0, 1), then Z 2 ∼ χ21

and more generally, if Zi ∼ N(0, 1), then

Z 21 + ...+ Z 2

n ∼ χ2n

from which it is easy to conclude that if U ∼ χ2n and V ∼ χ2

m, then

U + V ∼ χ2m+n

that is, to get the distribution of the sum of χ2 random variables,just add the degrees of freedom.

Page 5: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

The χ2 distribution

The χ2n distribution is also related to the gamma distribution:

U ∼ χ2n ⇐⇒ U ∼ Gamma

(n

2,

1

2

)so we can write the density of the χ2

n distribution as

f (x) =1

2n/2Γ(n2 )x

n2−1e−

x2 , x ≥ 0

Page 6: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

The t distribution

Page 7: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

The t distribution

If Z ∼ N(0, 1) is independent of U ∼ χ2n, then

Z√Un

∼ tn

The tn distribution has density

fn(x) =Γ((n + 1)/2)√

nπΓ(n/2)

(1 +

t2

n

)−(n+1)/2

Page 8: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

The t distribution

Note that if Xi ∼ N(µ, σ2), then

Xn − µσ/√n∼ N(0, 1)

However, if we don’t know σ, we can estimate it by

s2 =1

n − 1

n∑i=1

(Xi − X )2

and our distribution becomes

Xn − µs/√n∼ tn−1

Page 9: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

The F -distribution

Page 10: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

The F-distribution

If U ∼ χ2m and V ∼ χ2

n are independent, then

F =U/m

V /n∼ Fm,n

The Fm,n distribution has density

f (w) =Γ((m + n)/2)

Γ(m/2)Γ(n/2)

(mn

)m/2wm/2−1

(1 +

m

nw)−(m+n)/2

Page 11: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Moment generating functions

Page 12: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Moment generating functions

The moment generating function (MGF) for a random variable Xis given by

M(t) = E(etX)

and the MGF of a random variable uniquely determines thecorresponding distribution.

Page 13: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Moment generating functions

The MGF of X is defined by

M(t) = E(etX)

A useful formula:

I The kth derivative of the MGF of X, evaluated at zero, isequal to the kth moment of X :

E(X k)

=dkM

dtk

∣∣∣t=0

Page 14: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Moment generating functions

Some nice properties of the moment generating function (showthem yourself!):

I If X has MGF MX (t), and Y = a + bX , then Y has MGF

MY (t) = eatMX (bt)

I If X and Y are independent with MGF’s MX and MY andZ = X + Y , then

MZ (t) = MX (t)MY (t)

Page 15: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Exercise

Page 16: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Exercise: moment generating functions

For each of the following distributions, calculate the momentgenerating function, and the first two moments.

1. X ∼ Poisson(λ)

2. Z = X + Y , where X ∼ Poisson(λ) and Y ∼ Poisson(µ) areindependent

3. Z ∼ N(0, 1)

Page 17: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Comparing Independent Samples from two normalpopulations with common but unknown

variance

I t-test

Page 18: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Comparing independent samples

So far, we have been working towards making inferences aboutparameters from a single population.

For example, we have looked at conducting hypothesis tests for the(unknown) population mean µ, such as

H0 : µ = 0

H1 : µ > 0

Given a sample X1, ...,Xn from our population of interest, we wouldlook at the sample mean, Xn, to see if we have enough evidenceagainst H0 in favor of H1 (for example, by calculating a p-value)

Page 19: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Comparing independent samples

What if we were instead interested in comparing parameters fromtwo different populations?

Suppose that:

I the first population has (unknown) mean µ1, and

I the second population has (unknown) mean µ2.

Then we might test the hypothesis:

H0 : µ1 = µ2

against

H1 : µ1 > µ2 or H1 : µ1 < µ2 or H1 : µ1 6= µ2

Page 20: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Comparing independent samples from two normalpopulations with common but unknown variance

Suppose that we have observed samples:

I X1 = x1, ...,Xn = xn from pop 1 with unknown variance σ2XI Y1 = y1, ....,Ym = ym from pop 2 with unknown variance σ2YI Assume common variance: σ2X = σ2Y

We might want to test

H0 : µ1 = µ2

againstH1 : µ1 > µ2

What test statistic might we look at to see if we have evidenceagainst H0 in favor of H1?

Page 21: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Comparing independent samples from two normalpopulations with common but unknown variance

I If H0 was true, we might expect Xn − Ym ≈ 0

I if H1 was true, we might expect Xn − Ym > 0.

Note that our p-value (the probability, assuming H0 true, of seeingsomething at least as extreme as what we have observed) would begiven by

PH0(X − Y > x − y)

To calculate this probability, we need to identify the distribution ofX − Y .

Page 22: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Comparing independent samples from two normalpopulations with common but unknown variance

It turns out that

t =(Xn − Ym)

s√

1n + 1

m

∼ tn+m−2

where the denominator is an estimate of the standard deviation ofX − Y (since the true variance is unknown), and

s2 =(n − 1)s2X + (m − 1)s2Y

n + m − 2

is the pooled variance of X1, ...,Xn and Y1, ...,Ym

Page 23: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Comparing independent samples from two normalpopulations with common but unknown variance

To testH0 : µ1 = µ2 against H1 : µ1 > µ2

Our p-value can be calculated by

PH0(X − Y > x − y) = P

(X − Y )

s√

1n + 1

m

>(x − y)

s√

1n + 1

m

= P

tn+m−2 >(x − y)

s√

1n + 1

m

This test is called a two-sample t-test

Page 24: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Exercise

Page 25: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Exercise: comparing independent samples from two normalpopulations with common but unknown variance

Suppose I am interested in comparing the average delivery time fortwo pizza companies. To test this hypothesis, I ordered 7 pizzasfrom pizza company A and recorded the delivery times:

(20.4 , 24.2 , 15.4 , 21.4 , 20.2 , 18.5 , 21.5)

and 5 pizzas from pizza company B:

(20.2 , 16.9 , 18.5 , 17.3 , 20.5)

I know that the delivery times follow normal distributions, but Idon’t know the mean or variance.Do I have enough evidence to conclude that the average deliverytimes for each company are different?

Page 26: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

A nonparametric test for comparing IndependentSamples from two arbitrary populations

I Mann-Whitney test

Page 27: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Comparing Independent Samples from two arbitrarypopulations

Suppose that we are interested in comparing two populations, butwe don’t have any information about the distribution of our twopopulations.

We observe data

I X1 = x1, ...,Xn = xn IID with unknown cts distribution F

I Y1 = y1, ...,Ym = ym IID with unknown cts distribution G

Suppose that we want to test the null hypothesis

H0 : F = G

We could test H0 using a nonparametric test (a test that makes nodistributional assumptions on the data) based on ranks.

Page 28: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Comparing independent samples from two arbitrarypopulations

Let Ri be the rank of Xi (when the ranks are taken over all of theXi ’s and Yi ’s combined).

What is the expected value of∑n

i=1 Ri under H0?

First note that since under H0 all ranks are equally likely,

P(R1 = k) =1

n + m

Thus

E (R1) = 1P(R1 = 1) + 2P(R1 = 2) + ...+ (n + m)P(R1 = n + m)

=1

n + m(1 + 2 + ...+ (n + m))

=1

n + m

((n + m)(n + m + 1)

2

)=

n + m + 1

2

Page 29: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Comparing independent samples from two arbitrarypopulations

We just showed that

E (R1) =n + m + 1

2

So that if H0 : F = G is true, we have

E

(n∑

i=1

Ri

)=

n(n + m + 1)

2

Thus a value of∑n

i=1 Ri that is significantly larger or smaller thann(n+m+1)

2 would provide evidence against H0 : F = G .

Page 30: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Example: Comparing independent samples from twoarbitrary populations

Suppose, for example, that we had

X = (6.2, 3.7, 4.7, 1.3)

Y = (1.2, 0.8, 1.4, 2.5, 1.1)

Then our ranks are

rank(X ) = (9, 7, 8, 4)

rank(Y ) = (3, 1, 5, 6, 2)

and we have that the sum of the ranks of the Xi ’s and Yi ’s are

sum of rank(X ) :=n∑

i=1

Ri = 28

sum of rank(Y ) =m∑i=1

R ′i = 17

Page 31: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Comparing independent samples from two arbitrarypopulations

In our example, we saw that the sums of the ranks of the Xi ’s waslarger than the sums of the ranks of the Yi ’s.

But how can we tell if this difference is enough to conclude thatH0 : F = G is false (that the two samples came from differentdistributions)?

Page 32: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Comparing independent samples from two arbitrarypopulations

Suppose that

I the rank of Xi is Ri

I the rank of Yi is R ′iNote that under H0, if n = m, we would expect that

n∑i=1

Ri =m∑i=1

R ′i

Recall that

E

(n∑

i=1

Ri

)=

n(n + m + 1)

2

similarly we should have that the expected sum of the R ′i ’s is

E

(m∑i=1

R ′i

)=

m(n + m + 1)

2

Page 33: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Comparing independent samples from two arbitrarypopulations

Thus if H0 were true, we would expect that

n∑i=1

Ri +m∑i=1

R ′i ≈ n1(n + m + 1)

where n1 is the sample size of the smaller population:n1 = min(n,m).Based on this idea, we can use the Mann-Whitney test to testH0.

Page 34: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Comparing independent samples from two arbitrarypopulations

The Mann-Whitney test can be conducted as follows

1. Concatenate the Xi and Yj into a single vector Z

2. Let n1 be the sample size of the smaller sample

3. Compute R = sum of the ranks of the smaller sample in Z

4. Compute R ′ = n1(m + n + 1)− R

5. Compute R∗ = min(R,R ′)

6. Compare the value of R ′ to critical values in a table: if thevalue is less than or equal to the tabulated value, reject thenull that F = G

Page 35: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate
Page 36: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate
Page 37: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Example: Comparing independent samples from twoarbitrary populations

Back to our example, where

X = (6.2, 3.7, 4.7, 1.3) rank(X ) = (9, 7, 8, 4)

Y = (1.2, 0.8, 1.4, 2.5, 1.1) rank(Y ) = (3, 1, 5, 6, 2)

Here n1 = 4.The smaller sample is X and so the sum of the ranks of X are:

R = sum of rank(X ) :=n∑

i=1

Ri = 28

Next, we compute

R ′ = n1(m + n + 1)− R = 4× (5 + 4 + 1)− 28 = 12

Page 38: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Example: Comparing independent samples from twoarbitrary populations

R = sum of rank(X ) :=n∑

i=1

Ri = 28

R ′ = n1(m + n + 1)− R = 4× (5 + 4 + 1)− 28 = 12

so thatR∗ = min(R,R ′) = 12

The critical value in the table for a two-sided test, with α = 0.05(n1 = 4, n2 = 5) is 11. Thus, our value is not less than or equal tothe critical value, so we fail to reject H0.

Page 39: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Exercise

Page 40: STAT 135 Lab 7 Distributions derived from the normal ... · Comparing independent samples from two arbitrary populations The Mann-Whitney test can be conducted as follows 1.Concatenate

Exercise: Rice Chapter 11, Exercise 21

A study was done to compare the performances of engine bearingsmade of different compounds. Ten bearings of each type weretested. The times until failure (in units of millions of cycles) foreach type are given below.

Type I 3.03 5.53 5.60 9.30 9.9212.51 12.95 15.21 16.04 16.84

Type II 3.19 4.26 4.47 4.53 4.674.69 12.78 6.79 9.37 12.75

1. Use normal theory to test the hypothesis that there is nodifference between the two types of bearings.

2. Test the same hypothesis using a nonparametric method.

3. Which of the methods – that of part (a) or that of part (b) –do you think is better in this case?