Assignment #7mfscott/lectures/16...Assignment #7 Chapter 12: 18, 24 Chapter 13: 28 Due this Friday...

Assignment #7

Chapter 12: 18, 24 Chapter 13: 28 Due this Friday Nov. 20th by 2pm in your TA’s homework box

Assignment #8

Chapter 14: 26 Chapter 15: 18, 27 Due next Friday Nov. 27th by 2pm in your TA’s homework box

Reading

For Today: Chapter 16 For Thuesday: Chapter 17

Lab Report

•  Posted on web-site •  Dates

–  Rough draft due to TAs homework box Monday Nov. 16th –  Rough draft returned in your registered lab section next week –  Final draft due at start of your registered lab section the week of Nov. 30th

•  10% of course grade –  Rough Draft - 5% –  Final draft - 5% –  If you’re happy with your rough draft mark, you can tell your TA to use it for

the final draft

•  Read the “Writing a Lab Report” section of your lab notebook for guidance!!

Chapter 15 Review

Null hypothesis for simple ANOVA

• H0 : Variance among groups = 0

OR

• H0 : µ1 = µ2 = µ3 = µ4 = ... µk

Key to ANOVA

If there really are no differences among populations then differences in sample means should be due to sampling error alone We can estimate how much variation in group means ought to be present if due to sampling error

€

σ x 2 =

σ x2

n+ Variance µi[ ]

Variance within groups

Variance among groups

Variance due to difference among true population

means. •  0 if H0 is true

•  >0 if HA is true

If H0 is true: If HA is true:

σ x2 =

σ x2

n

σ x2 >

σ x2

n

€

σ x 2 >

σ x2

n?

nσ x 2 > σ x

2 ?

€

nσ x 2 is estimated by the

“Mean Squares Group”

€

σ x2 is the variance within groups,

estimated by the “Mean Squares Error”

€

MSgroup

€

MSerror

Population parameters Estimates from sample

Mean squares group

€

SSgroup = ni X i − X ( )2∑

dfgroup = k-1

€

MSgroups =SSgroupsdfgroups

Mean squares error Error sum of squares =

SSerror = ΣiΣ j (Yij −Yi )2 = dfisi

2 = si2 (ni −1)∑∑

Error degrees of freedom =

€

dferror = dfi = ni −1( )∑∑ = N − k

MSerror is like the pooled variance in a 2-sample t-test:

€

MSerror =SSerrordferror

=si2(ni −1)∑N − k

€

sp2 =

df1s12 + df2s2

2

df1+ df2

Test statistic: F

If H0 is true:

€

F =n σ x

2

σ x2 = 1

If HA is true:

F =MSgroupMSerror

Estimate of F:

€

F =n σ x

2 + Variance µi[ ]( )σ x2 >1

Assumptions of ANOVA

(1)  Random samples

(2)  Normal distributions for each population

(3) Equal variances for all populations.

Example: Does the mean amount of nectar taken by bees depend on the concentration of caffeine in the nectar?

Hypotheses

H0: Mean amount of nectar does not differ among caffeine concentrations HA: Mean amount of nectar differs among caffeine concentrations

Example:

caffeine concentration mean standard

deviation n

50 ppm 0.008 0.289 5 100 ppm -0.172 0.169 5 150 ppm 0.376 0.309 5 200 ppm 0.378 0.393 5

Does the mean amount of nectar taken by bees depend on the concentration of caffeine in the nectar?

Mean squares Error

SSerror = dfisi2∑

= 4 0.289( )2 + 4 0.169( )2 + 4 0.309( )2 + 4 0.393( )2

=1.4482

dferror = 4+ 4+ 4+ 4 =16

MSerror =1.448216

= 0.0905

Mean Squares Group:

X =5 0.008( )+ 5 −0.172( )+ 5 0.376( )+ 5 0.378( )

5+ 5+ 5+ 5= 0.1475

SSgroup = 5(0.008-0.1475)2 + 5(-0.172-0.1475)2 + 5(0.376-0.1475)2 + 5(0.378-0.1475)2

=1.1344

€

SSgroup = ni X i − X ( )2∑

Mean Squares Group:

dfgroup= k – 1 = 4-1=3

MSgroups =SSgroupsdfgroups

=1.13443

= 0.3781

The test statistic for ANOVA is F

F =MSgroupMSerror

=0.37810.0905

= 4.18

MSgroupis always in the numerator, MSerror is always in the denominator

Compare to Fα(1),df_group,df_error

F0.05(1),3,16 = 3.24 F0.025(1),3,16 = 4.08 F0.01(1),3,16 = 5.29

4.08<4.18<5.29 0.025>P>0.01 and we can reject the null hypothesis. The mean amount of nectar taken differs for at least one of the caffeine concentrations

ANOVA table

Source SS df MS F P Group 1.1344 3 0.3781 4.18 <0.025 Error 1.4482 16 0.0905 Total 2.5826 19

In-class exercise Does PLP1 gene expression differ among people with schizophrenia, bipolar disorder and control groups?

Group mean standard deviation n

control -0.004 0.218 15 schizophrenia -0.195 0.182 15

bipolar -0.263 0.151 15

Variation Explained: R2

R2 =SSgroupsSStotal

The fraction of variation in Y that is “explained” by differences among groups

Kruskal-Wallis test

•  A non-parametric test similar to a single factor ANOVA

•  Uses the ranks of the data points

Multiple comparisons

Probability of at least one Type I error in N tests = 1-(1-α)N

For 20 tests, the probability of at least one Type I error is ~65%.

For 100 tests, ~99.4%!

"Bonferroni correction" for multiple comparisons

Uses a smaller α value:

€

" α =α

number of tests

Which groups are different?

After finding evidence for differences among means with ANOVA, sometimes we want to know: Which groups are different from which others?

One method for this: the Tukey-Kramer test

The Tukey-Kramer test

Done after finding variation among groups with single-factor ANOVA.

Compares all group means to all other

group means

With the Tukey-Kramer method, the probability of making at least one Type 1 error throughout the course of testing all pairs of means is no greater

than the significance level α.

Groups which cannot be distinguished share the same letter.

Another imaginary example:

€

H0 : µ1 = µ2

H0 : µ1 = µ3

H0 : µ2 = µ3

Cannot reject

Cannot reject

Reject

Chapter 16: Correlation between numerical variables

Two variables: Which test?

Explanatory variable

Categorical Numerical

Categorical Contingencyanalysis

Logisticregression

Survivalanalysis

Responsevariable

Numerical t-test

Analysis ofvariance

Regression

Correlation

Scatter plot

Tattersall et al. (2004) Journal of Experimental Biology 207:579-585

Correlation: r

•  r is called the “correlation coefficient”

•  Describes the relationship between two numerical variables

•  Parameter: ρ (rho) Estimate: r

•  -1 < ρ < 1 -1 < r < 1

Estimating the correlation coefficient

€

r =

Xi − X ( )∑ Yi − Y ( )

Xi − X ( )2∑ Yi − Y ( )2∑

“Sum of products”

“Sum of squares”

Standard error of r

€

SEr =1− r 2

n − 2

Example

How strong is the association between

number of encounters with aggressive adults as a chick and future aggressive behavior?

Example

Number of visits

Future aggressive behavior

1 -‐0.80 7 -‐0.92 15 -‐0.80 4 -‐0.46 11 -‐0.47 14 -‐0.46 23 -‐0.23 14 -‐0.16 9 -‐0.23 5 -‐0.23 4 -‐0.16 10 -‐0.10

Number of visits

Future aggressive behavior

13 -‐0.10 13 0.04 14 0.13 12 0.19 13 0.25 9 0.23 8 0.15 18 0.23 22 0.31 22 0.18 23 0.17 31 0.39

X is events experienced while nestling, Y is future behavior

X∑ = 315 Y∑ = −2.85

X 2∑ = 5329 Y 2∑ = 3.5553

XY∑ = −4.32 n = 24

Shortcuts

€

Xi − X ( ) Yi −Y ( )i=1

n

∑ = XiYi∑$

% & &

'

( ) ) −

Xi Yi∑∑

n

Xi − X ( )2i=1

n

∑ = Xi2( )∑ −

Xi∑$

% & &

'

( ) )

2

n

Yi −Y ( )2i=1

n

∑ = Yi2( )∑ −

Yi∑$

% & &

'

( ) )

2

n

Finding r

Xi − X( ) Yi −Y( )i=1

n

∑ = XiYi∑#

$%

&

'(−

Xi Yi∑∑n

= −4.32−315( ) −2.85( )

24= 33.086

Xi − X( )2

i=1

n

∑ = Xi2( )∑ −

Xi∑#

$%

&

'(

2

n= 5329−

315( )2

24=1194.625

Yi −Y( )2

i=1

n

∑ = Yi2( )∑ −

Yi∑#

$%

&

'(

2

n= 3.5553−

−2.85( )2

24= 3.217

r = 33.0861194.625( ) 3.217( )

= 0.534

SEr =1− r2

n− 2=1− (0.534)2

24− 2= 0.180

If ρ = 0,...

€

t =rSEr

r is normally distributed with mean 0

with df = n -2

Example

•  Are the effects of new mutations on mating success and productivity correlated?

•  Data from various visible mutations in Drosophila melanogaster

Hypotheses

H0: Mating success and productivity are not related (ρ = 0).

HA: Mating success and productivity are

correlated (ρ ≠ 0).

X is productivity, Y is the mating success

€

X∑ = −24.228 Y∑ = 9.498

X 2∑ = 35.1808 Y 2∑ = 4.5391

XY∑ = −4.62741 n = 31

Finding r

€

Xi − X ( ) Yi −Y ( )i=1

n

∑ = XiYi∑$

% & &

'

( ) ) −

Xi Yi∑∑

n

= −4.627 −−24.228( ) 9.4982( )

31= 2.796

Xi − X ( )2i=1

n

∑ = Xi2( )∑ −

Xi∑$

% & &

'

( ) )

2

n= 35.1808 −

−24.228( )2

31=16.245

Yi −Y ( )2i=1

n

∑ = Yi2( )∑ −

Yi∑$

% & &

'

( ) )

2

n= 4.539 −

9.49824( )2

31=1.6289

€

r =2.796

16.245( ) 1.6289( )= 0.5435

SEr =1− r2

n− 2=

0.704529

= 0.1558

t = rSEr

=0.54350.1558

= 3.49

df= n-2=31-2=29

t=3.49 is greater than t0.05(2), 29 = 2.045, so we can reject the null hypothesis and say that productivity and male mating success are correlated across genotypes.

Correlation assumes...

• Random sample

• X is normally distributed with equal variance for all values of Y

• Y is normally distributed with equal variance for all values of X

Bivariate Normal Distribution

•  The relationship between X and Y is linear

•  The cloud of points in a scatter plot of X and Y has a circular or elliptical shape •  The frequency distribution of X and Y separately are normal

Most Frequent departures from bivariate normal distribution

Spearman's rank correlation

•  An alternative to correlation that does not make so many assumptions

Example: Spearman's rs VERSIONS: 1. Boy climbs up rope, climbs down again 2. Boy climbs up rope, seems to vanish, re-appears at top, climbs down again 3. Boy climbs up rope, seems to vanish at top 4. Boy climbs up rope, vanishes at top, reappears somewhere the audience was not looking 5. Boy climbs up rope, vanishes at top, reappears in a place which has been in full view

Example: Spearman's rs

Hypotheses H0: The difficulty of the described trick is not correlated with the time elapsed since it was observed. HA: The difficulty of the described trick is correlated with the time elapsed since it was observed.

Years Elapsed Rank Years Impressiveness Score Rank Impressiveness 2 1 1 2

5 3.5 1 2

5 3.5 1 2

4 2 2 5

17 5.5 2 5

17 5.5 2 5

31 13 3 7

20 7 4 12.5

22 8 4 12.5

25 9 4 12.5

28 10.5 4 12.5

29 12 4 12.5

34 14.5 4 12.5

43 17 4 12.5

44 18 4 12.5

46 19 4 12.5

34 14.5 4 12.5

28 10.5 5 19.5

39 16 5 19.5

50 20.5 5 19.5

50 20.5 5 19.5

Finding rs

Ri − R( ) Si − S( )i=1

n

∑ = RiSi∑#

$%

&

'(−

Ri Si∑∑n

= 566

Ri − R( )2

i=1

n

∑ = Ri2( )∑ −

Ri∑#

$%

&

'(

2

n= 767.5

Si − S( )2

i=1

n

∑ = Si2( )∑ −

Si∑#

$%

&

'(

2

n= 678.5

rS =566

767.5( ) 678.5( )= 0.784

rS(0.05,21)=0.434 rS(0.01,21)=0.550 Since rS=0.784 is greater than 0.550, P<0.01 We reject the null hypothesis There is a positive correlation between the impressiveness score and number of years elapsed

Spearman’s rank correlation for n >100

SE[rS ]=1− rS

2

n− 2

t = rSSE[rS ]

df = n− 2

Attenuation: The estimated correlation will be lower

if X or Y are estimated with error

Real correlation

Y estimated with measurement

error

X and Y estimated with measurement

error

Correlation depends on range

Assignment #7mfscott/lectures/16...Assignment #7 Chapter 12: 18, 24 Chapter 13: 28 Due this Friday...

Documents

Transcript of Assignment #7mfscott/lectures/16...Assignment #7 Chapter 12: 18, 24 Chapter 13: 28 Due this Friday...