Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina...

119
Elementary Statistics for Elementary Statistics for the Biological and Life the Biological and Life Sciences Sciences STAT 205 STAT 205 University of South Carolina University of South Carolina Columbia, SC Columbia, SC © 2010, University of South Carolina. All rights reserved, except © 2010, University of South Carolina. All rights reserved, except where previous rights exist. No part of this material may be where previous rights exist. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form reproduced, stored in a retrieval system, or transmitted in any form or by any means — electronic, mechanical, photoreproduction, or by any means — electronic, mechanical, photoreproduction, recording, or scanning — without the prior written consent of the recording, or scanning — without the prior written consent of the University of South Carolina. University of South Carolina.

Transcript of Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina...

Page 1: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

Elementary Statistics for the Elementary Statistics for the Biological and Life SciencesBiological and Life Sciences

STAT 205STAT 205

University of South CarolinaUniversity of South CarolinaColumbia, SCColumbia, SC

© 2010, University of South Carolina. All rights reserved, except where previous rights © 2010, University of South Carolina. All rights reserved, except where previous rights exist. No part of this material may be reproduced, stored in a retrieval system, or exist. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or by any means — electronic, mechanical, photoreproduction, transmitted in any form or by any means — electronic, mechanical, photoreproduction, recording, or scanning — without the prior written consent of the University of South recording, or scanning — without the prior written consent of the University of South Carolina.Carolina.

Page 2: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 2

Chapter 9: Inferences forChapter 9: Inferences for

Paired SamplesPaired Samples

Selected tables and figures from Samuels, M. L., and Witmer, J. A., Selected tables and figures from Samuels, M. L., and Witmer, J. A., StatisticsStatistics forfor thethe LifeLife SciencesSciences, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by per-, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by per-mission.mission.

Page 3: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 3

Independence ViolationsIndependence Violations

In some settings, the In some settings, the independenceindependence be- be-tween samples in the 2-sample t-test is tween samples in the 2-sample t-test is violated, invalidating the methods used in violated, invalidating the methods used in Chapter 7.Chapter 7.

Secs. 7.9–7.10 go into more detail on model Secs. 7.9–7.10 go into more detail on model violations.violations.

One special case where we can provide a One special case where we can provide a solution is that of solution is that of PAIRED DATAPAIRED DATA..

Page 4: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 4

Paired DataPaired Data

Suppose the effect of some treatment or Suppose the effect of some treatment or stimulus is studied stimulus is studied on the same subjectson the same subjects (say, “before”–“after, “right”–“left”, etc.).(say, “before”–“after, “right”–“left”, etc.).

Independence is clearly violated!Independence is clearly violated!

But (!), since the data are so clearly But (!), since the data are so clearly “paired,” the differences “paired,” the differences

d = Yd = Y11 – Y – Y22

can still inform us about the treatment can still inform us about the treatment effect.effect.

Page 5: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 5

Paired Data ModelPaired Data Model

Suppose YSuppose Yi1i1 ~ i.i.d. N(µ ~ i.i.d. N(µ11,,1122) is ) is pairedpaired with Y with Yi2i2 ~ ~

i.i.d. N(µi.i.d. N(µ22,,2222) at each i = 1, …, n.) at each i = 1, …, n.

Then, for dThen, for dii = Y = Yi1i1 – Y – Yi2i2, we know from Rule E1 in , we know from Rule E1 in

Ch. 3. that Ch. 3. that

µµdd = E{d = E{dii} = E{Y} = E{Yi1i1 – Y – Yi2i2}}

= E{Y = E{Yi1i1} – E{Y} – E{Yi2i2} = µ} = µ11 – µ – µ22..

In fact, under this model dIn fact, under this model dii ~ i.i.d. N(µ ~ i.i.d. N(µdd,,dd22) ) ((dd

22 is a is a

complicated function of the model parameters)complicated function of the model parameters)

Page 6: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 6

Sample Mean DifferenceSample Mean Difference

If dIf dii ~ i.i.d. N(µ ~ i.i.d. N(µdd,,dd22), i = 1, …, n, then), i = 1, …, n, then

which can make inferences on µwhich can make inferences on µdd using our using our

previous application of the t-distribution:previous application of the t-distribution:

d - µd

SE(d) ~ t(n - 1)

where SE(d) = Sdn

= 1

n - 1 (di - d)

2i=1

n

n

d ~ N(µd , d

2

n ) ,

Page 7: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 7

Conf. Interval on µConf. Interval on µdd

Using the t-distribution feature for Using the t-distribution feature for

yields our typical form of confidence yields our typical form of confidence

interval on µinterval on µdd::

where df = n – 1 = (# pairs) – 1.where df = n – 1 = (# pairs) – 1.

d

d ± t/2SE(d)

Page 8: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 8

Example 9.3Example 9.3

Ex. 9.3Ex. 9.3: Y: Y11 = wt. loss after appetite inhib.; = wt. loss after appetite inhib.;

YY22 = wt. loss = wt. loss inin samesame womanwoman after placebo: after placebo:

Page 9: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 9

Example 9.3 – Conf. IntervalExample 9.3 – Conf. Interval

We have df = n–1 = 9–1 = 8, so for a 95% We have df = n–1 = 9–1 = 8, so for a 95% conf. interval on µconf. interval on µdd, we employ , we employ tt.025.025 = 2.306 = 2.306

(from Table 4).(from Table 4).

The 95% conf. interval isThe 95% conf. interval is

or 0.45 < µor 0.45 < µdd < 1.55 kg. < 1.55 kg.

d ± t.025SE(d) = d ± t.025Sdn

= 1.00 ± (2.306)0.729

= 1.00 ± (2.306)(0.24)

= 1.00 ± 0.55

Page 10: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 10

Hypothesis Tests on µHypothesis Tests on µdd

To test HTo test Hoo:µ:µdd = 0 find t = 0 find tss = =

Then, reject HThen, reject Hoo vs. vs.

• HHAA:µ:µdd ≠ 0, ≠ 0, when P = 2P{t(n–1) > |twhen P = 2P{t(n–1) > |tss|} ≤ |} ≤

• HHAA:µ:µdd > 0, > 0,

when P = P{t(n–1) > twhen P = P{t(n–1) > tss} ≤ } ≤

• HHAA:µ:µdd < 0, < 0,

when P = P{t(n–1) < twhen P = P{t(n–1) < tss} ≤ } ≤

d - 0SE(d)

Page 11: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 11

t-Test Rejection Regionst-Test Rejection Regions

To test HTo test Ho o :µ:µdd = 0 using rejection regions, reject = 0 using rejection regions, reject

HHoo vs. vs.

• HHA A :µ:µdd ≠ 0, ≠ 0,

when |t when |tss| ≥ t| ≥ t/2/2 (with df = n–1)(with df = n–1)

• HHA A :µ:µdd > 0, > 0,

when t when tss ≥ t ≥ t (with df = n–1)(with df = n–1)

• HHA A :µ:µdd < 0, < 0,

when t when tss ≤ –t ≤ –t(with df = n–1)(with df = n–1)

Page 12: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 12

Example 9.6Example 9.6Ex. 9.6Ex. 9.6: Y: Y11 = squirrel dist. to person chasing; = squirrel dist. to person chasing;

YY22 = squirrel dist. to nearest tree (n = 11). = squirrel dist. to nearest tree (n = 11).

Same squirrelSame squirrel each time, so data are paired: each time, so data are paired:

Page 13: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 13

Example 9.6 (cont’d)Example 9.6 (cont’d)

(Note: in Fig. 9.3 we find that Y(Note: in Fig. 9.3 we find that Y22 does not appear does not appear

normal, but the differences dnormal, but the differences dii do. So, we contin- do. So, we contin-

ue with the t-test.)ue with the t-test.)

Set Set = 0.10. = 0.10. Test H Test Hoo:µ:µdd = 0 vs. H = 0 vs. HAA:µ:µdd ≠ 0. ≠ 0.

We find tWe find tss = =

Apply P-value approach: find P = Apply P-value approach: find P =

2P{t(n–1) > 2P{t(n–1) > ||ttss||} = 2P{t(10) > 1.613}} = 2P{t(10) > 1.613} →→

dsd/ n

= 72148/ 11

= 7244.62

= 1.613

Page 14: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 14

Example 9.6 – P-valueExample 9.6 – P-value

From Table 4:From Table 4:P{t(10) > 1.812} = 0.05P{t(10) > 1.812} = 0.05

P{t(10) > 1.613} = between 0.05 and 0.10P{t(10) > 1.613} = between 0.05 and 0.10

P{t(10) > 1.372} = 0.10P{t(10) > 1.372} = 0.10

So, So, = 0.10 < P < 0.20 = 0.10 < P < 0.20 we we failfail toto rejectreject H Hoo and and conclude there is no conclude there is no

significant difference in mean distances.significant difference in mean distances.

Can find exact P = 0.1382 via TI-84 or R.Can find exact P = 0.1382 via TI-84 or R.

Page 15: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 15

More on Paired DesignMore on Paired Design

As n As n ∞∞, the CLT allows use of the t-, the CLT allows use of the t-distribution for paired data, so these distribution for paired data, so these inferences are available in large inferences are available in large samples.samples.

In small samples, a distribution-free In small samples, a distribution-free approach is possible (as we’ll see in approach is possible (as we’ll see in Sec. 9.4)Sec. 9.4)

Additional features of the paired design Additional features of the paired design are discussed in Sec. 9.3.are discussed in Sec. 9.3.

Page 16: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 16

Sign TestSign Test

In small samples with non-normal paired In small samples with non-normal paired differences, a distribution-free approach is differences, a distribution-free approach is available, known as the available, known as the SIGN TESTSIGN TEST..

• For paired data YFor paired data Yi1i1, Y, Yi2i2, find d, find dii = Y = Yi1i1–Y–Yi2i2 and and

taketake WWii = {sign of d = {sign of dii} }

Under HUnder Hoo:no difference between Y:no difference between Yi1i1 & Y & Yi2i2, we , we

expect dexpect dii ≈ 0 such that ≈ 0 such that

P{WP{Wii > 0} = P{W > 0} = P{Wii < 0} = 1/2. < 0} = 1/2.

• Ignore any dIgnore any dii = 0. Let n = 0. Let ndd = # non-zero d = # non-zero dii’s.’s.

Page 17: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 17

Sign test (cont’d)Sign test (cont’d)

To set up sign test:To set up sign test:a)a) Select Select ..

b)b) Determine HDetermine HAA from subject-matter from subject-matter

principles. Possibilities areprinciples. Possibilities are

““directionaldirectional”:”:

H HAA: effect in group 1 > effect in group 2: effect in group 1 > effect in group 2

H HAA: effect in group 1 < effect in group 2: effect in group 1 < effect in group 2

““non-directionalnon-directional”:”:

H HAA: effect in group 1 ≠ effect in group 2: effect in group 1 ≠ effect in group 2

Page 18: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 18

Sign Test StatisticSign Test Statistic

The test statistic is BThe test statistic is Bss, and it depends on H, and it depends on HAA. . Let NLet N++ = {# W = {# Wii > 0} and N > 0} and N–– = {# W = {# Wii < 0}. < 0}.

Then,Then, N N++ if Hif HAA: Y: Y11 > Y > Y22

B Bss = = N N–– if Hif HAA: Y: Y11 < Y < Y22 max{N max{N++,N,N––} } if Hif HAA: Y: Y11 ≠ Y ≠ Y22

Reject HReject Hoo in favor of H in favor of HAA when B when Bss exceeds a exceeds a critical point from Table 7critical point from Table 7(e.g., if H(e.g., if HAA:Y:Y11 > Y > Y22, reject when B, reject when Bss ≥ b ≥ b).).

Page 19: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 19

(Portion of) Table 7, p. 684(Portion of) Table 7, p. 684

Page 20: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 20

Sign Test P-valueSign Test P-value

Notice that this is a BInS setting: BNotice that this is a BInS setting: Bss is the is the

number of “successes” among nnumber of “successes” among ndd binary binary

trials where, under Htrials where, under Hoo, P{success} = ., P{success} = .

So, if HSo, if Hoo is true, B is true, Bss ~ Bin ~ Bin((nndd, , )). Thus for. Thus for

HHAA:effect 1 > effect 2, set P = P{Bin(n:effect 1 > effect 2, set P = P{Bin(ndd, ) ≥ B, ) ≥ Bss}.}.

HHAA:effect 1 < effect 2, set P = P{Bin(n:effect 1 < effect 2, set P = P{Bin(ndd, ) , ) ≥≥ B Bss},},

HHAA:effect 1 ≠ effect 2, :effect 1 ≠ effect 2, set Pset P == 2P{Bin(n2P{Bin(ndd, ) ≥ B, ) ≥ Bss},},

and reject Hand reject Hoo when P ≤ when P ≤ ..

12

12

121212

Page 21: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 21

Example 9.12Example 9.12

Ex. 9.12Ex. 9.12: Y = skin graft survival (days).: Y = skin graft survival (days).

Group 1: HL-antigen compatibility “close”Group 1: HL-antigen compatibility “close”Group 2: HL-antigen compatibility “poor”Group 2: HL-antigen compatibility “poor”

Page 22: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 22

Example 9.12 (cont’d)Example 9.12 (cont’d)

With this small data set, normality is brought With this small data set, normality is brought into question (and, data have ‘censored’ into question (and, data have ‘censored’ feature; see patients #3 and #10). So, a sign feature; see patients #3 and #10). So, a sign test is used.test is used.

Set Set = 0.05. = 0.05. Take H Take Hoo: “close” = “poor” : “close” = “poor”

vs. Hvs. HAA: “close” > “poor” (since we expect : “close” > “poor” (since we expect

poorer survival in the “poor” group).poorer survival in the “poor” group).

In Table 9.7 we see NIn Table 9.7 we see N++ = 9 (and N = 9 (and N–– = 2). So = 2). So

take take BBss = 9 = 9..

Page 23: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 23

Example 9.12 – P-valueExample 9.12 – P-value

Let B ~ Bin(11 , 0.5), so that the P-value Let B ~ Bin(11 , 0.5), so that the P-value here is P = P{B ≥ 9}here is P = P{B ≥ 9}

= P{B = 9} + P{B = 10} + P{B = 11}= P{B = 9} + P{B = 10} + P{B = 11}

= 11C9 (12)

9(1

2)2 + 11C10 (1

2)10

(12)

1 + 11C11 (1

2)11

(12)

0

= 11!9! 2!

(12)

11+ 11!

10! 1!(1

2)11

+ 11!11! 0!

(12)

11

= (55 + 11 + 1)(12)

11 = 67

211 = 0.033.

Page 24: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 24

Example 9.12 (concluded)Example 9.12 (concluded)

Since P = 0.033 < 0.05 = Since P = 0.033 < 0.05 = , we , we reject reject HHoo

and and conclude that graft survival is conclude that graft survival is significantly higher in the “close” group.significantly higher in the “close” group.

(Note that the Binomial P-value can be (Note that the Binomial P-value can be computed via TI-84.)computed via TI-84.)

To use the rejection region approach for To use the rejection region approach for

these data: reject Hthese data: reject Hoo if B if Bss ≥ b ≥ b.05.05 = 9 from = 9 from

Table 7. Since BTable 7. Since Bss = 9 ≥ 9, we still = 9 ≥ 9, we still reject Hreject Hoo..

Page 25: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 25

Chapter 10: Categorical Data and Chapter 10: Categorical Data and

Contingency Table AnalysisContingency Table Analysis

(Coverage order: Secs. 10.7(Coverage order: Secs. 10.710.210.210.310.310.1)10.1)

Selected tables and figures from Samuels, M. L., and Witmer, J. A., Selected tables and figures from Samuels, M. L., and Witmer, J. A., StatisticsStatistics forfor thethe LifeLife SciencesSciences, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by per-, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by per-mission.mission.

Page 26: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 26

Sec. 10.7: 2-Sample Sec. 10.7: 2-Sample Proportion DataProportion Data

Returning to the independent (two-)sample Returning to the independent (two-)sample case, suppose now the data are from a BInS case, suppose now the data are from a BInS setting:setting: Y Y11 ~ Bin(n ~ Bin(n11,p,p11) indep. of Y) indep. of Y22 ~ Bin(n ~ Bin(n22,p,p22))

Of interest is the difference pOf interest is the difference p11 – p – p22..

A good point estimator for pA good point estimator for p11 – p – p22 is the is the

difference in sample proportions difference in sample proportions p1 - p2 = Y1

n1 - Y2

n2

Page 27: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 27

Conf. Intervals for pConf. Intervals for p11–p–p22

But (!) for building conf. intervals on pBut (!) for building conf. intervals on p11 – p – p22

we apply our previous AC strategy and start we apply our previous AC strategy and start withwith

Then, findThen, find

p1 - p2 = Y1 + 1n1 + 2

- Y2 + 1n2 + 2

SE(p1 - p2) = p1(1-p1)n1 + 2

+ p2(1-p2)n2 + 2

Page 28: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 28

Agresti-Caffo Conf. IntervalsAgresti-Caffo Conf. Intervals

DEF’NDEF’N: When Y: When Y11 ~ Bin(n ~ Bin(n11,p,p11) indep. of Y) indep. of Y22

~ Bin(n~ Bin(n22,p,p22), the ), the 95% AGRESTI-CAFFO 95% AGRESTI-CAFFO

CONFIDENCE INTERVAL for pCONFIDENCE INTERVAL for p11 – p – p22 is is

where at where at = 0.05 we use z = 0.05 we use z0.0250.025 = 1.96. (Generali- = 1.96. (Generali-

zations exist for other values of zations exist for other values of .).)

p1 - p2 ± z/2SE(p1 - p2) =

Y1 + 1n1 + 2

- Y2 + 1n2 + 2

± z/2p1(1-p1)n1 + 2

+ p2(1-p2)n2 + 2

Page 29: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 29

Example 10.37Example 10.37

Ex. 10.37Ex. 10.37 (from Ex. 10.11 – see below): (from Ex. 10.11 – see below):YY11 = # patients angina-free after Timolol trt. = # patients angina-free after Timolol trt.

YY22 = # patients angina-free after placebo. = # patients angina-free after placebo.

Data from Ex. 10.11 (Table 10.4): Data from Ex. 10.11 (Table 10.4):

Page 30: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 30

Example 10.37 (cont’d)Example 10.37 (cont’d)

The Agresti-Caffo point estimator isThe Agresti-Caffo point estimator is

Associated SE isAssociated SE is

p1 - p2 = 44 + 1160 + 2

- 19 + 1147 + 2

= 45162

- 20149

= .278 - .134 = .144

SE(p1 - p2) = (.278)(.722)162

+ (.134)(.866)149

Page 31: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 31

Example 10.37 (concluded)Example 10.37 (concluded)

From this the 95% conf. interval isFrom this the 95% conf. interval is

or 0.056 < por 0.056 < p11 – p – p22 < 0.232. < 0.232.

p1 - p2 ± z/2SE(p1 - p2)

= 0.144 ± (1.96) (.278)(.722)162

+ (.134)(.866)149

= 0.144 ± (1.96)(0.0449) = 0.144 ± 0.088

Page 32: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 32

Sec. 10.2: Testing pSec. 10.2: Testing p11 vs. p vs. p22

For testing HFor testing Hoo: p: p11 = p = p22, we introduce a new , we introduce a new

construction: the construction: the contingencycontingency tabletable..

DEF’NDEF’N: A : A 222 CONTINGENCY TABLE2 CONTINGENCY TABLE is a is a tabular arrangement of count data tabular arrangement of count data representing how the success & failure representing how the success & failure frequencies relate to an explanatory factor. frequencies relate to an explanatory factor.

For testing HFor testing Hoo: p: p11 = p = p22, the column factor , the column factor

delineates Group 1 vs. Group 2 and the row delineates Group 1 vs. Group 2 and the row factor delineates success vs. failure.factor delineates success vs. failure.

Page 33: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 33

Basic structure of a 2Basic structure of a 22 contingency table:2 contingency table:

Grp. 1 Grp. 2

# Success Y1 Y2

# Failures n1–Y1 n2–Y2

(Col.) Total n1 n2

Notice that we can read the sample propor-Notice that we can read the sample propor-tions straight from the table:tions straight from the table:

222 Contingency Table2 Contingency Table

p1 = Y1n1

, p2 = Y2n2

Page 34: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 34

Example 10.11Example 10.11

Ex 10.11Ex 10.11: (Ex. 10.37, cont’d) Angina expt.: (Ex. 10.37, cont’d) Angina expt.

Timolol Placebo (Row) Tot.

# Angina-free 44 19 63

# Angina 116 128 244 (Col.) Tot. 160 147 307

Of interest is testing whether Angina status Of interest is testing whether Angina status is associated with Timolol trt., i.e., do the is associated with Timolol trt., i.e., do the row and column factors “interact”?row and column factors “interact”?

Page 35: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 35

Testing pTesting p11 vs. p vs. p22

To test HTo test Hoo:p:p11 = p = p22, there are many available , there are many available

approaches. We employ the contingency approaches. We employ the contingency table since it can be extended to more than 2 table since it can be extended to more than 2 row or column levels (see Sec. 10.5).row or column levels (see Sec. 10.5).

The table allows for construction of a statistic The table allows for construction of a statistic that compares the “that compares the “observedobserved” data against ” data against their “their “expectedexpected” values under a pre-specified ” values under a pre-specified model, say, the model under Hmodel, say, the model under Hoo:p:p11 = p = p22..

Page 36: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 36

DEF’NDEF’N: : PEARSON’S PEARSON’S 22 (CHI-SQUARE) (CHI-SQUARE) STATISTICSTATISTIC is is

is sometimes called a “goodness-of-fit” is sometimes called a “goodness-of-fit” statistic (for reasons explained in Sec. 10.1).statistic (for reasons explained in Sec. 10.1).

For application in a 2For application in a 22 contingency table, the 2 contingency table, the “O” values are the four counts in the table (Y“O” values are the four counts in the table (Y11, ,

YY22, n, n11–Y–Y11, n, n22–Y–Y22), and the “E” values are their ), and the “E” values are their

expected values under Hexpected values under Hoo:p:p11 = p = p22..

Xs2 = (O - E)2

E

Pearson’s Pearson’s 22 Statistic Statistic

Xs2

Page 37: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 37

““E” valuesE” values

But, what But, what areare the “E” values under the “E” values under HHoo:p:p11 = p = p22??

Well, Well, ifif HHoo is true, we expect both Y is true, we expect both Y11/n/n11 and and

YY22/n/n22 to estimate the same value, say, p. to estimate the same value, say, p.

We can estimate this common p using a We can estimate this common p using a weighted (“pooled”) estimator:weighted (“pooled”) estimator:

ppool = n1p1 + n2p2n1 + n2

= n1(Y1 n1) + n2(Y2 n2)

n1 + n2

= Y1 + Y2n1 + n2

Xs2

Page 38: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 38

““E” successesE” successes

Now, if there are nNow, if there are n11 total obsv’ns for Grp. 1, then total obsv’ns for Grp. 1, then

we “expect” nwe “expect” n11pppoolpool of these to be successes. of these to be successes.

This isThis is

Similarly, with nSimilarly, with n22 total obsv’ns in Grp. 2 we total obsv’ns in Grp. 2 we

expect nexpect n22pppoolpool successes: successes:

n1ppool = n1(Y1 + Y2)n1 + n2

n2ppool = n2(Y1 + Y2)n1 + n2

Page 39: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 39

““E” failuresE” failures

For the expected # of failures, just subtract For the expected # of failures, just subtract the “E” successes from each total, nthe “E” successes from each total, n jj::

n1 - n1ppool = n1n1 + n2 - Y1 - Y2

n1 + n2

= n1(n1 - Y1 + n2 - Y2)n1 + n2

and

n2 - n2ppool = = n2(n1 - Y1 + n2 - Y2)n1 + n2

Page 40: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 40

Grp. 1 Grp. 2 Row tot.

# successn1(Y1+Y2)

n1 + n2

n2(Y1+Y2)n1 + n2

Y1 + Y2

# failuren1(n1-Y1+n2-Y2)

n1 + n2

n2(n1-Y1+n2-Y2)n1 + n2

n1-Y1 + n2-Y2

Col. Tot. n1 n2 n1 + n2

The result is an “expected” 2The result is an “expected” 22 contingency 2 contingency table:table:

Notice the similar structure of each “E”:Notice the similar structure of each “E”:

E = (Row Total)(Col. Total)/(Grand Total) E = (Row Total)(Col. Total)/(Grand Total)

Expected 2Expected 22 Table2 Table

Page 41: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 41

Examples 10.14-10.15Examples 10.14-10.15

Exs. 10.14-10.15Exs. 10.14-10.15 (10.11 cont’d): Angina expt. (10.11 cont’d): Angina expt. ““O” table wasO” table was

Timolol Placebo Row Tot.# Angina-free 44 19 63# Angina 116 128 244

Col. Tot. 160 147 307

““E” table isE” table isTimolol Placebo Row Tot.

# Angina-free 32.83 30.17 63# Angina 127.17 116.83 244

Col. Tot. 160 147 307

(cf. Table 10.7)(cf. Table 10.7)

e.g., e.g., E = (63)(160)/307E = (63)(160)/307= 32.83= 32.83

Page 42: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 42

222 Table2 Table

In the 2In the 22 table of expected counts, note that:2 table of expected counts, note that:

the “E” values need not be integers, and we the “E” values need not be integers, and we do NOT round them;do NOT round them;

the row and column totals do not change the row and column totals do not change (they are designed not to)(they are designed not to)• This is a quick way to double-check the This is a quick way to double-check the

calculations!calculations!

Page 43: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 43

22(() Distribution) Distribution

To find the P-value, we need the null refer-To find the P-value, we need the null refer-ence distribution of ence distribution of

DEF’NDEF’N (p.394): The (p.394): The 22(() DISTRIBUTION) DISTRIBUTION

with with df is the limiting distribution of df is the limiting distribution of

Pearson’s statistic under HPearson’s statistic under Hoo..

NOTATIONNOTATION: ~ : ~ 22(())

In the special case of a 2In the special case of a 22 contingency 2 contingency table, table, = 1. = 1.

Xs2

Xs2

Xs2

Page 44: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 44

Properties of Properties of 22(() )

The The 22(() dist’n is) dist’n is

always ≥ 0always ≥ 0

skewed rightskewed right

has integer df’shas integer df’s

has upper-has upper- critical point , given in critical point , given in Table 9 (Table 9 ( must bracket P-values) must bracket P-values)

computable in DoStatcomputable in DoStat

2()

Page 45: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 45

(Portion of) Table 9, p. 686(Portion of) Table 9, p. 686

Page 46: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 46

Rejecting HRejecting Hoo

So, to reject HSo, to reject Hoo:p:p11 = p = p22 vs. H vs. HAA:p:p11 ≠ p ≠ p22, set , set and and

find = find = ∑∑(O–E) (O–E) 22/E./E.

P-valueP-value approachapproach: find P = P{: find P = P{22(1) ≥ } via (1) ≥ } via computer or bracket via Table 9 and reject Hcomputer or bracket via Table 9 and reject Hoo if if

P ≤ P ≤ ..

RejectionRejection regionregion approachapproach: find from : find from

Table 9 and reject HTable 9 and reject Hoo if if

(Notice: a 1-tailed table look-up for a 2-sided test!)(Notice: a 1-tailed table look-up for a 2-sided test!)

Xs2

Xs2

2(1)

Xs2

2(1)

Page 47: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 47

One-sided testingOne-sided testing

To find a one-sided P-value:To find a one-sided P-value:

for Hfor HAA: p: p11 > p > p22, use , use

(otherwise, report P > 0.50).(otherwise, report P > 0.50).

for Hfor HAA: p: p11 < p < p22, use , use

(otherwise, report P > 0.50).(otherwise, report P > 0.50).

P = 12P{2(1) Xs

2} if p1 < p2

P = 12P{2(1) Xs

2} if p1 > p2

Page 48: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 48

Example 10.16Example 10.16

Ex. 10.16Ex. 10.16 (10.11 cont’d): Angina expt. (10.11 cont’d): Angina expt. Set Set = 0.01. From the O and E values = 0.01. From the O and E values computed in Ex. 10.15, computed in Ex. 10.15, we find we find

Xs2 = (O - E)2

E

= (44-32.83)2

32.83 + (116-127.17) 2

127.17

+ (19-30.17)2

30.17 + (128-116.83) 2

116.83

= 10.0

Page 49: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 49

Example 10.16 (cont’d)Example 10.16 (cont’d)

To test To test H Hoo:p:p11 = p = p22 vs. H vs. HAA:p:p11 ≠ p ≠ p22, we , we

bracket P = P{bracket P = P{22(1) ≥ 10.0} from Table 9:(1) ≥ 10.0} from Table 9:

P{P{22(1) ≥ 6.63} = 0.01(1) ≥ 6.63} = 0.01

P{P{22(1) ≥(1) ≥ 10.0} = between 0.01 and 0.00110.0} = between 0.01 and 0.001

P{P{22(1) ≥ 10.83} = 0.001(1) ≥ 10.83} = 0.001

So, 0.001 < P < 0.01 (two-sided).So, 0.001 < P < 0.01 (two-sided).

Page 50: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 50

Example 10.16 (concluded)Example 10.16 (concluded)

Since P < 0.01 = Since P < 0.01 = , we , we rejectreject H Hoo and and

conclude there is a significant conclude there is a significant

difference in angina response after difference in angina response after

Timolol trt.Timolol trt.

Can Can find exact P = 0.0016 via TI-84/R. find exact P = 0.0016 via TI-84/R.

(A one-sided H(A one-sided HAA is not unreasonable here, is not unreasonable here,

but it’s easy to mess up the P-value, so be but it’s easy to mess up the P-value, so be careful!)careful!)

Page 51: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 51

Pearson’s XPearson’s X22 for 2 for 22 Table2 Table

When using the Pearson XWhen using the Pearson X22 statistic in 2 statistic in 22 2 tables, note that:tables, note that:

22(1) is only an approximation in (1) is only an approximation in finite samples. To be valid, a standard rule-of-finite samples. To be valid, a standard rule-of-thumb is to require E ≥ 1 for every cell, thumb is to require E ≥ 1 for every cell, andand E ≥ 5 (i.e., here {nE ≥ 5 (i.e., here {n11+n+n22}/4 ≥ 5).}/4 ≥ 5).

This method is antiquated for testing pThis method is antiquated for testing p11=p=p22

(esp. against 1-sided alternatives). A better (esp. against 1-sided alternatives). A better method is method is Fisher’s Exact test; see Sec. 10.4.; see Sec. 10.4.

Xs2

Page 52: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 52

Sec. 10.3: Testing AssociationSec. 10.3: Testing Association

The layout of the 2The layout of the 22 table can apply to 2 table can apply to more than just tests of pmore than just tests of p11 = p = p22..

What if the row factor represents more than What if the row factor represents more than just success-vs.-failure? (Not uncommon!)just success-vs.-failure? (Not uncommon!)

In this case, we have a In this case, we have a single samplesingle sample with n with n observations and with observations and with twotwo explanatory explanatory factors (each having two levels).factors (each having two levels).

Page 53: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 53

General 2General 22 table 2 table (cf. Table 10.13)(cf. Table 10.13):: column factor

level C1 level C2 row tot.

row level R1 a b a+bfactor level R2 c d c+d

col. tot. a+c b+d n

Notice that n = a + b + c + d.Notice that n = a + b + c + d.

Natural question: does the column factor Natural question: does the column factor affect the row factor, and/or affect the row factor, and/or vice versavice versa??

General 2General 22 Table2 Table

Page 54: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 54

Testing AssociationTesting Association

Statistically, asking if the 2 factors Statistically, asking if the 2 factors interrelate is an issue of “association”:interrelate is an issue of “association”:• HHoo: there is : there is nono associationassociation between the row between the row

and column factors and column factors

• HHAA: there is : there is somesome associationassociation between the between the row and column factors row and column factors

(An older term for “no association” is (An older term for “no association” is “independence,” but don’t confuse this with “independence,” but don’t confuse this with statistical independence from Chap. 3.)statistical independence from Chap. 3.)

Page 55: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 55

Pearson’s XPearson’s X22 for Association for Association

We can test the association hypotheses We can test the association hypotheses using Pearson’s statistic, = using Pearson’s statistic, = ∑∑(O–E)(O–E)22/E./E.

Here, the “O” terms are just a, b, c, and d.Here, the “O” terms are just a, b, c, and d.

The “E” terms are calculated as in Sec. 10.2:The “E” terms are calculated as in Sec. 10.2:

For instance, in the (C1,R1) cell we have For instance, in the (C1,R1) cell we have EE1111 = (a+b)(a+c)/n, etc. = (a+b)(a+c)/n, etc.

E = (Row Total)(Col. Total)Grand Total

Xs2

Page 56: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 56

Rejecting HRejecting Hoo

As in Sec. 10.2, ~ As in Sec. 10.2, ~ 22(1) under H(1) under Hoo, so for , so for

fixed fixed , reject H, reject Hoo as follows: as follows:

P-valueP-value approachapproach: find P = P{: find P = P{22(1) ≥ } via (1) ≥ } via computer or bracket via Table 9 and reject Hcomputer or bracket via Table 9 and reject Hoo if if

P ≤ P ≤ ..

RejectionRejection regionregion approachapproach: find : find from from

Table 9 and reject HTable 9 and reject Hoo if if ≥≥

(Again: a 1-tailed table look-up for a 2-sided test.)(Again: a 1-tailed table look-up for a 2-sided test.)

Xs2

Xs2

2(1)

Xs2

2(1)

Page 57: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 57

Example 10.21Example 10.21

Ex. 10.21Ex. 10.21: Hair color & eye color in n = 6800 : Hair color & eye color in n = 6800 German males.German males.

““O” values in Table 10.11:O” values in Table 10.11:

Page 58: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 58

Example 10.21 (cont’d)Example 10.21 (cont’d)

We find the “E” values as:We find the “E” values as:

Dark hair Light hair row total

Dark eye 485.84 371.16 857

Light eye 3,369.16 2,573.84 5,943

col. total 3,855 2,945 6,800

Set = 0.05. Of interest is testing whether a significant association exists between hair color and eye color.

Page 59: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 59

Example 10.21 – XExample 10.21 – X22 Statistic Statistic

Given the O’s and the E’s, Given the O’s and the E’s, the test the test statistic isstatistic is

The P-value is P = P{The P-value is P = P{22(1) ≥ 313.63}.(1) ≥ 313.63}.

Xs2 = (O - E)2

E

= (726 - 485.84)2

485.84 + + (2814 - 2573.84)2

2573.84 = 313.63

Page 60: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 60

Example 10.21 – P-valueExample 10.21 – P-value

From Table 9:From Table 9:

P{P{22(1) ≥  313.63} = below 0.0001(1) ≥  313.63} = below 0.0001

P{P{22(1) ≥  15.14} = 0.0001(1) ≥  15.14} = 0.0001

So, P < 0.001 < 0.05 = So, P < 0.001 < 0.05 = we we rejectreject HHoo and and conclude that a significant conclude that a significant

association exists between hair color association exists between hair color and eye color in these males.and eye color in these males.

Can Can find P < 0.0001 via TI-84/R.find P < 0.0001 via TI-84/R.

Page 61: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 61

Notes on the Notes on the 22 Test Test

Some notes:Some notes:• 22(1) is only an approximation in (1) is only an approximation in

finite samples. The rule-of-thumb E ≥ 1 finite samples. The rule-of-thumb E ≥ 1 for every cell, for every cell, andand E ≥ 5 (here, n/4 ≥ 5) E ≥ 5 (here, n/4 ≥ 5) still applies.still applies.

• By contrast, Ex. 10.21 illustrates that By contrast, Ex. 10.21 illustrates that is is veryvery sensitive when n is large. sensitive when n is large.

• The 2The 22 table structure allows for a 2 table structure allows for a variety of “conditional” probability variety of “conditional” probability descriptions of the data; see pp. 413-416.descriptions of the data; see pp. 413-416.

Xs2

Xs2

Page 62: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 62

Notes on the Notes on the 22 Test (cont’d) Test (cont’d)

• We can We can extendextend the 2 the 2 2 table into an2 table into an r r c CONTINGENCY TABLEc CONTINGENCY TABLE

for cases when more than 2 levels exist for for cases when more than 2 levels exist for either factor. is still useful here; see Sec. either factor. is still useful here; see Sec. 10.5.10.5.

• Many variants exist of Pearson’s . One seen Many variants exist of Pearson’s . One seen often is = 2often is = 2∑∑∑∑OOlnln{O/E}, known as the {O/E}, known as the ‘Likelihood-ratio test,’ ‘LR test,’ or ‘G test.’ ‘Likelihood-ratio test,’ ‘LR test,’ or ‘G test.’ While this has useful properties, While this has useful properties, it usually it usually performs worseperforms worse than for contingency tables than for contingency tables and so is and so is NOT recommendedNOT recommended..

Xs2

Xs2

Xs2

Gs2

Page 63: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 63

Phi-Divergence StatisticPhi-Divergence Statistic

An alternative competitor to Pearson’s An alternative competitor to Pearson’s for rfor rc tables that can be recommended is c tables that can be recommended is known as the known as the

PHI-DIVERGENCE STATISTICPHI-DIVERGENCE STATISTIC::

In the 2In the 22 case, ~ 2 case, ~ 22(1) under H(1) under Hoo and so and so

is used in the same fashion as . (It can is used in the same fashion as . (It can also be extended to the ralso be extended to the rc case.)c case.)

Cs2

Xs2

Xs2

Cs2 = 8

3 O OE

- O

Page 64: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 64

Sec. 10.1: Goodness-of-FitSec. 10.1: Goodness-of-Fit

Pearson’s original idea was to use to Pearson’s original idea was to use to assess divergence in the “O”s against a assess divergence in the “O”s against a modeled value for “E”. modeled value for “E”.

AnyAny model could be proposed, not just for model could be proposed, not just for rrc tables. In this sense, measures the c tables. In this sense, measures the goodness-of-fitgoodness-of-fit of the model for “E”. of the model for “E”.

Xs2

Xs2

Page 65: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 65

Example 10.1Example 10.1 Ex. 10.1Ex. 10.1: In genetics, we believe that : In genetics, we believe that

offspring characters appear in regular offspring characters appear in regular “ratios.” E.g., in snapdragons, the offspring “ratios.” E.g., in snapdragons, the offspring of two pink (hybrid) parents producesof two pink (hybrid) parents produces

(i.e., “1:2:1”). Or, so we think!(i.e., “1:2:1”). Or, so we think!

Can this model be supported by data? (We’ll Can this model be supported by data? (We’ll see, later…)see, later…)

P{Red} = 14

, P{Pink} = 12

, P{White} = 14

Page 66: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 66

Testing Goodness-of-FitTesting Goodness-of-Fit

To use to test a model’s goodness-of-fit:To use to test a model’s goodness-of-fit:• Set Set ..

• Designate K > 1 categories that the model can Designate K > 1 categories that the model can predict (e.g., Red/Pink/White predict (e.g., Red/Pink/White K = 3). K = 3).

• Collect data (the OCollect data (the Okk’s) from a sample of size n.’s) from a sample of size n.

• Determine the EDetermine the Ekk’s for each ’s for each kk th category th category

using the model’s predictions.using the model’s predictions.

• Calculate = Calculate = ∑∑(O(Okk–E–Ekk))22/E/Ekk..

Xs2

Xs2

Page 67: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 67

Testing Goodness-of-Fit (cont’d)Testing Goodness-of-Fit (cont’d)

• The pertinent hypotheses areThe pertinent hypotheses areHHoo: model fit is adequate vs. : model fit is adequate vs.

HHAA: model fit is poor: model fit is poor

• Under HUnder Hoo, ~ , ~ 22(K–1), so (K–1), so P-value is P = P-value is P =

P{P{22(K–1) ≥ }. (K–1) ≥ }. Reject HReject Hoo if P ≤ if P ≤ . .

• Or, Or, use Rejection Region approach: use Rejection Region approach:

reject Hreject Hoo if if

Xs2

Xs2

Xs2

2(K-1)

Page 68: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 68

Exs. 10.4–10.5Exs. 10.4–10.5 (10.1 cont’d): (10.1 cont’d): Set Set = 0.10. = 0.10. Suppose a sample of n = 234 snapdragons Suppose a sample of n = 234 snapdragons crossed from pink parents yields:crossed from pink parents yields:

Color Red Pink WhiteObserved 54 122 58Expected 58.5 117 58.5

Examples 10.4–10.5Examples 10.4–10.5

EERR = n = n P(Red) P(Red) = (234)(0.25)= (234)(0.25) = 58.5= 58.5 EEPP = n = n P(Pink) P(Pink)

= (234)(0.5)= (234)(0.5) = 117= 117

EEWW = n = n P(White) P(White) = (234)(0.25)= (234)(0.25) = 58.5= 58.5

Page 69: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 69

Examples 10.4–10.5 (cont’d)Examples 10.4–10.5 (cont’d)

Test Test the goodness-of-fit of the the goodness-of-fit of the (Mendelian) hypotheses:(Mendelian) hypotheses:

HHoo: model fit to Mendelian ratios : model fit to Mendelian ratios

is adequateis adequatevs. vs.

HHAA: model fit to Mendelian ratios : model fit to Mendelian ratios

is pooris poor

Page 70: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 70

Example 10.5 – XExample 10.5 – X22 Statistic Statistic

We calculateWe calculate

Reject HReject Hoo if P if P = P{= P{22(K–1) ≥  } (K–1) ≥  }

= P{= P{22(2) ≥  0.56}(2) ≥  0.56}is less than or equal to is less than or equal to = 0.10. = 0.10.

Xs2

Xs2 = (Ok - Ek)

2

Ekk=1

3

= (54 - 58.5)2

58.5 + (122 - 117) 2

117 + (58 - 58.5)2

58.5

= 0.56

Page 71: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 71

Example 10.5 – P-valueExample 10.5 – P-value

From Table 9:From Table 9:P{P{22(2) ≥  3.22} = 0.20(2) ≥  3.22} = 0.20

P{P{22(2) ≥  0.56} = above 0.20(2) ≥  0.56} = above 0.20

So, P > 0.20 > 0.10 = So, P > 0.20 > 0.10 = we we failfail toto rejectreject H Hoo and and conclude the model fit conclude the model fit

appears adequate.appears adequate.

Can Can find exact P = 0.756 via TI-84/R. find exact P = 0.756 via TI-84/R.

Page 72: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 72

CaveatsCaveats

Some warnings: Goodness-of-fit tests Some warnings: Goodness-of-fit tests requirerequire

• categorical data (i.e., counts, not categorical data (i.e., counts, not continuous measurements)continuous measurements)

• large nlarge n

• objectively defined categoriesobjectively defined categories

So, they cannot be applied haphazardly!So, they cannot be applied haphazardly!

Page 73: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 73

Chapter 12: Linear Regression Chapter 12: Linear Regression

and Correlationand Correlation

Selected tables and figures from Samuels, M. L., and Witmer, J. A., Selected tables and figures from Samuels, M. L., and Witmer, J. A., StatisticsStatistics forfor thethe LifeLife SciencesSciences, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by per-, 3rd Ed. © 2003, Prentice Hall, Upper Saddle River, NJ. Used by per-mission.mission.

Page 74: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 74

Predictor VariablesPredictor Variables

In Chap. 10 we introduced the idea that In Chap. 10 we introduced the idea that a (categorical) response, Y, could a (categorical) response, Y, could depend on levels of an external variable.depend on levels of an external variable.

Why not extend this idea to when Y is a Why not extend this idea to when Y is a continuous (normal) measurement?continuous (normal) measurement?

We say Y is a We say Y is a RESPONSE VARIABLERESPONSE VARIABLE, , dependent upon an explanatory dependent upon an explanatory PREDICTOR VARIABLEPREDICTOR VARIABLE, X., X.

Page 75: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 75

Simple Linear ModelSimple Linear Model

DEF’NDEF’N: The : The SIMPLE LINEAR MODELSIMPLE LINEAR MODEL relating relating Y and X isY and X is

Y = bY = b00 + b + b11X. X.

• bb00 is the is the Y-INTERCEPTY-INTERCEPT of the model, the of the model, the

point where the line crosses the Y-axis.point where the line crosses the Y-axis.

• bb11 is the is the SLOPESLOPE of the model, the change in of the model, the change in

Y for a given unit change in X (“rise” over Y for a given unit change in X (“rise” over “run”).“run”).

Page 76: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 76

Y = bY = b00 + b + b11XX

XX

YY

∆∆Y = bY = b11

bb00

∆∆X = 1X = 1

Page 77: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 77

Linear RegressionLinear Regression

DEF’NDEF’N: The : The LEAST SQUARES (LS) LEAST SQUARES (LS) REGRESSION LINEREGRESSION LINE is a data-dependent fit is a data-dependent fit of a linear model. It has coefficientsof a linear model. It has coefficients

(slope) b1 = (xi - x)(yi - y)

i=1

n

(xi - x)2i=1

n

(intercept) b0 = y - b1x

Page 78: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 78

Example 12.3Example 12.3

Ex. 12.3Ex. 12.3: Y = snake weight (g): Y = snake weight (g)X = snake length (cm)X = snake length (cm)

Notice that the data appear as (xNotice that the data appear as (x ii,y,yii) pairs:) pairs:

Page 79: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 79

Example 12.4Example 12.4

Ex. 12.4Ex. 12.4 (12.3 cont’d): Snake data. (12.3 cont’d): Snake data. ScatterplotScatterplot shows a clear linear relation: shows a clear linear relation:

Page 80: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 80

Example 12.4 (cont’d)Example 12.4 (cont’d)

Table 12.3 summarizes the LS calculations:Table 12.3 summarizes the LS calculations:

Page 81: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 81

Example 12.4 – LS CoefficientsExample 12.4 – LS Coefficients

From Table 12.3 we seeFrom Table 12.3 we see

so that the LS coefficients areso that the LS coefficients are

bb11 = 1237/172 = = 1237/172 = 7.1927.192 and and

bb00 = 152 – (7.192)(63) = = 152 – (7.192)(63) = –301.096–301.096..

Thus, the LS line is Thus, the LS line is –301.096 + 7.192X–301.096 + 7.192X. Note . Note that these operations are available in TI-84/R.that these operations are available in TI-84/R.

x = 63, y = 152,

(xi - x)(yi - y)i=1

n = 1237 and (xi - x)2

i=1

n = 172

Page 82: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 82

Example 12.4 – InterpretationsExample 12.4 – Interpretations

Interpretation of LS coefficients:Interpretation of LS coefficients:

• bb11 = 7.192 indicates that a 1 cm increase in = 7.192 indicates that a 1 cm increase in

snake length leads to an estimated 7.192 g snake length leads to an estimated 7.192 g increase in snake weight.increase in snake weight.

• bb00 is the estimated weight of a snake whose is the estimated weight of a snake whose

length is 0 cmlength is 0 cm

Clearly, this is a poor Clearly, this is a poor EXTRAPOLATIONEXTRAPOLATION (p. 546) away from the bulk of the data.(p. 546) away from the bulk of the data.

Indeed, would we ever see Y < 0 ?Indeed, would we ever see Y < 0 ?

Page 83: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 83

ResidualsResiduals

DEF’NDEF’N: A : A PREDICTED VALUEPREDICTED VALUE (a.k.a. a (a.k.a. a FITTED VALUEFITTED VALUE ) is an estimate of y) is an estimate of yii based on a based on a

prediction/fitted regression equation, prediction/fitted regression equation, bb00 + b + b11xxii..

NOTATIONNOTATION::

DEF’NDEF’N: A : A RESIDUALRESIDUAL is the departure from Y is the departure from Y of a fitted value: Residof a fitted value: Residii = =

See Figure 12.6 See Figure 12.6

yi

yi - yi

Page 84: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 84

Figure 12.6Figure 12.6

Page 85: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 85

SS(Resid.)SS(Resid.)

DEF’NDEF’N: The : The RESIDUAL SUM OF SQUARESRESIDUAL SUM OF SQUARES (a.k.a. (a.k.a. SUM OF SQUARED ERRORSSUM OF SQUARED ERRORS, or , or SSESSE), is), is

DEF’NDEF’N: The : The LEAST SQUARES CRITERIONLEAST SQUARES CRITERION states that the optimal fit of a model to data states that the optimal fit of a model to data occurs when SS(Resid.) is minimized.occurs when SS(Resid.) is minimized.

Notice that under our linear model,Notice that under our linear model,

SS(Resid.) = SS(Resid.) = ∑∑(y(yii – b – b00 – b – b11xxii))22..

SS(Resid.) = (yi - yi)2

i=1

n

Page 86: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 86

Example 12.5Example 12.5

Ex. 12.5Ex. 12.5 (12.3 cont’d): Snake data. Table 12.4 (12.3 cont’d): Snake data. Table 12.4 shows the calculations that lead to SS(Resid.):shows the calculations that lead to SS(Resid.):

Page 87: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 87

Std. DeviationStd. Deviation We can use SS(Resid.) to update the accura-cy We can use SS(Resid.) to update the accura-cy

of our measure of variability. Recall that to of our measure of variability. Recall that to estimate the variation of Y we used the SDestimate the variation of Y we used the SD

But, this But, this ignoresignores any effect X has on Y. Since any effect X has on Y. Since SS(Resid.) incorporates the effect of X, it serves SS(Resid.) incorporates the effect of X, it serves as a basis for more accurate estimates of as a basis for more accurate estimates of variationvariation

SD = SY = (yi - y)2

i=1

n

n - 1

Page 88: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 88

Residual SDResidual SD

DEF’NDEF’N: The : The RESIDUAL STANDARD DEVIA-RESIDUAL STANDARD DEVIA-TIONTION from an LS fit is from an LS fit is

Notice in SNotice in SY|XY|X that the df have changed from that the df have changed from

n – 1 to n – 2, now “incorporating” the fitting n – 1 to n – 2, now “incorporating” the fitting of of 22 model parameters, b model parameters, b00 & b & b11..

SY|X = SS(Resid.)n - 2

= (yi - yi)

2i=1

n

n - 2

Page 89: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 89

Example 12.6Example 12.6

Ex. 12.6Ex. 12.6 (12.3 cont’d): Snake data. With n = 9 (12.3 cont’d): Snake data. With n = 9 data pairs, we found SS(Resid.) = 1093.66. data pairs, we found SS(Resid.) = 1093.66. Thus Thus

Compare this with the larger SD Compare this with the larger SD

for these data.for these data.

SY|X = 1093.667

= 156.238 = 12.499 g

SY = 99909 - 1

= 1248.75 = 35.338 g

Page 90: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 90

The Linear Statistical ModelThe Linear Statistical Model

To perform inferences in a linear regression, we To perform inferences in a linear regression, we need a statistical model. We start with:need a statistical model. We start with:

DEF’NDEF’N: A : A CONDITIONAL MEANCONDITIONAL MEAN is the expected is the expected value of a variable, Y, conditional on another value of a variable, Y, conditional on another variable, X.variable, X.

NOTATION: µNOTATION: µY|XY|X

DEF’NDEF’N: A : A CONDITIONAL STD. DEVIATIONCONDITIONAL STD. DEVIATION is the is the SD of a variable, Y, conditional on another SD of a variable, Y, conditional on another variable, X.variable, X.

NOTATION: NOTATION: Y|XY|X

Page 91: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 91

Linear (Regression) ModelLinear (Regression) Model

DEF’NDEF’N: The : The LINEAR (REGRESSION) MODELLINEAR (REGRESSION) MODEL

of Y on X assumesof Y on X assumes

Y = µ Y = µY|XY|X + + ,,

where the conditional mean is linear:where the conditional mean is linear:

µµY|XY|X = = 00 + + 11X X

and and is a random error term with is a random error term with

µ µ = 0 and = 0 and = = Y|XY|X

Page 92: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 92

Linear PredictionLinear Prediction

In the linear regression model, we use the LS In the linear regression model, we use the LS coefficients, bcoefficients, b00 & b & b11, to estimate , to estimate 00 & & 11, and , and

SSY|XY|X to estimate to estimate Y|XY|X..

Thus, in principle we could estimate (or Thus, in principle we could estimate (or “predict”) µ“predict”) µY|XY|X at at anyany X = X = xx via via

= b= b00 + b + b1 1 xx

Careful: when making predictions on µCareful: when making predictions on µY|XY|X

outside the range of xoutside the range of x, the “, the “extrapolationextrapolation” can ” can be very poor!be very poor!

μ xˆ XY|

Page 93: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 93

Example 12.12Example 12.12

Ex. 12.12Ex. 12.12 (12.3 cont’d): For the snake (12.3 cont’d): For the snake

data, we saw bdata, we saw b00 = –301.096 and b = –301.096 and b11 = =

7.192 (and S7.192 (and SY|XY|X = 12.499). = 12.499).

If, e.g., we wished to predict the weight If, e.g., we wished to predict the weight

of snake of snake xx = 68 cm long, we would use = 68 cm long, we would use

= –301.096 + (7.192)(68)= –301.096 + (7.192)(68)

= 187.96 g= 187.96 g

μ̂ 68XY|

Page 94: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 94

Normal Error ModelNormal Error Model

11 indicates how Y changes with (unit) indicates how Y changes with (unit)

increases in X, and thus has important increases in X, and thus has important biological interest.biological interest.

To make inferences on To make inferences on 11 we update the linear we update the linear

statistical model:statistical model:

Y = µY = µY|XY|X + +

µ µY|XY|X = = 00 + + 11X X ~ N(0, ~ N(0,Y|XY|X22))

(conditional mean is linear) (conditional SD is constant)(conditional mean is linear) (conditional SD is constant)

Page 95: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 95

Figure 12.9Figure 12.9

Page 96: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 96

Confidence Interval for Confidence Interval for 11

Under the normal error model, bUnder the normal error model, b11 is unbiased is unbiased

for for 11, with, with

Using this, a Using this, a 1 – 1 – conf. interval for conf. interval for 11 is is

b b11 ± ± tt/2/2SE(bSE(b11))

where where tt/2/2 has df = n–2 (same as S has df = n–2 (same as SY|XY|X).).

SE(b1) = SY|X

(xi - x)2i=1

n = SY|X

xi2

i=1

n - 1n xi

i=1

n 2

Page 97: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 97

Example 12.18Example 12.18

Ex. 12.18Ex. 12.18 (12.3 cont’d). For the Snake Data (12.3 cont’d). For the Snake Data (n = 9), we had b(n = 9), we had b11 = 7.19 and S = 7.19 and SY|XY|X = 12.499. For = 12.499. For

a 95% conf. interval on a 95% conf. interval on 11, we need , we need tt.05/2.05/2 = = tt.025.025 = =

2.365 (df = 9–2 = 7). Also, from Table 12.3,2.365 (df = 9–2 = 7). Also, from Table 12.3,

The 95% interval is then The 95% interval is then

(xi - x)2i=1

n = 172

b1 ± t.025SE(b1) = 7.19 ± (2.365)12.499

172

= 7.19 ± (2.365)(0.953) = 7.19 ± 2.25

Page 98: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 98

Testing Testing 11

• Similarly, we can use the t-dist’n to Similarly, we can use the t-dist’n to testtest HHoo::11 = 0. (Why 0? At = 0. (Why 0? At 11 = 0, there is = 0, there is nono

effecteffect of X on Y.) of X on Y.)

• The test statistic isThe test statistic is

• Under HUnder Hoo, this has t, this has tss ~ t(n–2). ~ t(n–2).

We can use either the P-value approach We can use either the P-value approach or the rejection region approach.or the rejection region approach.

ts = b1 - 0 SE(b1)

Page 99: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 99

P-values for Testing P-values for Testing 11

To test HTo test Hoo::11 = 0 vs. = 0 vs.

• HHAA: : 11 ≠ 0, ≠ 0,

reject Hreject Hoo when P = 2P{t(n–2) > |t when P = 2P{t(n–2) > |tss|} ≤ |} ≤

• HHAA: : 11 > 0, > 0,

reject Hreject Hoo when P = P{t(n–2) > t when P = P{t(n–2) > tss} ≤ } ≤

• HHAA: : 11 < 0, < 0,

reject Hreject Hoo when P = P{t(n–2) < t when P = P{t(n–2) < tss} ≤ } ≤

Page 100: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 100

Rejection Regions for Testing Rejection Regions for Testing 11

To test HTo test Hoo::11 = 0 using rejection regions, = 0 using rejection regions, if:if:

• HHAA: : 11 ≠ 0, ≠ 0,

reject Hreject Hoo when |t when |tss| ≥ t| ≥ t/2/2 (with df = n–2)(with df = n–2)

• HHAA: : 11 > 0, > 0,

reject Hreject Hoo when t when tss ≥ t ≥ t (with df = n–2)(with df = n–2)

• HHAA: : 11 < 0, < 0,

reject Hreject Hoo when t when tss ≤ –t ≤ –t(with df = n-2)(with df = n-2)

Page 101: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 101

Example 12.18 (cont’d)Example 12.18 (cont’d)

Ex. 12.18Ex. 12.18 (cont’d): For the Snake Data, (cont’d): For the Snake Data, Set Set = 0.05. A natural alternative to = 0.05. A natural alternative to H Hoo::11 = 0 is H = 0 is HAA::11 > 0 (why?), so > 0 (why?), so ifif the linear the linear

regression model is valid, regression model is valid, we find we find

with df = n–2 = 9–2 = 7.with df = n–2 = 9–2 = 7.

ts = b1

SE(b1) = 7.19

12.499/ 172

= 7.190.953

= 7.545

Page 102: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 102

Example 12.18 – P-valueExample 12.18 – P-value

For For the P-value, use Table 4: the P-value, use Table 4:

P{t(7) ≥  7.545} = below 0.0005P{t(7) ≥  7.545} = below 0.0005

P{t(7) ≥  5.408} = 0.0005P{t(7) ≥  5.408} = 0.0005

So, P < 0.0005 < 0.05 = So, P < 0.0005 < 0.05 = we we rejectreject H Hoo and and conclude that mean conclude that mean

snake weight increases significantly snake weight increases significantly with increasing snake length.with increasing snake length.

Can Can find P = find P = 0.000070.00007 via TI-84/R. via TI-84/R.

Page 103: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 103

rr 22

DEF’NDEF’N: The : The COEFFICIENT OF DETERMINATIONCOEFFICIENT OF DETERMINATION is is

Properties of Properties of rr 22::• 0 ≤ 0 ≤ rr 22 ≤ 1 ≤ 1

• Interpret as % variation in Y that is explained Interpret as % variation in Y that is explained by variation in Xby variation in X

• BADLY over-usedBADLY over-used

r 2 = (xi - x)(yi - y)

i=1

n 2

(xi-x)2i=1

n(yi-y)2

i=1

n

Page 104: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 104

Example 12.22Example 12.22

Ex. 12.22Ex. 12.22 (12.3 cont’d): Snake data. From (12.3 cont’d): Snake data. From

Table 12.3 we can findTable 12.3 we can find

so so rr22 = (1237) = (1237)22/(172)(9990) = 0.8905 (/(172)(9990) = 0.8905 ( 89% 89%

of variation in snake weight is explained of variation in snake weight is explained

by variation in snake length).by variation in snake length).

(xi - x)(yi - y)i=1

n

= 1237

(xi - x)2i=1

n

= 172 and (yi - y)2i=1

n

= 9990

Page 105: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 105

Random PredictorsRandom Predictors

The linear regression model with µThe linear regression model with µY|XY|X = = 00 + + 11X X

is is conditionalconditional on X. This is an important on X. This is an important distinction.distinction.

If X is itself random — which is not uncommon If X is itself random — which is not uncommon in biology — inferences on in biology — inferences on 11 and/or prediction and/or prediction

on µon µYY are are invalidinvalid (uninterpretable, really) unless (uninterpretable, really) unless

we impose the conditioning.we impose the conditioning.

For cases when we center on the joint relation For cases when we center on the joint relation between X and Y, between X and Y, rather than predicting Y from rather than predicting Y from XX, we need a different statistical model., we need a different statistical model.

Page 106: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 106

Bivariate ModelBivariate Model

DEF’NDEF’N: The : The BIVARIATE RANDOM SAMPLING BIVARIATE RANDOM SAMPLING MODELMODEL views the pairs (X views the pairs (Xii,Y,Yii) as joint random ) as joint random

variables, sampled from a popl’n of pairs with variables, sampled from a popl’n of pairs with means µmeans µXX, µ, µYY, SD’s , SD’s XX, , YY, and a , and a population population

correlationcorrelation parameterparameter, , ..

In this model, –1 ≤ In this model, –1 ≤ ≤ 1 such that ≤ 1 such that measures measures the the level of dependencelevel of dependence between X and Y: between X and Y:• ±1 ±1 X & Y highly dependent/related X & Y highly dependent/related

• 0 0 X & Y independent X & Y independent

Page 107: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 107

Sample CorrelationSample Correlation To estimate To estimate we use the we use the SAMPLE SAMPLE

CORRELATION COEFFICIENTCORRELATION COEFFICIENT

Computing formula:Computing formula:

r = (xi - x)(yi - y)

i=1

n

(xi - x)2i=1

n

(yi - y)2i=1

n

r = xi yi

i=1

n

- 1n xi i=1

n

yi i=1

n

xi2

i=1

n

- 1n( xii=1

n

)2 yi2

i=1

n

- 1n( yii=1

n

)2

Page 108: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 108

Properties of Properties of rr

Properties of the sample correlation coeffi-Properties of the sample correlation coeffi-cient, cient, rr ::•

• as n as n ∞, E[ ∞, E[rr] ≈ ] ≈ • related to LS regression coeffs.: brelated to LS regression coeffs.: b11 = = rr SSYY/S/SXX

• test of Htest of Hoo::11 = 0 numerically equivalent to test = 0 numerically equivalent to test

of Hof Hoo:: = 0 = 0

use use

• see plotted illustrations in Fig. 12.15see plotted illustrations in Fig. 12.15

r = ± r 2

ts = b1SE(b1)

= r n-21 - r 2

Page 109: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 109

Figure 12.15Figure 12.15

Page 110: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 110

Example 12.27Example 12.27Ex. 12.27Ex. 12.27 (from Ex. 12.19): Is calcium in blood (from Ex. 12.19): Is calcium in blood related to blood pressure?related to blood pressure?

Y = calcium conc. in blood plateletsY = calcium conc. in blood plateletsX = b.p. (avg. of diastolic & systolic)X = b.p. (avg. of diastolic & systolic)

Page 111: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 111

Example 12.27 (cont’d)Example 12.27 (cont’d)

We are told that We are told that

So,So,

estimates the population correlation estimates the population correlation ..

(xi - x)(yi - y)i=1

n

= 2792.5

(xi - x)2i=1

n

= 2397.5 and (yi - y)2i=1

n

= 9562.97

r = 2792.5(2397.5)(9562.97)

= 0.5832

Page 112: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 112

Regression vs. CorrelationRegression vs. Correlation

The best way to contrast regression The best way to contrast regression and correlation is to:and correlation is to:

• use (conditional) regression analysis use (conditional) regression analysis when when predictionprediction of Y from X is of Y from X is desired, butdesired, but

• use correlation analysis when use correlation analysis when associationassociation between Y and X is between Y and X is under study.under study.

Page 113: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 113

Bivariate Normal ModelBivariate Normal Model

We can build 1 – We can build 1 –  conf. intervals on conf. intervals on if if we extend the model to include bivariate we extend the model to include bivariate normality.normality.

Assume Y ~ N(µAssume Y ~ N(µYY,,YY22), X ~ N(µ), X ~ N(µXX,,XX

22), with ), with

CorrCorr(X,Y) = (X,Y) = ..

Unfortunately, there is no easy way to Unfortunately, there is no easy way to build good intervals directly on build good intervals directly on . Instead, . Instead, we transform between different scales for we transform between different scales for

Page 114: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 114

Fisher Z-TransformFisher Z-Transform

DEF’NDEF’N: The : The FISHER Z-TRANSFORMFISHER Z-TRANSFORM is is

with with INVERSE Z-TRANSFORMINVERSE Z-TRANSFORM

Under the bivariate normal model, Under the bivariate normal model,

Z(r) = 12

ln 1 + r1 - r

r = e2Z - 1e2Z + 1

Z(r) ~ N(0 , 1n-3

)

Page 115: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 115

Confidence Interval on Confidence Interval on Using the Z-transform, we can build a conf. Using the Z-transform, we can build a conf.

interval on Z(interval on Z():):

Then, invert this into a 1 – Then, invert this into a 1 – conf. intv’l on conf. intv’l on ::

Z(r) ± z/21

n-3

exp{2 Z(r) - z/21

n-3 } - 1

exp{2 Z(r) - z/21

n-3 } + 1

< <

exp{2 Z(r) + z/2

1n-3

} - 1exp{2 Z(r) + z/2

1n-3

} + 1

Page 116: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 116

Example 12.30Example 12.30

Ex. 12.30Ex. 12.30 (12.27 cont’d): Calcium/b.p. data. (12.27 cont’d): Calcium/b.p. data.n = 38 with n = 38 with rr = 0.5832. The Z-transform gives = 0.5832. The Z-transform gives

So, a 95% conf. interval on Z(So, a 95% conf. interval on Z() is) is

Don’t stop here!Don’t stop here!

Z(.5832) = 12 ln 1.5832

0.4168 = 1

2 ln(3.7985)

= 1.33462

= 0.6673

0.6673 ± z.025135

= 0.6673 ± (1.96)(0.169)

= 0.6673 ± 0.3313 or .3360 < Z(r) < .9986.

Page 117: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 117

Example 12.30 – Conf. LimitsExample 12.30 – Conf. Limits

Now apply the inverse Z-transform:Now apply the inverse Z-transform:

So, report 0.32 < So, report 0.32 < < 0.76. < 0.76.

lower limit on is e2(.3360) - 1e2(.3360) + 1

= 1.958 - 11.958 + 1

= 0.9582.958

= 0.32

upper limit on is e2(.9986) - 1e2(.9986) + 1

= 7.368 - 17.368 + 1

= 6.3688.368

= 0.76

Page 118: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 118

Notes on Notes on rr

Some final notes on Some final notes on rr::• Always plot the data!Always plot the data! Why? Because Why? Because

• rr is VERY sensitive to extreme observations is VERY sensitive to extreme observations and outliers (see Fig. 12.19 and outliers (see Fig. 12.19 ), so BE ), so BE CAREFUL!CAREFUL!

• rr is also known as the Pearson Product- is also known as the Pearson Product-Moment Correlation Coefficient.Moment Correlation Coefficient.

• A distribution-free version of A distribution-free version of rr exists, exists, known as known as Spearman’s Rank Correlation Spearman’s Rank Correlation CoefficientCoefficient..

Page 119: Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC © 2010, University of South Carolina. All.

STAT205 – Elementary Statistics for the Biological and Life Sciences 119

Figure 12.19Figure 12.19