1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph...

64
1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane Cerise Statistics and Data Analysis Chapter 14 December 13, 2007

Transcript of 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph...

Page 1: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

1

Nonparametric Statistical Methods

Svetlana StoyanchevLuke SchordineLi OuyangValencia JosephMinghui Lu Rachel Merrill Kleva CostaMichael JohnesJane Cerise

Statistics and Data Analysis

Chapter 14December 13, 2007

Page 2: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

3

Why use nonparametric methods?

Lawyers’ income (http://www.nalp.org/)

Make very few assumptions about the data distribution•Ordinal scale•Not all data is normally distributed

Page 3: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

4

Inference on a single sample: Sign Test

Use median μ as a measure instead of mean:

s+ = the number of xi ’s that exceed μ0 s- = n - s+

Reject H0 if s+ is large (or s- is small) How large should s+ be in order to reject H0 at a

given significance ?

H0: μ = μ0 H1: μ > μ0

Mean

Media

n

Page 4: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

5

Random sample: X1, X2, … Xn with median μ0

Let: Prob( Xi> μ0 ) = p and Prob( Xi< μ0 ) = 1 - p

H0: μ = μ0 H1: μ > μ0

H0: p = 1/2 H1: p > 1/2

S+ ~ Bin ( n, p )S- ~ Bin ( n, 1 - p)

Apply the test of binomial proportion from Chapter 9!

Inference on a single sample: Sign Test

(S+, S- are Random Variable when H0 is true)

Page 5: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

6

Rejection Criterion for H0:

One-sided test:

H0: μ = μ0 H1: μ > μ0 (or μ < μ0)

Two-sided test:

H0: μ = μ0 H1: μ != μ0

or

Inference on a single sample: Sign Test

Smin = min(S+,S-) Smax = max(S+,S-)

0 <= s- <= s+ <= n

Page 6: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

7

When n > 20, distribution of S+ and S- can be approximated by normal distribution with

Can use Z-test with Z statistic:

Reject H0 when:

or

Inference on a single sample: Sign Test

Page 7: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

8

Confidence interval for μOrdered data values:

Confidence interval for μ with prob:

-level CI for μ: where

198.0 199.0 200.5 200.8 201.3 202.5 202.2 203.4 203.7 206.3

H0: μ = 200 H1: μ != 200

/2 = 0.011 = 1 – 0.022 = 0.978

The lower 1.1% critical point is 1 (from table A1 n=10, p=.5) The upper 1.1% critical point is 9 (by symmetry)

97.8% CI = [199, 203.7]

Compute 95% CI for the temperature measurements and hypothesis. Because of discreteness can not find exact 95 % CI

Let

Page 8: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

10 10

14.1.2: Wilcoxon Signed Rank Test

Designed by Frank Wilcoxon (1892-1965) to improve on the Sign Test

Takes into account whether xi is greater or lesser than µ̃L, and also the difference

di=xi-µ̃�0.

Page 9: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

11

Frank Wilcoxon: The Man Behind the TestBorn in Ireland, grew up in Catskills in

New YorkEarned B.S. at Penn. Military Academy,

master’s at Rutgers, Ph.D. at Cornell,

all in chemistryWorked as a research scientist at

several laboratoriesBecame interested in statistical methods

after reading R.A. Fisher’s Statistical

Methods for Research WorkersIn response to Fisher’s Student’s T-tests, he developed non-parametric tests for paired and unpaired data sets Source: http://www.wikipedia.org

Page 10: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

12

Wilcoxon Signed Rank Test

Wilcoxon’s paired sample test, it assumes symmetry about the median.

Assigns a rank to each difference di based on its absolute value, |di| = ri

Take the sums of the positive and negative deviations (W+ and W -, respectively), with the smallest deviation receiving first rank E (w+) = E(Σ i*Zi)

= E(1Z1+2Z2+…+nZn) = E(1Z1)+E(2Z2)+…+E(nZn) = 1E(Z1) + 2E(Z2)+…+nE(Zn), [ E(Z1)= E(Z2)=…=E(Zn) ] = 1E(Z1) + 2E(Z1)+…+nE(Z1) = (1+2+3+…+n) E(Z1)

1

n

i

i

W iZ

Page 11: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

13

Wilcoxon Signed Rank Test The actual test utilizes the Z distribution:

( 1)1/ 2

4

n n

( 1)(2 1)

24

n n n

( 1)w 1/ 2

4( 1)(2 1)

24

n n

n n nz

Page 12: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

14

Wilcoxon Signed Rank Test

Use a two-tailed Z-test with H0: μ0=0.

Reject if Zα/2 ≤ Z0

There are advantages:Uses ranked difference, not just differenceLeads to increased power

And disadvantages:Symmetry is assumed but may not be trueLeads to increased Type I error

Page 13: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

15

1 2 3 4 5 6 7

iSubj.  XA   XB 

di

XA—XB

|di|XA—XB

ri

XA—XB Wi

1 2 3 4 5 6 7 8 9 

10 11 12 13 14 15 16 

78246445645230506450782284409072

78246248685625445640683668205832

0   0   

+2   —3   —4   —4   +5   +6   +8   

+10   +10   —14   +16   +20   +32   +40   

0   0   2   3   4   4   5   6   8   

10   10   14   16   20   32   40   

------12

3.53.5567

8.58.51011121314

------+1—2

—3.5—3.5+5+6+7

+8.5+8.5—10+11+12+13+14

W = 67.0 TN = 14 

Adapted from http://faculty.vassar.edu/lowry/ch12a.html1

n

i

i

W W

Page 14: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

16

Wilcoxon Signed Rank Test: Example

For the present example, with N=14, W=67, and σw=±31.86, the result is:

067 0.5

2.0931.86

Z

00.5

W

WZ

(Subtracting 0.5 is a correction factor, due

to the fact that W is greater than μw=0)

Since Z0>Zα/2=1.96, reject the null hypothesis.

Page 15: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

17

Wilcoxon Signed Rank Test: Confidence Interval

To get a 95% CI for the mean:Take pairwise averages of the data:

17

2

i jij

x xX

Order these Walsh Averages from greatest to least

(1-α) level CI: ( 1) ( )w N wX X

Page 16: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

18 18

14.2 Inferences for Two Independent Samples

By-Li Ouyang

Page 17: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

19 19

Problem: One population larger than another population.How to solve?two equivalent nonparametric tests-Wilcoxon, Mann and Whitney Test

14.2.1 Wilcoxon-Mann-Whitney Test1st-the Wilcoxon rank sum testAssumption: no ties in the two samples: x1,x2, …, xn1

and y1, y2, ,,,,,yn2.

1. Rank all N = n1 + n2 observations in ascending order.2. Denote w1= sum the rank of x’s. w2= sum the rank of y’s. Ranks range over the integers 1, 2, ….., N. We have,

3. Reject H0 if w1 is large or if w2 is small.Note: At significant level α, n1 ≠ n2, distributions of W1 ≠ W2.

1 2

( 1)1 2

2

N Nw w N

Page 18: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

20 20

2nd-the Mann and Whitney U-test1.Compare xi with yi. u1= number of pairs xi > yi u2= number of pairs xi < yi and u1 + u2 = n1n2.2.Reject H0 if u1 is large or u2 is small.

Two Test Statistics are related as follows:

Advantage of the Mann-Whitney test:Same distribution (whether u1 or u2) & Distribution range : [0, n1n2 ]

P- value=P{U≥ u1}=P{U≤ u2}At significant level α, we reject H0 if P-value ≤ α or u1 ≥ un1, n2,α .

Denote: un1, n2,α - the upper α critical point.

1 1 2 21 1 2 2

( 1) ( 1),

2 2

n n n nu w u w

Page 19: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

21 21

For large n1 and n2, the null distribution of U is Normal distributed.

Z-test(Large Sample)Test Statistic:

We reject H0 at significant level α, if z ≥ zα

Or

Two-sided test,Test Statistics: umax= max(u1 , u2) or umin = min (u1 , u2)

P-value = 2P{U ≥ umax}=2P{U ≤ umin }

1 2 1 2 ( 1)( ) , ( )

2 12

n n n n NE U Var U

1 21 1

1 2

1 1( )

2 2 2( 1) ( )

12

n nu u E U

Zn n N Var U

1, 2,

1 2 1 21

( 1)1

2 2 12 n n

n n n n Nu z u

Page 20: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

22 22

Example : Failure Times of Capacitors (Wilcoxon-Mann-Whitney Test)18 capacitors, 8 under control group and 10 under stressed groupPerform the Wilcoxon-Mann-Whitney test to determine if thermal stress significantly reduces the time to failure of capacitors. α = 0.05.

n1 = 8 n2 = 10The rank sums arew1 = 4+8+10+11+13+14+17+18 = 95w2 = 1+2+3+5+6+7+9+12+15+16 = 76

Times to Failure for Two Capacitor Groups Ranks of Times to Failure

Control Group Stressed Group Control Group Stressed Group5.2 17.1 1.1 7.2 4 13 1 78.5 17.9 2.3 9.1 8 14 2 99.8 23.7 3.2 15.2 10 17 3 12

12.3 29.8 6.3 18.3 11 18 5 15    7.0 21.1     6 16

1 11 1

2 22 2

( 1) (8)(9)95 59

2 2( 1) (10)(11)

76 212 2

n nu w

n nu w

Page 21: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

23 23

Let F1 be c.d.f of the control group. F2 be c.d.f of the stressed group.Check that u1 + u2= n1n2=80.From Table A.11 P-Value=0.051Large sample Z-test:

Conclusion: yields the P-Value= 1- Ф(1.643) =0.0502

0 1 2 1 1 2: . :H F F vs H F F

1 1 2

1 2

/ 2 1/ 2

( 1)

1259 (8)(10) / 2 1/ 2

1.643(8)(10)(19)

12

u n nZ

n n N

Table A.11 Upper-Tail Probabilities of the Null Distribution of the Wilcoxon-

Mann-Whitney Statistic

n1 n2 w1 u1

P(W≥w1)=P(U≥u1)

8 8 84 48 0.052

  8 87 51 0.025

  8 90 54 0.010

  8 92 56 0.005

  9 89 53 0.057

  9 93 57 0.023

  9 96 60 0.010

  9 98 62 0.006

  10 95 59 0.051

  10 98 62 0.027

  10 102 66 0.010

  10 104 68 0.006

Page 22: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

24 24

Null distribution of the Wilcoxon-Mann-Whitney Test Statistic

Two r.v’s, X and Y, with c.d.f’s F1 and F2,respectively.

Assumption: Under H0, all N= n1 + n2 observations

come from the common distribution F1 = F2. Therefore, all possible orderings of these observations with n1 coming from F1 and n2 coming from F2 are equally likely. There are: For example, Total orderings. n1 = 2, n2 = 3.

All possible Orderings of n1 = 2, n2 = 3

Ranksw1 u1

Ranksw1 u1

1 2 3 4 5 1 2 3 4 5

x x y y y 3 0 y x y x y 6 3

x y x y y 4 1 y x y y x 7 4

x y y x y 5 2 y y x x y 7 4

x y y y x 6 3 y y x y x 8 5

y x x x y 5 2 y y y x x 9 6

Null Distribution of W1 and U1(n1=2&n2=3)

w1 u1 P(W1 - w1) = P (U1 =u1)

3 0 0.14 1 0.15 2 0.26 3 0.27 4 0.28 5 0.19 6 0.1

1 1 2

!

! !

N N

n n n

510

2

Page 23: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

25 25

14.2.2 Wilcoxon-Mann-Whitney Confidence IntervalAssumption: F1and F2belong to a location parameter family with location parameters θ1 and θ2 .(θ1&θ2 :respective population medians) F1(x)=F(x - θ1), and F2(y) =F(y - θ2) F: common unknown distribution function

How to calculate CI for θ1 –θ2 ?Step1: Calculate all N= n1n2 pairwise differences d ij = xi -yj (1≤i≤n1, 1 ≤j≤ n2)And rank them: d(1) ≤ d(2) ≤....≤ d(N)

Step 2:Lower α/2 critical point u = un1, n2,1-α/2

The 100(1-)% CI for is given by d(u+1) ≤ θ1 - θ2 ≤ d(N-u)

Page 24: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

26 26

Example: Find 95% CI for the difference between the median failure times of the control group and thermally stressed group of capacitors.n1 =8, n2 =10, N= n1n2 =80The lower 2.2% critical point of the distribution of U is 17 The upper 2.2% critical point of the distribution of U is 80-17=63α/2=0.022 -> 1-α=1-0.044=0.956Therefore, [d(18) ,d(63) ] =[-1.1,14.7] is a 95.6% CI .

Differences dij = xi -yj between two group

xiyi

1.1 2.3 3.2 6.3 7.0 7.2 9.1 15.2 18.3 21.1

5.2 4.1 2.9 2.0 -1.1 -1.8 -2.0 -3.9 -10.0 -13.1 -15.98.5 7.4 6.2 5.3 2.2 1.5 1.3 -0.6 -6.7 -9.8 -12.69.8 8.7 7.5 6.6 3.5 2.8 2.6 0.7 -5.4 -8.5 -11.3

12.3 11.2 10.0 9.1 6.0 5.3 5.1 3.2 -2.9 -6.0 -8.817.1 16.0 14.8 13.9 10.8 10.1 9.9 8.0 1.9 -1.2 -4.017.9 16.8 15.6 14.7 11.6 10.9 10.7 8.8 2.7 -0.4 -3.223.7 22.6 21.4 20.5 17.4 16.7 16.5 14.6 8.5 5.4 2.6

29.8 28.7 27.5 26.6 23.5 22.8 22.6 20.7 14.6 11.5 8.7

Table A.11 n1 n2 u1(80-u1) P(W≥w1)

8 10 59(80-59=21) 0.051

  10 62(80-62=18) 0.027

  10 63(80-63=17) 0.22

  10 66(80-66=14) 0.01

  10 68(80-68=12) 0.006

Page 25: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

27

Example Using SAS

Two Groups A & B Both groups are exposed to a chemical that encourages tumor growth Group B has been treated with a drug to prevent tumor formation

The masses (in grams) of tumors in each group are

Group A: 3.1 2.2 1.7 2.7 2.5

Group B: 0.0 0.0 1.0 2.3

We want to see if there are any differences in tumor mass between group A & B.

Thus we will use the Wilcoxon test. Puts all the data in increasing order Calculate the rank

Mass: 0.0 0.0 1.0 1.7 2.2 2.3 2.5 2.7 3.1

Group: B B B A A B A A A

Rank: 1.5 1.5 3 4 5 6 7 8 9

Page 26: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

28

SAS Program

Data Tumor;Input Group $ Mass @@;Datalines;A 3.1 A 2.2 A 1.7 A 2.7 A 2.5B 0.0 B 0.0 B 1.0 B 2.3;

Proc NPAR1WAY data= Tumor Wilcoxon;Title "Non Parametric Test to Compare Tumor Masses";Class Group;Var Mass;Exact Wilcoxon;run;

proc univariate data=tumor normal plot;Title "More Descriptive Statistics";Class group;Var Mass;run;

Page 27: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

29

The NPAR1WAY Procedure

Wilcoxon Scores (Rank Sums) for Variable Mass Classified by Variable Group

Group N Sum of Squares Expected Under HO Std Dev Under HO

Mean Score

A 5 33.0 25.0 4.065437 6.60

B 4 12.0 20.0 4.065437 3.00

Wilcoxon Two-Sample Test

StatisticsNormal ApproximationZ One-Sided Pr < Z Two-Sided Pr > |Z| t ApproximationOne-Sided Pr < Z Two-Sided Pr > |Z| Exact TestOne-Sided Pr <= S

Two-Sided Pr >= |S - Mean|

12.0000

-1.8448 0.0325 0.0651

0.0511 0.1023 0.0317 0.0635

Z includes a continuity correction of 0.5.

Kruskal-Wallis Test

Chi-Square 3.8723

DF 1

PR>Chi-Square 0.0491

Page 28: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

30

The Univariate Procedure

Tests for Location: Mu0=0

Tests Statistic p Value

Student’s t T 10.3479 Pr > |t| 0.0005

Sign M 2.5 Pr >= |M| 0.0625

Signed Rank S 7.5 Pr >= |S| 0.0625

Page 29: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

32

INFERENCES FOR SEVERAL INDEPENDENT SAMPLES

-The Kruskal-Wallis test is a -The Kruskal-Wallis test is a generalization of the Wilcoxon-generalization of the Wilcoxon-Mann-Whitney test for a ≥ 2 Mann-Whitney test for a ≥ 2 independent samplesindependent samples-It is also a nonparametric -It is also a nonparametric alternative to the ANOVA F-test alternative to the ANOVA F-test for a one-way layoutfor a one-way layout

Page 30: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

33

The steps to the test:1) First rank all N values from smallest to

largest. And take the average rank of #’s with equal values using the formula (N+1)/2

2) Calculate rank sums ri= ∑j=1rij and averages ṝi = ri/ni, i=1, 2, …, a.

3) Calculate the test statistic kw = =

4) Reject H0 for large values of kw ( if kw > )

Page 31: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

34

The Pedagogy Problem

Consider Example 14.9 on page 581 of the text, in which four methods of teaching the concept of percentage to sixth graders are compared. There are 28 classes, 7 using each method: the Case Method, the Formula Method, the Equation Method, and the Unitary Analysis Method.

Page 32: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

35

DATA Test_Score; INPUT Method $ Score @@;DATALINES;C 14.59 C 23.44 C 25.43 C 18.15 C 20.82 C 14.06 C 14.26F 20.27 F 26.84 F 14.71 F 22.34 F 19.49 F 24.92 F 20.20E 27.82 E 24.92 E 28.68 E 23.32 E 32.85 E 33.90 E 23.42U 33.16 U 26.93 U 30.43 U 36.43 U 37.04 U 29.76 U 33.88;

PROC NPAR1WAY DATA=Test_Score WILCOXON;CLASS Method;VAR Score;*EXACT WILCOXON;RUN;

The Program

Page 33: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

36

A Note About the ProgramYou might have noticed the asterisk in the program line:

*EXACT WILCOXON

The asterisk turns the line into a comment. Otherwise, SAS attempts to find an exact p-value for the test, and it can take a very long time. Otherwise, this command would be highly recommended. We’ll settle for a quicker approximation.

Page 34: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

37

The Output The NPAR1WAY Procedure

Wilcoxon Scores (Rank Sums) for Variable Score Classified by Variable Method

Sum of Expected Std Dev Mean Method N Scores Under H0 Under H0 Score C 7 49.00 101.50 18.845498 7.000000 F 7 66.50 101.50 18.845498 9.500000 E 7 125.50 101.50 18.845498 17.928571 U 7 165.00 101.50 18.845498 23.571429

Average scores were used for ties.

Kruskal-Wallis Test

Chi-Square 18.1390 DF 3 Pr > Chi-Square 0.0004

Page 35: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

38

A Note About the Output

We see that the value of kw is 18.1390, a value large enough to yield an approximate p-value of 0.0004... an extremely small value. At a level of significance of 5%, or even 1%, there is a strong suggestion that the methods are not equally effective, and that the Unitary Analysis Method seems to be the best choice.

Page 36: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

39

Use this to check for differences between treatment groups.

Test Statistic: ṝi - ṝj (the difference in their rank avg. For large n’s, Ri – Rj is approximately normally

distributed. Therefore Zij =

Treatments i and j are different if |Zij|> qa,∞,α

Page 37: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

40

INFERENCES FOR SEVERAL MATCHED SAMPLES

The Friedman Test is a generalization of the sign test for a ≥2 matched samples

It is also a nonparametric alternative to the ANOVA F-Test for a randomized block design

Since this is use for a block design, rankings are done within each individual group.

The steps for the test: Rank observations from a treatments separately within each

block. Where needed take the average of equal ranking values using (N+ 2) /2.

Page 38: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

41

14.4.1 Friedman Test

Example 14.11

Ryan and Joiner give data on the percentage drip loss in meat loaves. The goal was to compare the eight oven positions, which might differ due to temperature variations. Three batches of eight loaves were baked. The loaves from each batch were randomly placed in the eight positions.

Analyze the data using the Friedman test.

Here the oven positions are treatments and batches are blocks.

Page 39: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

42

14.4.1 Friedman TestExample 14.11, SASdata meatloaf; input ovenbatch ovenposition driploss @@; datalines;1 1 7.33 1 2 3.22 1 3 3.28 1 4 6.441 5 3.83 1 6 3.28 1 7 5.06 1 8 4.442 1 8.11 2 2 3.72 2 3 5.11 2 4 5.782 5 6.50 2 6 5.11 2 7 5.11 2 8 4.283 1 8.06 3 2 4.28 3 3 4.56 3 4 8.613 5 7.72 3 6 5.56 3 7 7.83 3 8 6.33 ;proc rank data=meatloaf out=rankings; by ovenbatch; var driploss; ranks drip; run;proc print data=rankings; run;proc means data=rankings sum; class ovenposition;var drip;run;proc freq data=rankings; tables ovenbatch*ovenposition*driploss /cmh2; run;proc freq data=meatloaf; tables ovenbatch*ovenposition*driploss /cmh2 scores=rank; run;

The Friedman test is identical to the ANOVA CMH statistic when the analysis uses rank scores (SCORES=RANK)

Page 40: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

43

14.4.1 Friedman Test

Example 14.11, SAS results

Obs ovenbatch ovenposition driploss drip

1 1 1 7.33 8.0

2 1 2 3.22 1.0

3 1 3 3.28 2.5

4 1 4 6.44 7.0

5 1 5 3.83 4.0

6 1 6 3.28 2.5

7 1 7 5.06 6.0

8 1 8 4.44 5.0

9 2 1 8.11 8.0

10 2 2 3.72 1.0

11 2 3 5.11 4.0

12 2 4 5.78 6.0

13 2 5 6.50 7.0

14 2 6 5.11 4.0

15 2 7 5.11 4.0

16 2 8 4.28 2.0

17 3 1 8.06 7.0

18 3 2 4.28 1.0

19 3 3 4.56 2.0

20 3 4 8.61 8.0

21 3 5 7.72 5.0

22 3 6 5.56 3.0

23 3 7 7.83 6.0

24 3 8 6.33 4.0

Analysis Variable : drip Rank for Variable driploss

ovenposition N

Obs Sum

1 3 23.0000000

2 3 3.0000000

3 3 8.5000000

4 3 21.0000000

5 3 16.0000000

6 3 9.5000000

7 3 16.0000000

8 3 11.0000000

Summary Statistics for ovenposition by drip

Controlling for ovenbatch

Cochran-Mantel-Haenszel Statistics (Based on Table Scores)

Statistic Alternative Hypothesis DF Value Prob

1 Nonzero Correlation 1 0.1488 0.6997

2 Row Mean Scores Differ 7 17.9393 0.0122

Total Sample Size = 24

Page 41: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

44

Calculate the Friedman Statistic: fr = Reject H0 for large values of fr.The distribution of this test can be approximated

by the Thus reject H0 if fr >

It is similar to the Kruskal- Wallis test|ri - rj|>

=

Page 42: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

46

Rank Correlation Methods

Pearson Correlation Coefficient ρ measures only the degree of

linear association between random variables which are normally distributed, it can not deal with nonlinear case.

Spearman’s Rank Correlation Coefficient ρs and Kendall’s Rank

Correlation Coefficient τ measure the degree of monotone (increasing or decreasing) association between two variables.

Extreme (1 or -1) correlation does not imply a cause—effect relationship.

Zero correlation does not imply independence. A “strong” correlation is not necessarily statistically significant, and

vice versa.

Page 43: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

47

Researchers at the European Centre for Road Safety Testing are trying to find out how the age of cars affects their braking capability. They test a group of ten cars of differing ages and find out the minimum stopping distances that the cars can achieve. The results are set out in the table below:

Car Age(months)

Xi

Mini Stopping at 40 kph (metres)

Yi

Age Rank

(ui)

Stopping Rank

(vi)

Differences of the Ranks

(di = ui-vi)

A 9 28.4 1 1 0

B 15 29.3 2 2 0

C 24 37.6 3 7 -4

D 30 36.2 4 4.5 -0.5

E 38 36.5 5 6 -1

F 46 35.3 6 3 3

G 53 36.2 7 4.5 2.5

H 60 44.1 8 8 0

I 64 44.8 9 9 0

J 76 47.2 10 10 0

d2=32.5

14.5.1 Spearman’s Rank Correlation Coefficient

Page 44: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

48

14.5.1 Spearman’s Rank Correlation Coefficient• Ho: X and Y are independent => ρs = 0

• Ha: X and Y are positive (monotone) associated <=> ρs > 02

12

6 (6)(32.5)1 1 0.803

( 1) (10)(99)

n

iis n n

dr

P-value = 0.0081

For large samples n≥10, rs~ Normal (0, 1/(n-1))

409.29803.01 nrz s

Since -1<rs<1, rs=0.803 indicate a strong positive association between car ages and minimum stopping distance; in other words, the older the car, the longer the distance we could expect it to take to stop.

Page 45: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

49

Car Age(months)

Xi

Mini Stopping at 40 kph (metres)

Yi

Concordant

Pairs (Nci)

Discordant

Pairs (Ndi)Tie Pairs (Nti)

A 9 28.4 9 0 0

B 15 29.3 8 0 0

C 24 37.6 3 4 0

D 30 36.2 4 1 1

E 38 36.5 3 2 0

F 46 35.3 4 0 0

G 53 36.2 3 0 0

H 60 44.1 2 0 0

I 64 44.8 1 0 0

J 76 47.2 0 0 0

Nc=37 Nd=7 Nt=1

Nci=#{j>i: xj>xi and yj>yi}Ndi=#{j>i: xj>xi and yj<yi}Nti=#{j>i: xj=xi or yj=yi}

14.5.2 Kendall’s Rank Correlation Coefficient

N

NN dcdc

Where N=Nc + Nd + Nt

Page 46: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

50

14.5.2 Kendall’s Rank Correlation Coefficient

• Ho: X and Y are independent => τ = 0

• Ha: X and Y are positively associated <=> τ > 0

37 70.67

( )( ) (45 0)(45 1)c d

x y

N N

N T N T

9 ( 1) (9)(10)(9)0.67 2.697

2(2 5) 2(25)

n nz

n

P-value=0.00355

For Large samples n≥10,2(2 5)

~ (0, )9 ( 1)

nNormal

n n

Tie pairs:

1

2 0jg

xj

aT

1

22 2 1

jh

yj

bT

Page 47: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

51

Kendall τ and Spearman ρs imply different interpretations: While Spearman ρs can be thought of as the regular Pearson ρ but computed just from ranks of variables, Kendall τ rather represents a probability.

Spearman’s rank correlation coefficient is related to Kendall’s coefficient of concordance, by rs=2w-1 when a=2

A piece of SAS code: PROC CORR DATA=CAR SPEARMAN KENDALL;

Which will generate the correlation coefficients by just click a way!

Page 48: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

52

14.5.3 Kendall’s Coefficient of Concordance

This measures the degree to which many judges agree on the ranking of several subjects, suppose there were three employers ranking several candidates for a job, you get the following data:

Candidate a b c d e f---------------------------------------------Judge A 1 6 3 2 4 5---------------------------------------------Judge B 1 5 6 4 2 3---------------------------------------------Judge C 6 3 2 5 4 1---------------------------------------------Rank Sum 8 14 11 11 10 8

• Ho: Random assignment of ranks by the judges Judges are in disagreement

• Ha: Not Random assignment of ranks by the judges Judges are in agreement

Page 49: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

53

14.5.3 Kendall’s Coefficient of Concordance

1367.0)16)(3(

05.2

w

a: treatmentsb: blocks ri: sum of ranksfr: Friedman statistic

a

ii ab

fraababr

d

d

ntdisagreeme

agreementw

1

222

max )1(12

)1(/}

2

)1({

07.1105.26305.65

)7)(3(3]9101111148[)7)(3(6

12

)1(3)1(

12

205.0,5

222222

2

abraab

fri

i

0≤w≤1, small values indicating disagreement and large values indicating agreement

Conclusion: We can not reject Null hypothesis, all employers give different rankings to the same candidates.

Page 50: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

54

14.5 Rank Correlation Methods

Examples 14.12 and 14.13

Data are given on the yearly alcohol consumption from wine in liters per person

and yearly heart disease deaths per 100,000 people for 19 countries.

Test if there is an association between these two variables using Spearman’s rank correlation coefficient.

Test if there is an association between these two variables using Kendall’s rank correlation coefficient (Kendall’s tau).

Page 51: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

55

14.5 Rank Correlation Methods

Example 14.12, 14.13, in SASdata wineheart;

input country $ alcohol deaths @@;

datalines;

australia 2.5 211 austria 3.9 167 belgium 2.9 131

canada 2.4 191 denmark 2.9 220 finland 0.8 297

france 9.1 71 iceland 0.8 211 ireland 0.7 300

italy 7.9 107 netherlands 1.8 167 newzealand 1.9 266

norway 0.8 227 spain 6.5 86 sweden 1.6 207

switzerland 5.8 115 uk 1.3 285 us 1.2 199

wgermany 2.7 172

;

proc corr data=wineheart spearman;

run;

proc corr data=wineheart kendall;

run;

2 Variables: alcohol deaths

Simple Statistics

Variable N Mean Std Dev Median Minimum Maximum

alcohol 19 3.02632 2.50972 2.40000 0.70000 9.10000

deaths 19 191.05263 68.39629 199.00000 71.00000 300.00000

Spearman Correlation Coefficients, N = 19 Prob > |r| under H0: Rho=0

alcohol deaths

alcohol 1.00000

-0.82886 <.0001

deaths -0.82886 <.0001

1.00000

Kendall Tau b Correlation Coefficients, N = 19 Prob > |r| under H0: Rho=0

alcohol deaths

alcohol 1.00000

-0.69644 <.0001

deaths -0.69644 <.0001

1.00000

Page 52: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

56

14.5 Rank Correlation Methods

Example

data brakestats;

input car $ age stoppingdistance @@;

datalines;

a 9 28.4 b 15 29.3 c 24 37.6 d 30 36.2 e 38 36.5

f 46 35.3 g 53 36.2 h 60 44.1 i 64 44.8 j 76 47.2

;

proc corr data=brakestats spearman kendall;

run;

2 Variables: age stoppingdistance

Simple Statistics

Variable N Mean Std Dev Median Minimum Maximum

age 10 41.50000 22.11209 42.00000 9.00000 76.00000

stoppingdistance 10 37.56000 6.23773 36.35000 28.40000 47.20000

Spearman Correlation Coefficients, N = 10 Prob > |r| under H0: Rho=0

age stoppingdistance

age 1.00000

0.80244 0.0052

stoppingdistance 0.80244 0.0052

1.00000

Kendall Tau b Correlation Coefficients, N = 10 Prob > |r| under H0: Rho=0

age stoppingdistance

age 1.00000

0.67420 0.0071

stoppingdistance 0.67420 0.0071

1.00000

Page 53: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

57

14.5.3 Kendall’s Coefficient of Concordance

The Kendall’s Coefficient of Concordance is closely related to the Friedman statistic, we can calculate the Coefficient of Concordance once we obtain the Friedman statistic using SAS.

Example: data election;

input judge $ candidate $ candrank @@;

datalines;

a a 1 a b 6 a c 3 a d 2 a e 4 a f 5

b a 1 b b 5 b c 6 b d 4 b e 2 b f 3

c a 6 c b 3 c c 2 c d 5 c e 4 c f 1

;

proc freq data=election;

tables judge*candidate*candrank

/cmh2 scores=rank noprint;

run;

Summary Statistics for candidate by candrank

Controlling for judge

Cochran-Mantel-Haenszel Statistics (Based on Rank Scores)

Statistic Alternative Hypothesis DF Value Prob

1 Nonzero Correlation 1 0.0667 0.7963

2 Row Mean Scores Differ 5 2.0476 0.8425

Total Sample Size = 18

Page 54: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

59 59

Resampling Methods

“Resampling” is generating the sampling distribution by drawing repeated random samples from the observed sample itself. 1

This is useful for assessing the accuracies

(e.g. the bias and standard error) of complex statistics.

• Permutation Test• Bootstrap Method• Jackknife Method

Page 55: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

60 60

Permutation TestDeveloped by R.A. Fisher (1890-1962) and E.J.G. Pitman (1897-1993) in the 1930s. 2

Draws SRS (Simple Random Samples) without replacement

Tests whether two samples X and Y , of size n1 and n2 respectively, are drawn from the same common distribution.

Hypotheses: Ho: Differences between the samples are due to chance.

Ha1: Y tends to have greater values than X , not simply due to chance

Ha2: Y tends to have smaller values than X , not simply due to chance

Ha3: There are differences between X and Y , not due to chance.

This method may be used to compare many different test statistics. To illustrate this method, however, let us consider the permutation test based on the difference between the sample averages.

d y x

Page 56: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

61 61

Permutation TestMethodology

1. Pool the samples in to one group (of size n1 + n2).

2. List all of the possible regroupings of the observations into two groups of size n1 and n2.

3. For each possible regrouping, compute the sample averages and , and then compute the difference, .

4. To assess how “unusual” the original observed difference is, compute a p-value (a proportion) as follows:

For Ha1: p-value = (# of times ) /

For Ha2: p-value = (# of times )/

For Ha3: p-value = (# of times )/

1 2

1

n n

n

ix

iy

d y x i i id y x

id d

id d

1 2

1

n n

n

1 2

1

n n

n

1 2

1

n n

n

id d

Page 57: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

62 62

Bootstrap MethodInvented by B. Efron (1938- )in the 1960s.

Draws a very large number of SRS with replacement (Note the difference from the Permutation Test)

Heavily computer-based method of deriving robust estimates of

standard error of sample statistics.

Page 58: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

63 63

Jackknife Method1

First implemented by R.E. von Mises2 (1883-1953), then developed (separately) by Tukey (1915-2000) and Quenouille in the 1950s.

Resamples by deleting one observation at a time

This method is also useful for estimating the standard error of a statistic, say based on a random sample of size n drawn from some distribution ‘F’.

First, calculate the n values of the statistic denoted by

Let and be the standard deviation of

The jackknife estimate of is given by

1 2( , ,..., )nt t x x x

*1 2 1( , ,..., , ,..., )i i i nt t x x x x x

**

1

ni

i

tt

n

*ts

* * *1 2, ,..., nt t t

( )SE t

** * 2

1

( 1)1( ) ( )

nt

ii

n snjse t t t

n n

Page 59: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

64

14.6 Resampling Methods

SAS can be used to perform permutation, bootstrap, and jackknife resampling.

For the most part macros are required. These can be written

and are also readily available on the web.

PROC MULTTEST can be used to perform several tests incorporating permutation or bootstrap

In the following two examples, We use permutation and bootstrap resampling to obtain t-test p-value adjustment.

Page 60: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

65

14.6 Resampling MethodsExample 14.15 permutation test and

data capacitor;

Input group $ failtime @@;

Datalines;

Control 17.9 control 23.7 control 29.8

Stressed 15.2 stressed 18.3 stressed 21.1

;

Proc multtest data=capacitor permutation nsample=25000

out=results outsamp=samp;

test mean(failtime /lower);

class group;

contrast 'a vs b' -1 1;

Run;

proc print data=samp(obs=18);

run;

proc print data=results;

run;

The PERMUTATION option in the PROC MULTTEST statement requests permutation resampling, and NSAMPLE=25 000 requests 25000 permutation samples. The OUTSAMP=SAMP option creates an output SAS data set containing the 25000 permutation samples.

The TEST statement specifies the t-test for T. The test is lower-tailed. The grouping variable in the CLASS statement is group, and the coefficients across the groups are -1 and 1, as specified in the CONTRAST statement. (See Chapter 12)

PROC PRINT displays the first 18 observations of the Res data set containing the bootstrap samples.

Page 61: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

66

14.6 Resampling Methods

Obs _sample_ _class_ _obs_ Failtime

1 1 control 6 21.1

2 1 control 5 18.3

3 1 control 3 29.8

4 1 stressed 2 23.7

5 1 stressed 1 17.9

6 1 stressed 4 15.2

7 2 control 5 18.3

8 2 control 2 23.7

9 2 control 6 21.1

10 2 stressed 3 29.8

11 2 stressed 4 15.2

12 2 stressed 1 17.9

13 3 control 2 23.7

14 3 control 1 17.9

15 3 control 6 21.1

16 3 stressed 4 15.2

17 3 stressed 3 29.8

18 3 stressed 5 18.3

Model Information

Test for continuous variables Mean t-test

Tails for continuous tests Lower-tailed

Strata weights None

P-value adjustment Permutation

Center continuous variables No

Number of resamples 25000

Seed 356405001

Contrast Coefficients

Contrast

group

control stressed

a vs b -1 1

Continuous Variable Tabulations

Variable group NumObs Mean Standard Deviation

failtime control 3 23.8000 5.9506

failtime Stressed

3 18.2000 2.9513

p-Values

Variable Contrast Raw Permutation

failtime a vs b 0.1090 0.1474

Page 62: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

67

14.6 Resampling MethodsExample 14.17 bootstrap testdata capacitor;

Input group $ failtime @@;

Datalines;

control 17.9 control 23.7 control 29.8

stressed 15.2 stressed 18.3 stressed 21.1

;

Proc multtest data=capacitor bootstrap nsample=25

outsamp=res nocenter out=outboot;

test mean(failtime /lower);

class group;

contrast 'a vs b' -1 1;

Run;

proc print data=res(obs=18);

run;

proc print data=outboot;

run;

The BOOTSTRAP option in the PROC MULTTEST statement requests bootstrap resampling, and NSAMPLE=25 requests 25 bootstrap samples. The OUTSAMP=RES option creates an output SAS data set containing the 25 bootstrap samples.

The TEST statement specifies the t-test for T. The test is lower-tailed. The grouping variable in the CLASS statement is group, and the coefficients across the groups are -1 and 1, as specified in the CONTRAST statement. (See Chapter 12)

PROC PRINT displays the first 18 observations of the Res data set containing the bootstrap samples.

Page 63: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

68

14.6 Resampling Methods

Obs _sample_ _class_ _obs_ failtime

1 1 control 6 21.1

2 1 control 6 21.1

3 1 control 6 21.1

4 1 stressed 2 23.7

5 1 stressed 1 17.9

6 1 stressed 6 21.1

7 2 control 4 15.2

8 2 control 6 21.1

9 2 control 2 23.7

10 2 stressed 1 17.9

11 2 stressed 3 29.8

12 2 stressed 6 21.1

13 3 control 2 23.7

14 3 control 4 15.2

15 3 control 3 29.8

16 3 stressed 3 29.8

17 3 stressed 3 29.8

18 3 stressed 1 17.9

Model Information

Test for continuous variables Mean t-test

Tails for continuous tests Lower-tailed

Strata weights None

P-value adjustment Bootstrap

Center continuous variables No

Number of resamples 25

Seed 270752001

Contrast Coefficients

Contrast

group

control stressed

a vs b -1 1

Continuous Variable Tabulations

Variable group NumObs Mean Standard Deviation

failtime control 3 23.8000 5.9506

failtime stressed 3 18.2000 2.9513

p-Values

Variable Contrast Raw Bootstrap

failtime a vs b 0.1090 0.0400

Page 64: 1 Nonparametric Statistical Methods Svetlana Stoyanchev Luke Schordine Li Ouyang Valencia Joseph Minghui Lu Rachel Merrill Kleva Costa Michael Johnes Jane.

69

Works Cited

1. Tamhane, Ajit and Dorothy Dunlop. Statistics and Data Analysis. Upper Saddle River, NJ. Prentice Hall, Inc. 2000.

2. “Resampling (statistics)”. Wikipedia. <http://en.wikipedia.org/wiki/Resampling_(statistics)>. 2007.

3. “Ch.14: Nonparametric Statistical Method” Group project, Wei Zhu, instructor. 2006.