1 Nonparametric Methods I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung...

74
1 Nonparametric Methods I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University [email protected] http://tigpbp.iis.sinica.edu.tw/ courses.htm

Transcript of 1 Nonparametric Methods I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung...

1

Nonparametric Methods I

Henry Horng-Shing LuInstitute of Statistics

National Chiao Tung [email protected]

http://tigpbp.iis.sinica.edu.tw/courses.htm

2

Parametric vs. NonparametricMLE: probability distribution and

likelihood Bayes: conditional, prior and

posterior distributionsDistribution free?http://en.wikipedia.org/wiki/Non-

parametric_statistics

3

Motivation (1) In many applications, direct access to a

measurement and is not possible. However, an estimation of the measurement is needed.

Most of the time, the large scale repetition of an experiment is not economically feasible.

What can one do?

4

Motivation (2) Q1: What estimator for the problem of

interest can be used? Q2: Having chosen an estimator, how

accurate is it? What is the bias and variance of an estimator?

Q3: How to make inference? What is the confidence interval? What is the p-value for a hypothesis testing?

5

References B. Efron (1979) Computers and the theory of

statistics: thinking the unthinkable, SIAM Review, 21, 460-480.

B. Efron and R. J. Tibshirani (1993) An Introduction to the Bootstrap. Chapman & Hall.

J. I. De la Rosa and G. A. Fleury (2006) Bootstrap methods for a measurement estimation problem. IEEE Transactions onInstrumentation and Measurement, 55, 3, 820–827.

http://en.wikipedia.org/wiki/Resampling_%28statistics%29#Jackknife http://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29

6

Resampling Techniques Data resampling

PART 1: Jackknife Resampling without replacement

PART 2: Bootstrap Resampling with replacement

7

PART 1: Jackknife Naming Illustration Math Expression Examples R codes C codes

8

Why the funny name of Jackknife? Jackknife: a pocket knife

http://en.wikipedia.org/wiki/Jackknife Mosteller and Tukey (1977, p. 133) described a

predecessor resampling method, the jackknife, in the following way:“The name ‘jackknife’ is intended to suggest the broad usefulness of a technique as a substitute for specialized tools that may not be available, just as the Boy Scout’s trustworthy tool serves so variedly…”

http://mrw.interscience.wiley.com/emrw/9780470013199/esbs/article/bsa321/current/abstract

9

Illustration of Jackknife

Population,

resampling

1 2, , ..., nX X X

sampling

2 , ..., nX X 1 3, ... nX X X 1 2 1, , ..., nX X X

N times

inference

( 1) ( 2) ( )ˆ ˆ ˆ n

statistics

Estimate by

10

Math Expression2 2

1

2 2 21 1

1

23 12

1

11 33

, ..., ( ), . . ( ) ( , ), ( )=( , ).

We can estimate θ by the followings:

1ˆ ( , ..., ) ( , S ) (where ( ) )1

ˆ ( , ( ) )

1

ˆ ( , 2

iid

n

n

n ii

ii

X X F x e g F x N F

X X X S X Xn

q qmedian

c

X mediannq q

2

2

),

and so on.

n

c

2

Rules of judgement :

ˆ( )

ˆ( )

bias E

Variance Var

MSE bias Variance

11

An Example of Jackknife (1)

1 2 n, ..., XX X

21 n

ˆ, ..., X ~ ( ) ( . . ( , ), )iid

X F X e g N X

2(1)

... + ˆ 1

nX X

n

(1) (2) ( )( )

2 2. .2 2

( ) ( ) ( )1

ˆ ˆ ˆ + ... + ˆ = ,

ˆ 1ˆ ˆ ˆ ˆ( 1)( ) ( 1)( ) 0, ( ) .

n

ne g

ii

Xn

s nbias n n X X se

n n n

1 3(2)

... + ˆ 1

nX X X

n

1 1( )

... + ˆ 1

nn

X X

n

1 2 n, ..., XX X

1 2 n, ..., XX X

HW HW

12

* * *1 2 1 99 (1) (50)

ˆ, ..., , ..., nX X X X X X

* * *1 2 1 99 (2) (50)

ˆ, ..., , ..., nX X X X X X

* * *1 2 1 99 ( ) (50)

ˆ, ..., , ..., n nX X X X X X

(50) (51)21 n

ˆ ˆ, ..., X ~ ( ) ( . . ( , ), 100, )2

iid X XX F X e g N median n median

An Example of Jackknife (2)

13

Summary of the Jackknife Method

(1) (2) ( )( )

( )

22

( ) ( )1

2 2

ˆ ˆ ˆ + ... + ˆ ,

ˆ ˆ( 1)( ),

1 ˆ ˆ( ) ,

.

n

J

n

J ii

J JJ

n

bias n

nse

n

MSE bias se

14

How do quartiles lead to an estimate?

1

1

Note that q3= (0.75)=0.6745 and

q1= (0.25)=-0.6745 is

the upper and lower quartile of

a standard normal distribution.

Hence, the interquartile range is IQR=q3-q1=1.349.

Therefore, IQR/1.349 can be

used an estimator of

the standard deviation of a normal distribution.

15

Jackknife by R

1. Open “R”

16

2. Install add-on packages

17

3.Select a mirror site, like

Taiwan (Taipeh)

18

4.Select the package of “bootstrap”

19

20

5. type: library(bootstrap)

21

If you want to see the manual, you can type “?jackniffe”.

22

23

R-package

24

Select the menu to open the editor in R

25

You can edit your program in this box and then store this

program.

26

You can save your program……

27

main.jackknife.function

28

(1) Use mouse to select the R commands you want to run.

(2) Press “F5” to run

29

output

30

Jackknife by C define functions

31

32

33

An example for jackknife

34

35

36

37

PART 2: Bootstrap Naming Illustration Math Expression Examples R codes

Three approaches Package(bootstrap) Package(boot) Write your own R codes

C codes

38

The Bootstrap Bootstrap technique was proposed by

Bradley Efron (1979, 1981, 1982) in literature.

Bootstrapping is an application of intensive computing to traditional inferential methods.

39

Why the funny name of bootstrap?

Bootstrap: http://www.concurringopinions.com/

archives/Bootstrap_1.jpg In the book of ‘Singular

Travels, Campaigns and Adventures of Baron Munchausen’ by R. E. Raspe (1786), the main character, finding himself in a deep hole, extracts himself using only the straps of his boots.

http://tigger.uic.edu/~slsclove/stathumr.htm

40

Illustration of Bootstrap

Population,

resampling

1 2, , ..., nX X X

sampling

* * *1 2, , ..., nX X X

B times

inference

* * *1 2ˆ ˆ ˆ B

statistics

estimate by

* * *1 2, , ..., nX X X * * *

1 2, , ..., nX X X

41

Math Expression

. .2

1 2

. . . .

1

* * *1 2

*

, , ..., ~ ( ) ( ( , )),

ˆ( )( ( )), ( )( ( ) ),

1where ( ) 1 .

Resampling with replacement:

, , ..., ~ ( ).

Repeat times and every time,

ˆ

e g

n

e g e g

n n

n

n ii

n n

i

X X X F x N

F xdF x F xdF x X

F x X xn

X X X F x

B

* * *. .* * 1 2

* *

1

+ + ... ( )( ( ) ),

11, ..., , where ( ) 1 .

e gn

n n

n

n ii

X X XF xdF x

n

i B F x X xn

* * ** 1 2( )

*( )

* * 2( )

1

ˆ ˆ ˆ ... ˆ ,

ˆ ˆ( ),

1 ˆ ˆvar ( - ) .1

B

B

B

B bb

B

bias

B

42

1 2For example, , , ..., ~ ( ) ( . . ( ) ( , 1) ).

If you want to know population, you can calculate mean or variance in expectation,

but it is often not easy to do.

For example, is sampling fr

nX X X F x e g F x N

X

1

om a population and

( ) ( ) ( );

1 ( );

2( ).

E X xf x dx xdF x

median F

F

Population,

43

to STEP 1: When you get data objects, how can you do

estimate the parameter of the population?

n

Population

step1sampling

1 2, , ..., nX X X

44

step2

resamplingB times

1 2, , ..., nX X X

STEP2 : Resampling the data times by with replacement,

then you can get many resampling data, and use this

resampling data instead of really resampling data from

population.

B

* * *1 2, , ..., nX X X * * *

1 2, , ..., nX X X* * *1 2, , ..., nX X X

45

1 2

* * *1 2

STEP 3: Regrad , , ..., as the new population and resample it times with

replacement, , , ..., ~ ( ).

Then, you can calculate statistics.

n

n n

X X X B

X X X F x

* * *1 2, , ..., nX X X

* * *1 2ˆ ˆ ˆ B

Step 3:

statistics

* * *1 2, , ..., nX X X * * *

1 2, , ..., nX X X

46

* * *1 2

* * ** *1 2( ) *

** * * *

* *1 2 1 1 2* * * 1

*( )

STEP 4: Make inference by resampling statistics, , , ...,

...( ) ?

( )... ...

( ) ( ) ( ) .

( ) (

B

LLNBB

n

in i n

LLNB B

X X X

X X XX E X

B

E XX X X X X X

E X E E Xn n n

bias X X X X

2* * 2 * * *( ) * * *

1

* * * * * ** 1 2 * 1 * 2 *

* * 2

2 2*

1* 1

) 0.

1var ( ) ( ) ( ) ?

1

... ( ) ( ) ... ( )( ) ( )

1 1( )( )

.

BLLN

B b Bb

n n

n

ii

X X E X E X V XB

X X X V X V X V XV X V

n n

nX X SV X n nn n n

47

* * *1 2

* * ** *1 2( ) *

*( )

2* * 2

( )1

2 2 2

ˆ ˆ ˆMore generally, we can get , , ..., by bootstrap.

ˆ ˆ ˆ...ˆ ˆ ˆ( ) .

ˆ ˆ( ),

1 ˆ ˆvar ( ) ,1

var .

B

LLNBB

B

B

B B bb

B B B B B

EB

bias

seB

MSE bias se bias

Summary of the Bootstrap Method

48

Bootstrap by R Approach 1

Use package “bootstrap” Approach 2

Use package “boot” Approach 3

Write your own R codes

49http://finzi.psych.upenn.edu/R/library/bootstrap/DESCRIPTION

Approach 1

50

1. Install the add-on package

51

2.Select a mirror site like

“Taiwan (Taipeh)”

52

3.Select the package of “bootstrap”

53

54

4. type library(bootstrap)

55

If you want to see the manual, you can type “?bootstrap”.

56

bias

57

Use this package to do bootstrap

58

59

2

60

http://finzi.psych.upenn.edu/R/library/boot/DESCRIPTION

Approach 2

61

Library(boot)

62

63

A character string indicating the type of simulation required. Possible values are "ordinary" (the default), "parametric", "balanced", "permutation", or "antithetic". Importance resampling is specified by including importance weights; the type of importance resampling must still be specified but may only be "ordinary" or "balanced" in this case.

Arguments

64

R code

Approach 3

65

An example

66

Run functions

2

67

Run main function

68

Bootstrap by C

69

70

實際操作

An example

71

72

73

74

Exercises Write your own programs similar to those

examples presented in this talk.

Write programs for those examples mentioned at the reference web pages.

Write programs for the other examples that you know.

Prove those theoretical statements in this talk.

74