1 Nonparametric Methods I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung...
-
Upload
annabella-pope -
Category
Documents
-
view
217 -
download
0
Transcript of 1 Nonparametric Methods I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung...
1
Nonparametric Methods I
Henry Horng-Shing LuInstitute of Statistics
National Chiao Tung [email protected]
http://tigpbp.iis.sinica.edu.tw/courses.htm
2
Parametric vs. NonparametricMLE: probability distribution and
likelihood Bayes: conditional, prior and
posterior distributionsDistribution free?http://en.wikipedia.org/wiki/Non-
parametric_statistics
3
Motivation (1) In many applications, direct access to a
measurement and is not possible. However, an estimation of the measurement is needed.
Most of the time, the large scale repetition of an experiment is not economically feasible.
What can one do?
4
Motivation (2) Q1: What estimator for the problem of
interest can be used? Q2: Having chosen an estimator, how
accurate is it? What is the bias and variance of an estimator?
Q3: How to make inference? What is the confidence interval? What is the p-value for a hypothesis testing?
5
References B. Efron (1979) Computers and the theory of
statistics: thinking the unthinkable, SIAM Review, 21, 460-480.
B. Efron and R. J. Tibshirani (1993) An Introduction to the Bootstrap. Chapman & Hall.
J. I. De la Rosa and G. A. Fleury (2006) Bootstrap methods for a measurement estimation problem. IEEE Transactions onInstrumentation and Measurement, 55, 3, 820–827.
http://en.wikipedia.org/wiki/Resampling_%28statistics%29#Jackknife http://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29
6
Resampling Techniques Data resampling
PART 1: Jackknife Resampling without replacement
PART 2: Bootstrap Resampling with replacement
8
Why the funny name of Jackknife? Jackknife: a pocket knife
http://en.wikipedia.org/wiki/Jackknife Mosteller and Tukey (1977, p. 133) described a
predecessor resampling method, the jackknife, in the following way:“The name ‘jackknife’ is intended to suggest the broad usefulness of a technique as a substitute for specialized tools that may not be available, just as the Boy Scout’s trustworthy tool serves so variedly…”
http://mrw.interscience.wiley.com/emrw/9780470013199/esbs/article/bsa321/current/abstract
9
Illustration of Jackknife
Population,
resampling
1 2, , ..., nX X X
sampling
2 , ..., nX X 1 3, ... nX X X 1 2 1, , ..., nX X X
N times
inference
( 1) ( 2) ( )ˆ ˆ ˆ n
statistics
Estimate by
10
Math Expression2 2
1
2 2 21 1
1
23 12
1
11 33
, ..., ( ), . . ( ) ( , ), ( )=( , ).
We can estimate θ by the followings:
1ˆ ( , ..., ) ( , S ) (where ( ) )1
ˆ ( , ( ) )
1
ˆ ( , 2
iid
n
n
n ii
ii
X X F x e g F x N F
X X X S X Xn
q qmedian
c
X mediannq q
2
2
),
and so on.
n
c
2
Rules of judgement :
ˆ( )
ˆ( )
bias E
Variance Var
MSE bias Variance
11
An Example of Jackknife (1)
1 2 n, ..., XX X
21 n
ˆ, ..., X ~ ( ) ( . . ( , ), )iid
X F X e g N X
2(1)
... + ˆ 1
nX X
n
(1) (2) ( )( )
2 2. .2 2
( ) ( ) ( )1
ˆ ˆ ˆ + ... + ˆ = ,
ˆ 1ˆ ˆ ˆ ˆ( 1)( ) ( 1)( ) 0, ( ) .
n
ne g
ii
Xn
s nbias n n X X se
n n n
1 3(2)
... + ˆ 1
nX X X
n
1 1( )
... + ˆ 1
nn
X X
n
1 2 n, ..., XX X
1 2 n, ..., XX X
HW HW
12
* * *1 2 1 99 (1) (50)
ˆ, ..., , ..., nX X X X X X
* * *1 2 1 99 (2) (50)
ˆ, ..., , ..., nX X X X X X
* * *1 2 1 99 ( ) (50)
ˆ, ..., , ..., n nX X X X X X
(50) (51)21 n
ˆ ˆ, ..., X ~ ( ) ( . . ( , ), 100, )2
iid X XX F X e g N median n median
An Example of Jackknife (2)
13
Summary of the Jackknife Method
(1) (2) ( )( )
( )
22
( ) ( )1
2 2
ˆ ˆ ˆ + ... + ˆ ,
ˆ ˆ( 1)( ),
1 ˆ ˆ( ) ,
.
n
J
n
J ii
J JJ
n
bias n
nse
n
MSE bias se
14
How do quartiles lead to an estimate?
1
1
Note that q3= (0.75)=0.6745 and
q1= (0.25)=-0.6745 is
the upper and lower quartile of
a standard normal distribution.
Hence, the interquartile range is IQR=q3-q1=1.349.
Therefore, IQR/1.349 can be
used an estimator of
the standard deviation of a normal distribution.
37
PART 2: Bootstrap Naming Illustration Math Expression Examples R codes
Three approaches Package(bootstrap) Package(boot) Write your own R codes
C codes
38
The Bootstrap Bootstrap technique was proposed by
Bradley Efron (1979, 1981, 1982) in literature.
Bootstrapping is an application of intensive computing to traditional inferential methods.
39
Why the funny name of bootstrap?
Bootstrap: http://www.concurringopinions.com/
archives/Bootstrap_1.jpg In the book of ‘Singular
Travels, Campaigns and Adventures of Baron Munchausen’ by R. E. Raspe (1786), the main character, finding himself in a deep hole, extracts himself using only the straps of his boots.
http://tigger.uic.edu/~slsclove/stathumr.htm
40
Illustration of Bootstrap
Population,
resampling
1 2, , ..., nX X X
sampling
* * *1 2, , ..., nX X X
B times
inference
* * *1 2ˆ ˆ ˆ B
statistics
estimate by
* * *1 2, , ..., nX X X * * *
1 2, , ..., nX X X
41
Math Expression
. .2
1 2
. . . .
1
* * *1 2
*
, , ..., ~ ( ) ( ( , )),
ˆ( )( ( )), ( )( ( ) ),
1where ( ) 1 .
Resampling with replacement:
, , ..., ~ ( ).
Repeat times and every time,
ˆ
e g
n
e g e g
n n
n
n ii
n n
i
X X X F x N
F xdF x F xdF x X
F x X xn
X X X F x
B
* * *. .* * 1 2
* *
1
+ + ... ( )( ( ) ),
11, ..., , where ( ) 1 .
e gn
n n
n
n ii
X X XF xdF x
n
i B F x X xn
* * ** 1 2( )
*( )
* * 2( )
1
ˆ ˆ ˆ ... ˆ ,
ˆ ˆ( ),
1 ˆ ˆvar ( - ) .1
B
B
B
B bb
B
bias
B
42
1 2For example, , , ..., ~ ( ) ( . . ( ) ( , 1) ).
If you want to know population, you can calculate mean or variance in expectation,
but it is often not easy to do.
For example, is sampling fr
nX X X F x e g F x N
X
1
om a population and
( ) ( ) ( );
1 ( );
2( ).
E X xf x dx xdF x
median F
F
Population,
43
to STEP 1: When you get data objects, how can you do
estimate the parameter of the population?
n
Population
step1sampling
1 2, , ..., nX X X
44
step2
resamplingB times
1 2, , ..., nX X X
STEP2 : Resampling the data times by with replacement,
then you can get many resampling data, and use this
resampling data instead of really resampling data from
population.
B
* * *1 2, , ..., nX X X * * *
1 2, , ..., nX X X* * *1 2, , ..., nX X X
45
1 2
* * *1 2
STEP 3: Regrad , , ..., as the new population and resample it times with
replacement, , , ..., ~ ( ).
Then, you can calculate statistics.
n
n n
X X X B
X X X F x
* * *1 2, , ..., nX X X
* * *1 2ˆ ˆ ˆ B
Step 3:
statistics
* * *1 2, , ..., nX X X * * *
1 2, , ..., nX X X
46
* * *1 2
* * ** *1 2( ) *
** * * *
* *1 2 1 1 2* * * 1
*( )
STEP 4: Make inference by resampling statistics, , , ...,
...( ) ?
( )... ...
( ) ( ) ( ) .
( ) (
B
LLNBB
n
in i n
LLNB B
X X X
X X XX E X
B
E XX X X X X X
E X E E Xn n n
bias X X X X
2* * 2 * * *( ) * * *
1
* * * * * ** 1 2 * 1 * 2 *
* * 2
2 2*
1* 1
) 0.
1var ( ) ( ) ( ) ?
1
... ( ) ( ) ... ( )( ) ( )
1 1( )( )
.
BLLN
B b Bb
n n
n
ii
X X E X E X V XB
X X X V X V X V XV X V
n n
nX X SV X n nn n n
47
* * *1 2
* * ** *1 2( ) *
*( )
2* * 2
( )1
2 2 2
ˆ ˆ ˆMore generally, we can get , , ..., by bootstrap.
ˆ ˆ ˆ...ˆ ˆ ˆ( ) .
ˆ ˆ( ),
1 ˆ ˆvar ( ) ,1
var .
B
LLNBB
B
B
B B bb
B B B B B
EB
bias
seB
MSE bias se bias
Summary of the Bootstrap Method
48
Bootstrap by R Approach 1
Use package “bootstrap” Approach 2
Use package “boot” Approach 3
Write your own R codes
63
A character string indicating the type of simulation required. Possible values are "ordinary" (the default), "parametric", "balanced", "permutation", or "antithetic". Importance resampling is specified by including importance weights; the type of importance resampling must still be specified but may only be "ordinary" or "balanced" in this case.
Arguments