Kernel estimators ESSI SYRJÄLÄ. Introduction More generally.
-
Upload
magdalene-bryant -
Category
Documents
-
view
218 -
download
4
Transcript of Kernel estimators ESSI SYRJÄLÄ. Introduction More generally.
Kernel estimatorsESSI SYRJÄLÄ
Introduction
Kernel estimators are non-parametric regression estimators
Kernel estimators smooth out the contribution of each observed data point over a local neighborhood of that data point
The contribution of data point to the estimate at some point depends on how apart they are from each others.
More generally
Estimate of f: where
K is a Kernel where and h controls the smoothness of the fitted curve, it is called the bandwidth, window width or smoothing parameter
If the observations are spaced very unevenly, the estimator can give poor results
-> Nadaraya –Watson estimator:
Basic asymptotics
The optimal choice of h gives: =O()
-> when sample size increases, mean squared error decreases at a rate proportional to .
-> for typical parametric estimator, MSE(x)=O() -> kernel estimator is less efficient.
Kernel estimator
Estimator requires two choices: the kernel and the smoothing parameter
The choice of kernel is not so important than the choice of smoothing parameter -> too small: undersmoothing,too large: oversmoothing
We prefer smoothness and compact kernel, optimal choice for kernel is the Epanechnikov kernel:
There is also other kernels like Gaussian and Uniform.
R-code: Choices of bandwidth
library(faraway)
data(trees)
attach(trees)
plot(Height ~ Girth, trees ,main="bandwidth=1")
# The default uses a uniform kernel but it’s quite rough so we # change it to normal kernel
lines(ksmooth(Girth,Height,"normal",1),lwd=2,col = "red")
plot(Height ~ Girth, trees ,main="bandwidth=3")
lines(ksmooth(Girth,Height,"normal",3),lwd=2,col = "red")
plot(Height ~ Girth, trees ,main="bandwidth=7")
lines(ksmooth(Girth,Height,"normal",7),lwd=2,col = "red")
Kernel estimates with different bandwidths
R-code
install.packages("sm")
library(sm)
#Cross-validated choice of bandwidth
hm<-hcv(Girth,Height,display="lines") #hm=2.291831
#This uses Gaussian kernel
sm.regression(Girth,Height,h=hm,xlab="girth",ylab="height")
Cross-validation criterion as a function of a smoothing parameter and kernel estimate with this value of the smoothing parameter
Exercise
Use data ais from package alr3. Find the best value for the smoothing parameter (bandwidth) by plotting pictures with different bandwidths and then by cross-validation. Notice that you have to define start value and end value (?hcv).
Then do the same thing just for females (when sex is female).
References
Faraway, Julian J. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models, Chapman& Hall/CRC, 2006.
Wikipedia. Kernel density estimation. Edited 1.4.2015. http://en.wikipedia.org/wiki/Kernel_density_estimation
Wikipedia. Big O notation. Edited 12.3.2015. http://en.wikipedia.org/wiki/Big_O_notation#Usage