Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes...

Nonparametric Techniques

2PR , ANN, & ML

Nonparametric Techniques

w/o assuming any particular distribution

the underlying function may not be known (e.g.

multi-modal densities)

too many parameters

Estimating density distribution directly

Transform into a lower-dimensional space

where parametric techniques may apply

(more on this later on dimension reduction)

3PR , ANN, & ML

Example

Estimate the population growth, annual

rainfall, etc. in the US

p(x,y)dxdy is the probability of rain fall in

[x,x+dx,y,y+dy]

4PR , ANN, & ML

Example (cont.)

A simple parametric model for p(x,y)

probably does not exist

In stead

partition the area into a lattice

At each (x,y), count the amount of rain r(x,y)

Do that for a whole year

Normalize S r (x,y) = 1

5PR , ANN, & ML

Density estimationprobability

value (x)

p x( )

probability

value (x)

P p x dxxx

xi x j

6PR , ANN, & ML

From equation

From observation

P p x dx p x x xxx

j ( ) ( )( )

p xk n

Density estimation

7PR , ANN, & ML

Comparison

In Reality:

The number of

training samples is

limited

if V is too small, k

becomes erratic

What does 0 mean?

if V is too large,

is not representative

In theory:

If n becomes infinitely

large, k/n approaches

the probability, p(x) =

(k/n)/V is then only a

space average

Hence, V must be

allowed to go to zero

as n goes to infinityp x( )

p xk n

8PR , ANN, & ML

In Theory

Theoretically, we can use a sequence of

samples with increasing size for estimation

p x p x if

( ) ( )

9PR , ANN, & ML

Two different approaches

Constrain the region size

Shrink the region to maintain good locality

(Parzen Windows)

Constrain the sample size

Enlarge the number of samples to maintain

good resolution (Kn-nearest-neighbors)

10PR , ANN, & ML

Parzen WindowsUse a windowing function, e.g.

A sequence of n regions can be defined

eorotherwise

( ) ( / )

By definition

11PR , ANN, & ML

Parzen Window (cont.)

As n increases

The window becomes narrower (by hn)

The window becomes taller (by 1/Vn)

Sampling with smaller aperture but

higher focus

The same 100 dollars collected from 100

people and from 1 person is different

(per person)

1)()(1

)( duudxh

12PR , ANN, & ML

)(xpn Small n: large aperture, smoothed, fuzzy estimate

Large n: small aperture, sharp, erratic estimate

13PR , ANN, & ML

14PR , ANN, & ML

2D Sampling

Five samples

Windowing func:

15PR , ANN, & ML

# of samples

Window

16PR , ANN, & ML

17PR , ANN, & ML

Does it work?

“Work” in the sense that you if you are able to shrink down the window size as much as you want (certainly, you must simultaneously increase the number of samples available), then the limit of the profile should be the correct probability

This implies (treating pn as a random variable)

E(pn (x))=p(x)

Var(pn(x)) -> 0

18PR , ANN, & ML

Convergence of Mean

)()()(

)]([)(

Will pn (x) goes to p(x)?

If n goes to infinity

xi will cover all possible x (summation to integration)

with p(x) distribution (weighted by p(x) )

Sample v appears with

probability p(v)

19PR , ANN, & ML

Convergence of Variance

Will pn(x) always end up at p(x) for certain?

nVn must approach infinity, even Vn when goes to zero

dphVnV

)())(sup()(

))(sup(1

)()(11

-> 0 as n-> infinity

20PR , ANN, & ML

kn-nearest-neighbor

Parzen window size hard to estimate

Constrain the number of data items instead

of the size of the window

enlarge window around x to

enclose that many samples, then

p xk n

21PR , ANN, & ML

kn-nearest-neighbor

Intuitively, as n increases

kn should increase (for good representation)

Vn should decrease (for good localization)

The following conditions guarantee

convergence

22PR , ANN, & ML

Sharp spikes around data points:

Kn=1, the probability estimate is infinity at data point

(region size is zero to capture 1 sample)

23PR , ANN, & ML

24PR , ANN, & ML

25PR , ANN, & ML

An Example

Estimating

n tagged samples

a volume V around x captures k samples,

of them are

)|( xip

),()|(

26PR , ANN, & ML

Comparison

Parametric

simple and analytical

may not fit well real-world densities

Non-parametric

flexible and fit all densities

need to remember all samples

27PR , ANN, & ML

One Final Note

Here we talk about Parzen window and kn-

nearest-neighbor rule as a way to estimate a

single probability density

This rule is equally useful at labeling a

sample against multiple probable classes

(densities)

More on that in linear discriminant function

More Realistic Scenarios

Drake’s Equation

Rate of start formation, fraction of stars having

planets, average # of planets per star that

support life, fraction of such stars actually

develop life, fractions of such stars actually

develop civilization, such civilization have

communication, length of time such civilization

actually release signals

PR , ANN, & ML

More Realistic Scenarios

Chance of a person develops cancer

(ancestry, birth place, how raised, living

habits, education history, work history,

exercise habit, income, debt, food intake,

Chance of a person contributes to political

campaign (…)

PR , ANN, & ML

Curse of Dimensionality

Not possible to estimate distributions in

such high-dimensional space

# of samples needed are generally infinitely

PR , ANN, & ML

Practical Usage

X = rand(3,3)

Sampling based on certain distribution

(default is uniform)

Need to evaluate certain expectation

Technology advances by alien contact

Life expectancy (for cancer case)

Amount of money for political campaigns

PR , ANN, & ML

General Idea

Finite number samples: sample

mean/variance to estimate population

mean/variance

z(l) , l = 1, …, L

Samples may not be independent

Some distribution (uniform) is easier to sample

than others

f(z) is small in regions where p(z) is large and

vice versa

PR , ANN, & ML

From One to Another

PR , ANN, & ML

z: uniform

y: any known distribution

Sample z uniformly ==

Sample y based on p(y)

Multi-Dimensional

Much more difficult

Do not know the form

Cannot get enough samples to populate the

landscape

How to generate IID samples?

PR , ANN, & ML

Rejection Sampling

A real distribution p(z)

A proposal distribution q(z)

Procedure

Generate zo from q(z)

Generate uo from [0, kq(zo)] uniformly

Reject sample if

Otherwise, accept

PR , ANN, & ML

Importance Sampling

A real distribution p(z)

A proposal distribution q(z)

Procedure

Generate zo from q(z), nothing rejected

p(z(l))/q(z(l))): importance weight to account for

sampling from wrong distribution

PR , ANN, & ML

Imagine

A very high-dimensional space

Samples occupy low-dimensional manifold in

such a high-dimensional space

Choose a random start point

Wander about in the space, seeking out places

with sample

With right “seek” strategy, samples generated

along the walk have the right population

characteristics

PR , ANN, & ML

MCMC Successive sampling points are NOT

independent, but form a Markov chain

Z* is generated at each step, accepted if

probability > preset threshold

Can be shown that the distribution of z(t)

tends to p(z) as t -> infinity

So distribution of steps z’s after some

initial steps can be used to approximate p(z)

For Metropolis algorithm, q has to be

symmetrical q(a|b)=q(b|a)

PR , ANN, & ML

Meropolis - Hastings f(x): proportional to p(x) – target distribution

Given:

xo: first sample

Q(x’|x): Markov process to generate next sample

(x’) given current sample (x), Q must be

symmetrical (e.g., Gaussian)

Iteration:

X’ picking from Q(x’|x)

r=f(x’)/f(x) >=1 accept, otherwise accept with prob

r. If rejected, x’=x

Intuition

PR , ANN, & ML

Gibbs Sampling

Special case of MCMC Metropolis-

Hastings

From x (i) to x (i+1) by component-wide

sampling, j-th variable in x (i+1) depends on

1 to j-1 in (i+1)-th iterations

j+1 to n in (i)-th iteration

PR , ANN, & ML

Slice Sampling

Random walk under the probability curve

Start from an xo with f(x)>0

Randomly select height y, 0<y<=f(x)

Randomly select x’ lie within the slice, repeat

PR , ANN, & ML

Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes...

Documents

Transcript of Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes...

Nonparametric Bayesian Dictionary Learning for Analysis of … · Nonparametric Bayesian Dictionary Learning for Analysis of Noisy and Incomplete Images ... nonparametric Bayesian

Applied Nonparametric Regression

Nonparametric Pricing of Multivariate Contingent Claims ... · dependence function and nonparametric marginals. The nonparametric euro-yen risk-neutral density has greater volatility,

Nonparametric Bayesian Lomax delegate racing for survival …papers.nips.cc/paper/7748-nonparametric-bayesian-lomax... · 2019-02-19 · Nonparametric Bayesian Lomax delegate racing

Nonparametric Counterfactual Predictions in Neoclassical ... · Nonparametric Counterfactual Predictions in Neoclassical Models . ... Nonparametric Counterfactual Predictions in Neoclassical

Nonparametric Confidence Intervals: Nonparametric Bootstrap.

Applied Nonparametric Regression - Kuliah Umum 19 …ft-sipil.unila.ac.id/dbooks/applied nonparametric regression.pdf · Applied Nonparametric Regression ... exibility in data analysis

Ch10 Nonparametric Tests

Nonparametric Regression

Experiments : design, parametric and nonparametric ... · Experiments : design, parametric and nonparametric ... design, parametric and nonparametric analysis, and ... Fisher'sbook

Chap. 9: Nonparametric Statistics - nu.edu.sd Nonparametric Statistics.pdf · Learning Objectives 1. Distinguish Parametric & Nonparametric Test Procedures 2. Explain commonly used

Nonparametric Test

Gibbons Nonparametric (2003)

Module 9: Nonparametric Tests - Nova Southeastern …apps.fischlerschool.nova.edu/toolbox...Module 9 Overview ! Nonparametric Tests ! Parametric vs. Nonparametric Tests ! Restrictions

Nonparametric tests

Nonparametric Estimation: Part II - Stanford Universityyjhan/nonparametric... · Nonparametric Estimation: Part II Regression in Transformed Space Yanjun Han Department of Electrical

Nonparametric Statistics: ANOVA

“Nonparametric” “Nonparametric” methodsmethods

Dr. Nonparametric Bayespeople.eecs.berkeley.edu › ... › nonparametric › slides.pdf · 2009-11-20 · Dr. Nonparametric Bayes Or: How I Learned to Stop Worrying and Love the

Nonparametric Tests - University of Washingtoncourses.washington.edu/b511/lectures/Lectures3_Fall2013-2x1.pdf · 1 Fall 2013 Biostat 511 339 Nonparametric Tests • Nonparametric