Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes...

42
Nonparametric Techniques

Transcript of Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes...

Page 1: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

Nonparametric Techniques

Page 2: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

2PR , ANN, & ML

Nonparametric Techniques

w/o assuming any particular distribution

the underlying function may not be known (e.g.

multi-modal densities)

too many parameters

Estimating density distribution directly

Transform into a lower-dimensional space

where parametric techniques may apply

(more on this later on dimension reduction)

Page 3: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

3PR , ANN, & ML

Example

Estimate the population growth, annual

rainfall, etc. in the US

p(x,y)dxdy is the probability of rain fall in

[x,x+dx,y,y+dy]

Page 4: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

4PR , ANN, & ML

Example (cont.)

A simple parametric model for p(x,y)

probably does not exist

In stead

partition the area into a lattice

At each (x,y), count the amount of rain r(x,y)

Do that for a whole year

Normalize S r (x,y) = 1

Page 5: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

5PR , ANN, & ML

Density estimationprobability

value (x)

p x( )

probability

value (x)

P p x dxxx

i

j ( )

xi x j

Page 6: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

6PR , ANN, & ML

From equation

From observation

Hence

P p x dx p x x xxx

j ii

j ( ) ( )( )

Pk

n

p xk n

x x

k n

Vj i

( )/

( )

/

Density estimation

Page 7: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

7PR , ANN, & ML

Comparison

In Reality:

The number of

training samples is

limited

if V is too small, k

becomes erratic

What does 0 mean?

if V is too large,

is not representative

In theory:

If n becomes infinitely

large, k/n approaches

the probability, p(x) =

(k/n)/V is then only a

space average

Hence, V must be

allowed to go to zero

as n goes to infinityp x( )

p xk n

x x

k n

Vj i

( )/

( )

/

Page 8: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

8PR , ANN, & ML

In Theory

Theoretically, we can use a sequence of

samples with increasing size for estimation

Then

p x p x if

V

k

k

n

n

nn

nn

n

n

( ) ( )

( )

( )

( )

lim

lim

lim

1 0

2

3 0

Page 9: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

9PR , ANN, & ML

Two different approaches

Constrain the region size

Shrink the region to maintain good locality

(Parzen Windows)

Constrain the sample size

Enlarge the number of samples to maintain

good resolution (Kn-nearest-neighbors)

Page 10: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

10PR , ANN, & ML

Parzen WindowsUse a windowing function, e.g.

A sequence of n regions can be defined

221

2

2

1

0

||1)(

x

eorotherwise

xx

n n

nh

n

x x h

h

( ) ( / )

1

n

i

in

n

i n

i

n

n

n

i n

in

i

inn

xxnh

xx

Vnxp

h

xxxxk

11

11

)(1

)(11

)(

)()(

By definition

Page 11: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

11PR , ANN, & ML

Parzen Window (cont.)

As n increases

The window becomes narrower (by hn)

The window becomes taller (by 1/Vn)

Sampling with smaller aperture but

higher focus

The same 100 dollars collected from 100

people and from 1 person is different

(per person)

1)()(1

)( duudxh

xx

Vdxxx

n

i

n

in

n

Page 12: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

12PR , ANN, & ML

x

)(xpn Small n: large aperture, smoothed, fuzzy estimate

Large n: small aperture, sharp, erratic estimate

Page 13: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

13PR , ANN, & ML

1n

16n

256n

n

Page 14: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

14PR , ANN, & ML

2D Sampling

Five samples

Windowing func:

Page 15: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

15PR , ANN, & ML

# of samples

Window

size

Page 16: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

16PR , ANN, & ML

1n

16n

256n

n

Page 17: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

17PR , ANN, & ML

Does it work?

“Work” in the sense that you if you are able to shrink down the window size as much as you want (certainly, you must simultaneously increase the number of samples available), then the limit of the profile should be the correct probability

This implies (treating pn as a random variable)

E(pn (x))=p(x)

Var(pn(x)) -> 0

Page 18: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

18PR , ANN, & ML

Convergence of Mean

)()()(

)()(1

)](1

[1

)]([)(

1

xvvvx

vvvx

xx

xx

pdp

dphV

hVE

n

pEp

n

nn

n

i n

i

n

nn

Will pn (x) goes to p(x)?

If n goes to infinity

xi will cover all possible x (summation to integration)

with p(x) distribution (weighted by p(x) )

Sample v appears with

probability p(v)

Page 19: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

19PR , ANN, & ML

Convergence of Variance

Will pn(x) always end up at p(x) for certain?

nVn must approach infinity, even Vn when goes to zero

n

nn

nnn

n

nnn

n

i

n

n

i

n

n

i

n

n

i

n

n

nV

p

dphVnV

pn

dphVnV

pnhVn

nE

pnhnV

E

)())(sup()(

)()(1

))(sup(1

)(1

)()(11

)(1

)(1

)(1

)(1

)(

2

22

1

22

22

1

2

2

xx

vvvx

xvvvx

xxx

xxx

x

-> 0 as n-> infinity

Page 20: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

20PR , ANN, & ML

kn-nearest-neighbor

Parzen window size hard to estimate

Constrain the number of data items instead

of the size of the window

enlarge window around x to

enclose that many samples, then

k nn

p xk n

Vn

n

n

( )/

Page 21: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

21PR , ANN, & ML

kn-nearest-neighbor

Intuitively, as n increases

kn should increase (for good representation)

Vn should decrease (for good localization)

The following conditions guarantee

convergence

0lim

lim

n

k

k

n

n

nn

Page 22: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

22PR , ANN, & ML

Sharp spikes around data points:

Kn=1, the probability estimate is infinity at data point

(region size is zero to capture 1 sample)

Page 23: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

23PR , ANN, & ML

Page 24: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

24PR , ANN, & ML

1

1

nk

n

4

16

nk

n

16

256

nk

n

nk

n

Page 25: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

25PR , ANN, & ML

An Example

Estimating

n tagged samples

a volume V around x captures k samples,

of them are

)|( xip

ki

k

k

V

nkV

nk

V

nkV

nk

p

pp

V

nkp

i

i

c

j

i

i

c

j

jn

inin

iin

/

/

/

/

),(

),()|(

/),(

11

x

xx

x

i

Page 26: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

26PR , ANN, & ML

Comparison

Parametric

simple and analytical

may not fit well real-world densities

Non-parametric

flexible and fit all densities

need to remember all samples

Page 27: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

27PR , ANN, & ML

One Final Note

Here we talk about Parzen window and kn-

nearest-neighbor rule as a way to estimate a

single probability density

This rule is equally useful at labeling a

sample against multiple probable classes

(densities)

More on that in linear discriminant function

Page 28: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

28

More Realistic Scenarios

Drake’s Equation

Rate of start formation, fraction of stars having

planets, average # of planets per star that

support life, fraction of such stars actually

develop life, fractions of such stars actually

develop civilization, such civilization have

communication, length of time such civilization

actually release signals

PR , ANN, & ML

Page 29: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

29

More Realistic Scenarios

Chance of a person develops cancer

(ancestry, birth place, how raised, living

habits, education history, work history,

exercise habit, income, debt, food intake,

etc.)

Chance of a person contributes to political

campaign (…)

PR , ANN, & ML

Page 30: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

30

Curse of Dimensionality

Not possible to estimate distributions in

such high-dimensional space

# of samples needed are generally infinitely

large

PR , ANN, & ML

Page 31: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

31

Practical Usage

X = rand(3,3)

Sampling based on certain distribution

(default is uniform)

Need to evaluate certain expectation

Technology advances by alien contact

Life expectancy (for cancer case)

Amount of money for political campaigns

PR , ANN, & ML

Page 32: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

32

General Idea

Finite number samples: sample

mean/variance to estimate population

mean/variance

z(l) , l = 1, …, L

Samples may not be independent

Some distribution (uniform) is easier to sample

than others

f(z) is small in regions where p(z) is large and

vice versa

PR , ANN, & ML

Page 33: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

33

From One to Another

PR , ANN, & ML

z: uniform

y: any known distribution

Sample z uniformly ==

Sample y based on p(y)

Page 34: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

34

Multi-Dimensional

Much more difficult

Do not know the form

Cannot get enough samples to populate the

landscape

How to generate IID samples?

PR , ANN, & ML

Page 35: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

35

Rejection Sampling

A real distribution p(z)

A proposal distribution q(z)

Procedure

Generate zo from q(z)

Generate uo from [0, kq(zo)] uniformly

Reject sample if

Otherwise, accept

PR , ANN, & ML

Page 36: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

36

Importance Sampling

A real distribution p(z)

A proposal distribution q(z)

Procedure

Generate zo from q(z), nothing rejected

p(z(l))/q(z(l))): importance weight to account for

sampling from wrong distribution

PR , ANN, & ML

Page 37: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

37

MCMC

Imagine

A very high-dimensional space

Samples occupy low-dimensional manifold in

such a high-dimensional space

Choose a random start point

Wander about in the space, seeking out places

with sample

With right “seek” strategy, samples generated

along the walk have the right population

characteristics

PR , ANN, & ML

Page 38: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

38

MCMC Successive sampling points are NOT

independent, but form a Markov chain

Z* is generated at each step, accepted if

probability > preset threshold

Can be shown that the distribution of z(t)

tends to p(z) as t -> infinity

So distribution of steps z’s after some

initial steps can be used to approximate p(z)

For Metropolis algorithm, q has to be

symmetrical q(a|b)=q(b|a)

PR , ANN, & ML

Page 39: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

39

Meropolis - Hastings f(x): proportional to p(x) – target distribution

Given:

xo: first sample

Q(x’|x): Markov process to generate next sample

(x’) given current sample (x), Q must be

symmetrical (e.g., Gaussian)

Iteration:

X’ picking from Q(x’|x)

r=f(x’)/f(x) >=1 accept, otherwise accept with prob

r. If rejected, x’=x

Page 40: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

40

Intuition

PR , ANN, & ML

Page 41: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

41

Gibbs Sampling

Special case of MCMC Metropolis-

Hastings

From x (i) to x (i+1) by component-wide

sampling, j-th variable in x (i+1) depends on

1 to j-1 in (i+1)-th iterations

j+1 to n in (i)-th iteration

PR , ANN, & ML

Page 42: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars

42

Slice Sampling

Random walk under the probability curve

Start from an xo with f(x)>0

Randomly select height y, 0<y<=f(x)

Randomly select x’ lie within the slice, repeat

PR , ANN, & ML

xxo

f(xo)

y

slice