Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

16
Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth

Transcript of Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Page 1: Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Kernel adaptive filtering

Lecture slides for EEL6502

Spring 2011

Sohan Seth

Page 2: Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

The big picture

Adaptive filters are linear.

How do we learn (continuous) nonlinear structures?

Page 3: Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

A particular approachAssume a parametric model …

e.g. neural network

Universality: The parametric model should be able to approximate any continuous function.

Universal approximation for sufficiently large

Nonlinearly map signal to higher dimensional space and ... apply a linear

filter.

nonlinear

Page 4: Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

It’s difficultyNonlinear performance surfaceCan we learn nonlinear structure using knowledge of linear adaptive filtering?

Fix the nonlinear mapping, and use linear filtering.How do we choose the mappings?Need to guarantee universal approximation!

e.g.

A different approachFilter order is

Page 5: Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

A ‘trick’y solution

Optimal filter exists in the span of input data ***

Only the inner product matters, not the mapping

e.g

Mapping is infinite dimensional.

Top-down design

Output is a projection

Page 6: Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Inner product and pd kernel are equivalentInner product

1.Symmetry,

2.Linearity,

3.Positive definiteness is an inner product in some space

space: Linear space with inner product

Use pd kernel to implicitly construct nonlinear mapping

Positive definite (pd) kernel

e.g.or,

Page 7: Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

How do things work?

Mercer decomposition

considering

Generalization of eigen-value decomposition in functional space.

Take a positive definite kernel

Then

can be infinite

parameters to learn

Bottom-up design

Nonlinearity is implicit in the choice of kernel.

Page 8: Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Functional viewWe do not explicitly evaluate the mapping. But it is implicitly applied through the kernel function.

Need to remember all the input data and the coefficients

Feature space

Universality is guaranteed through the kernel.

Page 9: Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Ridge regressionHow to find ?

Solution

Problem

How to invert an infinite dimensional matrix

Regularization ***

Page 10: Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Online learning

LMS update rule

LMS update rule in feature space

How do we compute these?

Set to 0

Page 11: Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Kernel-LMS

Initialize

Iterate for

is the largest eigenvalue of

Unkwown

1.Need to choose a kernel

2.Need to select step size

3.Need to store

4.No regularization ***

5. time complexity for each iteration

Page 12: Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Functional approximation

Kernel should be universal e.g. How to

choose

Page 13: Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Implementation details

Large

Small

Choosing best value of

2. Thumb-rules: Fast but not accurate

1. Cross validation: Accurate but time consuming

Limiting network size 1. Importance

estimationClose centers are redundant

Page 14: Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Self-regularization : Over-fitting parameters to fit

samples How to remove it?

How does KLMS do it?

Page 15: Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Ill-posed-nessIll-posed-ness appears due to small singular values in the autocorrelation matrix while taking inverseHow to remove it?

Solve

Tikhonov regularization

Weight the inverse of the small singular values

e.g.

Page 16: Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Self-regularization : Well-posed-ness

How does KLMS do it?

Regularizer on the expected solution

However, large singular values might be suppressed.

More information on the course website!Username:

Password:

The stepsize acts as regularizer