Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Kernel adaptive filtering

Lecture slides for EEL6502

Spring 2011

Sohan Seth

The big picture

Adaptive filters are linear.

How do we learn (continuous) nonlinear structures?

A particular approachAssume a parametric model …

e.g. neural network

Universality: The parametric model should be able to approximate any continuous function.

Universal approximation for sufficiently large

Nonlinearly map signal to higher dimensional space and ... apply a linear

filter.

nonlinear

It’s difficultyNonlinear performance surfaceCan we learn nonlinear structure using knowledge of linear adaptive filtering?

Fix the nonlinear mapping, and use linear filtering.How do we choose the mappings?Need to guarantee universal approximation!

e.g.

A different approachFilter order is

A ‘trick’y solution

Optimal filter exists in the span of input data ***

Only the inner product matters, not the mapping

e.g

Mapping is infinite dimensional.

Top-down design

Output is a projection

Inner product and pd kernel are equivalentInner product

1.Symmetry,

2.Linearity,

3.Positive definiteness is an inner product in some space

space: Linear space with inner product

Use pd kernel to implicitly construct nonlinear mapping

Positive definite (pd) kernel

e.g.or,

How do things work?

Mercer decomposition

considering

Generalization of eigen-value decomposition in functional space.

Take a positive definite kernel

Then

can be infinite

parameters to learn

Bottom-up design

Nonlinearity is implicit in the choice of kernel.

Functional viewWe do not explicitly evaluate the mapping. But it is implicitly applied through the kernel function.

Need to remember all the input data and the coefficients

Feature space

Universality is guaranteed through the kernel.

Ridge regressionHow to find ?

Solution

Problem

How to invert an infinite dimensional matrix

Regularization ***

Online learning

LMS update rule

LMS update rule in feature space

How do we compute these?

Set to 0

Kernel-LMS

Initialize

Iterate for

is the largest eigenvalue of

Unkwown

1.Need to choose a kernel

2.Need to select step size

3.Need to store

4.No regularization ***

5. time complexity for each iteration

Functional approximation

Kernel should be universal e.g. How to

choose

Implementation details

Large

Small

Choosing best value of

2. Thumb-rules: Fast but not accurate

1. Cross validation: Accurate but time consuming

Limiting network size 1. Importance

estimationClose centers are redundant

Self-regularization : Over-fitting parameters to fit

samples How to remove it?

How does KLMS do it?

Ill-posed-nessIll-posed-ness appears due to small singular values in the autocorrelation matrix while taking inverseHow to remove it?

Solve

Tikhonov regularization

Weight the inverse of the small singular values

e.g.

Self-regularization : Well-posed-ness

How does KLMS do it?

Regularizer on the expected solution

However, large singular values might be suppressed.

More information on the course website!Username:

Password:

The stepsize acts as regularizer

Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Documents

Transcript of Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.