Elements of Pattern Recognition CNS/EE-148 -- Lecture 5 M. Weber P. Perona.

Elements ofPattern Recognition

CNS/EE-148 -- Lecture 5

M. WeberP. Perona

What is Classification?

• We want to assign objects to classes based on a selection of attributes (features).

• Examples:– (age, income) {credit worthy, not credit worthy}

– (blood cell count, body temp) {flue, hepatitis B, hepatitis C}

– (pixel vector) {Bill Clinton, coffee cup}

• Feature vector can be continuous, discrete or mixed.

What is Classification?

• Want to find a function from measurements to class labels decision boundary.

{ }?2102 ,,,: CCCCRc →

Signal 1

Signal 2

• Statistical methods use pdf: p(C,x)

• Assume p(C,x) known for now

Space of Feature Vectors

Some Terminology

• p(C) is called a prior or a priori probability

• p(x|C) is called a class-conditional density

or likelihood of C with respect to x

• p(C|x) is called a posterior or

a posteriori probability

Examples• One measurement, symmetric cost, equal priors

p(x|C1) p(x|C2)

⎭⎬⎫

⎩⎨⎧

)(),|(

)(),|()|(

CxcifxCP

CxcifxCPxerrorP∫= dxxpxerrorPerrorP )()|()(

Examples

• One measurement, symmetric cost, equal priors

p(x|C1) p(x|C2)

How to Make the Best Decision? (Bayes Decision Theory)

• Define a cost function for mistakes, e.g.

• Minimize expected loss (risk) over entire p(C,x).

• Sufficient to assure optimal decision for each individual x.

• Result: decide according to maximum posterior probability:

ijjiL δ−=1),(

dxxpxxcCLE

xxcCLEExcCLER

)(]|))(,([

]]|))(,([[))](,([

∫===

iii xCpxcCLxxcCLE

)|())(,(]|))(,([

)|(max)( xCpxc ii

Two Classes, C1, C2

• It is helpful to consider the likelihood ratio:

• Use known priors p(Ci) or ignore them.

• For more elaborate loss function (proof is easy):

• g(x) is called a discriminant function

g(x) ≡p(x | C1)

p(x | C2)≥

l12 − l22

l21 − l11

Discriminant Functions for Multivariate Gaussian Class Conditional Densities

• Two multivariate Gaussians in d dimensions• Since log is monotonic, we can look at log g(x).

)()()(

)|(log)(log 21

1 xgxgCp

Cxpxg −==

Mahalanobis Distance2 superfluous

)(loglog2

ii Cpd

xxxg +Σ−−−Σ−−= − πμμ

Mahalanobis Distance

• iso-distance lines = iso-probability lines

• Decision surface:

)()()( 12ii

Ti xxxd μμ −Σ−= −

decisionsurface

.)()( 22

21 constxdxd =−

Case 1: Σi = 2I

• Discriminant functions…

• …simplify to:

)()()()(

)(log22

22)()(

12221112

Cpxxxxxx

Cpxxxgxg

TTTTTT

+−−

−−−

++−+−+−=

−=−

μμμμ

μμμ

μμμμμμσ

)(loglog2

ii Cpxxxg +Σ−−Σ−−= − μμ

Decision Boundary

μμμμμ

μμμμ

μμμμμ

−−−=−

−−

−−=−−⇒

•If μ2=0, we obtain...

The matched filter! With an expression for the threshold.

Two Signals and Additive White Gaussian Noise

Signal 1

Signal 2

μ1-μ2

μμμμμ

μμμμ

−−−=−

−−

μμμμ−

−−

Case 2: Σi = Σ

• Two classes, 2D measurements, p(x|C) are multivariate Gaussians with equal covariance matrices.

• Derivation is similar– Quadratic term vanishes since it is independent of class

– We obtain a linear decision surface

• Matlab demo

Case 3: General Covariance Matrix

• See transparency

Isn’t this to simple?

• Not at all…

• It is true that images form complicated manifolds (from a pixel point of view, translation, rotation and scaling are all highly non-linear operations)

• The high dimensionality helps

Assume Unknown Class Densitites

• In real life, we do not know the class conditional densities.

• But we do have example data.

• This puts us in the typical machine learning scenario:We want to learn a function, c(x), from examples.

• Why not just estimate class densities from examples and apply the previous ideas?– Learn Gaussian (simple density): in N dimensions need N2 samples

at least!• 10x10 pixels 10,000 examples!

– Avoid estimating densities whenever you can! (too general)– posterior is generally simpler than class conditional (see transparency)

Remember PCA?• Principal components are

eigenvectors of covariance matrix

• Use reconstruction error for recognition (e.g. Eigenfaces)– good

• reduces dimensionality

– bad• no model within subspace

• linearity may be inappropriate

• covariance not appropriate to optimize discrimination

USUCxxN

==−−∑μ

μμ u1

Fisher’s Linear Discriminant

• Goal: Reduce dimensionality before training classifiers etc. (Feature Selection)

• Similar goal as PCA!

• Fisher has classification in mind…

• Find projection directions such that separation is easiest

• Eigenfaces vs. Fisherfaces

Fisher’s Linear Discriminant

• Assume we have n d-dimensional samples x1,…,xn

• n1 from set (class) X1 and n2 from set X2

• we form linear combinations:

• and obtain y1…,yn

• only direction of w is important

xwy T=

Objective for Fisher• Measure the separation as the distance between the means

after projecting (k = 1,2):

• Measure the scatter after projecting:

• Objective becomes to maximize

kYykk mwxw

∑∑∈∈

===11~

∑∈

−=kYy

kk mys 22 )~(~

• We need to make the dependence on w explicit:

• Defining the within-class scatter matrix, SW=S1+S2, we obtain

• Similarly for the separation (between-class scatter matrix)

• Finally we can write

wSwwmxmxw

≡−−=

)(~ 22

wSwss WT=+ 2

wSwwmmmmw

mwmwmm

=−−

=−=−

)()~~(

Fisher’s Solution

• Is called a generalized Rayleigh quotient. Any w that maximizes J must satisfy the generalized eigenvalue problem

• Since SB is very singular (rank 1), and SBw is in the direction of (m1-m2), we are done:

wSwS WB λ=

)( 211 mmSw W −= −

Comments on FLD

• We did not follow Bayes Decision Theory

• FLD is useful for many types of densities

• Fisher can be extended (see demo):– more than one projection direction– more than two clusters

• Let’s try it out: Matlab Demo

Fisher vs. Bayes

• Assume we do have identical Gaussian class densities, then Bayes says:

• while Fisher says:

• Since SW is proportional to the covariance matrix, w is in the same direction in both cases.

• Comforting...

μμ −Σ=

=+−w

)( 211 mmSw W −= −

What have we achieved?

• Found out that maximum posterior strategy is optimal. Always.

• Looked at different cases of Gaussian class densities, where we could derive simple decision rules.

• Gaussian classifiers do reasonable jobs!• Learned about FLD which is useful and often

preferable to PCA.

Just for Fun: Support Vector Machine

• Very fashionable…s.o.t.a?

• Does not model densities

• Fits decision surface directly

• Maximizes margin reduces “complexity”

• Decision surface only depends on nearby samples

• Matlab Demo

Learning Algorithms

Set of functions LearningAlgorithm

Examples:(xi,yi)

p(x,y)

Learnedfunction

y = f(x)f = ?

Assume Unknown Class Densitites

• SVM Examples

• Densitites are hard to estimate -> avoid it– example from Ripley

• Give intuitions on overfitting

• Need to learn– Standard machine learning problem– Training/Test sets

Elements of Pattern Recognition CNS/EE-148 -- Lecture 5 M. Weber P. Perona.

Documents

Transcript of Elements of Pattern Recognition CNS/EE-148 -- Lecture 5 M. Weber P. Perona.

Recognition in Terra Incognita - Foundation · RecognitioninTerra Incognita Sara Beery, Grant Van Horn, and Pietro Perona Caltech {sbeery,gvanhorn,perona}@caltech.edu Abstract. It

One Piece Perona and Nico Robin Cosplay

PERONA Alberto M.: Ensayos Sobre Video Documental y Cine

期末執行成果總覽 · 2012. 5. 25. · 整車安全cns 15499-1, cns 15499-2, cns 15499-3 電池組 cns 15515-1 充電設施cns 15511-2, cns 15511-3

Elements of Pattern Recognition CNS/EE-148 -- Lecture 5 M. Weber P. Perona

5 Pietro Perona, PhD Emmanuel Mignot, MD, PhD , and HHS ...

An Analysis of the Perona-Malik Schemeesedoglu/Papers_Preprints/pm_esedoglu.pdf · An Analysis of the Perona-Malik Scheme ... ANALYSIS OF PERONA-MALIK SCHEME 3 ... Some previous mathematical

Falls Reduction: A CNS Mentoring CNS Student Initiative

TABLE CNS 2A / TABLA CNS 2A AERONAUTICAL MOBILE … CARSAM Table... · 2013. 7. 17. · CAR/SAM FASID IV-CNS 2A-1 Updated/ Actualizada: 24MAR09 TABLE CNS 2A / TABLA CNS 2A AERONAUTICAL

Viola and Jones Object Detector Ruxandra Paun EE/CS/CNS 148 - Presentation 04.28.2005.

CNS 60335-1 (103 CNS 60335-2-40 · 1 250v 600v 71kw 1.cns 60335-1 (103) cns 60335-2-40 (104 ) 2.cns 13783-1 (102) cns 13438 (95 ) 3. cns 3615 102 5.7 5.8 5.9 5.10 4.cns 15663 5 (102)

CNS Prion Disease CNS Toxoplasmosis · •CNS Prion Disease •CNS Toxoplasmosis •CNS Vasculitis Etiology and Pathology Patient presentation and Demographics Types and Stages of

An Introduction to ROSA-ROSSA structure G. Perona giovanni.perona@polito.it.

CNS depressants CNS depressants can be classified into :

First Aid Christophe Perona

Acute and chronic treatments · CNS vasculitis –Treatment What is the evidence ? Secondary CNS vasculitis Specific treatments CNS involvement pejorative prognosis Primary CNS vasculitis

Intro CALTECH 256 Greg Griffin, Alex Holub and Pietro Perona.

June Litmer, MS, CNS-BC Suzanne Brooks, MS, CNS-BC Teresa Ditmer, MS, CNS-BC · 2014-10-01 · June Litmer, MS, CNS-BC Suzanne Brooks, MS, CNS-BC Teresa Ditmer, MS, ... Transversus

Issac Garcia-Munoz Senior Thesis Electrical Engineering Advisor: Pietro Perona

Abdomen: Abdominal - Protiddhoniprotiddhoni.com/AlAmin/Pictoral Paediatric clinical examination.pdf · CNS: Light reflex CNS: Muscle bulk CNS: Patellar clonus CNS: Planter reflex