Fundamentals of Media Processingresearch.nii.ac.jp/~satoh/fmp/FMP_2012_10_23.pdfFisherface method...
Transcript of Fundamentals of Media Processingresearch.nii.ac.jp/~satoh/fmp/FMP_2012_10_23.pdfFisherface method...
Fundamentals of Media Processing
Shin'ichi SatohKazuya KodamaHiroshi MoDuy-Dinh Le
Course web page
http://research.nii.ac.jp/~satoh/fmp/
Today's topics
Face detection and recognition PCA and Eigenface method LDA and Fisherface method Relation to Bayes decision rule assuming
normal distribution Application of Eigenvalue decomposition Searching document by the latent semantic
information: Latent Semantic Indexing (LSI) Estimating 3-D structure from video:
Factorization method
Face detection and recognition
Assume that we have a set of face images (with identity)
Face detection: decide if given unknown image is face or not
Face recognition: decide the identity of given face image
Image as intensity data
Image conversion
original gray scale crop
discretize quantize
x=[x1 x2 x3 ... xn]
The Space of Faces
An image is a point in a high dimensional space An N x M image is a point in RNM
We can define vectors in this space as we did in the 2D case
+=
[Thanks to Chuck Dyer, Steve Seitz, Nishino]
Multivariate normal distribution
We typically assume normal distribution Let's assume p(x|non-face) yields uniform
distribution, and p(x|face) yields normal distribution
multivariate normal distribution
)()(
21exp
)2(1)( 1T
2/12/μxμxx
dp
Normal distribution
Eigenface method
Key idea: observe distribution of face images in the image space
find principal components by principal component analysis (PCA)
project unknown images onto the space spanned by the obtained principal components
face detection: decide face if it's close enough to the space
face recognition: decide the identity of the closest face image within the space
Eigenface method
Assume we have samples of faces:X=[x1 x2 x3 ... xn]
We can then obtain covariance matrix:
n
in 1
1E ixxμ
TT ))((1)(E μμμμ)(xx XXn
--
Eigenface method
We then apply eigenvalue decomposition to the covariance matrix
where λi are eigenvalues and φi are eigenvectors Retain m eigenvectors (φi , i=1... m) corresponding
to the m largest eigenvalues Then each image x can be converted into m-
dimensional vector
iii
m1
)(T μxx'
Covariance matrix and its algebraic/geometric interpretation What is the quadratic form ?
x1
x2x3
y1 y2 y3
μ
φ
Covariance matrix and its algebraic/geometric interpretation How to maximize wrt ?
Lagrange multipliers method
very good exercise. pls try!
J. M. Rehg © 2002
Principle Component Analysis
PCA is an orthonormal projection of a random vector X onto a lower-dimensional subspace Y that minimizes mean square error.
YUXeXUY
uuuU
uuΣXXΣ
,
},{
21
21
Tm
n
iiiT
nm
E
1x
2x1y
2y
J. M. Rehg © 2002
Principle Component Analysis
Equivalently, PCA yields a distribution for Y with Uncorrelated components Maximum variance
YUXeXUY
uuuU
uuΣXXΣ
,
},{
21
21
Tm
n
iiiT
nm
E
1x
2x1y
2y
J. M. Rehg © 2002
Classification Using PCA
Detection of faces based on distance from face spaceRecognition of faces based on distance within face space
YUXeXUY
uuuU
uuΣXXΣ
,
},{
21
21
Tm
n
iiiT
nm
E
1x
2x1y
2y
?DTe
?Ri Tyy
Eigenfaces PCA extracts the eigenvectors of A
Gives a set of vectors v1, v2, v3, ... Each one of these vectors is a direction in face space
what do these look like?
Projecting onto the Eigenfaces
The eigenfaces v1, ..., vK span the space of faces
A face is converted to eigenface coordinates by
Fisherface method
Eigenface finds principal components which maximize the variance of face distribution
What happens if we use identity information? it would be reasonable idea to maximize the
variance between different people, while minimize the variance within the same people
find such components by Linear Discriminant Analysis (LDA)
project unknown images onto the space spanned by the obtained components
face recognition (only): decide the identity of the closest face image within the space
Fisherface -2
Poor Projection Good Projection
FisherFace - 3
N Sample images: C classes:
Average of each class:
Total average:
Nxx ,,1
c ,,1
ikx
ki
i xN
1
N
kkx
N 1
1
FisherFace - 4
Scatter of class i: Tikx
iki xxSik
c
iiW SS
1
c
i
TiiiBS
1
BWT SSS
Within class scatter:
Between class scatter:
Total scatter:
FisherFace - 5
After projection:
Between class scatter (of y’s):Within class scatter (of y’s):
kT
k xWy
WSWS BT
B ~
WSWS WT
W ~
FisherFace - 6
Good separation
2S
1S
BS
21 SSSW
FisherFace - 7
The wanted projection:
WSW
WSW
SS
WW
TB
T
W
Bopt WW
max arg~~
max arg
miwSwS iWiiB ,,1
How is it found ?
Experimental Results - 1
Variation in Facial Expression, Eyewear, and Lighting
Input: 160 images of 16 people Train: 159 images Test: 1 image
With glasses
Without glasses
3 Lighting conditions
5 expressions
Experimental Results - 2
Normal distribution
Singular Value Decomposition
orthogonalorthogonal
diagonal
Some Properties of SVD
Some Properties of SVD
• That is, Ak is the optimal approximation in terms of the approximation error measured by the Frobenius norm, among all matrices of rank k
• Forms the basics of LSI (Latent Semantic Indexing) in informational retrieval
Low rank approximation by SVD
Applications of SVD
Pseudoinverse Range, null space and rank Matrix approximation Other examples
http://en.wikipedia.org/wiki/Singular_value_decomposition
LSI (Latent Semantic Indexing)
Introduction Latent Semantic Indexing
LSI Query Updating
An example
Problem Introduction
Traditional term-matching method doesn’t work well in information retrieval
We want to capture the concepts instead of words. Concepts are reflected in the words. However, One term may have multiple meaning Different terms may have the same meaning.
LSI (Latent Semantic Indexing)
LSI approach tries to overcome the deficiencies of term-matching retrieval by treating the unreliability of observed term-document association data as a statistical problem.
The goal is to find effective models to represent the relationship between terms and documents. Hence a set of terms, which is by itself incomplete and unreliable, will be replaced by some set of entities which are more reliable indicants.
LSI, the Method
Document-Term M Decompose M by SVD. Approximating M using truncated SVD
LSI, the Method (cont.)
Each row and column of A gets mapped into the k-dimensional LSI space, by the SVD.
Query
A query q is also mapped into this space, by
Compare the similarity in the new space Intuition: Dimension reduction through LSI
brings together “related” axes in the vector space.
1 kkT
k Uqq
Example
Example (cont.)
Example (cont. Mapping)
Example (cont. Query)
Query: Application and Theory
Example (cont. Query)
How to set the value of k?
LSI is useful only if k << n. If k is too large, it doesn't capture the
underlying latent semantic space; if k is too small, too much is lost.
No principled way of determining the best k.
How well does LSI work?
Effectiveness of LSI compared to regular term-matching depends on nature of documents. Typical improvement: 0 to 30% better precision. Advantage greater for texts in which synonymy and ambiguity
are more prevalent. Best when recall is high.
Costs of LSI might outweigh improvement. SVD is computationally expensive; limited use for really large
document collections Inverted index not possible
References
Mini tutorial on the Singular Value Decomposition http://www.cs.brown.edu/research/ai/dynamics/tutorial/Post
script/SingularValueDecomposition.ps
Basics of linear algebra http://www.stanford.edu/class/cs229/section/section_lin_alg
ebra.pdf
Factorization
C. Tomasi and T. Kanade. Shape and motion from image streams under orthography---a factorization method. International Journal on Computer Vision, 9(2):137-154, November 1992. also as Technical Report TR-92-1270, Cornell University, March 1992.
Motivation
Given: an image sequence of a particular object
Automatically extract the 3-D structure of the object
Useful for object identification, 3-D model construction for CG, CAD, etc.
Example
First Step: Feature Point Selection and Tracking
Feature Point SelectionFeature Point Tracking
Feature Point Tracking
Select good points to track We regard a given image sequence as the
following spatio-temporal function: Pixel correspondence:
If we define:
...
),,( tyxI
),,(),,( tyxItyxI
),,()( tyxIJ x),,()( tyxII dx
),,( tyxI ),,( tyxI
),( d
The problem is then to find that minimizes:d
W
wJI xxdx d)]()([ 2
W
If we assume that the displacement vector is small,
dgxdx )()( II
where ),(yI
xI
g
then what we need to solve becomes:ed G
W
wG xgg dT W
wJI xge d)(where and
Feature Point Selection
Good feature points should provide stable solution for tracking: from
G should have numerically stable inverse Assume two eigenvalues of G be The criterion for feature point selection is:
d ed G
21,
),min( 21
Measurement Matrix
f-th frame (f+1)-th framep-th point p-th point),( fpfp vu ),( )1()1( pfpf vu
)(
)(~~~
fpfp
fpfp
vv
uuVUW
P
F
F
f=1,…,Fp=1,…,P
The Rank Theorem
The Rank Theorem
pT
ffpfp
pT
ffpfp
vv
uu
sj
si
under orthographic projection,
P
F
FRSW ss
j
ji
i
1
T
T1
T
T1
~
thus
The Rank Theorem
How we decompose ? by using the singular value decomposition
(SVD)
then
W~
BAW ~
BS
AR2/1
2/1
ˆ
ˆ
Results
Results
Results
Schedule (topics subject to change) 10/9(today) orientation, Bayes decision theory, probability
distribution, normal distribution 10/16 random vector, linear algebra, orthogonal expansions,
principal component analysis(PCA) 10/23 Applications of Eigenvalue decomposition (or Singular Value
Decomposition): Latent Semantic Indexing and the Factorization method
10/30 no class 11/6 advanced pattern recognition #1 11/13 advanced pattern recognition #2 11/20 Nonparametric methods (Parzen windows, k-nearest neighbor
estimate, k-nearset neighbor classification) 11/27 advanced pattern recognition #3 12/4 no class 12/11 Nonparametric methods contd., Clustering (k-Means,
agglomerative hierarchical clustering)
Schedule cont'd
12/18 advanced pattern recognition #4 12/25 advanced pattern recognition #5 2013/1/8 no class 1/15 (Fundamentals of) Signal processing 1/22 (Fundamentals of) Image processing 1/29 3-D/4-D Image processing 2/5 Project 2/12 Discussion