Large-Scale Face Manifold Learning - Brown...
Transcript of Large-Scale Face Manifold Learning - Brown...
![Page 1: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/1.jpg)
1
Large-Scale Face Manifold Learning
Sanjiv Kumar
Google Research New York, NY
* Joint work with A. Talwalkar, H. Rowley and M. Mohri
![Page 2: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/2.jpg)
2
50 x 50 pixel faces
50 x 50 pixel random images
Space of face images significantly smaller than 2562500
Face Manifold Learning
Want to recover the underlying (possibly nonlinear) space !
ℜ2500
(Dimensionality Reduction)
![Page 3: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/3.jpg)
3
Dimensionality Reduction
• Linear Techniques – PCA, Classical MDS – Assume data lies in a subspace – Directions of maximum variance
• Nonlinear Techniques – Manifold learning methods
• LLE • ISOMAP • Laplacian Eigenmaps
– Assume local linearity of data – Need densely sampled data as input
[Roweis & Saul ’00]
[Tenanbaum et al. ’00]
[Belkin & Niyogi ’01]
Bottleneck: Computational Complexity ≈ O(n3) !
![Page 4: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/4.jpg)
4
Outline
• Manifold Learning – ISOMAP
• Approximate Spectral Decomposition – Nystrom and Column-Sampling approximations
• Large-scale Manifold learning – 18M face images from the web – Largest study so far ~270 K points
• People Hopper – A Social Application on Orkut
![Page 5: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/5.jpg)
5
• Find the low-dimensional representation that best preserves geodesic distances between points
ISOMAP [Tanenbaum et al., ’00]
![Page 6: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/6.jpg)
6
• Find the low-dimensional representation that best preserves geodesic distances between points
ISOMAP [Tanenbaum et al., ’00]
Recovers true manifold asymptotically !
Output co-ordinates Geodesic distance
![Page 7: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/7.jpg)
7
i j
Given n input images:
• Find t nearest neighbors for each image : O(n2)
• Find shortest path distance for every (i, j), Δij : O(n2 log n)
• Construct n × n matrix G with entries as centered Δij
2 – G ~ 18M x 18M dense matrix
• Optimal k reduced dims: Uk Σk1/2
O(n3) ! Eigenvectors Eigenvalues
[Tanenbaum et al., ’00] ISOMAP
![Page 8: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/8.jpg)
8
Spectral Decomposition • Need to do eigen-decomposition of symmetric positive
semi-definite matrix
• For , G ≈ 1300 TB – ~100,000 x 12GB RAM machines
• Iterative methods – Jacobi, Arnoldi, Hebbian – Need matrix-vector products and several passes over data – Not suitable for large dense matrices
• Sampling-based methods – Column-Sampling Approximation – Nystrom Approximation
€
G[ ] n×n
[Golub & Loan, ’83][Gorell, ’06]
Relationship and comparative performance? [Frieze et al., ’98]
[Williams & Seeger, ’00]
O(n3)
![Page 9: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/9.jpg)
9
Approximate Spectral Decomposition
• Sample l columns randomly without replacement
l
C
• Column-Sampling Approximation – SVD of C
• Nystrom Approximation – SVD of W [Frieze et al., ’98]
[Williams & Seeger, ’00][Drineas & Mahony, ’05]
l
![Page 10: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/10.jpg)
10
Column-Sampling Approximation
![Page 11: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/11.jpg)
11
Column-Sampling Approximation
![Page 12: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/12.jpg)
12
Column-Sampling Approximation
O(nl 2) !
O(l 3) !
[n × l ]
[l × l ]
![Page 13: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/13.jpg)
13
Nystrom Approximation
C
l
l
![Page 14: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/14.jpg)
14
Nystrom Approximation l
l
O(l 3) !
C
![Page 15: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/15.jpg)
15
Nystrom Approximation l
l
C
Not Orthonormal !
O(l 3) !
![Page 16: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/16.jpg)
16
Nystrom Vs Column-Sampling
• Experimental Comparison – A random set of 7K face images – Eigenvalues, eigenvectors, and low-rank approximations
[Kumar, Mohri & Talwalkar, ICML ’09]
![Page 17: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/17.jpg)
17
Eigenvalues Comparison
% deviation from exact
![Page 18: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/18.jpg)
18
Eigenvectors Comparison
Principal angle with exact
![Page 19: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/19.jpg)
19
Low-Rank Approximations
Nystrom gives better reconstruction than Col-Sampling !
![Page 20: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/20.jpg)
20
Low-Rank Approximations
![Page 21: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/21.jpg)
21
Low-Rank Approximations
![Page 22: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/22.jpg)
22
Orthogonalized Nystrom
Nystrom-orthogonal gives worse reconstruction than Nystrom !
![Page 23: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/23.jpg)
23
Low-Rank Approximations Matrix Projection
![Page 24: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/24.jpg)
24
Low-Rank Approximations Matrix Projection
![Page 25: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/25.jpg)
25
Low-Rank Approximations Matrix Projection
€
˜ G nys = C ln
W −2⎛
⎝ ⎜
⎞
⎠ ⎟ CTG
€
˜ G col = C CTC( )−1CTG
![Page 26: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/26.jpg)
26
Col-Sampling gives better Reconstruction than Nystrom !
Low-Rank Approximations Matrix Projection
– Theoretical guarantees in special cases [Kumar et al., ICML ’09]
![Page 27: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/27.jpg)
27
How many columns are needed? Columns needed to get 75% relative accuracy
• Sampling Methods – Theoretical analysis of uniform sampling method – Adaptive sampling methods – Ensemble sampling methods
[Deshpande et al. FOCS ’06] [Kumar et al., ICML ’09]
[Kumar et al., AISTATS ’09]
[Kumar et al., NIPS ’09]
![Page 28: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/28.jpg)
28
So Far …
• Manifold Learning – ISOMAP
• Approximate Spectral Decomposition – Nystrom and Column-Sampling approximations
• Large-scale Face Manifold learning – 18 M face images from the web
• People Hopper – A Social Application on Orkut
![Page 29: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/29.jpg)
29
Large-Scale Face Manifold Learning
• Construct Web dataset – Extracted 18M faces from 2.5B internet images – ~15 hours on 500 machines – Faces normalized to zero mean and unit variance
• Graph construction – Exact search ~3 months (on 500 machines) – Approx Nearest Neighbor – Spill Trees (5 NN, ~2 days) – New methods for hashing based kNN search – Less than 5 hours!
[Liu et al., ’04]
[Talwalkar, Kumar & Rowley, CVPR ’08]
[CVPR ’10] [ICML ’10] [ICML ’11]
![Page 30: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/30.jpg)
30
Neighborhood Graph Construction
• Connect each node (face) with its neighbors
• Is the graph connected? – Depth-First-Search to find largest connected component – 10 minutes on a single machine – Largest component depends on number of NN ( t )
![Page 31: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/31.jpg)
31
Samples from connected components
From Largest Component
From Smaller Components
![Page 32: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/32.jpg)
32
Graph Manipulation
• Approximating Geodesics – Shortest paths between pairs of face images – Computing for all pairs infeasible
• Key Idea: Need only a few columns of G for sampling-based decomposition – require shortest paths between a few ( l ) nodes and all
other nodes – 1 hour on 500 machines (l = 10K)
• Computing Embeddings (k = 100) – Nystrom: 1.5 hours, 500 machine – Col-Sampling: 6 hours, 500 machines – Projections: 15 mins, 500 machines
O(n2 log n) !
![Page 33: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/33.jpg)
33
18M-Manifold in 2D
Nystrom Isomap
![Page 34: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/34.jpg)
34
Shortest Paths on Manifold
18M samples not enough!
![Page 35: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/35.jpg)
35
Summary
• Large-scale nonlinear dimensionality reduction using manifold learning on 18M face images
• Fast approximate SVD based on sampling methods
• Open Questions – Does a manifold really exist or data may form clusters in
low dimensional subspaces? – How much data is really enough?
![Page 36: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/36.jpg)
36
People Hopper
• A fun social application on Orkut
• Face manifold constructed with Orkut database – Extracted 13M faces from about 146M profile images – ~3 days on 50 machines – Color face image (40x48 pixels) 5760-dim vector – Faces normalized to zero mean and unit variance in
intensity space
• Shortest path search using bidirectional Dijkstra
• Users can opt-out – Daily incremental graph update
![Page 37: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/37.jpg)
37
People Hopper Interface
![Page 38: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/38.jpg)
38
From the Blogs
![Page 39: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/39.jpg)
39
CMU-PIE Dataset
• 68 people, 13 poses, 43 illuminations, 4 expressions
• 35,247 faces detected by a face detector
• Classification and clustering on poses
![Page 40: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/40.jpg)
40
Clustering • K-means clustering after transformation (k = 100)
– K fixed to be the same as number of classes
• Two metrics Purity - points within a cluster come from the same class Accuracy - points from a class form a single cluster
Matrix G is not guaranteed to be positive semi-definite in Isomap ! - Nystrom: EVD of W (can ignore negative eigenvalues) - Col-sampling: SVD of C (signs are lost) !
![Page 41: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/41.jpg)
41
Optimal 2D embeddings
![Page 42: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/42.jpg)
42
Laplacian Eigenmaps
Minimize weighted distances between neighbors
• Find t nearest neighbors for each image : O(n2)
• Compute weight matrix W:
• Compute normalized laplacian
• Optimal k reduced dims: Uk
O(n3) Bottom eigenvectors of G
[Belkin & Niyogi, ’01]
where
![Page 43: Large-Scale Face Manifold Learning - Brown Universitycs.brown.edu/about/partners/symposia/ipp43/kumar.slides.pdf · Large-Scale Face Manifold Learning • Construct Web dataset –](https://reader033.fdocuments.us/reader033/viewer/2022041610/5e37473ae961b3075e4dd86f/html5/thumbnails/43.jpg)
43
Different Sampling Procedures