Large-Scale Face Manifold Learning

Sanjiv Kumar

Google Research New York, NY

* Joint work with A. Talwalkar, H. Rowley and M. Mohri

50 x 50 pixel faces

50 x 50 pixel random images

Space of face images significantly smaller than 2562500

Face Manifold Learning

Want to recover the underlying (possibly nonlinear) space !

ℜ2500

(Dimensionality Reduction)

Dimensionality Reduction

•  Linear Techniques –  PCA, Classical MDS –  Assume data lies in a subspace –  Directions of maximum variance

•  Nonlinear Techniques –  Manifold learning methods

•  LLE •  ISOMAP •  Laplacian Eigenmaps

–  Assume local linearity of data –  Need densely sampled data as input

[Roweis & Saul ’00]

[Tenanbaum et al. ’00]

[Belkin & Niyogi ’01]

Bottleneck: Computational Complexity ≈ O(n3) !

Outline

•  Manifold Learning –  ISOMAP

•  Approximate Spectral Decomposition –  Nystrom and Column-Sampling approximations

•  Large-scale Manifold learning –  18M face images from the web –  Largest study so far ~270 K points

•  People Hopper – A Social Application on Orkut

•  Find the low-dimensional representation that best preserves geodesic distances between points

ISOMAP [Tanenbaum et al., ’00]

•  Find the low-dimensional representation that best preserves geodesic distances between points

ISOMAP [Tanenbaum et al., ’00]

Recovers true manifold asymptotically !

Output co-ordinates Geodesic distance

Given n input images:

•  Find t nearest neighbors for each image : O(n2)

•  Find shortest path distance for every (i, j), Δij : O(n2 log n)

•  Construct n × n matrix G with entries as centered Δij

2 –  G ~ 18M x 18M dense matrix

•  Optimal k reduced dims: Uk Σk1/2

O(n3) ! Eigenvectors Eigenvalues

[Tanenbaum et al., ’00] ISOMAP

Spectral Decomposition •  Need to do eigen-decomposition of symmetric positive

semi-definite matrix

•  For , G ≈ 1300 TB –  ~100,000 x 12GB RAM machines

•  Iterative methods –  Jacobi, Arnoldi, Hebbian –  Need matrix-vector products and several passes over data –  Not suitable for large dense matrices

•  Sampling-based methods –  Column-Sampling Approximation –  Nystrom Approximation

G[ ] n×n

[Golub & Loan, ’83][Gorell, ’06]

Relationship and comparative performance? [Frieze et al., ’98]

[Williams & Seeger, ’00]

Approximate Spectral Decomposition

•  Sample l columns randomly without replacement

•  Column-Sampling Approximation – SVD of C

•  Nystrom Approximation – SVD of W [Frieze et al., ’98]

[Williams & Seeger, ’00][Drineas & Mahony, ’05]

Column-Sampling Approximation

O(nl 2) !

O(l 3) !

[n × l ]

[l × l ]

Nystrom Approximation

Nystrom Approximation l

O(l 3) !

Nystrom Approximation l

Not Orthonormal !

O(l 3) !

Nystrom Vs Column-Sampling

•  Experimental Comparison –  A random set of 7K face images –  Eigenvalues, eigenvectors, and low-rank approximations

[Kumar, Mohri & Talwalkar, ICML ’09]

Eigenvalues Comparison

% deviation from exact

Eigenvectors Comparison

Principal angle with exact

Low-Rank Approximations

Nystrom gives better reconstruction than Col-Sampling !

Low-Rank Approximations

Orthogonalized Nystrom

Nystrom-orthogonal gives worse reconstruction than Nystrom !

Low-Rank Approximations Matrix Projection

˜ G nys = C ln

W −2⎛

⎝ ⎜

⎠ ⎟ CTG

˜ G col = C CTC( )−1CTG

Col-Sampling gives better Reconstruction than Nystrom !

Low-Rank Approximations Matrix Projection

–  Theoretical guarantees in special cases [Kumar et al., ICML ’09]

How many columns are needed? Columns needed to get 75% relative accuracy

•  Sampling Methods –  Theoretical analysis of uniform sampling method –  Adaptive sampling methods –  Ensemble sampling methods

[Deshpande et al. FOCS ’06] [Kumar et al., ICML ’09]

[Kumar et al., AISTATS ’09]

[Kumar et al., NIPS ’09]

So Far …

•  Manifold Learning –  ISOMAP

•  Approximate Spectral Decomposition –  Nystrom and Column-Sampling approximations

•  Large-scale Face Manifold learning –  18 M face images from the web

•  People Hopper – A Social Application on Orkut

Large-Scale Face Manifold Learning

•  Construct Web dataset –  Extracted 18M faces from 2.5B internet images –  ~15 hours on 500 machines –  Faces normalized to zero mean and unit variance

•  Graph construction –  Exact search ~3 months (on 500 machines) –  Approx Nearest Neighbor – Spill Trees (5 NN, ~2 days) –  New methods for hashing based kNN search –  Less than 5 hours!

[Liu et al., ’04]

[Talwalkar, Kumar & Rowley, CVPR ’08]

[CVPR ’10] [ICML ’10] [ICML ’11]

Neighborhood Graph Construction

•  Connect each node (face) with its neighbors

•  Is the graph connected? –  Depth-First-Search to find largest connected component –  10 minutes on a single machine –  Largest component depends on number of NN ( t )

Samples from connected components

From Largest Component

From Smaller Components

Graph Manipulation

•  Approximating Geodesics –  Shortest paths between pairs of face images –  Computing for all pairs infeasible

•  Key Idea: Need only a few columns of G for sampling-based decomposition –  require shortest paths between a few ( l ) nodes and all

other nodes –  1 hour on 500 machines (l = 10K)

•  Computing Embeddings (k = 100) –  Nystrom: 1.5 hours, 500 machine –  Col-Sampling: 6 hours, 500 machines –  Projections: 15 mins, 500 machines

O(n2 log n) !

18M-Manifold in 2D

Nystrom Isomap

Shortest Paths on Manifold

18M samples not enough!

Summary

•  Large-scale nonlinear dimensionality reduction using manifold learning on 18M face images

•  Fast approximate SVD based on sampling methods

•  Open Questions –  Does a manifold really exist or data may form clusters in

low dimensional subspaces? –  How much data is really enough?

People Hopper

•  A fun social application on Orkut

•  Face manifold constructed with Orkut database –  Extracted 13M faces from about 146M profile images –  ~3 days on 50 machines –  Color face image (40x48 pixels) 5760-dim vector –  Faces normalized to zero mean and unit variance in

intensity space

•  Shortest path search using bidirectional Dijkstra

•  Users can opt-out – Daily incremental graph update

People Hopper Interface

From the Blogs

CMU-PIE Dataset

•  68 people, 13 poses, 43 illuminations, 4 expressions

•  35,247 faces detected by a face detector

•  Classification and clustering on poses

Clustering •  K-means clustering after transformation (k = 100)

–  K fixed to be the same as number of classes

•  Two metrics Purity - points within a cluster come from the same class Accuracy - points from a class form a single cluster

Matrix G is not guaranteed to be positive semi-definite in Isomap ! - Nystrom: EVD of W (can ignore negative eigenvalues) - Col-sampling: SVD of C (signs are lost) !

Optimal 2D embeddings

Laplacian Eigenmaps

Minimize weighted distances between neighbors

•  Find t nearest neighbors for each image : O(n2)

•  Compute weight matrix W:

•  Compute normalized laplacian

•  Optimal k reduced dims: Uk

O(n3) Bottom eigenvectors of G

[Belkin & Niyogi, ’01]

Different Sampling Procedures

Large-Scale Face Manifold Learning - Brown...

Transcript of Large-Scale Face Manifold Learning - Brown...

Large-Scale Face Manifold Learning - Brown...

Documents

Transcript of Large-Scale Face Manifold Learning - Brown...

Large-scale learningpeople.rennes.inria.fr/.../6-large-scale-learning.pptx.pdf · 2012-06-19 · " works well if data lies in low-dimensional manifold (e.g. faces, digits) ! does

Online Contextual Face Recognition: Towards Large Scale Photo Tagging for Sharing

Large Scale Manifold Transduction Michael Karlen Jason Weston Ayse Erkan Ronan Collobert ICML 2008.

Subspace Indexing on Grassmannian Manifold for Large Scale ...users.eecs.northwestern.edu/~zli/new_home/pub/gsu.grassmann.tal… · Subspace Indexing on Grassmannian Manifold for

Conceptual Navigation and Multiple Scale Narration in a ...cid.nada.kth.se/pdf/cid_52.pdf · Conceptual Navigation and Multiple Scale Narration in a Knowledge Manifold Ambjörn Naeve

Group Sampling for Scale Invariant Face Detectionopenaccess.thecvf.com/.../Ming_Group_Sampling_for_Scale_Invariant… · confront real-world face detection, extreme scale variations

ENGINEERING STANDARD FOR CONTROL VALVES SECOND … · 2015. 5. 27. · CONTROL VALVE MANIFOLD DESIGN ... ANSI/ISA 75.08.06 “Face-to-Face Dimensions for Flanged Globe-Style Control

Robust Open-Set Face Recognition for Small-Scale ... · PDF fileRobust Open-Set Face Recognition for Small-Scale Convenience Applications Hua Gao, Haz m Kemal Ekenel, Rainer Stiefelhagen

Small-scale farmers in China in the face of modernisation ...pubs.iied.org/pdfs/16515IIED.pdf · Small-scale farmers in China in the face of modernisation and globalisation Jikun

Blind Face Restoration via Deep Multi-scale Component Dictionariescslzhang/paper/conf/ECCV... · 2020-07-31 · Blind Face Restoration via Deep Multi-scale Component Dictionaries

Subspace Indexing on Grassmannian Manifold for …zli/new_home/pub/mit-ll.grass...Subspace Indexing on Grassmannian Manifold for Large Scale Visual Recognition Zhu Li Media Networking

A Large-Scale, Time-Synchronized Visible and Thermal Face ......A Large-Scale, Time-Synchronized Visible and Thermal Face Dataset Domenick Poster1,∗ Matthew Thielke2 Robert Nguyen2,3

Subspace Indexing on Grassmannian Manifold for Large Scale ...users.ece.northwestern.edu/~zli/new_home/pub/nu.grassmann.talk.1… · Subspace Indexing on Grassmannian Manifold for

Large-Scale Face Manifold Learning - Brown University · PDF fileLarge-Scale Face Manifold Learning Sanjiv Kumar ... A Social Application on Orkut . 5 ... largeScaleManifoldLearning_Brown.ppt

Sub-center ArcFace: Boosting Face Recognition by Large-scale … · 2020. 8. 12. · Sub-center ArcFace: Boosting Face Recognition by Large-scale Noisy Web Faces Jiankang Deng * 1

5. Face Recognition Using Eigen Features of Multi-Scale ...shodhganga.inflibnet.ac.in/bitstream/10603/3478/15/15_chapter 5.pdf · 5. Face Recognition Using Eigen Features of Multi-Scale

HIGH DIMENSIONAL MANIFOLD TOPOLOGY THEN AND NOWv1ranick/slides/orsay.pdf · 2006-12-01 · 2 Time scale I 1905 Manifold duality (Poincar´e) I 1944 Embeddings (Whitney) I 1952 Transversality,

Large-Scale, Real-World Face Recognition in Movie Trailers Alan Wright.

Kinetic Data for Large Scale Analysis and Modeling of Face ... · Kinetic Data for Large-Scale Analysis and Modeling of Face-to-Face Conversation Jonas Beskow, Simon Alexanderson,

Face recognition on large-scale video in the wild with hybrid ...vipl.ict.ac.cn/homepage/rpwang/publications/Face...Face recognition on large-scale video in the wild with hybrid Euclidean-and-Riemannian