Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction...
-
Upload
anthony-pierce -
Category
Documents
-
view
230 -
download
2
Transcript of Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction...
![Page 1: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/1.jpg)
Manifold learningJan Kamenický
![Page 2: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/2.jpg)
Many features ⇒ many dimensions
Dimensionality reduction◦ Feature extraction (useful representation)◦ Classification◦ Visualization
Nonlinear dimensionality reduction
![Page 3: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/3.jpg)
WhaT maniFold?◦ Low dimensional embedding of high dimensional
data lying on a smooth nonlinear manifold
Linear methods fail◦ i.e. PCA
Manifold learning
![Page 4: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/4.jpg)
Unsupervised methods◦ Without any a priori knowledge
ISOMAPs◦ Isometric mapping
LLE◦ Locally linear embedding
Manifold learning
![Page 5: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/5.jpg)
Core idea◦ Use geodesic distances on the manifold instead of
Euclidean
Classical MDS◦ Maps data to the lower dimensional space
ISOMAP
![Page 6: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/6.jpg)
Select neighbours◦ K-nearest neighbours◦ ε-distance neighbourhood
Create weighted neighbourhood graph◦ Weights = Euclidean distances
Estimate the geodesic distances as shortest paths in the weighted graph◦ Dijkstra’s algorithm
Estimating geodesic distances
![Page 7: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/7.jpg)
Dijkstra’s algorithm 1) Set distances (0 for initial, ∞ for all other nodes),
set all nodes as unvisited 2) Select unvisited node with smallest distance as
active 3) Update all unvisited neighbours of the active
node (if the computed distance is smaller) 4) Mark active node as visited (it has now minimal
distance), repeat from 2) as necessary
![Page 8: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/8.jpg)
Time complexity◦ O(|E|dec+|V|min)
Implementation◦ Sparse edges◦ Fibonacci heap as a priority queue◦ O(|E|+|V|log|V|)
Geodesic distances in ISOMAP◦ O(N2logN)
Dijkstra’s algorithm
![Page 9: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/9.jpg)
Input◦ Dissimilarities (distances)
Output◦ Data in a low-dimensional embedding, with
distances corresponding to the dissimilarities
Many types of MDS◦ Classical◦ Metric / non-metric (number of dissimilarity
matrices, symmetry, etc.)
Multidimensional scaling (MDS)
![Page 10: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/10.jpg)
Quantitative similarity Euclidean distances (output) One distance matrix (symmetric)
Minimizing the stress function
Classical MDS
![Page 11: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/11.jpg)
We can optimize directly◦ Compute double-centered distance matrix
◦ Note:
◦ Perform SVD of B
◦ Compute final data
Classical MDS
![Page 12: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/12.jpg)
Covariance matrix
Projection of centered X onto eigenvectors of NS (result of the PCA of X)
MDS and PCA correspondence
![Page 13: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/13.jpg)
ISOMAP
![Page 14: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/14.jpg)
ISOMAP
![Page 15: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/15.jpg)
How many dimensions to use?◦ Residual variance
Short-circuiting◦ Too large neigbourhood (not enough data)◦ Non-isometric mapping◦ Totally destroys the final embedding
ISOMAP
![Page 16: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/16.jpg)
Conformal ISOMAP◦ Modified weights in geodesic distance estimate:
◦ Magnifies regions with high density◦ Shrinks regions with low density
ISOMAP modifications
![Page 17: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/17.jpg)
C-ISOMAP
![Page 18: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/18.jpg)
Landmark ISOMAP◦ Use only geodesic distances from several
landmark points (on the manifold)◦ Use Landmark-MDS for finding the embedding
Involves triangulation of non-landmark data◦ Significantly faster, but higher chance for “short-
circuiting”, number of landmarks has to be chosen carefully
ISOMAP modifications
![Page 19: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/19.jpg)
Kernel ISOMAP◦ Ensures that the B (double-centered distance
matrix) is positive semidefinite by constant-shifting method
ISOMAP modifications
![Page 20: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/20.jpg)
Core idea◦ Estimate each point
as a linear combination of it’s neighbours – find best such weights
◦ Same linear representation will hold in the low dimensional space
Locally linear embedding
![Page 21: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/21.jpg)
Find weights Wij by constrained minimization
Neighbourhood preserving mapping
LLE
![Page 22: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/22.jpg)
Low dimensional representation Y
We take eigenvectors of M corresponding to its q+1 smallest eigenvalues
Actually, different algebra is used to improve numeric stability and speed
LLE
![Page 23: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/23.jpg)
LLE
![Page 24: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/24.jpg)
LLE
![Page 25: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/25.jpg)
ISOMAP◦ Preserves global geometric properties (geodesic
distances), especially for faraway points
LLE◦ Preserves local neighbourhood correspondence
only◦ Overcomes non-isometric mapping◦ Manifold is not explicitly required◦ Difficult to estimate q (number of dimensions)
ISOMAP vs LLE
![Page 26: Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649f505503460f94c72a75/html5/thumbnails/26.jpg)
The end