CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in...
Transcript of CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in...
![Page 1: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/1.jpg)
CS184A/284AAI in Biology and Medicine
Dimension Reduction and Data Visualization
![Page 2: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/2.jpg)
Visualizing Data using t-SNE
![Page 3: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/3.jpg)
Visualization and Dimensionality Reduction
Intuition behind t-SNE
Visualizing representations
![Page 4: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/4.jpg)
Visualization is key to understand data easily
QuestionIs the relation linear?
![Page 5: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/5.jpg)
![Page 6: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/6.jpg)
Dimensionality Reduction is a helpful tool forvisualization● Dimensionality reduction algorithms
○ Map high-dimensional data to a lower dimension○ While preserving structure
● They are used for○ Visualization○ Performance○ Curse of dimensionality
● A ton of algorithms exist● t-SNE is specialised for visualization● ... and has gained a lot of popularity
![Page 7: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/7.jpg)
Dimensionality Reduction techniques solveoptimization problems
Three approaches for Dimensionality Reduction:
● Distance preservation● Topology preservation● Information preservation
t-SNE is distance-based but tends to preserve topology
![Page 8: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/8.jpg)
SNE computes pair-wise similaritiesSNE converts euclidean distances to similarities, that can be interpreted as probabilities.
Hence the name Stochastic Neighbor Embedding...
![Page 9: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/9.jpg)
Pair-wise similarities should stay the same
![Page 10: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/10.jpg)
Kullback-Leiber Divergence measures thefaithfulness with which qj|i models pj|i
![Page 11: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/11.jpg)
Some remaining questions
Why radial basis function (exponential)?
2. Why probabilities?
3. How do you choose i?
![Page 12: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/12.jpg)
Why radial basis function (exponential)?Focus on local geometry.
This is why t-SNE can be interpreted as topology-based
![Page 13: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/13.jpg)
Why probabilities?
Small distance does not mean proximity on manifold.
Probabilities are appropriate to model this uncertainty
![Page 14: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/14.jpg)
How do you choose σi?
![Page 15: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/15.jpg)
The entropy of p increases with σi
Entropy
H(p) = -Σi pi log2 pi
![Page 16: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/16.jpg)
Perplexity, a smooth measure of the # of neighbors.Perplexity
Perp(P) = 2H(P)
![Page 17: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/17.jpg)
![Page 18: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/18.jpg)
The "Crowding problem"
![Page 19: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/19.jpg)
Mismatched Tails can Compensate for MismatchedDimensionalities
![Page 20: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/20.jpg)
Last but not least: Optimization
![Page 21: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/21.jpg)
![Page 22: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/22.jpg)
Visualizing representations
![Page 23: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/23.jpg)
Mapping raw data to distributed representations● Feature engineering is often laborious.● New tendency is to automatically learn adequate features or representations.● Ultimate goal: enable AI to extract useful features from raw sensory data.
● t-SNE can be used to make sense of the learned representations!
TextImagesOther sensory inputs
High dimensional vectors
![Page 24: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/24.jpg)
Using t-SNE to explore a Word embedding
![Page 25: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/25.jpg)
Explore a Wikipedia article embedding.
![Page 26: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/26.jpg)
Exploring game state representations.Google Deepmind plays Atari games.
● A representation is learned with a convolutional neural network
● From 84x84x4 = 28.224 pixel values to 512 neurons.
● Predicts expected score if a certain action is taken.
![Page 27: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/27.jpg)
Exploring game state representations.Google Deepmind plays Atari games.
![Page 28: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/28.jpg)
Using t-SNE to explore image representations.Classifying dogs and cats.
Each data point is an image of a dog or a catred = cats, blue = dogs
![Page 29: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/29.jpg)
Using t-SNE to explore image representations.Classifying dogs and cats.
RepresentationConvolutional net trained for Image Classification (1000 classes)
https://indico.io/blog/visualizing-with-t-sne/
![Page 30: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/30.jpg)
Using t-SNE to explore image representations.Classifying dogs and cats.
RepresentationConvolutional net trained for Image Classification (1000 classes)
https://indico.io/blog/visualizing-with-t-sne/
![Page 31: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/31.jpg)
Conclusion● The t-SNE algorithm reduces dimensionality while preserving local similarity.
● The t-SNE algorithm has been build heuristically.
● t-SNE is commonly used to visualize representations.
![Page 32: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/32.jpg)
AcknowledgementSimon Carbonnelle for slides
![Page 33: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data](https://reader035.fdocuments.us/reader035/viewer/2022081404/5f04c1717e708231d40f8b87/html5/thumbnails/33.jpg)