Auto-Encoders and PCA, a brief psychological background

93
Auto-Encoders and PCA, a brief psychological background Self-taught Learning

description

A Psychological background on how we think and store memory to explain the motivation behind the Autoencoders and then comparing the performance, in terms of reconstruction error, of the PCA against the Autoencoders.

Transcript of Auto-Encoders and PCA, a brief psychological background

Page 1: Auto-Encoders and PCA, a brief psychological background

Auto-Encoders and PCA, a brief psychological background Self-taught Learning

Page 2: Auto-Encoders and PCA, a brief psychological background

• How do Humans Learn? And why not replicating that?

• How do babies think?

Long Term

Slide 2 of 77

Page 3: Auto-Encoders and PCA, a brief psychological background

• “We might expect that babies would have really powerful learning mechanisms. And in fact, the baby's brain seems to be the most powerful learning computer on the planet.

• But real computers are actually getting to be a lot better. And there's been a revolution in our understanding of machine learning recently. And it all depends on the ideas of this guy, the Reverend Thomas Bayes, who was a statistician and mathematician in the 18th century.” Alison Gopnik is an American professor of

psychology and affiliate professor of philosophy

at the University of California, Berkeley.

How do babies think

Slide 3 of 77

Page 4: Auto-Encoders and PCA, a brief psychological background

• “And essentially what Bayes did was to provide a mathematical way using probability theory to characterize, describe, the way that scientists find out about the world.

• So what scientists do is they have a hypothesis that they think might be likely to start with. They go out and test it against the evidence.

• The evidence makes them change that hypothesis. Then they test that new hypothesis and so on and so forth.”

Alison Gopnik is an American professor of

psychology and affiliate professor of philosophy

at the University of California, Berkeley.

How do babies think

Slide 4 of 77

Page 5: Auto-Encoders and PCA, a brief psychological background

• 𝑃 𝜔 𝑋 ∝ 𝑃 𝑋 𝜔 ∗ 𝑃(𝜔)

• Posterior ∝ Likelihood * Prior

• If this is how our brain work, why not continue in this way !

Bayes’ Theorem

Slide 5 of 77

Page 6: Auto-Encoders and PCA, a brief psychological background

• 𝑃 𝜔 𝑋 ∝ 𝑃 𝑋 𝜔 ∗ 𝑃(𝜔)

Bayes’ Theorem – Issues

Slide 6 of 77

Page 7: Auto-Encoders and PCA, a brief psychological background

• 𝑃 𝜔 𝑋 ∝ 𝑃 𝑋 𝜔 ∗ 𝑃(𝜔)

• To build the likelihood, we need tons of data (The Law of Large Numbers)

Bayes’ Theorem – Issues

Slide 6 of 77

Page 8: Auto-Encoders and PCA, a brief psychological background

• 𝑃 𝜔 𝑋 ∝ 𝑃 𝑋 𝜔 ∗ 𝑃(𝜔)

• To build the likelihood, we need tons of data (The Law of Large Numbers)

• Not any data, labeled data !

Bayes’ Theorem – Issues

Slide 6 of 77

Page 9: Auto-Encoders and PCA, a brief psychological background

• 𝑃 𝜔 𝑋 ∝ 𝑃 𝑋 𝜔 ∗ 𝑃(𝜔)

• To build the likelihood, we need tons of data (The Law of Large Numbers)

• Not any data, labeled data !

• We need to solve for features.

Bayes’ Theorem – Issues

Slide 6 of 77

Page 10: Auto-Encoders and PCA, a brief psychological background

• 𝑃 𝜔 𝑋 ∝ 𝑃 𝑋 𝜔 ∗ 𝑃(𝜔)

• To build the likelihood, we need tons of data (The Law of Large Numbers)

• Not any data, labeled data !

• We need to solve for features.

• How should we decide on which features to use ?

Bayes’ Theorem – Issues

Slide 6 of 77

Page 11: Auto-Encoders and PCA, a brief psychological background

Vision Example

Slide 11 of 77

Page 12: Auto-Encoders and PCA, a brief psychological background

Vision Example

Slide 12 of 77

Page 13: Auto-Encoders and PCA, a brief psychological background

Vision Example

Slide 13 of 77

Page 14: Auto-Encoders and PCA, a brief psychological background

Vision Example

Slide 14 of 77

Page 15: Auto-Encoders and PCA, a brief psychological background

Vision Example

Slide 15 of 77

Page 16: Auto-Encoders and PCA, a brief psychological background

Feature Representation – Vision

Slide 16 of 77

Page 17: Auto-Encoders and PCA, a brief psychological background

Feature Representation – Audio

Slide 17 of 77

Page 18: Auto-Encoders and PCA, a brief psychological background

Feature Representation – NLP

Slide 18 of 77

Page 19: Auto-Encoders and PCA, a brief psychological background

The “One Learning Algorithm” Hypothesis

Slide 19 of 77

Page 20: Auto-Encoders and PCA, a brief psychological background

The “One Learning Algorithm” Hypothesis

Slide 20 of 77

Page 21: Auto-Encoders and PCA, a brief psychological background

The “One Learning Algorithm” Hypothesis

Slide 21 of 77

Page 22: Auto-Encoders and PCA, a brief psychological background

On Computer Perception

• The Adult visual system computes an incredibly complicated function of the input.

Slide 22 of 77

Page 23: Auto-Encoders and PCA, a brief psychological background

On Computer Perception

• The Adult visual system computes an incredibly complicated function of the input. • We can try to implement most of this incredibly complicated function (hand-

engineer features)

Slide 22 of 77

Page 24: Auto-Encoders and PCA, a brief psychological background

On Computer Perception

• The Adult visual system computes an incredibly complicated function of the input. • We can try to implement most of this incredibly complicated function (hand-

engineer features) • OR, we can learn this function instead.

Slide 22 of 77

Page 25: Auto-Encoders and PCA, a brief psychological background

Self-taught Learning

Slide 23 of 77

Page 26: Auto-Encoders and PCA, a brief psychological background

Self-taught Learning

Slide 23 of 77

Page 27: Auto-Encoders and PCA, a brief psychological background

First Stage of Visual Processing – V1

Slide 24 of 77

Page 28: Auto-Encoders and PCA, a brief psychological background

Feature Learning via Sparse Coding

• , 𝑋(2),…, 𝑋(𝑚) (each in 𝑅𝑛∗𝑛 )

• , Φ2,…, Φ𝑘 (also in 𝑅𝑛∗𝑛 ), so that each input X can be approximately decomposed as:

• 𝑎𝑗𝜑𝑗𝑘𝑗=1 , s.t. 𝑎𝑗 are mostly zero (“sparse”)

Slide 25 of 77

Page 29: Auto-Encoders and PCA, a brief psychological background

Feature Learning via Sparse Coding

• Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection).

• , 𝑋(2),…, 𝑋(𝑚) (each in 𝑅𝑛∗𝑛 )

• , Φ2,…, Φ𝑘 (also in 𝑅𝑛∗𝑛 ), so that each input X can be approximately decomposed as:

• 𝑎𝑗𝜑𝑗𝑘𝑗=1 , s.t. 𝑎𝑗 are mostly zero (“sparse”)

Slide 25 of 77

Page 30: Auto-Encoders and PCA, a brief psychological background

Feature Learning via Sparse Coding

• 1) 𝑋 (1) , 𝑋 (2) 𝑋𝑋 𝑋 (2) (2) 𝑋 (2) ,…, 𝑋 (𝑚) 𝑋𝑋 𝑋 (𝑚) (𝑚𝑚) 𝑋 (𝑚) (each in 𝑅 𝑛∗𝑛 𝑅𝑅 𝑅 𝑛∗𝑛 𝑛𝑛∗𝑛𝑛 𝑅 𝑛∗𝑛 )

• Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection).

• Input: Images 𝑋 ( (1) (1) , 𝑋(2),…, 𝑋(𝑚) (each in 𝑅𝑛∗𝑛 )

• , Φ2,…, Φ𝑘 (also in 𝑅𝑛∗𝑛 ), so that each input X can be approximately decomposed as:

• 𝑎𝑗𝜑𝑗𝑘𝑗=1 , s.t. 𝑎𝑗 are mostly zero (“sparse”)

Slide 25 of 77

Page 31: Auto-Encoders and PCA, a brief psychological background

Feature Learning via Sparse Coding

• Φ 1 1 Φ 1 , Φ 2 Φ Φ 2 2 Φ 2 ,…, Φ 𝑘 Φ Φ 𝑘 𝑘𝑘 Φ 𝑘 (also in 𝑅 𝑛∗𝑛 𝑅𝑅 𝑅 𝑛∗𝑛 𝑛𝑛∗𝑛𝑛 𝑅 𝑛∗𝑛 ), so that each input X can be approximately decomposed as:

• 1) 𝑋 (1) , 𝑋 (2) 𝑋𝑋 𝑋 (2) (2) 𝑋 (2) ,…, 𝑋 (𝑚) 𝑋𝑋 𝑋 (𝑚) (𝑚𝑚) 𝑋 (𝑚) (each in 𝑅 𝑛∗𝑛 𝑅𝑅 𝑅 𝑛∗𝑛 𝑛𝑛∗𝑛𝑛 𝑅 𝑛∗𝑛 )

• Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection).

• Learn: Dictionary of bases Φ 1 , Φ2,…, Φ𝑘 (also in 𝑅𝑛∗𝑛 ), so that each input X can be approximately decomposed as:

• , Φ2,…, Φ𝑘 (also in 𝑅𝑛∗𝑛 ), so that each input X can be approximately decomposed as:

• 𝑎𝑗𝜑𝑗𝑘𝑗=1 , s.t. 𝑎𝑗 are mostly zero (“sparse”)

Slide 25 of 77

Page 32: Auto-Encoders and PCA, a brief psychological background

Feature Learning via Sparse Coding

• 𝑗=1 𝑘 𝑎 𝑗 𝜑 𝑗 𝑗𝑗=1 𝑗=1 𝑘 𝑎 𝑗 𝜑 𝑗 𝑘𝑘 𝑗=1 𝑘 𝑎 𝑗 𝜑 𝑗 𝑎 𝑗 𝑎𝑎 𝑎 𝑗 𝑗𝑗 𝑎 𝑗 𝜑 𝑗 𝜑𝜑 𝜑 𝑗 𝑗𝑗 𝜑 𝑗 𝑗=1 𝑘 𝑎 𝑗 𝜑 𝑗 , s.t. 𝑎 𝑗 𝑎𝑎 𝑎 𝑗 𝑗𝑗 𝑎 𝑗 are mostly zero (“sparse”)

• Φ 1 1 Φ 1 , Φ 2 Φ Φ 2 2 Φ 2 ,…, Φ 𝑘 Φ Φ 𝑘 𝑘𝑘 Φ 𝑘 (also in 𝑅 𝑛∗𝑛 𝑅𝑅 𝑅 𝑛∗𝑛 𝑛𝑛∗𝑛𝑛 𝑅 𝑛∗𝑛 ), so that each input X can be approximately decomposed as:

• 1) 𝑋 (1) , 𝑋 (2) 𝑋𝑋 𝑋 (2) (2) 𝑋 (2) ,…, 𝑋 (𝑚) 𝑋𝑋 𝑋 (𝑚) (𝑚𝑚) 𝑋 (𝑚) (each in 𝑅 𝑛∗𝑛 𝑅𝑅 𝑅 𝑛∗𝑛 𝑛𝑛∗𝑛𝑛 𝑅 𝑛∗𝑛 )

• Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection).

• X ≈ 𝑎𝑗𝜑𝑗𝑘𝑗=1 , s.t. 𝑎𝑗 are mostly zero (“sparse”)

• , Φ2,…, Φ𝑘 (also in 𝑅𝑛∗𝑛 ), so that each input X can be approximately decomposed as:

• 𝑎𝑗𝜑𝑗𝑘𝑗=1 , s.t. 𝑎𝑗 are mostly zero (“sparse”)

Slide 25 of 77

Page 33: Auto-Encoders and PCA, a brief psychological background

Feature Learning via Sparse Coding

Slide 26 of 77

Page 34: Auto-Encoders and PCA, a brief psychological background

Feature Learning via Sparse Coding

Slide 27 of 77

Page 35: Auto-Encoders and PCA, a brief psychological background

Sparse Coding applied to Audio

Slide 28 of 77

Page 36: Auto-Encoders and PCA, a brief psychological background

Learning Features Hierarchy

Slide 29 of 77

Page 37: Auto-Encoders and PCA, a brief psychological background

Learning Features Hierarchy

Slide 30 of 77

Page 38: Auto-Encoders and PCA, a brief psychological background

Features Hierarchy: Trained on face images

Slide 31 of 77

Page 39: Auto-Encoders and PCA, a brief psychological background

Features Hierarchy: Trained on diff. categories

Slide 32 of 77

Page 40: Auto-Encoders and PCA, a brief psychological background

Applications in Machine learning

Slide 33 of 77

Page 41: Auto-Encoders and PCA, a brief psychological background

Phoneme Classification (TIMIT benchmark)

Slide 34 of 77

Page 42: Auto-Encoders and PCA, a brief psychological background

State-of-the-art

Slide 35 of 77

Page 43: Auto-Encoders and PCA, a brief psychological background

Brain Operation Modes

Slide 36 of 77

Page 44: Auto-Encoders and PCA, a brief psychological background

Brain Operation Modes

Slide 37 of 77

• Professor Daniel Khaneman, the Hero of Psychology.

• Won in 2002, the Nobel Prize in economics.

• Now he is teaching psychology in Princeton.

Page 45: Auto-Encoders and PCA, a brief psychological background

Brain Operation Modes

Slide 38 of 77

• What do you see?

• Angry Girl.

Page 46: Auto-Encoders and PCA, a brief psychological background

Brain Operation Modes

Slide 39 of 77

• Now, What do you see?

• Needs effort.

Page 47: Auto-Encoders and PCA, a brief psychological background

Slide 40 of 77

System One System Two

Page 48: Auto-Encoders and PCA, a brief psychological background

System One

Slide 41 of 77

• It’s Automatic

• Perceiving things + Skills =Answer

• It is an intuitive process.

• Intuition is Recognition

Page 49: Auto-Encoders and PCA, a brief psychological background

System One: Memory

Slide 42 of 77

Page 50: Auto-Encoders and PCA, a brief psychological background

System One: Memory

Slide 43 of 77

• By the age of three we all learned that “Big things can’t go inside small things”.

Page 51: Auto-Encoders and PCA, a brief psychological background

System One: Memory

Slide 43 of 77

• By the age of three we all learned that “Big things can’t go inside small things”.

• All of us have tried to save their favorite movie on the computer

and we know that those two hours requires gabs of space.

Page 52: Auto-Encoders and PCA, a brief psychological background

System One: Memory

Slide 44 of 77

Page 53: Auto-Encoders and PCA, a brief psychological background

System One: Memory

Slide 45 of 77

• How do we cram the vast universe of our experience in a relatively small storage compartment between our ears?

Page 54: Auto-Encoders and PCA, a brief psychological background

System One: Memory

Slide 45 of 77

• How do we cram the vast universe of our experience in a relatively small storage compartment between our ears?

• We Cheat !

• Compress memories into critical thread and key features.

• Ex: “Dinner was disappointing”, “Tough Steak”

Page 55: Auto-Encoders and PCA, a brief psychological background

System One: Memory

Slide 45 of 77

• How do we cram the vast universe of our experience in a relatively small storage compartment between our ears?

• We Cheat !

• Compress memories into critical thread and key features.

• Ex: “Dinner was disappointing”, “Tough Steak”

• Later when we want to remember our experience, our brains reweave, and not retrieve, the scenes using the extracted features.

Page 56: Auto-Encoders and PCA, a brief psychological background

System One: Memory

Slide 46 of 77

Daniel Todd Gilbert is Professor of Psychology at Harvard University.

In this experiment two groups of people set down to watch a set of slides, the question group and the now question group. The slides were about two cars approaching a yield sign, one car turns right and then the two cars collide.

Page 57: Auto-Encoders and PCA, a brief psychological background

System One: Memory

Slide 46 of 77

Daniel Todd Gilbert is Professor of Psychology at Harvard University.

In this experiment two groups of people set down to watch a set of slides, the question group and the now question group. The slides were about two cars approaching a yield sign, one car turns right and then the two cars collide.

Page 58: Auto-Encoders and PCA, a brief psychological background

System One: Memory

Slide 46 of 77

Daniel Todd Gilbert is Professor of Psychology at Harvard University.

In this experiment two groups of people set down to watch a set of slides, the question group and the now question group. The slides were about two cars approaching a yield sign, one car turns right and then the two cars collide.

Page 59: Auto-Encoders and PCA, a brief psychological background

System One: Memory

Slide 47 of 77

• The no question group wasn’t asked any questions.

• The question group was asked the following question:

• Did another car pass by the blue car while it stopped at the Stop Sign?

• And then they were asked to pick which set of slides did they see, the one with the yield sign or the one with the stop sign.

Page 60: Auto-Encoders and PCA, a brief psychological background

System One: Memory

Slide 47 of 77

• 90% of the no question group chose the yield sign

• 80% of the question group chose the stop sign

Page 61: Auto-Encoders and PCA, a brief psychological background

System One: Memory

Slide 47 of 77

• 90% of the no question group chose the yield sign

• 80% of the question group chose the stop sign

• The general finding is: our brains compress experiences into key features

and fill in details that were not actually stored. And this is the basic idea behind the auto-encoders

Page 62: Auto-Encoders and PCA, a brief psychological background

Sparse Auto-encoders

Slide 48 of 77

Page 63: Auto-Encoders and PCA, a brief psychological background

• An Auto-encoder neural network is an unsupervised learning algorithm that applies back propagation, on a set of unlabeled training examples {𝑥 1 , 𝑥 2 , 𝑥 4 ,….} where 𝑥 𝑖 ∈ 𝑅𝑛 by setting the target values to be equal to the inputs.[6]

• i.e. it uses 𝑦 𝑖 =𝑥 𝑖

• Original contributions in back propagation was made by Hinton and Hebbian in 1980s and nowadays by Hinton , Salakhutdinov, Bengio, LeCun and Erhan (2006-2010)

Sparse Auto-encoder

Slide 49 of 77

Page 64: Auto-Encoders and PCA, a brief psychological background

• Before we get further into the details of the algorithm, we need to quickly go through neural network.

• To describe neural networks, we will begin by describing the simplest possible neural network. One that comprises a single "neuron." We will use the following diagram to denote a single neuron [5]

Neural Network

Single Neuron [8]

Slide 50 of 77

Page 65: Auto-Encoders and PCA, a brief psychological background

• This "neuron" is a computational unit that takes as input x1,x2,x3 (and a +1 intercept term), and outputs

• ℎ𝑊,𝑏 𝑋 = 𝑓 𝑊𝑇𝑥 = 𝑓( 𝑊𝑖𝑥𝑖 + 𝑏)

3𝑖=1 where 𝑓:ℜ → ℜ is called the activation function.

[5]

Neural Network

Slide 51 of 77

Page 66: Auto-Encoders and PCA, a brief psychological background

• The activation function can be:[8]

1) Sigmoid function : 𝑓 𝑧 =1

1+exp (−𝑧) , output

scale from [0,1]

Sigmoid Activation Function

Sigmoid Function [8]

Slide 52 of 77

Page 67: Auto-Encoders and PCA, a brief psychological background

• 2) Tanh function: : 𝑓 𝑧 = tanh(𝑧)𝑒𝑧−𝑒−𝑧

𝑒𝑧+𝑒−𝑧 ,

output scale from [-1,1]

Tanh Activation Function

Tanh Function [8]

Slide 53 of 77

Page 68: Auto-Encoders and PCA, a brief psychological background

• Neural network parameters are:

• (W,b) = (W(1),b(1),W(2),b(2)), where we write 𝑊𝑖𝑗(𝑙)

to denote the parameter (or

weight) associated with the connection between unit j in layer l, and unit i in layer l+ 1.

• 𝑏𝑖(𝑙)

the bias associated with unit i in layer l + 1.

• 𝑎𝑖(𝑙)

will denote the activation (meaning output value) of unit i in layer l.

• Given a fixed setting of the parameters W, b, our neural network defines a hypothesis hW,b(x) that outputs a real number.

Neural Network Model

Slide 54 of 77

Page 69: Auto-Encoders and PCA, a brief psychological background

Cost Function

Slide 55 of 77

Page 70: Auto-Encoders and PCA, a brief psychological background

• The auto-encoder tries to learn a function ℎ𝑤,𝑏(𝑥) ≈ 𝑥 . In other words, it is trying an approximation to the identity function, so as to output 𝑥^ is similar to 𝑥

• Placing constraints on the network, such as limiting the number of hidden units, or imposing a sparsity constraint on the hidden units, lead to discover interesting structure in the data, even if the number of hidden units is large.

Auto-encoders and Sparsity

Slide 56 of 77

Page 71: Auto-Encoders and PCA, a brief psychological background

• Assumption :

1. The neurons to be inactive most of the time (a neuron to be "active" (or as "firing") if its output value is close to 1, or "inactive" if its output value is close to 0) and the activation function is sigmoid function.

2. Recall that 𝑎𝑗(2)

denotes the activation of hidden unit 𝑗 in layer 2 in the auto-encoder

3. 𝑎𝑗(2)

(x) to denote the activation of this hidden unit when the network is given a specific input 𝑥

4. Let: 𝜌 =1

𝑚 [𝑎𝑗

(2)(𝑥𝑖)] 𝜌 𝑗=𝜌

𝑚𝑖=1 be the average activation unit 𝑗 (averaged over the training set).

• Objective:

• We would like to (approximately) enforce the constraint: 𝜌𝑗 = 𝜌 where 𝜌 is a sparsity parameter, a small value close to zero

Auto-encoders and Sparsity Algorithm

Slide 57 of 77

Page 72: Auto-Encoders and PCA, a brief psychological background

• To achieve this, we will add an extra penalty term to our optimization objective that penalizes : 𝜌𝑗 deviating significantly from 𝜌.

• 𝜌 log𝜌

𝜌𝑗 𝑠2𝑗=1 +(1- 𝜌) log

1−𝜌

1−𝜌𝑗 , here “𝑠2” is the number of neurons in the hidden layer, and the

index 𝑗 is the summing over the hidden units in the network.[6]

• It can also be written 𝐾𝐿(𝜌 || 𝜌𝑗 )𝑠2𝑗=1 where 𝐾𝐿(𝜌 || 𝜌𝑗 ) = 𝜌 log

𝜌

𝜌𝑗 +(1-𝜌) log

1−𝜌

1− 𝜌𝑗 is the

Kullback-Leibler (KL) divergence between a Bernoulli random variable with mean 𝜌 and a Bernoulli random variable with mean 𝜌𝑗 . [6]

• KL-divergence is a standard function for measuring how different two different distributions are.

Autoencoders and Sparsity Algorithm

Slide 58 of 77

Page 73: Auto-Encoders and PCA, a brief psychological background

• Kl penalty function has the following property 𝐾𝐿(𝜌 || 𝜌𝑗 ) =0 if 𝜌𝑗 = 𝜌 and otherwise it increases monotonically as 𝜌𝑗 diverges from 𝜌 .

• For example, if we plotted 𝐾𝐿(𝜌 || 𝜌𝑗 ) for a range of values 𝜌𝑗

• (set 𝜌=0.2), We will see that the KL-divergence reaches its minimum

• of 0 at 𝜌𝑗 = 𝜌 and approach ∞ as 𝜌𝑗 approaches 0 or 1.

• Thus, minimizing this penalty term has the effect of causing 𝜌𝑗

• to close to 𝜌

Auto-encoders and Sparsity Algorithm –cont’d

KL Function

Slide 59 of 77

Page 74: Auto-Encoders and PCA, a brief psychological background

Sparse Auto-encoders Cost Function to minimize

Slide 60 of 77

Page 75: Auto-Encoders and PCA, a brief psychological background

Gradient Checking

Slide 61 of 77

Page 76: Auto-Encoders and PCA, a brief psychological background

• We implemented a sparse auto-encoder, trained with 8×8 image patches using the L-BFGS optimization algorithm

Auto-encoder Implementation

A random sample of 200 patches from the dataset.

Slide 62 of 77

Page 77: Auto-Encoders and PCA, a brief psychological background

Auto-encoder Implementation

Slide 63 of 77

• We have trained it using digits from 0 to 9

Page 78: Auto-Encoders and PCA, a brief psychological background

AutoEncoder Visualization

Slide 64 of 77

Page 79: Auto-Encoders and PCA, a brief psychological background

Auto-encoder Implementation

Slide 65 of 77

• We have trained it with faces.

Page 80: Auto-Encoders and PCA, a brief psychological background

Auto-encoder with PCA flavor

Slide 66 of 77

Eigen Vectors

Pe

rce

nta

ge

of

Var

ian

ce r

eta

ine

d

Page 81: Auto-Encoders and PCA, a brief psychological background

Autoencoder Implementation

Slide 67 of 77

50 100 150

200 300 350

Page 82: Auto-Encoders and PCA, a brief psychological background

Auto-encoder Performance

Slide 68 of 77

Page 83: Auto-Encoders and PCA, a brief psychological background

In Progress Work (Future Results)

• Given the fact of small dataset for facial features

• We train the neural network with a random dataset in hope that the average mean would be a nice base start for the tuning phase of the neural network

• We then fine tune with the smaller dataset of facial features

Slide 69 of 77

Page 84: Auto-Encoders and PCA, a brief psychological background

Wrap up

Slide 70 of 77

Page 85: Auto-Encoders and PCA, a brief psychological background

Slide 71 of 77

[Andrew Ng]

Page 86: Auto-Encoders and PCA, a brief psychological background

• Twitter:

Data - Now

Slide 72 of 77

• Facebook:

Page 87: Auto-Encoders and PCA, a brief psychological background

• Twitter:

Data - Now

Slide 72 of 77

7 terabytes of Data / Day

• Facebook:

Page 88: Auto-Encoders and PCA, a brief psychological background

• Twitter:

Data - Now

Slide 72 of 77

7 terabytes of Data / Day

• Facebook: 500 terabytes of Data / Day

Page 89: Auto-Encoders and PCA, a brief psychological background

• NASA announced its square kilometer telescope.

Data – Tomorrow

Slide 73 of 77

Page 90: Auto-Encoders and PCA, a brief psychological background

• NASA announced its square kilometer telescope.

Data – Tomorrow

Slide 73 of 77

• It will generate 700 terabyte of data every second.

Page 91: Auto-Encoders and PCA, a brief psychological background

• NASA announced its square kilometer telescope.

Data – Tomorrow

Slide 73 of 77

• It will generate 700 terabyte of data every second.

• It will generate data of the same size as the

internet today in two days.

• Do you know how long it is going to take

Google, with all its resources, to just index

data generated from this beast in a year? 3

whole months, 90 days !

Page 92: Auto-Encoders and PCA, a brief psychological background

Slide 74 of 77

[Andrew Ng]

Page 93: Auto-Encoders and PCA, a brief psychological background

Thanks! Q?

Slide 75 of 77