An introduction to Deep Learning

Post on 11-Apr-2017

111 views 3 download

Transcript of An introduction to Deep Learning

An introduction to Deep Learning

Who am I?

• David Rostcheck

• I am a data science consultant

• Follow my articles on LinkedIn


in some tests, Deep Learning has already shown abilities at the same level as humans

These include: • computers that understand natural

language• autonomous vehicles • programs that can identify what is

occurring in a video

It’s notable that

these solutions to diverse problems

in very different fields

use the same powerful technology


a neural net is a


of the brain,

a mathematical abstraction

in the real brain,

the neurons send signals withfre cuen cies

not discrete signals

tools exist that try to simulate the brain in a way that’s

more accurate

to the real brain

Example: Numenta NuPIC, a type of Hierarchical Temporal Memory (HTM)

but the techniques of neural nets

are sufficient

to deliver results

similar or better than humans

in specific cognitive tests


Deep Learning

what is it?

common point of view:

a with

neural distinct

net levels

is correct, but…

there is another point of view,

maybe more useful,

that we are going to present here

it comes from Vincent Vanhoucke, Principal Research Scientist at Google.

the following comes from

his course on Deep

Learning, on Udacity

He thinks about Deep Learning as

a framework for calculating

linear and almost linear

equations in an efficient way

to develop this framework,

we are going to construct a


the simplest (and worst)


but wait a minute…


a classifier?

Because classification (or more generally prediction) is a central technique in Machine Learning

with this, we can achieve ranking, regression, detection, reinforcement learning, and more…

we start with a linear equation, in vector form…

Think about constructing a simple classifier to predict, for each occurrence of X, which is:

to do this, we must learn the values of W and b

Does it work well?


It’s the worst.


there are two problems…

No. 1:

it gives values,

and what we want

are probabilities

we can fix it with the“softmax” function:

we express the correct values in a vector of values 1 (correct) and 0 (the others).

we call this“one-hot encoding”

to evaluate errors, we compare the probabilities with the correct values

using what we call“cross-entropy”

better, but…

there remains the second problem:

our equation is linear

and doesn’t represent non-linear equations well

this problem killed the perceptron (single level neural net)

it doesn’t help to just add levels to the network

because we can represent whatever combination of linear operations as another linear operation – we can reduce the new network to another WX + b with the same problem

What do we do?

without another option,

we have to introduce non-linear


logistic function

but it’s expensive to calculate – we can use a simplified approximation called a “Rectified Linear Unit” , o ReLU

now we can construct our neural net, in a way that’s efficient to calculate

we can express this in a modular way, with a series of linear or almost linear operations with a matrix ... that allows us to us the power of a GPU

this is good, but we are still lacking something…

to improve our estimation, we must minimize the error,

and this requires us to calculate the derivative of the function

think about the chain rule of calculus:

d f(x) = d du f(x)dx du dx

that can convert a derivative into a product (of other derivatives):

that fits in our modular framework

now we have it! a general, modular framework that incorporates everything we need!

and we can construct deep neural nets, adding more levels as we need them

…but wait a minute:

why do we like deep networks?

the most interesting problems,

like language and vision,

have very complex rules

we need a lot of parameters to represent them

yes, but why don’t we use wider networks?

why is it better to have deep ones?

are more efficient and better capture the structure inherent in many problems


the convolutional network, or convnet,

transforms the input

so that the translation

of the input does not matter

we use it for visual recognition

Let’s start with a photo:

We use a region (kernel) of a photo like an input to another small neural net, with K as the output

we slice the window across the photo

this transforms the photo into another new one, with K color channels, and different dimensions

this operation is called

a convolution

if the region (the “kernel”) has

the same size as the original,

what did we obtain?


in this case,

we recover the original photo


?Contact:, twitter: @davidrostcheckArticles: