Intro to Deep Learning

Introduction to Neural Networks and Theano

Kushal AroraKarora@cise.ufl.edu

Outline1. Why Care?

2. Why it works?

3. Logistic Regression to Neural Networks

4. Deep Networks and Issues

5. Autoencoders and Stacked Autoencoders

6. Why Deep Learning Works

7. Theano Overview

8. Code Hands On

Why Care?● Has bettered state of art of various tasks

– Speech Recognition

● Microsoft Audio Video Indexing Service speech system uses Deep Learning● Brought down word error rate by 30% compared to SOTA GMM

– Natural Language Understanding/Processing

● Neural net language models are current state of art in language modeling, Sentiment Analysis, Paraphrase Detection and many other NLP task

● SENNA system uses neural embedding for various NLP tasks like POS tagging, NER, chunking etc.

● SENNA is not only better than SOTA but also much faster

– Object Recognition

● Breakthrough started in 2006 with MNIST dataset. Still the best (0..27% error )● SOTA error rate over ImageNet dataset down to 15.3% compared to 26.1%

Why Care?

● MultiTask and Transfer Learning

– Ability to exploit commonalities of different learning task by transferring learned knowledge

– Example Google Image Search (Woman in a red dress in a garden)

– Other example is Google's Image Caption (http://googleresearch.blogspot.com/2014/11/apictureisworththousandcoherent.html)

Why it works?● Learning Representations

– Time to move over hand crafted features

● Distributed Representation

– More robust and expressive features

● Learning Multiple Level of Representation

– Learning multiple level of abstraction

#1 Learning Representation

• Handcrafting features is inefficient and time consuming

• Must be done again for each task and domain

• The features are often incomplete and highly correlated

#2 Distributed Representation

● Features can be nonmutually exclusive example language.

● Need to move beyond onehot representations such as from clustering algorithms, knearest neighbors etc

● O(n) parameters/examples for O(N) input regions but we can do better O(k) parameters/inputs for O(2k) input regions

● Similar to multiclustering, where multiple clustering algorithms are applied in parallel or same clustering algorithm is applied to different input region

#3 Learning multiple level of representation

● Deep architectures promote reuse of features

● Deep architecture can potentially lead to progressively more abstract features at higher layer

● Good intermediate representation can be shared across tasks

● Insufficient depth can be exponentially inefficient

The Basics

● Building block of the network is a neuron.

● The basic terminologies– Input

– Activation layer

– Output

● Activation function is usually of the form

inputactivation function

output

h(w ,b)=f (wT . x+b)

f ( z)=1

1+e−z

Logistic Regression● Logistic regression is a probabilistic, linear classifier

● Parameterized by W and b

● Each output is probability of input belonging to class yi

● Prob of x being member of class Yi is calculated as

P (Y =Y i∣x)=softmax i(Wx+b)

softmax i(Wx+b)=eW i

eW jTx+bi

y pred=argmaxY iP (Y=Y i∣x)

Logistic Regression – Training● The loss function for logistic regression is negative log likelihood, defined as

● The loss function minimization is done using stochastic gradient descent or mini batch stochastic gradient descent.

● The function is convex and will always reach global minima which is not true for other architectures we will discuss.

● Cannot solve famous XOR problem.

Multilayer preceptron

● An MLP can be viewed as a logistic regression classifier where the input is first transformed using a learned non-linear transformation.

● The non linear transformation can be a sigmoid or a tanh function.

● Due to addition of non linear hidden layer, the loss function now is non convex

● Though there is no sure way to avoid minima, some empirical value initializations of the weight matrix helps.

Multilayer preceptron – Training

● Let D be the size of input vector and there be L output labels. The output vector (of size L) will be given as

where and are activation functions.

● The hard part of training is calculating the gradient for stochastic gradient descent and of course avoid the local minima.

● Backpropagation used as an optimization technique to avoid calculation gradient at each step. (It is like Dynamic programming for derivative chain rule recursion)

Deep Neural Nets and issues● Generally networks with more than 23 hidden layers are

called deep nets.

● Pre 2006 they performed worse than shallow networks.

● Though hard to analyze the exact cause the experimental results suggested that gradientbased training for deep nets got stuck in local minima due to gradient diffusion

● It was difficult to get good generalization.

● Random initialization which worked for shallow network couldn't be used deep networks.

● The issue was solved using a unsupervised pretraining of hidden layers followed by a fine tuning or supervised training.

AutoEncoders

● Multilayer neural nets with target output = input.

● Reconstruction = decoder(encoder(input))

● Objective is to minimize the reconstruction error.

● PCA can be seen as a auto-encoder with and

● So autoencoder could be seen as an non-linear PCA which tries of learn latent representation of input.

a= tanh (Wx+b)

x '=tanh (W Tx+c)

L=‖x '−x‖2

a=Wx x '=W T a

AutoEncoders Training

Stacked Autoencoders● One way of creating deep networks.

● Other is Deep Belief Networks that is made of stacked RBMs trained greedily.

● Training is divided into two steps– Pre Training

– Fine Tuning

● In Pre Training autoencoders are recursively trained one at a time.

● In second phase, i.e. fine tuning, the whole network is trained using to minimize negative log likelihood of predictions.

● Pre Training is unsupervised but post training is supervised

PreTraining

Why PreTraining Works

● Hard to know exactly as deep nets are hard to analyze

● Regularization Hypothesis

– PreTraining acts as adding regularization term leading to better generalization.

– It can be seen as good representation for P(X) leads to good representation of P(Y|X)

● Optimization Hypothesis

– PreTraining leads to weight initialization that restricts the parameter space near better local minima

– These minimas are not achievable via random initialization

Other Type of Neural Networks

● Recurrent Neural Nets

● Recursive Neural Networks

● Long Short Term Memory Architecture

● Convolutional Neural Networks

Libraries to use

● Theano/Pylearn2 (U Motreal, Python)

● Caffe (UCB, C++, Fast, CUDA based)

● Torch7 (NYU, Lua, supported by Facebook)

Introduction to TheanoWhat is Theano?

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multidimensional arrays efficiently.

Why should it be used?

– Tighter integration with numpy

– Transparent use of GPU

– Efficient symbolic representation

– Does lots of heavy lifting like gradient calculation for you

– Uses simple abstractions for tensors and matrices so you don't have to deal with it

Theano Basics (Tutorial)

● How To: – Declare Expression– Compute Gradient

● How expression is evaluated. (link)

● Debugging

Implementations● Logistic Regression

● Multilayer preceptron

● AutoEncoders

● Stacked AutoEncoders

Refrences● [Bengio07] Bengio, P. Lamblin, D. Popovici and H. Larochelle, Greedy

LayerWise Training of Deep Networks, in Advances in Neural Information Processing Systems 19 (NIPS‘06), pages 153160, MIT Press 2007.

● [Bengio09] Bengio, Learning deep architectures for AI, Foundations and Trends in Machine Learning 1(2) pages 1127.

● [Xavier10] Bengio, X. Glorot, Understanding the difficulty of training deep feedforward neuralnetworks, AISTATS 2010

● Bengio, Yoshua, Aaron Courville, and Pascal Vincent. "Representation learning: A review and new perspectives." Pattern Analysis and Machine Intelligence, IEEE Transactions on 35.8 (2013): 17981828.

● Deep learning for NLP (http://nlp.stanford.edu/courses/NAACL2013/)

Thank You!

Intro to Deep Learning

Engineering

Transcript of Intro to Deep Learning

Deep Learning for Computer Visionintrotodeeplearning.com/2017/lectures/6S191-Deep...Intro to Deep Learning Convolutional Neural Networks: Layers • INPUT [32x32x3] will hold the raw

List of Deep Learning and NLP Resourcescs- · List of Deep Learning and NLP Resources Dragomir Radev dragomir.radev@yale.edu May 3, 2017 * Intro + ...

EXCELLENCE Professional Hands-On Trainings in Cyber ...SKILLS Intro to Deep Learning Deep Learning for Computer Vision Intro to Accelerated GPU Computing AI & Deep Learning with TensorFlow

Deep Learning Lab Course 2017 (Deep Learning …ais.informatik.uni-freiburg.de/teaching/ws17/deep...Deep Learning Lab Course 2017 (Deep Learning Practical) Labs: (Computer Vision)

Deep Learning Intro - Data Science AfricaDeep Learning Intro Max Welling Special thanks to Efstratios Gavves whose slides I am using and who has helped with the practicals Also thanks

Introduction to Deep Learning with Tensor ow · Preliminary Intro to DL with Tensor ow Introduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia

Applied Deep Learning for Sequences Class 1 -- Intro

An Intro to Deep Learning for NLPmausam/courses/col772/spring2019/lectures/09-dlintro.pdf · An Intro to Deep Learning for NLP Mausam Disclaimer: this is an outsider’s understanding.

CS 1678: Intro to Deep Learning

Georgia Tech cse6242 - Intro to Deep Learning and DL4J

20151216 Computer Vision Meetup Deep Learning Deep Learning - Rene...Deep Learning . René Donner Deep Learning Overview 2 ... coursera ... 20151216 Computer Vision Meetup Deep Learning

Large Scale Data Analysis Using Deep Learningukang/courses/17S-DL/L2-intro...Deep learning has a long and rich history with varying popularity over time 2. Deep learning has become

Deep Learning Intro

Deep Reinforcement Learningintrotodeeplearning.com/materials/2018_6S191_Lecture5.pdf · Lex Fridman: fridman@mit.edu January 2018 Course 6.S191: Intro to Deep Learning Deep Reinforcement

Deep Learning Intro - Georgia Tech - CSE6242 - March 2015

Intro to Deep Reinforcement Learning

Intro to Deep Learning

Drago's long list of Deep Learning and NLP Resourcesradev/intronlp/dlnlp2016.pdfDrago's long list of Deep Learning and NLP Resources November 26, 2016 * Intro

Convolutional Neural Networks for Biomedical Image ...Deep Learning Intro Perceptron and MLP intro Convolutional NN intro Deep CNN Tools and methods for Deep CNNs. Artificial Neural

Victoria Dean Multimodal Learningintrotodeeplearning.com/2017/lectures/Multimodal learning...soaring in a sunny sky. MIT 6.S191 | Intro to Deep Learning | IAP 2017 MIT 6.S191 | Intro