Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper...

59
Tejas Kulkarni 1

Transcript of Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper...

Page 1: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Tejas Kulkarni1

Page 2: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Introduction of Deep

Learning

Zaikun Xu

USI, Master of Informatics

HPC Advisory Council Switzerland Conference 2016

2

Page 3: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

OverviewIntroduction to deep learning

Big data + Big computational power (GPUs)

Latest updates of deep learning

Examples

Feature extraction

3

Page 4: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Machine Learning

source : http://www.cs.utexas.edu/~eladlieb/RLRG.html

Reinforcement learning

Introduction to deep learning

4

Page 5: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Inspiration from Human

brain

Introduction to deep learning

• Over 100 billion neurons

• Around 1000 connections

• Electrical signal transmitted through axon, can be inhibitory

or excitatory

• Activation of a neuron depends on inputs

5

Page 6: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Artificial Neuron

• Circles —————————- Neuron

source : http://cs231n.stanford.edu/

• Arrow —————————— Synapse

• Weight —————————- Electrical signal

Introduction to deep learning

6

Page 7: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Nonlinearity

Introduction to deep learning

• Neural network learns complicated structured data with non-linear activation

functions

7

Page 8: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Perceptron

• Perceptron with linear activation

Introduction to deep learning

• In1969, cannot learn XOR function and takes very

long to train

Winter of AI

8

Page 9: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Back-Propogation

• 1970, got BS on psychology and keep studying NN at University of

Edinburg

Introduction to deep learning

• 1986, Geoffrey Hinton and David Rumelhart, Nature paper

"Learning Representations by Back-propagating errors”

• Hidden layers added

• Computation linearly depends on # of neurons

• Of course, computers at that time is much faster

9

Page 10: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Convolutionary NN

• Yann Lecun, born at Paris, post-doc of Hinton at U

Toronto.

Introduction to deep learning

• LeNet on hand-digital recognition at Bell Labs, 5% error rate

• Attack skill : convolution, pooling

10

Page 11: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Local connection

11

Introduction to deep learning

Page 12: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Weight sharing

12

Introduction to deep learning

Page 13: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Convolution

• Feature map too big

13

Introduction to deep learning

Page 14: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Pooling

14

Introduction to deep learning

http://vaaaaaanquish.hatenablog.com/entry/2015/01/26/060622

Page 15: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Support Vector

Machine

Introduction to deep learning

• Vladmir Vapnik, born at Soviet Union, colleague of Yann Lecun ,

invented SVM at 1963

• Attack skill : Max-margin + kernel mapping

15

Page 16: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Another winter looming

Introduction to deep learning

• SVM : good at “capacity

control”, easy to use, repeat

• NN : good at modelling

complicated structure

• SVM achieved 0.8% error rate as 1998 and 0.56% in 2002 on the

same hand-digit recognition task.

• NN stops at local optima, hard and takes long to train, overfitting

16

Page 17: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Recurrent Neural Nets

Introduction to deep learning

http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-2-implementing-a-language-model-rnn-with-python-numpy-and-theano/

17

Model sequential data

Page 18: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Vanish gradient

Introduction to deep learning

18

Page 19: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

LSTM

• http://www.cs.toronto.edu/~graves/handwriting.html19

Introduction to deep learning

Page 20: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

10 years funding

• Hinton was happy and rebranded its neural network to

deep learning. The story goes on …………..

Introduction to deep learning

20

Page 21: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Other Pioneers

Introduction to deep learning

21

Page 22: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Data preparation

• Input : Can be image, video, text, sounds….

• Output : Can be a translation of another language,

location of an object, a caption describing a image

….

Introduction to deep learning

Training data Validation data Test data

Model

22

Page 23: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Training • Forward pass, propagating information from input to

ouput

Introduction to deep learning

23

Page 24: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Training• Backward pass, propagating errors back and update

weights accordingly

Introduction to deep learning

24

Page 25: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Parameter update• Gradient descent (sgd, mini-batch)

Introduction to deep learning

animation by Alec Radford

• x = x - lr * dx

source : http://cs231n.stanford.edu/

25

Page 26: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Parameter update• Gradient descent

Introduction to deep learning

• x = x - lr * dx

animation by Alec Radford)

• Second order method ? Quasi-Newton’s method26

Page 27: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Problems

• While neural networks can approximate any function

of input X to output y, it is very hard to train with

multiple layers

• Initialization

Introduction to deep learning

• Pre-training

• More data/Dropout to avoid overfitting

• Accelerate with GPU

• Support training in parallel

27

Page 28: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

ImageNet

• source :https://devblogs.nvidia.com/parallelforall/nvidia-ibm-cloud-support-imagenet-large-scale-visual-recognition-

challenge/

Big data + Big computational power

• >1M well labeled images of 1000 classes

28

Page 29: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Big data + Big computational power

29

Page 30: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

GPUs

• source :http://techgage.com/article/gtc-2015-in-depth-recap-deep-learning-quadro-m6000-autonomous-driving-more/

Big data + Big computational power

30

Page 31: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Parallelization

• Data Parallelization: same weight but different data,

gradient needs to be synchronized

Big data + Big computational power

• Not scale well with size of cluster

• Works well on smaller clusters and easy to implement

source : Nvidia

31

Page 32: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

• Model Parallelization: same data, but different parts

of weight

Parallelization

Big data + Big computational power

• Parameters in a

big Neural net

can not fit into

memory of a

single GPU

Figure from Yi Wang 32

Page 33: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Model+Data Parallelization

Big data + Big computational power

Krizhevsky et al. 2014

computation

heavy

weight heavy

33

Page 34: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

• Trained on 30M moves + Computation + Clear

Reward scheme

Big data + Big computational power

34

Page 35: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

• Task : A classifier to recognize hand written digits

Feature extraction

source :https://indico.io/blog/visualizing-with-t-sne/

35

Page 36: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Hierarchal feature

extraction

Feature extraction

• source : http://www.rsipvision.com/exploring-deep-learning/

36

Page 37: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Transfer learning

• Use features learned on ImageNet to tune your only classifier on

other tasks.

Feature extraction

• 9 categories + 200, 000 images

• extract features from last 2 layers + linear SVM37

Page 38: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Word embedding• Embed words into vector space such that similar words are more

close to each other in vector space

Feature extraction

source : http://anthonygarvan.github.io/wordgalaxy/

• Calculate cosine similarity

Pairs - France+ Italy = Rome

38

Page 39: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Examples

39

Page 40: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Go

• 3361 different legal positions, roughly 10170

Examples

Brute Force does not work !!!

• Monto Carlo Tree Search

Heuristic Searching method

Still slow since the search tree is so big

40

Page 41: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

AlphaGo

Examples

• Offline learning : from human go players, learn a Deep

learning based policy network and a rollout policy

• Use self-playing result to train a value network,

which evaluate the current situation

41

Page 42: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Online Playing

Examples

• When playing with Lee Sedol, policy network give

possible moves (about 10 to 20), the value network

calculates the weight of position.

• MCTS update the weight of each possible moves

and select one move.

• The engineering effort on systems with CPU and

GPU with data , model parallelism matters

42

Page 43: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Image recognition

Examples

Train in a supervised fashion 43

Page 44: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Caption generation

Examples

44

Page 45: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Neural MT

• The main idea behind RNNs is to compress a sequence of input

symbols into a fixed-dimensional vector by using recursion.

Examples

source : https://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-with-gpus/

45

Page 46: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Neural arts

Examples

46

Page 47: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Go deeper

Updates of deep learning

• Deep Residual Learning for Image Recognition, MSRA, 201547

Page 48: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Go multi-modal

Updates of deep learning

48

Page 49: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Go End-End

Updates of deep learning

source : https://devblogs.nvidia.com/parallelforall/deep-speech-accurate-speech-recognition-gpu-accelerated-deep-learning/#.VO56E8ZtEkM.google_plusone_share

49

Page 50: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Go with attention

Updates of deep learning

50

Page 51: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Go with Specalized Chips

Updates of deep learning

• Prototypes

51

Page 52: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Go where

• Video understanding

Updates of deep learning

• Deep reinforcement learning

• Natural language understanding

• Unsupervised learning !!!

52

Page 53: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Action classification

53

Page 54: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

54

Page 55: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

55

Page 56: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Summary• “The takeaway is that deep learning excels in tasks

where the basic unit, a single pixel, a single

frequency, or a single word/character has little

meaning in and of itself, but a combination of such

units has a useful meaning.” — Dallin Akagi

• Structured data + well defined cost function/rewards

• learning in a End-End fashion is nice trend

56

Page 57: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Thoughts

• Deep leaning will change our lives in a positive way

• We learn fast by either competing with friend like AlphaGo,

or they directly teach us in a effective way

• What might happen in the future

• Still long way to go towards real AI

57

Page 58: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

list of materials

• https://github.com/ChristosChristofidis/awesome-

deep-learning

58

Page 59: Learning - HPC Advisory Council...• 1986, Geoffrey Hinton and David Rumelhart, Nature paper "Learning Representations by Back-propagating errors” • Hidden layers added • Computation

Acknowledgement

• Tim Dettmers

• Lyu xiyu

59