Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... ·...

47
Machine Learning and having it deep and structured Hung-yi Lee

Transcript of Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... ·...

Page 1: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Machine Learningand having it deep and structured

Hung-yi Lee

Page 2: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

What isMachine Learning?

Page 3: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

You know how to program …

• You can ask computers to do lots of things for you.

• However, computer can only do what you ask it to do.

• Computer can never solve the problem you can’t solve.

Page 4: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Some tasks are very complex

• One day, you are asked to write a program for speech recognition.

Find the common patterns from the left waveforms.

It seems impossible to write a program for speech recognition.

你好 你好

你好 你好

You quickly get lost in the exceptions and special cases.

Page 5: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Let the machine learn by itself

你好

大家好

人帥真好

You said “你好”

A large amount of audio data

You write the program for learning.

Learning ......

Page 6: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Learning ≈ Looking for a Function

• Speech Recognition

• Handwritten Recognition

• Weather forecast

• Play video games

f

f

f

f

“2”

“你好”

“sunny tomorrow”

“fire”

weather today

Positions and number of enemies

Page 7: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Framework

Training:Pick the best Function f*

TrainingData

Model

Testing:

,ˆ,,ˆ, 2211 yxyx

Hypothesis Function Set

x

y21, ff

*f

“Best” Function

yxf

:x

:y “你好” “2”

label

function input

function output

大家好

:x

:y

Page 8: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

This Course

Machine Learning

Deep LearningStructured Learning

Page 9: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Deep Learning

Page 10: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

• Production line (生產線)

What is Deep Learning?

“Hello”Simple

Function 1Simple

Function 2Simple

Function N

A very complex function

ModelHypothesis Functions

……

End-to-end training:What each function should do is learned automatically

Page 11: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

• Shallow Approach

Deep v.s. Shallow- Speech Recognition

MFCC

Waveform

Filter bank

DFT

DCT logGMM

spectrogram

“Hello”

:hand-crafted :learned from data

Each box is a simple function in the production line:

Page 12: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Deep v.s. Shallow- Speech Recognition• Deep Learning

All functions are learned from data

Less engineering labor, but machine learns more

“Bye bye, MFCC”- Deng Li in Interspeech 2014

f1

“Hello”

f2

f3

f4f5

Page 13: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Deep v.s. Shallow- Image Recognition• Shallow Approach

http://www.robots.ox.ac.uk/~vgg/research/encoding_eval/

:hand-crafted :learned from data

Page 14: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Deep v.s. Shallow- Image Recognition• Deep Learning

Reference: Zeiler, M. D., &

Fergus, R. (2014). Visualizing

and understanding

convolutional networks.

In Computer Vision–ECCV

2014 (pp. 818-833)

“monkey”f1 f2 f3 f4

All functions are learned from data

Page 15: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

• Production line

• Deep learning usually referred to neural network based approach

What is Deep Learning?

“Hello”Simple

Function 1Simple

Function 2Simple

Function N

End-to-end training:What each function should do is learned automatically

A very complex function

ModelHypothesis Function Set

……

Page 16: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Inspired from Human Brains

Page 17: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

A Neuron for Machine

z

1w

2w

Nw

1x

2x

Nx

b

z z

zbias

a

ze

z

1

1

Sigmoid function

Each neuron is a very simple function

Activation function

Page 18: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

• Cascading the neurons to form a neural network.

Deep Learning

Hidden Layer

1x

2x

……

Layer 1

……

1y

2y…

Layer 2…

…Layer L

……

……

……

Input Output

MyNx

M: RRf N

A neural network is a complex function:

Each layer is a simple function in the production line.

Page 19: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Ups and downs of Deep Learning

• 1960s: Perceptron (single layer neural network)

• 1969: Perceptron has limitation

• 1980s: Multi-layer perceptron

• Do not have significant difference from DNN today

• 1986: Backpropagation

• Usually more than 3 hidden layers is not helpful

• 1989: 1 hidden layer is “good enough”, why deep?

• 2006: RBM initialization (breakthrough)

• 2009: GPU

• 2011: Start to be popular in speech recognition

• 2012: win ILSVRC competition (image)

Page 20: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Become very Popular

Page 21: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Why Deep Learning?

• Speech recognition

Seide, Frank, Gang Li, and Dong Yu. "Conversational Speech Transcription

Using Context-Dependent Deep Neural Networks." Interspeech. 2011.

Deeper is Better.

Page 22: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Why Deeper is Better?

1x 2x ……Nx

……

1x 2x ……Nx

……

……

Shallow Deep

Deep works better simply because it uses more parameters.

Page 23: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Universality Theorem

Reference: http://neuralnetworksanddeeplearning.com/chap4.html

Any continuous function f

M: RRf N

Can be realized by a network with one hidden layer

(given enough hidden neurons)

What is the reason to be deep?

Why “Deep” neural network not “Fat” neural network?

Page 24: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Fat + Short v.s. Thin + Tall

1x 2x ……Nx

Deep

1x 2x ……Nx

……

Shallow

Which one is better?

If they have the same parameters,

Page 25: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Fat + Short v.s. Thin + TallToy Example {0,1}: 2 Rf

10

x

y

……

……

……

……

……

……

0 or 1

Sample 10,0000 points as training data

Page 26: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Fat + Short v.s. Thin + TallToy Example

1 hidden layer:

(A) 125 neurons (C) 2500 neurons(B) 500 neurons

3 hidden layers:

Q: the number of parameters close to (A), (B) or (C)?

Page 27: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Fat + Short v.s. Thin + TallHand-writing digit classification• Same parameters

Deeper: Using less parameters to achieve the same performance

Page 28: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Fat + Short v.s. Thin + TallSpeech Recognition• Word error rate (WER)

1 hidden layerMultiple layers

Seide, Frank, Gang Li, and Dong Yu. "Conversational Speech Transcription

Using Context-Dependent Deep Neural Networks." Interspeech. 2011.

Page 29: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Think about Logic Circuits ……

• A two-layer circuit of logic gates can represent any Boolean function.

• Using multiple layers of logic gates to build some functions are much simpler (less gates needed).

• E.g. parity check

Circuit

Circuit

1 (even)

0 (odd)

For input sequence with d bits,

Two-layer circuit need O(2d) gates.

1 0 1 0

0 0 0 1

Page 30: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Think about Logic Circuits ……

• A two-layer circuit of logic gates can represent any Boolean function.

• Using multiple layers of logic gates to build some functions are much simpler (less gates needed).

• E.g. parity check

XOR

With multiple layers, we need only O(d) gates.

1010

00 1

Page 31: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Back to Deep Learning

• Some functions can be easily represented by deep structure

• Perhaps the functions that can be naturally decomposed into several steps

• E.g. image

http://neuralnetworksanddeeplearning.com/chap1.html

Page 32: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Back to Deep Learning

• Some functions can be easily represented by deep structure

• Perhaps the functions that can be naturally decomposed into several steps

• E.g. image

• To represent the functions with shallow structure needs much more parameters

• More parameters imply more training data needed

• To achieve the same performance, deep learning needs less training data

• With the same amount of data, deep learning can achieve better performance.

Page 33: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Size of Training Data

• Different numbers of training examples

10,0000 5,0000 2,0000

1 hidden layer

3 hidden layers

More parameters

Less parameters

Page 34: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Size of Training Data

• Hand-writing digit classification

Deeper: Using less training data to achieve the same performance

Page 35: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Why deep is not popular before?

• In the past, usually deep does not work ……

We will go back to this issue in the following lectures.

?

Page 36: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

What is Structured Learning?

Page 37: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

In the real world ……

YXf :

X (Input domain): Sequence, graph structure, tree structure ……

Y (Output domain): Sequence, graph structure, tree structure ……

Page 38: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Retrieval

YXf :

:X :Y

(keyword)

“Machine learning”

A list of web pages (Search Result)

Page 39: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Translation

(Another kind of sequence)(One kind of sequence)

YXf :

“Machine learning and having it deep and structured”

“機器學習及其深層與結構化”

:X :Y

Page 40: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Speech Recognition

“大家好,歡迎大家來修機器學習及其深層與結構化”

(Another kind of sequence)(One kind of sequence)

YXf :

:X :Y

Page 41: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Speech Summarization

YXf :

:X

:YSummary

Select the most informative segments to

form a compact version

Record Lectures

Page 42: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Object Detection

YXf :

:X Image Object Positions:Y

Haruhi

Mikuru

Page 43: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Pose Estimation

YXf :

:X Image :Y Pose

Source of images: http://groups.inf.ed.ac.uk/calvin/Publications/eichner-techreport10.pdf

Page 44: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Concluding Remarks

Machine Learning

Deep LearningStructured Learning

Page 45: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Reference

• No Textbook

• Deep Learning• “Neural Networks and Deep Learning”

• written by Michael Nielsen

• http://neuralnetworksanddeeplearning.com/

• “Deep Learning” (not finished yet)

• Written by Yoshua Bengio, Ian J. Goodfellow and Aaron Courville

• http://www.iro.umontreal.ca/~bengioy/dlbook/

• Structured Learning• No suggested reference

Page 46: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Thank you!

Page 47: Machine Learning - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture... · 2015-10-08 · Back to Deep Learning •Some functions can be easily represented

Human Brains are Deep