Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning...

29
Topic 9 Deep Learning for Audio Applications (some slides are adapted from Kevin Duh’s slides on Deep Learning and Neural Networks)

Transcript of Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning...

Page 1: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Topic 9

Deep Learning for Audio Applications

(some slides are adapted from Kevin Duh’s slides on

Deep Learning and Neural Networks)

Page 2: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Audio Classification Tasks

• Music genre, mood, artist, composer, instrument classification

• Auto tagging, i.e., labeling music with words

• Chord recognition

• Acoustic event detection

• Speech/speaker recognition

• General flowchart

ECE 477 - Computer Audition, Zhiyao Duan 2019 2

Feature extraction

Feature selection / transformation

Training data Test data

Classifier training

Classifier

Test results

Page 3: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Features that we have studied

• Raw input: audio waveform or spectrogram

• Feature output:

– RMS, Zero Crossing Rate

– Spectral centroid, spread, skewness, kurtosis, flatness, irregularity, roll-off, flux, etc.

– Harmonic features

– MFCC, LPC, PLP, etc.

• Hand-crafted / engineered / pre-defined

• Hard to decide what features to use for a task

• Question: can computers learn features directly from data?

ECE 477 - Computer Audition, Zhiyao Duan 2019 3

Page 4: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Feature / Representation Learning

• Learn a transformation of "raw" inputs to a representation that can be effectively exploited in a task

• Automatic / does not rely on human knowledge

• Target for a specific task

ECE 477 - Computer Audition, Zhiyao Duan 2019 4

Training data Test data

Feature learning

Rep’s

Test data features

Page 5: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Methods Viewed as Feature Learning

• Principal Component Analysis (PCA)

– Learns a linear transformation, where rows of 𝑾 are the orthogonal directions of greatest variance in the training data

𝑓 𝒙 = 𝑾𝒙 + 𝒃

• Dictionary Learning (e.g., NMF)

– Learns a linear transformation, where the input, transformation matrix, and activation matrix (i.e., features), are all non-negative

𝒙 = 𝑾𝒉(𝑿 = 𝑾𝑯)

ECE 477 - Computer Audition, Zhiyao Duan 2019 5

Page 6: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Are linear features good enough?

• Probably not…

• The world is complex and often nonlinear.

ECE 477 - Computer Audition, Zhiyao Duan 2019 6

Can you define a linear transformation on the images to discriminate “2”s from non-“2”s?

𝑓 𝒙 =

𝑖

𝑤𝑖𝑥𝑖 + 𝑏

where 𝒙 is a vector representing all pixel values of a image.

Page 7: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Are these features highly nonlinear…

…to the waveform or spectrogram?

• RMS, ZCR

• Spectral centroid, spread, skewness, kurtosis, flatness, flux

• Harmonic features

• Cepstrum: ℱ−1 log ℱ 𝑥 𝑡 2 2

• MFCC

• LPC

ECE 477 - Computer Audition, Zhiyao Duan 2019 7

Page 8: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Can we learn highly non-linear features?

Deep neural networks!

ECE 477 - Computer Audition, Zhiyao Duan 2019 8

Page 9: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

ECE 477 - Computer Audition, Zhiyao Duan 2019

Biological Analogy

9

Page 10: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

History of Neural Networks

• 1943 – first neural network computing model by McCulloch and Pitts

• 1958 – Perceptron by Rosenblatt

• 1960’s – a big wave

• 1969 – Minsky & Papert’s book “Perceptrons”

• 1970’s – “winter” of neural networks

• 1975 – Backpropagation algorithm by Werbos

• 1980’s – another big wave

• 1990’s – overtaken by SVM proposed in 1993 by Vapnik

• 2006 – a fast learning algorithm for training deep belief networks by Hinton

• 2010’s – another big wave

• 2018 Turing Award – Hinton, Bengio & LeCun

• ????????

ECE 477 - Computer Audition, Zhiyao Duan 2019 10

Page 11: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

ECE 477 - Computer Audition, Zhiyao Duan 2019

Perceptron

else 0

0 if 11

n

i

ii bxw

w1

w3

w2

w4

w5

x1

x2

x3

x4

x5

1

b

Inp

uts

Bias

11

Page 12: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

ECE 477 - Computer Audition, Zhiyao Duan 2019

A compromise function

• Perceptron

• Sigmoid (Logistic)

)( bsignoutput T xw

)(1

1)(

b

TT

eboutput

xwxw

12

Page 13: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Limitations of 1-layer Nets

• Only express linearly separable cases

– For example, they are good as logic operators “AND”, “NOT”, and “OR”

• How about “XOR”, which is not a linearly separable case?

ECE 477 - Computer Audition, Zhiyao Duan 2019 13

-0.8

1

0.5

x1

0.5x2

AND

?

1

?

x1

?x2

XOR

else 0

0 if 11

n

i

ii bxw

else 0

0 if 11

n

i

ii bxw

Page 14: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

2-layer Nets

ECE 477 - Computer Audition, Zhiyao Duan 2019 14

wij

𝑓 𝑥 = 𝜎

𝑗

𝑎𝑗ℎ𝑗 = 𝜎

𝑗

𝑎𝑗𝜎

𝑖

𝑤𝑖𝑗𝑥𝑖

x1

x2

x3

+1 +1

a3

a2

a1

𝑓 𝑥

h1

h2

h3

Input layer Hidden layer Hidden layer output can be viewed as “features” calculated from the raw input

Page 15: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Richer Representation with More Layers

• 1-layer nets only model linear hyperplanes

• 2-layer nets can approximate any continuous function, given enough hidden nodes

• >3-layer nets can do so with fewer nodes and weights

ECE 477 - Computer Audition, Zhiyao Duan 2019 15

Page 16: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Richer Representations

ECE 477 - Computer Audition, Zhiyao Duan 2019 16

(Lee et al., 2009)

Page 17: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

How to learn the weights?

• Given training data - input and label pairs

𝒙(𝑖), 𝑦(𝑖)

• Adjust network weights to minimize

difference (error) between 𝑓 𝒙(𝑖) and 𝑦(𝑖)

– Calculate derivative of error w.r.t. weights

– Back Propagation algorithm

• See derivation on white board

ECE 477 - Computer Audition, Zhiyao Duan 2019 17

Page 18: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Problems of BP for deep networks

• Vanishing gradient problem

– Gradients vanishes when they are propagated back to early layers, hence their weights are hard to adjust

• Many local minima

– Which will trap gradient decent methods

ECE 477 - Computer Audition, Zhiyao Duan 2019 18

Page 19: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Recurrent Neural Network (RNN)

• Network with loops

• Good for sequential data

ECE 477 - Computer Audition, Zhiyao Duan 2019 19

Page 20: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Another Look of RNN

• Unfolding in time

• Training: Back Propagation Through Time (BPTT)

ECE 477 - Computer Audition, Zhiyao Duan 2019 20

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 21: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Bidirectional RNN

ECE 477 - Computer Audition, Zhiyao Duan 2019 21

Figure from [Graves, 2008]

Page 22: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Deep RNN (DRNN)

• RNN lacks hierarchical processing of inputs

ECE 477 - Computer Audition, Zhiyao Duan 2019 22

Recurrent at the 𝑙-th layer Recurrent at all layers(stacked RNN)

Page 23: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

DRNN for Melody/Background Separation

• [Huang et al., 2014]

ECE 477 - Computer Audition, Zhiyao Duan 2019 23

Page 24: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

DRNN for Melody/Background Separation

ECE 477 - Computer Audition, Zhiyao Duan 2019 24

Magnitude spectrum of mixture signal

Page 25: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

DRNN for Melody/Background Separation

• Training objectives

– Reconstruction

– Reconstruction & Discrimination

ECE 477 - Computer Audition, Zhiyao Duan 2019 25

Page 26: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Vanishing Gradient Problem of RNN

• Memory is very short

ECE 477 - Computer Audition, Zhiyao Duan 2019 26

Darkness indicates the influence of input at time 1Figure from [Graves, 2008]

Page 27: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

Long Short-Term Memory (LSTM)

• A memory cell

– It can remember things for a long time

ECE 477 - Computer Audition, Zhiyao Duan 2019 27

Page 28: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

LSTM for Onset Detection

• [Eyben, et al., 2010]

ECE 477 - Computer Audition, Zhiyao Duan 2019 28

Page 29: Topic 9 - University of Rochesterzduan/teaching/ece477/lectures/Topic 9 - Deep Learning for...•General flowchart ECE 477 - Computer Audition, Zhiyao Duan 2019 2 Feature extraction

LSTM for Onset Detection

ECE 477 - Computer Audition, Zhiyao Duan 2019 29

Network input(Spectrogram)

Network output(onset strength curve)