“Deep”&Learning&demo.clab.cs.cmu.edu/NLP/F20/files/slides/26-deep... · 2020. 11. 24. · 2...
Transcript of “Deep”&Learning&demo.clab.cs.cmu.edu/NLP/F20/files/slides/26-deep... · 2020. 11. 24. · 2...
-
“Deep” Learning
-
2
natural language analyzer
Big picture: natural language analyzers
Natural language input signal: -‐ Web page -‐ Ques
-
3
sen
-
4
sen
-
Agenda
• Big picture
• Why deep learning?
• Building blocks of a deep neural network
• How to train deep neural networks
• Important results
-
6
do
classifica
-
7
classifica
-
8
How to define f(l, d): linear models
Linear models: f(l, d) = w . g(l,d)
0 0 0 1 0 0 1 0 0 0 0 …
0.4 -‐1.2 0.2 0.2 -‐0.4 -‐1.0 5.1 1.1 2.3 0.8 -‐0.1 … Number of
-
9
How to define f(l, d): linear models
Linear models: f(l, d) = w . g(l,d) -‐ Easy to implement -‐ Easy to op
-
Agenda
• Big picture
• Why deep learning?
• Building blocks of a deep neural network
• How to train deep neural networks
• Important results
-
11
Linear models: f(l, d) = w . g(l,d) = w(l) . x(d) e.g., y1 = x1 w1,1+ x2 w2,1+ x3 w3,1+ x4 w4,1+ x5 w5,1 = w(1) . x(d)
w1,1
w5,3
Number of
-
12
neural network v1.0: linear model
Linear models: f(l, d) = w . g(l,d) = w(l) . x(d) e.g., y1 = x1 w1,1+ x2 w2,1+ x3 w3,1+ x4 w4,1+ x5 w5,1 = w(1) . x(d) x
W same as
y
W
=
x y
similar words s
-
13
neural network v2.0: representa
-
14
neural network v2.1: representa
-
15
neural network v3.0: complex func
-
16
neural network v3.0: complex func
-
17
neural network v3.0: complex func
-
18
neural network v3.0: complex func
-
19
neural network v3.5: “deeper” networks
x
W2
y
W1
h1
y = W3 h2 = W3 a2( W2 a1(W1 x) )
W3
h2
Wait but why do we need more layers?
-
20
neural network v3.5: “deeper” networks
x
W2
y
W1
h1
y = W3 h2 = W3 a2( W2 a1(W1 x) )
W3
h2
-
21
neural network v4.0: recurrent neural networks
Big idea: use hidden layers to represent sequen
-
22
neural network v4.0: recurrent neural networks
Figure credits: Christopher Olah
How to compute the hidden layers?
-
23
neural network v4.1: output sequences
Figure credits: Andrej Karpathy
-
24
neural network v4.1: output sequences
Figure credits: Andrej Karpathy
Example: Character-‐level language models
-
25
neural network v4.1: output sequences
Credits: Andrej Karpathy
Sample output: Copyright was the succession of independence in the slop of Syrian influence that was a famous German movement based on a more popular servicious, non-‐doctrinal and sexual power post. Many governments recognize the military housing of the [[Civil Liberaliza
-
26
neural network v4.2: Long-‐Short Term Memory
Figure credits: Christopher Olah hcp://colah.github.io/posts/2015-‐08-‐Understanding-‐LSTMs/
LSTMs
Regular RNNs
-
27
neural network v4.2: Long-‐Short Term Memory
Figure credits: Christopher Olah hcp://colah.github.io/posts/2015-‐08-‐Understanding-‐LSTMs/
-
28
neural network v4.3: bidirec
-
29
neural network v4.4: acen
-
30
neural network v5: convolu
-
31
neural network v5: convolu
-
32
neural network v5: convolu
-
convolu
-
34
neural network v5.1: recursive NNs
-
35
neural network v6: dropout
-
Agenda
• Big picture
• Why deep learning?
• Building blocks of a deep neural network
• How to train deep neural networks
• Important results
-
How to train NN models? • argmaxl f(d, l) only tells us which label to predict. • Supervised learning (need input/output pairs) • Loss func
-
How to op
-
How to op
-
How to op
-
How to op
-
How to op
-
How to op
-
How to op
-
Agenda
• Big picture
• Why deep learning?
• Building blocks of a deep neural network
• How to train deep neural networks
• Important results
-
46
Major results: language modeling
-
Krizehvsky et al. (2012)
47
Major results: image classifica
-
48
Major results: ImageNet
Krizehvsky et al. (2012): posi
-
49
Major results: ImageNet
Krizehvsky et al. (2012): sample convolu
-
50
Major results: speech recogni
-
51
Major results: transla
-
52
Major results: transla
-
53
Major results: dependency parsing
Chen and Manning (2014)
-
54
Major results: dependency parsing
Dyer et al. (2015)
-
Important things we didn’t cover
• Dark knowledge • Connec
-
Agenda
• Big picture
• Why deep learning?
• Building blocks of a deep neural network
• How to train deep neural networks
• Important results
-
57
sen