Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural...

31
Context-Aware Smoothing for Neural Machine Translation Kehai Chen 1 , Rui Wang 2 , Masao Utiyama 2 , Eiichiro Sumita 2 and Tiejun Zhao 1 1 Harbin Institute of Technology, China 2 National Institute of Information and Communications Technology, Japan The 8th International Joint Conference on Natural Language Processing

Transcript of Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural...

Page 1: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Context-Aware Smoothing for Neural Machine TranslationKehai Chen1, Rui Wang2, Masao Utiyama2, Eiichiro Sumita2 and Tiejun Zhao1

1Harbin Institute of Technology, China

2National Institute of Information and Communications Technology, Japan

The 8th International Joint Conference on Natural Language Processing

Page 2: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Content

• Motivation

• Related Works

• Context-Aware Representation

• NMT with Context-Aware Smoothing

• Experimental Results

• Conclusion

11/28/2017 1

Page 3: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Motivation-1:polysemy words他们tamen

想xiang

通过tongguo

打da

比赛bisai

来lai

解决jiejue

矛盾maodun

They want to solve the dispute by playing the game

Src1

(pinyin)

Src2

(pinyin)

Trg1

They are beating each other for a disputeTrg2

他们tamen

正在zhengzai

因为yinwei

争执zhengzhi

而er

打da

对方duifang

打da

vplaying

vbeating

Two bilingual parallel sentence pairs The lexicon semantic depends on its specific context

vda

11/28/2017 2

Page 4: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Motivation-1:polysemy words他们tamen

想xiang

通过tongguo

打da

比赛bisai

来lai

解决jiejue

矛盾maodun

They want to solve the dispute by playing the game

Src1

(pinyin)

Src2

(pinyin)

Trg1

They are beating each other for a disputeTrg2

他们tamen

正在zhengzai

因为yinwei

争执zhengzhi

而er

打da

对方duifang

打da

vplaying

vbeating

𝒉𝒋 = 𝒇𝒆𝒏𝒄(𝒗𝒋, 𝒉𝒋−𝟏)

Learn specific-sentence word representation 𝒗𝒋

Better source representation… …Better target translation

Two bilingual parallel sentence pairs The lexicon semantic depends on its specific context

vda

Motivation-1: Enhancing word representation for polysemy words

11/28/2017 3

Page 5: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Motivation-2:OOV

𝒉2

𝒉2

𝒉3

𝒉3

𝒉4

𝒉4

𝒉J

𝒉J

𝒉1

𝒉1

𝒉5

𝒉5

xJ

𝒔2

𝒄2

𝒔3

𝒄3

𝒔4

𝒄4

𝒔𝐼

𝒄𝐼

𝒔1

𝒄1

𝒔5

𝒄5

𝑦2 𝑦3 𝑦4 𝑦𝐼…𝑦1 𝑦5

x4

…x5

v4

x3

v3

x2

v2

x1

v1 v5 … vJ

Attention α

En

cod

er-D

eco

der

NM

T

Word embedding layer

Encoder layer

Decoder layer

11/28/2017 4

Page 6: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Motivation-2:OOV

𝒉2

𝒉2

𝒉3

𝒉3

𝒉4

𝒉4

𝒉J

𝒉J

𝒉1

𝒉1

𝒉5

𝒉5

xJ

𝒔2

𝒄2

𝒔3

𝒄3

𝒔4

𝒄4

𝒔𝐼

𝒄𝐼

𝒔1

𝒄1

𝒔5

𝒄5

𝑦2 𝑦3 𝑦4 𝑦𝐼…𝑦1 𝑦5

x4

…OOV

v4

x3

v3

x2

v2

x1

v1 vu … vJ

Attention α

En

cod

er-D

eco

der

NM

T

• The source sentence includes a OOV

Single vector vu represents all OOVs

11/28/2017 5

Page 7: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Motivation-2:OOV

𝒉2

𝒉2

𝒉3

𝒉3

𝒉4

𝒉4

𝒉J

𝒉J

𝒉1

𝒉1

𝒉5

𝒉5

xJ

𝒔2

𝒄2

𝒔3

𝒄3

𝒔4

𝒄4

𝒔𝐼

𝒄𝐼

𝒔1

𝒄1

𝒔5

𝒄5

𝑦2 𝑦3 𝑦4 𝑦𝐼…𝑦1 𝑦5

x4

…OOV

v4

x3

v3

x2

v2

x1

v1 vu … vJ

Attention α

En

cod

er-D

eco

der

NM

T

• Breaking the structure of sentence;• Pool source representation;• … …• Affecting translation prediction of target

word.

These gray parts indicate the parameters of NMT which are affected by the OOV

• The source sentence includes a OOV

Single vector vu represents all OOVs

11/28/2017 6

Page 8: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Related Works

• Translation Granularity for NMT

---Smaller Translation Granularity: Word, Sub-word (BPE), Character for OOV.

• Source representation for NMT

---RNN or CNN-based Encoder: learning source representation over the sequence of fixed word vectors.

Sennrich et al. (2016), Costa-jussa and Fonollosa (2016), and Li et al. (2016), … …

Bahdanau et al. (2015), Sutskever et al. (2014), … …

11/28/2017 7

Page 9: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Related Works

• Translation Granularity for NMT

---Smaller Translation Granularity: Word, Sub-word (BPE), Character for OOV.

• Source representation for NMT

• This work focus on enhancing word embedding layer.

---RNN or CNN-based Encoder: learning source representation over the sequence of fixed word vectors.

---Learning a specific-sentence representation for polysemy or OOV word by its context words.

---Offering context-aware representation enhances word embedding layer, thereby improving translations (though RNN Encoder can capture word context).

Sennrich et al. (2016), Costa-jussa and Fonollosa (2016), and Li et al. (2016), … …

Bahdanau et al. (2015), Sutskever et al. (2014), … …

11/28/2017 8

Page 10: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Context-Aware Representation

x4 unkx3x2x1 x9x8x7x6

If there is an OOV “unk” (or polysemy word) in the sentence:

v4v3v2v1 v9v8v7v6

Context information

When one understands natural language sentence intuitively, especially including OOV or polysemy word, one often inferences the meaning of these words depending on its context words.

11/28/2017 9

Page 11: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Context-Aware Representation

• We define a context Lj for source word xj in a fixed size window 2n:

𝐿𝑗 = 𝑥𝑗−𝑛, … , 𝑥𝑗−1 , 𝑥𝑗+1 , … , 𝑥𝑗+𝑛

Historical n words Future n words

11/28/2017 10

Page 12: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Context-Aware Representation

x4 x5x3x2x1 xJ...x7x6

• We define a context Lj for source word xj in a fixed size window 2n:

𝐿𝑗 = 𝑥𝑗−𝑛, … , 𝑥𝑗−1 , 𝑥𝑗+1 , … , 𝑥𝑗+𝑛

• Take x5 as an example, its context 𝐿5 follows (n=2):

𝐿5 = 𝑥3, 𝑥4, 𝑥6, 𝑥7

Historical n words Future n words

11/28/2017 11

Page 13: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Feedforward Context-of-Words Model (FCWM)

Output layer:

Input layer:

Context words Lj of xj :

concatenation

xj-2 xj-1 xj+1 xj+2

vj-2 vj-1 vj+1 vj+2

𝑽Lj

Context-Aware Representation

𝐿𝑗 = 𝑥𝑗−𝑛, … , 𝑥𝑗−1, 𝑥𝑗+1, … , 𝑥𝑗+𝑛

𝑳𝒋 = [𝒗𝒋−𝒏: … : 𝒗𝒋−𝟏: 𝒗𝒋+𝟏: … : 𝒗𝒋+𝒏]

𝑳𝒋 = 𝒗𝒋−𝒏, … , 𝒗𝒋−𝟏, 𝒗𝒋+𝟏, … , 𝒗𝒋+𝒏

Concatenation:

𝑽𝑳𝒋 = 𝜎(𝑾𝟏𝑳𝒋 + 𝒃𝟏)

𝑽𝑳𝒋 = 𝜑1 (𝐿𝑗; 𝜃1)11/28/2017 12

Page 14: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Feedforward Context-of-Words Model (FCWM) Convolutional Context-of-Words Model (CCWM)

Output layer:

Input layer:

Context words Lj of xj :

concatenation

xj-2 xj-1 xj+1 xj+2

vj-2 vj-1 vj+1 vj+2

𝑽Lj

xj-2 xj-1 xj+1 xj+2

vj-2 vj-1 vj+1 vj+2

Context-Aware Representation

𝐿𝑗 = 𝑥𝑗−𝑛, … , 𝑥𝑗−1, 𝑥𝑗+1, … , 𝑥𝑗+𝑛

Input layer:

Context words Lj of xj :

Convolution layer:

Non-linear output layer:

Pooling layer:

𝑳𝒋 = [𝒗𝒋−𝒏: … : 𝒗𝒋−𝟏: 𝒗𝒋+𝟏: … : 𝒗𝒋+𝒏]

𝑳𝒋 = 𝒗𝒋−𝒏, … , 𝒗𝒋−𝟏, 𝒗𝒋+𝟏, … , 𝒗𝒋+𝒏

Concatenation:

𝑽𝑳𝒋 = 𝜎(𝑾𝟏𝑳𝒋 + 𝒃𝟏)

𝐿𝑗 = 𝑥𝑗−𝑛, … , 𝑥𝑗−1, 𝑥𝑗+1, … , 𝑥𝑗+𝑛

𝓜= [𝒗𝒋−𝒏, … , 𝒗𝒋−𝟏, 𝒗𝒋+𝟏, … , 𝒗𝒋+𝒏]

𝓛𝒋 = 𝜓(𝑾𝟐ℳ+𝒃𝟐)

𝓟𝒍 = max[𝓛𝟐𝒍−𝟏, 𝓛𝟐𝒍]

𝓛 = [𝓛𝟏, … , 𝓛𝟐𝒏−𝒌+𝟏]

𝓟 = max[𝓟𝟏, … ,𝓟𝟐𝒏−𝒌+𝟏𝟐

]

𝓥𝓛𝒋 = 𝜎(𝑾𝟑(𝒂𝒗𝒆(σ𝒍=𝟏

𝟐𝒏−𝒌+𝟏

𝟐 𝓟𝒍))+ 𝒃𝟑)

𝑽𝑳𝒋 = 𝜑1 (𝐿𝑗; 𝜃1) 𝓥𝓛𝒋 = 𝜑2 (𝐿𝑗; 𝜃2)11/28/2017 13

Page 15: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

𝒉𝒋 = ቐ𝑓𝑒𝑛𝑐 𝒗𝒋, 𝒉𝒋−𝟏 , 𝑥𝑗 ∈ 𝑉𝑠𝑓𝑒𝑛𝑐(𝝋𝑒 (𝐿𝒙𝒋), 𝒉𝒋−𝟏), 𝑥𝑗 ∉ 𝑉𝑠

𝒉𝒋 = 𝑓𝑒𝑛𝑐 𝒗𝒋, 𝒉𝒋−𝟏

Standard NMT :

This work:

𝒑(𝑦𝑖|𝑦𝑖<𝑖 , 𝑥) = 𝑔(𝒗𝒚𝒊−𝟏 , 𝒔𝒊, 𝒄𝒊)

Standard NMT :

This work:

𝒑(𝑦𝑖|𝑦<𝑖 , 𝑥) = ቐ𝑔 𝒗𝒚𝒊−𝟏 , 𝒔𝒊, 𝒄𝒊 , 𝑦𝑖−1∈ 𝑉𝑡

𝑔 𝝋𝑑 𝐿𝒚𝒊−𝟏 , 𝒔𝒊, 𝒄𝒊 , 𝑦𝑖−1∉ 𝑉𝑡

𝒔2

𝒄2

𝒔3

𝒄3

𝒔4

𝒄4

𝒔𝐼

𝒄𝐼

𝒔1

𝒄1

𝒔5

𝒄5

𝑦2 𝑦3 𝑦4 𝑦𝐼…𝑦1 𝑦u

xJx4 …

v4

x5

v3

x2

v2

x1

v1vLu

φe(Lu)

… vJ

𝒉2

𝒉2

𝒉3

𝒉3

𝒉4

𝒉4

𝒉J

𝒉J

𝒉1

𝒉1

𝒉5

𝒉5

xu

φd(Lu)

𝑉𝐿𝑢′

Attention α

• CARNMT-Enc

En

cod

er-D

ecoder

NM

TNMT for OOV Smoothing

11/28/2017 14

Page 16: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

𝒉𝒋 = ቐ𝑓𝑒𝑛𝑐 𝒗𝒋, 𝒉𝒋−𝟏 , 𝑥𝑗 ∈ 𝑉𝑠𝑓𝑒𝑛𝑐(𝝋𝑒 (𝐿𝒙𝒋), 𝒉𝒋−𝟏), 𝑥𝑗 ∉ 𝑉𝑠

𝒉𝒋 = 𝑓𝑒𝑛𝑐 𝒗𝒋, 𝒉𝒋−𝟏

Standard NMT :

This work:

𝒑(𝑦𝑖|𝑦𝑖<𝑖 , 𝑥) = 𝑔(𝒗𝒚𝒊−𝟏 , 𝒔𝒊, 𝒄𝒊)

Standard NMT :

This work:

𝒑(𝑦𝑖|𝑦<𝑖 , 𝑥) = ቐ𝑔 𝒗𝒚𝒊−𝟏 , 𝒔𝒊, 𝒄𝒊 , 𝑦𝑖−1∈ 𝑉𝑡

𝑔 𝝋𝑑 𝐿𝒚𝒊−𝟏 , 𝒔𝒊, 𝒄𝒊 , 𝑦𝑖−1∉ 𝑉𝑡

𝒔2

𝒄2

𝒔3

𝒄3

𝒔4

𝒄4

𝒔𝐼

𝒄𝐼

𝒔1

𝒄1

𝒔5

𝒄5

𝑦2 𝑦3 𝑦4 𝑦𝐼…𝑦1 𝑦u

xJx4 …

v4

x5

v3

x2

v2

x1

v1v3 … vJ

𝒉2

𝒉2

𝒉3

𝒉3

𝒉4

𝒉4

𝒉J

𝒉J

𝒉1

𝒉1

𝒉5

𝒉5

x3

φd(Lu)

𝑉𝐿𝑢′

Attention α

• CARNMT-Dec

En

cod

er-D

ecoder

NM

TNMT for OOV Smoothing

11/28/2017 15

Page 17: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

𝒉𝒋 = ቐ𝑓𝑒𝑛𝑐 𝒗𝒋, 𝒉𝒋−𝟏 , 𝑥𝑗 ∈ 𝑉𝑠𝑓𝑒𝑛𝑐(𝝋𝑒 (𝐿𝒙𝒋), 𝒉𝒋−𝟏), 𝑥𝑗 ∉ 𝑉𝑠

𝒉𝒋 = 𝑓𝑒𝑛𝑐 𝒗𝒋, 𝒉𝒋−𝟏

Standard NMT :

This work:

𝒑(𝑦𝑖|𝑦𝑖<𝑖 , 𝑥) = 𝑔(𝒗𝒚𝒊−𝟏 , 𝒔𝒊, 𝒄𝒊)

Standard NMT :

This work:

𝒑(𝑦𝑖|𝑦<𝑖 , 𝑥) = ቐ𝑔 𝒗𝒚𝒊−𝟏 , 𝒔𝒊, 𝒄𝒊 , 𝑦𝑖−1∈ 𝑉𝑡

𝑔 𝝋𝑑 𝐿𝒚𝒊−𝟏 , 𝒔𝒊, 𝒄𝒊 , 𝑦𝑖−1∉ 𝑉𝑡

𝒔2

𝒄2

𝒔3

𝒄3

𝒔4

𝒄4

𝒔𝐼

𝒄𝐼

𝒔1

𝒄1

𝒔5

𝒄5

𝑦2 𝑦3 𝑦4 𝑦𝐼…𝑦1 𝑦u

xJx4 …

v4

x5

v3

x2

v2

x1

v1vLu

φe(Lu)

… vJ

𝒉2

𝒉2

𝒉3

𝒉3

𝒉4

𝒉4

𝒉J

𝒉J

𝒉1

𝒉1

𝒉5

𝒉5

xu

φd(Lu)

𝑉𝐿𝑢′

Attention α

• CARNMT-Both

En

cod

er-D

ecoder

NM

TNMT for OOV Smoothing

11/28/2017 16

Page 18: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

NMT for Smoothing all words

𝒉2

𝒉2

𝒉3

𝒉3

𝒉4

𝒉4

𝒉J

𝒉J

𝒉1

𝒉1

𝒉5

𝒉5

𝒔2

𝒄2

𝒔3

𝒄3

𝒔4

𝒄4

𝒔𝐼

𝒄𝐼

𝒔1

𝒄1

𝒔5

𝒄5

𝑦2 𝑦3 𝑦4 𝑦𝐼…𝑦1 𝑦5

Attention α

x5

v5 : vL5

φe(L5)

x4

v4: vL4

φe(L4)

x3

v3: vL3

φe(L3)

x2

v2: vL2

φe(L2)

x1

v1: vL1

φe(L1)

xJ

vJ: vLJ

φe(LJ)

φd(…)

…:…

φe(…)

v𝑦2

:v𝐿2′ v

𝑦3:v𝐿3

′v1

:v𝐿1′ v

𝑦4:v𝐿4

′ v𝑦5

:v𝐿5′ v

𝑦𝐼:v𝐿𝐼

′… :…

)𝜑𝑑(𝐿1′ )𝜑𝑑(𝐿2

′ )𝜑𝑑(𝐿3′ )𝜑𝑑(𝐿5

′ )𝜑𝑑(𝐿I′)𝜑𝑑(𝐿4

𝒉𝒋 = 𝑓𝑒𝑛𝑐(𝝋𝑒 (𝐿𝒙𝒋), 𝒉𝒋−𝟏)

𝒉𝒋 = 𝑓𝑒𝑛𝑐 𝒗𝒋, 𝒉𝒋−𝟏

Standard NMT :

This work:

𝒑(𝑦𝑖|𝑦𝑖<𝑖 , 𝑥) = 𝑔(𝒗𝒚𝒊−𝟏 , 𝒔𝒊, 𝒄𝒊)

Standard NMT :

This work:

𝒑(𝑦𝑖|𝑦<𝑖 , 𝑥) = 𝑔 𝝋𝑑 𝐿𝒚𝒊−𝟏 , 𝒔𝒊, 𝒄𝒊

• CARNMT-ALL

En

cod

er-D

ecoder

NM

T

11/28/2017 17

Page 19: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Experimental Settings• Training data includes 1.42M Chinese-to-English parallel sentence pairs from LDC

corpus.

• The NIST 2002 (MT02) and NIST 2003-2008 (MT03-08) datasets are as validation set

and test sets, respectively. The Case-insensitive 4-gram NIST BLEU score (Papineni et al.,

2002) is as evaluation metric.

• Vocab is 30k; Sentence length is 80; Mini-batch size 80; Word embedding dim is 620;

Hidden layer dim is 1000; Dropout on the all layers; Optimizer is Adadelta.

• The baseline includes: Standard Attentional NMT (Bahdanau et al., 2014); Subword-based

NMT (Sennrich et al., 2016); Character-based NMT (Costa-jussa and Fonollosa, 2016);

Replacing unk with similarity semantic in vocabulary words (Li et al., 2016).

11/28/2017 18

Page 20: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Experimental Results• Results for Chinese-to-English Translation Task

• Moses VS NMT ------> Strong baselines• CARNMT-Enc/Dec VS Bahdanau et al. (2015) ------> Our method can effectively smooth the negative effect (Motivation 1)• CARNMT-Both VS CARNMT-Enc/Dec ------> Source-side smoothing is orthogonal with target-side smoothing (Motivation 1)• ALLSmooth VS CARNMT-Both ------> In-vocabulary smoothing is beneficial for NMT (Motivation 2)

11/28/2017 19

Page 21: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Experimental Results• Results for Chinese-to-English Translation Task

• CARNMT-Enc/Dec VS Bahdanau et al. (2015) Our smooth method can relieve the negative effect of OOV effectively, as in Motivation 2

• CARNMT-Both VS CARNMT-Enc/Dec ------> Source-side smoothing is orthogonal with target-side smoothing (Motivation 1)• ALLSmooth VS CARNMT-Both ------> In-vocabulary smoothing is beneficial for NMT (Motivation 2)

11/28/2017 20

Page 22: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Experimental Results• Results for Chinese-to-English Translation Task

• CARNMT-Both VS CARNMT-Enc/Dec Source-side smoothing is orthogonal with target-side smoothing

11/28/2017 21

Page 23: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Experimental Results• Results for Chinese-to-English Translation Task

• ALLSmooth VS CARNMT-Both In-vocabulary smoothing is also beneficial for NMT (Motivation 1)

11/28/2017 22

Page 24: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Experimental Results• Results for Chinese-to-English Translation Task

• FCWM VS CCWM The CCWM learns the context semantic representation directly for smoothing word vector, while the FCWM predicts

semantic representation of word depending on its context.

11/28/2017 23

Page 25: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Experimental Results• Translation Qualities for Sentences with Different Numbers of OOV

2306 1827 1121 678 391 215 123 59 37 24 29

• The number of OOV = 0 • ALLSmooth is better than the baseline

Bahdanau et al. (2015).

• Both of CARNMT-Enc/Dec are similar to

baseline Bahdanau et al. (2015).

• With the increasing in the number of OOVs• The gap between our methods and other

methods (except PBSMT) become larger,

especially when more than five.

• When the number of OOV is more than seven• PBSMT is better than all NMT models

The number of sentences

11/28/2017 24

Page 26: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Experimental Results• Translation Qualities for Sentences with Different Numbers of OOV

2306 1827 1121 678 391 215 123 59 37 24 29

• The number of OOV = 0 • ALLSmooth is better than the baseline

Bahdanau et al. (2015).

• Both of CARNMT-Enc/Dec are similar to

baseline Bahdanau et al. (2015).

• With the increasing in the number of OOVs• The gap between our methods and other

methods (except PBSMT) become larger,

especially when more than five.

• When the number of OOV is more than seven• PBSMT is better than all NMT models

The number of sentences

11/28/2017 25

Page 27: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Experimental Results• Translation Qualities for Sentences with Different Numbers of OOV

2306 1827 1121 678 391 215 123 59 37 24 29

• The number of OOV = 0 • ALLSmooth is better than the baseline

Bahdanau et al. (2015).

• Both of CARNMT-Enc/Dec are similar to

baseline Bahdanau et al. (2015).

• With the increasing in the number of OOVs• The gap between our methods and other

methods (except PBSMT) become larger,

especially when more than five.

• When the number of OOV is more than seven• PBSMT is better than all NMT models

The number of sentences

11/28/2017 26

Page 28: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Experimental Results• Translation Qualities for Sentences with Different Numbers of OOV

2306 1827 1121 678 391 215 123 59 37 24 29

• The number of OOV = 0 • ALLSmooth is better than the baseline

Bahdanau et al. (2015).

• Both of CARNMT-Enc/Dec are similar to

baseline Bahdanau et al. (2015).

• With the increasing in the number of OOVs• The gap between our methods and other

methods (except PBSMT) become larger,

especially when more than five.

• When the number of OOV is more than seven• PBSMT is better than all NMT models

The number of sentences

11/28/2017 27

Page 29: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Experimental Results• Translation Qualities for Sentences with Different Numbers of OOV

• The negative effect of OOV exists in NMT• The OOV “jiyuqi” itself is not translated.• The phrase “lizheng yousuo zuowei” (the red part in English) is not translated.

• Smoothing the negative effect of OOV• Obtaining the translation “striving to accomplish something” of “lizheng yousuo zuowei”.

11/28/2017 28

Page 30: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Conclusion• Experimental results showed that the negative effect of OOV decreased

the translation performance of NMT, and the existing RNN encoder can

not adequately address the problem.

• The learned CAR was integrated into the Encoder to smooth word

representation, and thus enhanced the Decoder of NMT.

• Experimental results showed that the proposed method can greatly

alleviate the negative effect of OOV and enhance word representation of

in-vocabulary words, thus improving the translations.

11/28/2017 29

Page 31: Context-Aware Smoothing for Neural Machine Translation · Context-Aware Smoothing for Neural Machine Translation Kehai Chen1, Rui Wang 2, Masao Utiyama2, Eiichiro Sumita and Tiejun

Q&AThanks

11/28/2017 30