MixPoet: Diverse Poetry Generation via Learning ...

28
MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space Xiaoyuan Yi 1 , Ruoyu Li 2 , Cheng Yang 3 , Wenhao Li 1 , Maosong Sun 1* 1 Tsinghua University 2 6ESTATES PTE LTD 3 Beijing University of Posts and Telecommunications AAAI 2020

Transcript of MixPoet: Diverse Poetry Generation via Learning ...

Page 1: MixPoet: Diverse Poetry Generation via Learning ...

MixPoet: Diverse Poetry Generation via Learning

Controllable Mixed Latent Space

Xiaoyuan Yi1, Ruoyu Li2, Cheng Yang3, Wenhao Li1, Maosong Sun1*

1Tsinghua University2 6ESTATES PTE LTD

3Beijing University of Posts and Telecommunications

AAAI 2020

Page 2: MixPoet: Diverse Poetry Generation via Learning ...

CONTENTS

Background & Motivation

Methodology

Experiment & Conclusion

Jiuge (九歌) System

01

• 04

• 03

• 02

Page 3: MixPoet: Diverse Poetry Generation via Learning ...

Elegant Expressions

Colorful Contents

Diverse Styles

01 Background & Motivation

Poetry

• Human Writing Mechanism

• Computational Creativity

• Humanizing AI

Exploration

Application• Entertainments

• Advertising

• Poetry Education

• Literary Research

Page 4: MixPoet: Diverse Poetry Generation via Learning ...

01 Background & Motivation

(Yan, 2016 ;Wang et al., 2016; Yang et al., 2018)

?

Primary Criteria

of

Poetry Quality

Diversity

(Zhang and Lapata, 2014; Hopkins and Kiela, 2017)1Fluency

Rhyme Rhythm

2Topic

Relevance

(Ghazvininejad et al., 2016; Yi et al. 2018; Li et al. 2018)3Context

Coherence

Poems generated with different topic words

should be distinguishable from each other.

With the same topic word, the model

should be able to generate distinct poems.

Inter-Topic

Diversity

Intra-Topic

Diversity

Page 5: MixPoet: Diverse Poetry Generation via Learning ...

01 Background & Motivation

Most Frequent K Words

Acc

um

ula

tive

Fre

quen

cyExisting models tend to

• remember common patterns in the corpus

• produce repetitive and generic contents

The most frequent 20 words account for

20% of all generated contents!

Input keyword:desolation Input keyword:autumn lake

Distinct input keywords

• Repetitive words

• Identical lines

Page 6: MixPoet: Diverse Poetry Generation via Learning ...

01 Background & Motivation

Human-authored poems are highly diverse!

Each human poet has his/her own styles.

One possible solution: modeling each individual poet.

(Wei et al., 2018; Tikhonov and Yamshchikov, 2018)

• Sparsity of data

• Inconsistency of a poet’s style

Page 7: MixPoet: Diverse Poetry Generation via Learning ...

01 Background & Motivation

Diverse styles

Living experience

Historical background

Love experienceGender

School of literary

Factors

Composition manners of human poets

Thoughts Feelings Expressions…

塞上长城空自许,镜中衰鬓已先斑。陆游南宋《书愤五首·其一》

黄沙百战穿金甲,不破楼兰终不还王昌龄盛唐《从军行七首·其四》

人闲桂花落,夜静春山空。王维 《鸟鸣涧》

孤城向水闭,独鸟背人飞。刘长卿《馀干旅舍》

Page 8: MixPoet: Diverse Poetry Generation via Learning ...

01 Background & Motivation

Model each influence factor!

• Gather poems that share the same factor value

• Focus on the characteristics of each poem

instead of each poet

Sparse data

Changeable styles of one poet

Model each individual poet.

factor 1

factor 2

factor 3

factor M style 2

style 4

style 1

style 3

style 5

style N

Page 9: MixPoet: Diverse Poetry Generation via Learning ...

CONTENTS

Background & Motivation

Methodology

Experiment & Conclusion

Jiuge (九歌) System

01

• 04

• 03

• 02

Page 10: MixPoet: Diverse Poetry Generation via Learning ...

02 Methodology

𝑝 𝑥, 𝑦, 𝑧 𝑤 = 𝑝 𝑦 𝑤 ∗ 𝑝 𝑧 𝑤, 𝑦 ∗ 𝑝(𝑥|𝑤, 𝑦, 𝑧)

Overview

poem𝑥

user-specified keyword𝑤

𝑀 𝑦1, … , 𝑦𝑖 , … , 𝑦𝑀factors

• Poetry style is tightly coupled with

semantics! (Embler,1967)

Each factor 𝑦𝑖 is discretized into 𝐾𝑖 classes

𝑖=1

𝑚

𝐾𝑖 factor mixtures

New distinctive styles!

to capture factor-related properties

𝑧 latent variable

𝑝 𝑥, 𝑦 𝑤 = න𝑧

𝑝 𝑥, 𝑦, 𝑧 𝑤 𝑑𝑧

latent space!

Previous work:

[𝑧; 𝑦] (Hu et al., 2017)• Interpretability & controllability

Page 11: MixPoet: Diverse Poetry Generation via Learning ...

user

02 Methodology

𝑝 𝑥, 𝑦, 𝑧 𝑤 = 𝑝 𝑦 𝑤 ∗ 𝑝 𝑧 𝑤, 𝑦 ∗ 𝑝(𝑥|𝑤, 𝑦, 𝑧)

Figure1: A graphical illustration of MixPoet

specify𝑤

Overview

infer

𝑝 𝑦 𝑤𝑧 sample 𝑥

𝑤

determine

determine

𝑦

specify

different mixtures of 𝑦𝑖 intra-topic diversity √

different 𝑤 inferred 𝑦𝑖 inter-topic diversity √

+

+

Page 12: MixPoet: Diverse Poetry Generation via Learning ...

02 Methodology

Semi-Supervised Conditional VAE

For labelled data

log 𝑝 𝑥, 𝑦 𝑤 ≥ −𝐿(𝑥, 𝑦, 𝑤)

= 𝐸𝑞 𝑧 𝑥, 𝑤, 𝑦 log 𝑝 𝑥 𝑧, 𝑤, 𝑦 − 𝐾𝐿[𝑞(𝑧|𝑥, 𝑤, 𝑦)| 𝑝 𝑧 𝑤, 𝑦 + log 𝑝(𝑦|𝑤)

log 𝑝 𝑥 𝑤 ≥ −𝑈(𝑥,𝑤)

= 𝐸𝑞 𝑦 𝑥,𝑤 −𝐿 𝑥, 𝑦, 𝑤 + 𝐻(𝑞(𝑦|𝑥, 𝑤))

Decoder Recognition Network

Prior Network

Classifier

For unlabelled data (following (Kingma et al. 2014))

Classifier

𝐿 = 𝐸𝑝𝑙(𝑥,𝑤,𝑦) 𝐿 𝑥,𝑤, 𝑦 − 𝛼 ∗ log 𝑞 𝑦 𝑥,𝑤 + 𝛽 ∗ 𝐸𝑝𝑢 𝑥,𝑤 [𝑈(𝑥, 𝑤)]

Total loss

Page 13: MixPoet: Diverse Poetry Generation via Learning ...

02 Methodology

Latent Space Mixture

To incorporate multiple factors 𝑧 = [𝑧1, 𝑧2, … , 𝑧𝑀]

(1) independence of influence factors 𝑦1, 𝑦2

(2) conditional independence of subspaces 𝑧1, 𝑧2

e.g., M = 2

𝑝 𝑧 𝑤, 𝑦 = 𝑝 𝑧1 𝑤, 𝑦1 𝑝(𝑧2|𝑤, 𝑦2)

Assume:

Independently specify the values of different factor

A latent space which mixes the properties of multiple factors!

𝑧1 𝑧2

𝑧

𝑧2𝑧1

Page 14: MixPoet: Diverse Poetry Generation via Learning ...

02 Methodology

(1) Mixture for Isotropic Gaussian Space (MixPoet-IG)

Assume 𝑧~𝑁(𝜇, 𝜎2𝐼) 𝐾𝐿[𝑞(𝑧|𝑥, 𝑤, 𝑦)| 𝑝 𝑧 𝑤, 𝑦

= 𝐾𝐿[𝑞(𝑧1|𝑥, 𝑤, 𝑦1)| 𝑝 𝑧1 𝑤, 𝑦1+𝐾𝐿[𝑞(𝑧2|𝑥, 𝑤, 𝑦2)| 𝑝 𝑧2 𝑤, 𝑦2

Analytically minimize

these two KL terms!

Independent dimensions of the latent variable Not expressive enough(Dilokthanakul et al. 2017)

A complex latent space with

(1) independent subspaces (𝑧1, 𝑧2)

(2) entangled internal dimensions of each subspace

Independent control of each factor

Expressive and distinguishable latent representations

Page 15: MixPoet: Diverse Poetry Generation via Learning ...

02 Methodology

(2) Adversarial Mixture for Universal Space (MixPoet-AUS)

Universal Approximator (Makhzani et al. 2015)

𝑞 𝑧 𝑐, 𝜂 = 𝛿(𝑧 − 𝑓(𝑐, 𝜂))

𝜂~𝑁(0,1)sample 𝑓transformation

e.g., MLP𝑤, 𝑦1condition

𝑧1~𝑞(𝑧1|𝑤, 𝑦1)

Independent samples of 𝑧1, 𝑧2 from learned arbitrary complex distributions!get√

𝜂~𝑁(0,1)

simple distributioncondition

A learned arbitrary complex distribution

Page 16: MixPoet: Diverse Poetry Generation via Learning ...

= 𝐸𝑞 𝑧 𝑥, 𝑤, 𝑦1, 𝑦2 [log𝑞 𝑧 𝑥, 𝑤, 𝑦1, 𝑦2

𝑝 𝑧1 𝑤, 𝑦1 𝑝(𝑧2|𝑤, 𝑦2)]

≈ 𝐸𝑞 𝑧 𝑥, 𝑤, 𝑦1, 𝑦2 [log𝐶(𝑧, 𝑦1, 𝑦2)

1 − 𝐶( 𝑧1; 𝑧2 , 𝑦1, 𝑦2)]

𝐾𝐿[𝑞(𝑧|𝑥, 𝑤, 𝑦1, 𝑦2)| 𝑝 𝑧1 𝑤, 𝑦1 𝑝(𝑧2|𝑤, 𝑦2)

02 Methodology

(2) Adversarial Mixture for Universal Space (MixPoet-AUS)

𝐾𝐿[𝑞(𝑧|𝑥, 𝑤, 𝑦)| 𝑝 𝑧 𝑤, 𝑦 No analytical form!

Density Ratio Loss (Rosca et al. 2017)

Conditional latent discriminator

Page 17: MixPoet: Diverse Poetry Generation via Learning ...

02 Methodology

(2) Adversarial Mixture for Universal Space (MixPoet-AUS)

𝑧1~𝑞(𝑧1|𝑤, 𝑦1)

𝑧2~𝑞(𝑧2|𝑤, 𝑦2)𝑧~𝑞(𝑧|𝑥, 𝑤, 𝑦1, 𝑦2) 𝑧

𝑧1

𝑧2

𝑧2𝑧1

Use adversarial training to

minimize the ratio loss

𝐶

𝑧 𝑧2𝑧1

𝑦1, 𝑦2

1/0

When the discriminator is cheated

𝐶(∙) ≈ 0.5

𝐾𝐿[𝑞(𝑧|𝑥, 𝑤, 𝑦1, 𝑦2)| 𝑝 𝑧1 𝑤, 𝑦1 𝑝 𝑧2 𝑤, 𝑦2 ≈ 0

Page 18: MixPoet: Diverse Poetry Generation via Learning ...

CONTENTS

Background & Motivation

Methodology

Experiment & Conclusion

Jiuge (九歌) System

01

• 04

• 03

• 02

Page 19: MixPoet: Diverse Poetry Generation via Learning ...

03 Experiment & Analysis

Dataset

Two typical factors

Living Experience

(Dilthey 1985)

Historical Background

(Owen 1990)

Military Career (MC)

Countryside Life (CL)

Others

Prosperous Times (PT)

Troubled Times (TT)

Table 1: Statistics of labelled poems.

• 49,451 labelled poems

• 117k unlabelled poems

3 ∗ 2 = 6 styles!

Page 20: MixPoet: Diverse Poetry Generation via Learning ...

03 Experiment & Analysis

Experiment Results

Figure 2: Factor control accuracy.

Table 3: Human evaluation results of quality.

Table 2: Automatic evaluation results of diversity.

controllable

to some

extent!

Page 21: MixPoet: Diverse Poetry Generation via Learning ...

Acc

um

ula

tive

Fre

quen

cy

Most Frequent K Words

Figure 4 (b): Accumulative frequency of the most

frequent k words in generated poems.

03 Experiment & Analysis

Analysis

Figure 4 (a): Visualization of samples of z conditioned

on the keyword ‘spring wind’ and different mixtures.

Page 22: MixPoet: Diverse Poetry Generation via Learning ...

03 Experiment & Analysis

Case Study

Figure 5: Poems generated by baseline Basic (a), USPG (b) and MixPoet with different mixtures (c, d). Repetitive

phrases are marked in red. Phrases meeting different factor classes are marked in corresponding colors.

Page 23: MixPoet: Diverse Poetry Generation via Learning ...

03 Experiment & Analysis

General Comparison

Models Inter-topic

diversity

Intra-topic

diversity

Main contributor to

diversity

Interpretability of

styles

CVAE (Yang et al., 2018a) √ √ Randomness resulted by

different samples from the

latent space.

×

MRL (Yi et al., 2018) √ √ √ √ × Encourage the generation of

high-TF-IDF words.×

USPG (Yang et al., 2018b) √ √ √ √ Disentangle poetry space into

different clusters×

MixPoet (Ours) √ √ √ √ √ √ Mix different influence

factors conditioned latent

subspaces.

Different factor

mixture

MixPoet

• Best intra-topic diversity

• Satisfactory inter-topic diversity

• Interpretable and distinguishable styles

Page 24: MixPoet: Diverse Poetry Generation via Learning ...

03 Experiment & Analysis

Conclusion

A novel poetry generation model – MixPoet

Semi-supervised structure

Absorb and mix multiple factors which influence human poetry composition.

Controllability of factor-related properties

More distinguishable latent space

Improve both intra-topic & inter-topic diversity

Future Work

More factors, e.g., love experience, school of literary, gender, and age

Finer-granularity discretization, e.g., more classes for each factor (even continuous values?)

Dependence of influence factors, e.g., living experience is related to gender

Page 25: MixPoet: Diverse Poetry Generation via Learning ...

CONTENTS

Background & Motivation

Methodology

Experiment & Conclusion

Jiuge (九歌) System

01

• 04

• 03

• 02

Page 26: MixPoet: Diverse Poetry Generation via Learning ...

Jiuge

A Chinese poetry generation system developed by THUNLP lab.

• Support most popular genres of Chinese poetry

• Multi-style choices

• Human-machine collaborative creation

• Automatic Recommendation of related human-

authored poems.

• Page View > 7 million

(九歌)

Page 27: MixPoet: Diverse Poetry Generation via Learning ...

Jiuge (九歌)

https://jiuge.thunlp.cn/Online system:

Github: https://github.com/thunlp-aipoet

Source codes and data and of Jiuge systems will be

gradually released via our github page!

Page 28: MixPoet: Diverse Poetry Generation via Learning ...

Thanks!

Xiaoyuan Yi PhD Student

THUNLP Lab, Tsinghua University

Mail: [email protected]

[email protected]

Any questions or suggestions, please feel free to email us!