Emotional dialogue generation via multiple classifier base ... · There are many related types of...

18
• Article • Emotional dialogue generation via multiple classifier base on generative adversarial network Wei CHENXinmiao CHENXiao SUN * School of Computer and Information, Hefei University of Technology, Hefei, 230601 China * Corresponding author, [email protected] Abstract Background Human-machine dialogue generation is an essential research direction of natural language processing. Generating high-quality, diverse, fluent, and emotional conversations is a challenging task. With the continuous advancement of research related to artificial intelligence and deep learning, the end-to-end neural network model provides an extensible conversational generation framework, which provides the possibility for machines to understand semantics and automatically generate responses. The neural network model also brings new questions and challenges. The basic conversational model framework tends to produce universal, meaningless, and relatively "safe" answers. Methods Based on generative adversarial network (GAN), a new emotional dialogue generation framework EMC-GAN was proposed to complete the task of emotional dialogue generation. The emotional dialogue generation model includes a generative model and three discriminative models. The generator is based on the basic sequence to sequence (Seq2Seq) dialogue generation model, and the discriminative model of the overall framework consists of a basic discriminative model, an emotion discriminative model, and a fluency discriminative model. The basic discriminative model distinguishes generated fake sentences and real sentences in the training corpus. The emotion discriminative model evaluates whether the emotion of the generated dialogue text is the same as the specified emotion, and directs the generative model to generate the dialog text of the specified emotion category. And the fluency discriminative model scores fluency of the generated dialogue and guides generator to produce more fluent sentences. Results From the experimental results, our model is superior to other similar models in emotion accuracy, fluency, and consistency. Conclusions The proposed EMC-GAN model can generate consistent, smooth, and fluent dialogue text with specified emotion, and has better performance on emotion accuracy, consistency, and fluency. Keywords Emotional dialogue generation; Sequence to sequence model; Emotion classification; Generative adversarial networks; Multiple classifier 1 Introduction

Transcript of Emotional dialogue generation via multiple classifier base ... · There are many related types of...

Page 1: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

• Article •

Emotional dialogue generation via multiple classifier base

on generative adversarial network

Wei CHEN, Xinmiao CHEN, Xiao SUN*

School of Computer and Information, Hefei University of Technology, Hefei, 230601 China

* Corresponding author, [email protected]

Abstract Background Human-machine dialogue generation is an essential research direction of natural

language processing. Generating high-quality, diverse, fluent, and emotional conversations is a challenging

task. With the continuous advancement of research related to artificial intelligence and deep learning, the

end-to-end neural network model provides an extensible conversational generation framework, which

provides the possibility for machines to understand semantics and automatically generate responses. The

neural network model also brings new questions and challenges. The basic conversational model

framework tends to produce universal, meaningless, and relatively "safe" answers. Methods Based on

generative adversarial network (GAN), a new emotional dialogue generation framework EMC-GAN was

proposed to complete the task of emotional dialogue generation. The emotional dialogue generation model

includes a generative model and three discriminative models. The generator is based on the basic sequence

to sequence (Seq2Seq) dialogue generation model, and the discriminative model of the overall framework

consists of a basic discriminative model, an emotion discriminative model, and a fluency discriminative

model. The basic discriminative model distinguishes generated fake sentences and real sentences in the

training corpus. The emotion discriminative model evaluates whether the emotion of the generated dialogue

text is the same as the specified emotion, and directs the generative model to generate the dialog text of the

specified emotion category. And the fluency discriminative model scores fluency of the generated dialogue

and guides generator to produce more fluent sentences. Results From the experimental results, our

model is superior to other similar models in emotion accuracy, fluency, and consistency. Conclusions

The proposed EMC-GAN model can generate consistent, smooth, and fluent dialogue text with specified

emotion, and has better performance on emotion accuracy, consistency, and fluency.

Keywords Emotional dialogue generation; Sequence to sequence model; Emotion classification;

Generative adversarial networks; Multiple classifier

1 Introduction

Page 2: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

The technology related to human-machine dialogue has used in many products, such as intelligent voice

assistants, online customer service, and so on, so people put forward higher requirements and expectations

for the level of human-machine dialogue. There are many related types of research on dialogue systems,

such as dialogue systems with commonsense knowledge[1], dialogue systems with audio context[2], latent-

variable task-oriented dialogue systems[3], dialogue system combining text and image[4]. For more related

research, please refer to the survey of Ma et al.[5] At present, dialogue generation mainly includes three

methods, rule-based system[6], information retrieval system[7] and generation-based system. This work is

based on the latter approach. On the machine translation task, a lot of research work has been done on the

Seq2Seq model, such as implementation using Recurrent Neural Network(RNN)[8], Long Short-Term

Memory (LSTM)[9] and attention mechanism[10]. Vinyals et al. first applied the Seq2Seq structure in

dialogue generation task[11].

The basic Seq2Seq model has a primary drawback when used to generate texts; evaluating the

performance of the model is usually at the sentence level. Since then, many kinds of research have tried to

solve these problems using generative adversary networks (GAN)[12], which got great success in computer

vision. Yu et al. proposed a more suitable framework for generating conversations based on GAN, called

SeqGAN[13]. Modeling the data generator as a stochastic policy in reinforcement learning[14-15], SeqGAN

avoids the difference of generators by directly doing gradient strategy updates. Li et al. proposed that using

adversarial training based on reinforcement learning for open-domain dialogue generation[16]. Cui et al.

proposed the Dual Adversarial Learning (DAL) framework that improves both diversity and overall quality

of the generated responses[17].

People with emotional intelligence can know and show their emotions, identify others' emotions, control

emotions, and use feelings and emotions to spur adaptive behavior[18]. It is equally essential to give

machines emotion in human-machine dialogue. Ghosh et al. proposed an LSTM-based model for

generating text with emotion[19]. Rashkin et al. introduce a new data set with emotional annotations that

used to provide retrieval candidates or fine-tune the dialogue model, leading to more empathetic

responses[20]. Emotional Chatting Machine, proposed by Zhou and Zhang, can generate appropriate

dialogue texts both in content and emotion[21]. Wang et al. proposed a framework SentiGAN, which enables

the model to generate diverse, high-quality texts with specific sentiment labels through penalty mechanisms

[22]. In previous work, we presented a model built on LSTM, and we change the training corpus to solve the

emotion factors in dialogue texts: the input is adapted to the original sentence and the sentence with the

emotion label, and the sentence with the emotion label is used as the output[23].

We introduce a new emotional dialogue generation model based on a generative adversarial network

(EMC-GAN) to implement the emotional dialogue generation task in this work. Since the basic dialogue

generation model is challenging to get the emotional features of the dialogue text, we solve this problem by

decomposing the emotional dialogue task. Several different models are trained to generate dialogue texts

with different emotions. Each model focuses on creating one kind of emotional dialogue text. In this way, it

excludes the interference and influence of other emotions in generating the specified emotional dialogue

text to improve the accuracy of generating dialogue texts with specified emotion. The proposed framework

Page 3: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

includes a generative model and multiple discriminative models. The generative model was constructed

based on the basic Seq2Seq dialogue generation model[24]; the discriminative model of framework is

composed of the basic discriminative model, emotion discriminative model, and fluency discriminative

model. They are useful to distinguish the generated text from the original text and guide the generated

dialogue text to be more fluent and with a specific emotion, respectively. The EMC-GAN model can

produce coherent, smooth, and fluent dialogue texts with specified emotion in results, and does better in

emotional accuracy, coherence, and fluency than others.

Figure 1 EMC-GAN overall framework.

2 Methods

The proposed emotional dialogue generation framework EMC-GAN includes one generative model and

three discriminative models. The generative model 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒) is a dialogue generation model based on

the basic Seq2Seq model. It generates coherent and fluent target sentences with the specified emotion

category e for the input source sentences. The discriminative model of the EMC-GAN includes a basic

discriminative model 𝐷𝑒(𝑋, 𝑌; 𝜃𝑑𝑒), an emotion discriminative model 𝐷𝑒

𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑𝑒), and a fluency

discriminative model 𝐷𝑒𝑓𝑙𝑢𝑒𝑛𝑐𝑦

(𝑋, 𝑌; 𝜃𝑑𝑒). Basic discriminative model is the same as the general dialogue

generation model, which is based on generative adversarial network, and distinguishes generated fake

sentences and real sentences in the training corpus. It also guides the generator to generate dialogue texts

that are closer to human dialogue texts. Emotion discriminative model is a binary classifier of text sequence

which can distinguish whether the emotion of the generated dialogue text and the specified emotion e is the

same. It gives the confidence probability of the emotion category of the input dialogue text is the specified

emotion category. Fluency Discriminator scores the fluency of input dialogue texts and guides the

generator to create more fluent dialogue texts.

2.1 Generative model oriented emotional dialogue text

Page 4: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

The goal of generative model 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒) is to generate the target sequence for the input source

sequence with emotion e. 𝜃𝑔𝑒 is a parameter of generative model. At each time-step t, generative model

𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒) produces a sentence sequence 𝑆𝑡 = 𝑌1:𝑡 = {𝑦<1>, 𝑦<2>, … , 𝑦<𝑡>}, where 𝑦<𝑡> is a word token

in the existing vocabulary. The Eq1 and Eq2 show the penalty based loss function[22]:

𝑉𝐺𝑒(𝑆𝑡, 𝑦<𝑡+1>) = 𝜆1 ⋅ 𝑉𝑒(𝑆𝑡, 𝑦<𝑡+1>) + 𝜆2 ⋅ 𝑉𝑒

𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑆𝑡, 𝑦<𝑡+1>) +

𝜆3 ⋅ 𝑉𝑒𝑓𝑙𝑢𝑒𝑛𝑐𝑦

(𝑆𝑡, 𝑦<𝑡+1>) (1)

where 𝑉𝐺𝑒(𝑆𝑡, 𝑦<𝑡+1>) is the total penalty score for the sentence sequence that calculated by multiple

discriminative models, 𝑉𝑒(𝑆𝑡, 𝑦<𝑡+1>) is the basic discriminative model 𝐷𝑒(𝑋, 𝑌; 𝜃𝑑𝑒) calculate penalty

score, 𝑉𝑒𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑆𝑡, 𝑦<𝑡+1>) is the emotional discriminative model 𝐷𝑒

𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑𝑒) calculate penalty

score about the emotion of generated dialogue texts, and 𝑉𝑒𝑓𝑙𝑢𝑒𝑛𝑐𝑦

(𝑆𝑡, 𝑦<𝑡+1>) is the penalty score about

the fluency of the sentences that calculated by fluency discriminative model 𝐷𝑒𝑓𝑙𝑢𝑒𝑛𝑐𝑦

(𝑋, 𝑌; 𝜃𝑑𝑒) and 𝜆1 +

𝜆2 + 𝜆3 = 1. In this paper, we set 𝜆1 = 0.5 and 𝜆2 = 𝜆3 = 0.25. And loss function 𝐿(𝑦<𝑡+1>) based on

penalty score:

𝐿(𝑦<𝑡+1>) = 𝐺𝑒(𝑦<𝑡+1>|𝑆𝑡; 𝜃𝑔𝑒) ⋅ 𝑉𝐺𝑒

(𝑆𝑡, 𝑦<𝑡+1>) (2)

where 𝐺𝑒(𝑦<𝑡+1>|𝑆𝑡; 𝜃𝑔𝑒) is the probability of choosing (t+1)-th word relies on sequence 𝑆𝑡. Generative

model reference penalty to minimize loss:

𝐽𝐺𝑒(𝜃𝑔

𝑒) = 𝐸𝑌∼𝑃𝐺𝑒[𝐿(𝑦)] = ∑ 𝐿(𝑦<𝑡+1>)

𝑡=𝑇𝑦−1

𝑡=0

(3)

The penalty is calculated as follows:

𝑉𝑒(𝑆𝑡 , 𝑦<𝑡+1>) = {

1

𝑁∑ (1 − 𝐷𝑒(𝑋, 𝑌𝑡+1; 𝜃𝑑

𝑒))𝑁

𝑛=1, 𝑡 < 𝑇𝑦

1 − 𝐷𝑒(𝑋, 𝑌𝑡+1; 𝜃𝑑𝑒), 𝑡 = 𝑇𝑦

(4)

where 𝑇𝑦 is the target sequence maximum length, and N is the size of Monte Carlo search samples. The

penalty score of the partial sequence is calculated by the average of multiple samples to reduce the loss

caused by sampling.

2.1.1 The baseline model of dialogue generation

The basic Seq2Seq model is used as a benchmark model. This model uses the encoder-decoder network

with deep LSTM units as the underlying architecture for dialogue text generation. Adding an effective

attention mechanism to the model can get more corresponding information between the source sentence

and the target sentence. The overall structure of this model shows in Figure 2. The dialog generation model

in this paper has the same network structure as the Seq2Seq baseline model.

Both encoder and decoder are implemented by LSTM. For the encoder, at each time-step, a token of the

source sequence will be input to the encoder network; after the input completed, the encoder generates a

semantic vector C for all time-step inputs, the semantic vector C represent the input source sequence. The

decoder's first state is provided by the generated semantic vector C. The decoder decodes the semantic

vector and outputs a token 𝑦<𝑡> at each time-step. We get the output sequence 𝑌(𝑦<1>, 𝑦<2>, . . . , 𝑦<𝑇𝑦>)

Page 5: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

for the input sequence 𝑋(𝑥<1>, 𝑥<2>, . . . , 𝑥<𝑇𝑥>) by the dialogue generation framework based on the

encoder-decoder network.

Figure 2 Encoder-Decoder with attention mechanism.

2.1.2 Attention mechanism of generative model

The core work of attention mechanism is to compute the context vector. The context vector 𝑐𝑜𝑛𝑡𝑒𝑥𝑡<𝑡>

can direct the decoder for decoding, and the decoder can know which work tokens of input sentence should

be focused on by the context vector. Figure 3 shows the calculation detail of the context vector in attention

mechanism. For saving the hidden layer state of the source sequence, the Seq2Seq model with attention

uses a bidirectional LSTM network[25] to extract the hidden state of the source sequence at each time-step.

In the bidirectional LSTM network, as Eq5 shows:

𝑎<𝑡> = (𝑎→<𝑡>

𝑎←

<𝑡>) (5)

𝑎<𝑡> contains two parts 𝑎→<𝑡> and 𝑎

←<𝑡> , representing the positive sequence feature and the reverse

sequence feature, respectively. The two parts hidden state vector is calculated by follows:

𝑎→<𝑡> = 𝐵𝑖𝐿𝑆𝑇𝑀𝑃𝑟𝑒 (𝑎

→<𝑡−1>, 𝑥<𝑡>) (6)

𝑎←<𝑡> = 𝐵𝑖𝐿𝑆𝑇𝑀𝑃𝑜𝑠𝑡 (𝑎

←<𝑡+1>, 𝑥<𝑡>) (7)

We can not only get the historical information of the sequence before time-step t, but also get the future

information of the sequence by the intermediate state vector 𝑎<𝑡>. To distinguish the intermediate state

vector of the decoder from that of the encoder, we use 𝑆<𝑡> to represent that at time-step t. The

intermediate state vector of the decoder at last time-step 𝑆<𝑡−1> is concatenated with intermediate state

vector of the encoder at each time-step:

𝑒<𝑡,𝑡′> = (𝑆<𝑡−1>, 𝑎<𝑡′>) (8)

Page 6: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

where the vector 𝑒<𝑡,𝑡′> represent the concatenated vector of the intermediate state vector of decoder at t-1

and the intermediate state vector of encoder at t'. As Eq9, attention vector 𝛼<𝑡,𝑡′> represents the degree of

the decoder at t focus on the intermediate state vector 𝑎<𝑡′> of the encoder at t'. As Eq10 shows, the

context vector 𝑐𝑜𝑛𝑡𝑒𝑥𝑡<𝑡> is obtained by multiplying the attention vector 𝛼<𝑡,𝑡′> and its corresponding

hidden state vector 𝑎<𝑡′> and then summing the up from time-step 1 to 𝑇𝑥.

Figure 3 Context vector in attention mechanism.

𝛼<𝑡,𝑡′> =exp (𝑒<𝑡,𝑡′>)

∑ exp (𝑒<𝑡,𝜏>)𝑇𝑥

𝜏=1

(9)

𝑐𝑜𝑛𝑡𝑒𝑥𝑡<𝑡> = ∑ 𝛼<𝑡,𝑡′>𝑎<𝑡′>

𝑇𝑥

𝑡′=1

(10)

2.2 Discriminative model

The deep discriminative models implemented by convolutional neural network (CNN)[26] and recursive

convolutional neural network (RCNN)[27] perform well in complex sequence classification tasks. We use

CNN as the core structure of the discriminative model in this work. In addition, the highway network[28] is

added to the discriminative model to improve the training speed. The emotion discriminative model and

fluency discriminative model are both pre-trained models and do not participate in the adversarial training

process of the model.

2.2.1 Basic discriminative model

Page 7: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

The text classification model based on CNN, proposed by Zhang and LeCun[29], is used as main structure

to the basic discriminative model 𝐷𝑒(𝑋, 𝑌; 𝜃𝑑𝑒), which employed to distinguish generated fake sentences

and real sentences in dataset. The loss function of the basic discriminative model is shown:

𝐽𝐷𝑒(𝜃𝑑

𝑒) = −[𝑦 ⋅ log(𝑝) + (1 − 𝑦) ⋅ log(1 − 𝑝)] (11)

The adversarial training process of the proposed emotional dialogue generation model EMC-GAN is in

Table 1.

Table 1 The adversarial training of EMC-GAN

Algorithm 1 Adversarial training of model

Input: 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒); 𝐷𝑒(𝑋, 𝑌; 𝜃𝑑

𝑒), 𝐷𝑒𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑

𝑒), 𝐷𝑒𝑓𝑙𝑢𝑒𝑛𝑐𝑦

(𝑋, 𝑌; 𝜃𝑑𝑒);

Real dialogue text (the emotion category of target sentence Y is e): R{X,Y}

Output: Trained Dialogue Generator: 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒)

1: Initialize 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒) and 𝐷𝑒(𝑋, 𝑌; 𝜃𝑑

𝑒) with random weights;

2: Pre-train 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒) using MLE on Train Data R{X,Y};

3: Generate Fake dialogue texts F{X,Y} using 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒)

4: Pre-train 𝐷𝑒(𝑋, 𝑌; 𝜃𝑑𝑒) using {R,F}

5: while model not converges do

6: for each generative step do

7: Generate fake dialogue text (F) using 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒)

8: Calculate penalty 𝑉𝐺𝑒 by Eq1

9: Update 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒) by Eq3

10: end

11: for each discriminative step do

12: Generate fake dialogue text (F) using 𝐺𝑒(𝑌|𝑋; 𝜃𝑔𝑒)

13: Update 𝐷𝑒(𝑋, 𝑌; 𝜃𝑑𝑒) using {R,F} by Eq11

14: end

15: end

16: return

2.2.2 Emotion discriminative model

To generate the dialogue response with specified emotion, the emotion discriminative model

𝐷𝑒𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑

𝑒) guides the generative model to generate the target sentence with specified emotion for

the input source sentence. The emotional discriminative model can distinguish whether the emotion

category of input dialogue text is the specified emotion category. The training process of the emotion

discriminative model is in Table 2. For the emotion discriminative model 𝐷𝑒𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑

𝑒), the real

dialogue text with emotion category e is 𝑅 = {Dialogue𝑒3} and the fake dialogue text with emotion

category is 𝐹 = {Dialogue𝑒1, Dialogue𝑒2, … , Dialogue𝑒6} . The emotion discriminator is used to

Page 8: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

discriminate between real dialogue text R and the fake dialogue text F and gives confidence probability that

the input dialogue text is real dialogue text. This model is similar to the basic discriminative model and is

trained in advance. Emotion accuracy in different categories is about 70% ~ 85%, and experimental results

show it enough to guide the generative model to generate sentences with emotion.

Table 2 The training process of the emotion discriminative model

Algorithm 2 Emotion Discriminative Model training

Input: 𝐷𝑒𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑

𝑒);

Real dialogue text (R) with emotion category e: R{𝐷𝑖𝑎𝑙𝑜𝑔𝑢𝑒𝑒},

Fake dialogue text (F) with other emotion category: F{𝐷𝑖𝑎𝑙𝑜𝑔𝑢𝑒𝑒1, 𝐷𝑖𝑎𝑙𝑜𝑔𝑢𝑒𝑒2, ...};

Output: Trained Emotional Discriminator: 𝐷𝑒𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑

𝑒);

1: Initialize 𝐷𝑒𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑

𝑒) with random weights;

2: while model not converges do

3: for each emotional discriminative step do

4: Update 𝐷𝑒𝑒𝑚𝑜𝑡𝑖𝑜𝑛(𝑋, 𝑌; 𝜃𝑑

𝑒) using {R,F} by Eq11

5: end

6: end

7: return

2.2.3 Fluency discriminative model

The sentence fluency evaluation algorithm in this work based on the sentence fluency evaluation method

proposed by Liu[30]. The process of sentence fluency evaluation shows in Table 3, and the algorithm

employs the N-gram statistical language model[31] to evaluate sentences' fluency. This algorithm uses the

transition probability of all three tuples to measure the fluency of the whole sentence. At first we count all

binary tuples in the dialogue text of dataset and the corresponding occurrence times, save results in

𝑛_𝑔𝑟𝑎𝑚2_𝑐𝑜𝑢𝑛𝑡, take the binary tuple as the key of the dictionary, and take the occurrence time of the

binary tuple as the value. Count all the triple tuple and corresponding occurrence times in the same way,

and save counted results in 𝑛_𝑔𝑟𝑎𝑚3_𝑐𝑜𝑢𝑛𝑡. The transition probabilities of all triple tuples are calculated

by the dictionary 𝑛_𝑔𝑟𝑎𝑚2_𝑐𝑜𝑢𝑛𝑡 and 𝑛_𝑔𝑟𝑎𝑚3_𝑐𝑜𝑢𝑛𝑡, as shown:

𝑝(𝑥𝑖 , 𝑥𝑗 , 𝑥𝑘) = 𝑝(𝑥𝑘|𝑥𝑖 , 𝑥𝑗) =𝑐𝑜𝑢𝑛𝑡(𝑥𝑖 , 𝑥𝑗 , 𝑥𝑘)

𝑐𝑜𝑢𝑛𝑡(𝑥𝑖 , 𝑥𝑗) (12)

where xi, xj and xk are adjacent words in the sentence. And the calculated result is saved in

𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏 . Finally, the descending order is sorted according to the transition probability

corresponding to the triple tuples and saved to list 𝑠𝑜𝑟𝑡𝑒𝑑_𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏. In general, sentences with

higher transition probability of n-gram tuple are more fluent. Use two transition probabilities to decide

whether the current generated sentence is fluent, 𝑟𝑒𝑤𝑎𝑟𝑑_𝑝𝑟𝑜𝑏 and 𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑝𝑟𝑜𝑏. The first 40% triple

tuples are smoother and the last 20% of these are more awkward in 𝑠𝑜𝑟𝑡𝑒𝑑_𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏 , then

Page 9: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

𝑟𝑒𝑤𝑎𝑟𝑑_𝑝𝑟𝑜𝑏 is the minimum transition probability in the first 40% tuples, and 𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑝𝑟𝑜𝑏 is the

maximum transition probability in the last 20% tuples.

When evaluating the fluency of sentence 𝑋 = {𝑥1, 𝑥2, … , 𝑥𝑚}, the initial fluency score is assigned to

𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋) = 0, and then all the triple tuples are traversed. If the length of the sentence is less than 3, the

algorithm will directly set the fluency score of this short sentence to 0 because we don't expect the model to

use these short sentences as responses for the input source sentence. If the transition probability of the

current triple tuple is higher than 𝑟𝑒𝑤𝑎𝑟𝑑_𝑝𝑟𝑜𝑏, it indicates that the triple tuple is relatively smooth. The

current fluency score 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋) add the ratio of transition probability to 𝑟𝑒𝑤𝑎𝑟𝑑_𝑝𝑟𝑜𝑏. If the transition

probability of current triple tuple is less than 𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑝𝑟𝑜𝑏, it indicates that the triple tuple is relatively

awkward. The current fluency score 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋) subtract the ratio of transition probability to

𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑝𝑟𝑜𝑏. If the binary tuple corresponding to the triple tuple does not exist, the triple tuple is

assigned a tiny transition probability (0.02) when calculating fluency. Finally, the computed fluency score

𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋) divides the number of triple tuples in the sentence is the final result.

Table 3 The fluency score of dialogue by fluency discriminative model

Algorithm 3 Calculate fluency score

Input: corpus, Sentence to be evaluated;

Output: the fluency score of input sentence;

1: Count the number of Bi-gram of dialogue corpus.

2: Bi-gram count dict 𝑛_𝑔𝑟𝑎𝑚2_𝑐𝑜𝑢𝑛𝑡={′′[𝑥𝑖, 𝑥𝑗]′′: the number of [𝑥𝑖 , 𝑥𝑗] in corpus}

3: Count the number of Tri-gram of dialogue corpus.

4: Tri-gram count dict 𝑛_𝑔𝑟𝑎𝑚3_𝑐𝑜𝑢𝑛𝑡 ={′′[𝑥𝑖, 𝑥𝑗 , 𝑥𝑘]′′: the number of [𝑥𝑖 , 𝑥𝑗 , 𝑥𝑘] in corpus}

5: Calculate Tri-gram transfer probability by Eq12

6: Tri-gram transition probability dict 𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏={′′[𝑥𝑖 , 𝑥𝑗 , 𝑥𝑘]′′: 𝑝(𝑥𝑖 , 𝑥𝑗 , 𝑥𝑘)}

7: Sort the 𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏 by transfer probability, and sorted probability saved to list

𝑠𝑜𝑟𝑡𝑒𝑑_𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏

8: The size of sorted probability list is: 𝑠𝑖𝑧𝑒 = 𝑙𝑒𝑛(𝑠𝑜𝑟𝑡𝑒𝑑_𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏)

9: 𝑟𝑒𝑤𝑎𝑟𝑑_𝑝𝑟𝑜𝑏 = 𝑠𝑜𝑟𝑡𝑒𝑑_𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏[int(𝑠𝑖𝑧𝑒 ∗ 0.4)]

10: 𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑝𝑟𝑜𝑏 = 𝑠𝑜𝑟𝑡𝑒𝑑_𝑛_𝑔𝑟𝑎𝑚3_𝑝𝑟𝑜𝑏[int(𝑠𝑖𝑧𝑒 ∗ (1 − 0.2)]

11: 𝑝(𝑥𝑖| Tri-gram) = 𝑝(𝑥𝑖 , 𝑥𝑖+1, 𝑥𝑥+2)

12: 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋) = 𝑓𝑙𝑢𝑒𝑛𝑐𝑦({𝑥1, 𝑥2, … , 𝑥𝑚}) = 0

13: if 𝑇𝑥 < 3 then

14: return 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋);

15: for 𝑖 = 1 to 𝑚 − 2 do

16: if 𝑝(𝑥𝑖| Tri-gram) ≥ 𝑟𝑒𝑤𝑎𝑟𝑑_𝑝𝑟𝑜𝑏 then

17: 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋) += 𝑝(𝑥𝑖| Tri-gram )/𝑟𝑒𝑤𝑎𝑟𝑑_𝑝𝑟𝑜𝑏

18: else if 𝑝(𝑥𝑖|Tri-gram) ≤ 𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑝𝑟𝑜𝑏 then

19: 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋) −= 𝑝(𝑥𝑖|Tri-gram)/𝑝𝑒𝑛𝑎𝑙𝑡𝑦_𝑝𝑟𝑜𝑏

Page 10: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

20: end

21: 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋) = 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋)/(𝑚 − 2)

22: return 𝑓𝑙𝑢𝑒𝑛𝑐𝑦(𝑋);

3 Experiments

In the process of emotional dialogue generation, the generative model generates target sentences

according to the input source sentences and a specified emotion category. The resulting sentence sequence

should be consistent, fluent, and have the specified emotion category.

3.2 Datasets

The dialogue dataset is a series of dialogue pairs with emotion category labels {X, Y}, where 𝑋 =

{𝑒𝑥 , 𝑥<1>, 𝑥<2>, … , 𝑥<𝑇𝑥>} is the source sentence sequence, 𝑌 = {𝑒𝑦 , 𝑦<1>, 𝑦<2>, … , 𝑦<𝑇𝑦>} is the target

sentence sequence (dialogue response), 𝑒𝑥 and 𝑒𝑦 are the emotion category label of the source sentence and

the target sentence respectively. The generative model intends to produce the target sentence with a

specified emotion for input source sentences with any emotion. To construct the corresponding dataset for

different emotion generative model, we divide the dataset into multiple sub-datasets according to the

emotion of the target sentence. All the target sentence sequences have the same emotion label 𝑒𝑦 in each

sub-dataset.

NLPCC Weibo (NLPW): it is built from conversations extracted from Weibo comments and has

1119200 dialogue pairs with six emotion category labels (anger, disgust, happiness, like, sadness, and

other), similar to our previous work[23].

Xiaohuangji (XHJ): this dataset has 454130 dialogue pairs. However this dialogue corpus has no

corresponding emotion labels. We use natural language processing open-source tool HanLP[32] to train an

emotion classification model, which is a naive Bayes classifier with NLPW dataset as training data. The

emotion classification model can classify the sentence into six emotions corresponding to the NLPW

dataset.

The quantity distribution of two datasets NLPW and XHJ in different emotion categories is shown in

Figure 4. Table 4 shows the emotion distribution of different emotion sub-datasets.

Page 11: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

Figure 4 Dataset emotion distribution.

Table 4 Sub-dataset emotion distribution

Target Emotion

Dataset Source Emotion Other Like Sadness Disgust Anger Happiness

NLPW

Other 0.419 0.271 0.290 0.301 0.291 0.276

Like 0.187 0.381 0.182 0.180 0.159 0.276

Sadness 0.095 0.083 0.212 0.098 0.111 0.099

Disgust 0.149 0.110 0.145 0.259 0.199 0.120

Anger 0.065 0.040 0.064 0.080 0.147 0.057

Happiness 0.085 0.115 0.108 0.082 0.094 0.171

XHJ

Other 0.450 0.393 0.377 0.388 0.374 0.382

Like 0.132 0.227 0.119 0.124 0.109 0.144

Sadness 0.087 0.079 0.172 0.080 0.072 0.106

Disgust 0.173 0.155 0.170 0.246 0.183 0.160

Anger 0.102 0.091 0.109 0.110 0.212 0.101

Happiness 0.056 0.053 0.053 0.052 0.049 0.108

3.4 Experimental setup

The basic Seq2Seq dialogue generation model is a benchmark model for experimental comparison.

Besides, the hybrid neural network-based emotional dialogue generation model EHMCG[23] and the

emotional dialogue generation model EM-SeqGAN[33] are compared with the proposed EMC-GAN model.

This work mainly analyzes and evaluates the generated dialogue text of different models from three

indicators of emotion accuracy, coherence, and fluency. Tensorflow[34] is used to build our model. For the

penalty score calculated by the discriminator coefficient 𝜆 (in Eq1), the ratio between 𝜆1, 𝜆2, and 𝜆3 set at

2:1:1, and they constrain the weights of three evaluation dimensions in guiding the training of the

generative model. The training iterations of the generative model and discriminative model are 5 and 10,

respectively.

3.5 Emotion accuracy

Page 12: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

After generating dialogue text, we annotate the categories of emotion for the generated dialogue text by

the emotion classifier of HanLP. If the emotion category of the generated sentence is the same as that of the

target sentence, the emotion category of the generated target sentence conforms to the expectation.

Table 5 shows emotion accuracy of generated dialogue text of different dialogue generation models.

Compared with other models, the proposed EMC-GAN model has the highest emotion accuracy in any

emotions of each sub-dataset. For the NLPW dataset, the EMC-GAN model has high emotion accuracy in

emotion "Other", "Like", "Sadness" and "Disgust", which is 0.588 ~ 0.740, while the emotion accuracy in

emotion "Anger" and Happiness is only 0.392 and 0.236. For the XHJ dataset, the EMC-GAN model has

higher emotion accuracy in every emotion, and the emotion accuracy reaches 0.701~0.870. Compared with

other emotion categories, the emotion "Other" has the highest emotion accuracy on both datasets, and the

reason may be the emotion "Other" has the most number of training data. The emotion "Other" represents

any other emotions; the emotion classification model may tend to judge the emotion category of input

sentence as emotion "Other".

Table 5 The emotion accuracy of generated dialogue text

Emotion Accuracy

Dataset Model Other Like Sadness Disgust Anger Happiness

NLPW

Seq2Seq 0.286 0.121 0.089 0.128 0.212 0.191

EHMCG 0.421 0.354 0.374 0.289 0.211 0.195

EM-SeqGAN 0.572 0.487 0.548 0.376 0.295 0.201

EMC-GAN 0.740 0.687 0.723 0.588 0.392 0.236

XHJ

Seq2Seq 0.293 0.176 0.064 0.227 0.177 0.094

EHMCG 0.563 0.379 0.374 0.458 0.385 0.295

EM-SeqGAN 0.647 0.563 0.487 0.567 0.426 0.375

EMC-GAN 0.870 0.794 0.761 0.732 0.765 0.701

3.6 Coherence evaluation

The dialogue text's coherence, whether it meets the question or related context of the source sentence, is

one of the essential portions to estimate the performance of the dialogue generation model. Now, we do not

have a good enough model to evaluate the coherence of the generated dialogue text. To complete the task

of dialogue consistency assessment, we use manual judge method to assess the coherence of the generated

dialogue text. The evaluation options and corresponding evaluation scores are shown in Table 6. The

evaluation score range from 1~5 and the higher evaluation score means better coherence of the dialogue

text.

The coherence evaluation score of generated dialogue text of different dialogue generation models shows

in Table 7. The proposed EMC-GAN model has a higher coherence evaluation scores on all emotion

categories of both datasets than other models. The EMC-GAN model gets higher coherence evaluation

scores on XHJ dataset than that of NLPW dataset. The generated dialogue texts of "Other", "Like" and

"Sadness" emotion categories obtain the higher coherence evaluation score in both datasets. Notably, the

Page 13: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

coherence evaluation scores of the EMC-GAN model are 3.407 and 3.180 in "Other" and "Sadness",

respectively. That indicates the generated dialogue sentences have pretty good coherence.

Table 6 The evaluation options evaluation score of human evaluation

Option very good good normal bad very bad

Score 5 4 3 2 1

Table 7 The coherence evaluation score of generated dialogue

Coherence Evaluate Score

Dataset Model Other Like Sadness Disgust Anger Happiness

NLPW

Seq2Seq 1.306 1.067 1.451 1.085 1.181 1.051

EHMCG 1.403 1.192 1.387 1.236 1.335 1.096

EM-SeqGAN 1.732 1.563 1.734 1.522 1.403 1.225

EMC-GAN 2.277 2.875 2.115 1.881 1.972 1.245

XHJ

Seq2Seq 1.127 1.361 1.229 1.111 1.147 1.263

EHMCG 1.256 1.452 1.248 1.324 1.223 1.371

EM-SeqGAN 1.820 1.726 1.339 1.514 1.330 1.207

EMC-GAN 3.407 2.542 3.180 1.931 1.561 2.255

3.7 Fluency evaluation

Similar to coherence, the fluency of generated dialogue text is also an essential factor in evaluating the

dialogue generation model's performance, which much reflects the dialogue generation model's text

production capability. The fluency discriminative model evaluates the fluency of generated dialogue texts

in this process, and the fluency evaluation method is shown in Algorithm 3. Besides, to improve the

accuracy of the fluency evaluation, we also adopt manual evaluation.

The fluency score is obtained by the fluency discriminative model. As Table 8 shows, the proposed

EMC-GAN model has a higher fluency score than other models in each dataset and emotion category. For

the NLPW dataset, the EMC-GAN model gets a higher fluency score on the emotion "Sadness" and

"Anger", and the fluency score on the emotion "Other" is lower. The generated dialogue texts of the XHJ

dataset get higher fluency scores than that of the NLPW dataset, and the fluency of sentence improves

obviously. For the XHJ dataset, the fluency score on the emotion "Other" is also relatively low, and others

have a higher fluency score.

Table 9 is the result of the fluency score evaluated by humans of different models. This fluency score is

similar to the coherence evaluation score. Compared with other models, the proposed EMC-GAN model

gets a higher fluency in each dataset and emotion category. The EMC-GAN model has a higher fluency

score in each emotion of the XHJ dataset than that of the NLPW dataset. For the XHJ dataset, our EMC-

GAN model gets the highest fluency evaluation score of 4.480 on emotion "Sadness", and the fluency

evaluation score on emotion "Disgust" and "Anger" are lower at 2.835 and 2.960, respectively.

Table 8 The fluency score of generated dialogue evaluated by algorithm 3

Fluency Score

Dataset Model Other Like Sadness Disgust Anger Happiness

NLPW Seq2Seq -0.189 -0.193 -0.192 -0.193 -0.192 -0.194

Page 14: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

EHMCG 0.405 0.727 1.133 0.526 1.238 0.943

EM-SeqGAN 0.586 0.875 1.875 1.034 1.337 1.237

EMC-GAN 0.854 1.710 2.617 1.512 2.498 1.706

XHJ

Seq2Seq -0.124 -0.123 -0.124 -0.125 -0.124 -0.123

EHMCG 3.356 5.528 7.526 6.776 5.882 4.581

EM-SeqGAN 4.652 7.238 8.832 7.774 6.237 6.735

EMC-GAN 6.300 9.239 11.33 10.33 10.26 11.26

Table 9 The fluency score of generated dialogue evaluated by human

Fluency Evaluate Score

Dataset Model Other Like Sadness Disgust Anger Happiness

NLPW

Seq2Seq 1.193 1.267 1.251 1.202 1.114 1.351

EHMCG 1.253 1.196 1.325 1.269 1.156 1.183

EM-SeqGAN 2.013 2.162 2.067 1.849 1.758 1.657

EMC-GAN 2.424 2.875 2.365 2.476 2.272 1.984

XHJ

Seq2Seq 1.287 1.111 1.567 1.311 1.265 1.187

EHMCG 1.257 1.284 1.732 1.455 1.325 1.173

EM-SeqGAN 2.471 2.648 2.741 1.846 1.775 2.659

EMC-GAN 3.717 3.760 4.480 2.835 2.960 3.326

3.8 Results analysis

The emotion accuracy error of this experiment mainly comes from the mistake of datasets and emotion

classification model. As the accuracy of the emotion classification model for the NLPW dataset is 64%,

there are some errors in the emotion category of the dialogue text. Since the dialogue text in the XHJ

dataset does not have a corresponding emotion category label, the HanLP tool uses to train an emotion

classifier on the NLPW dataset as a training corpus and add emotion labels to the dialogue text in the XHJ

dataset. Besides, to reduce the influence of emotion classification model on the emotion evaluation for the

dialogue texts, we use the same emotion classification model to classify the generated dialogue texts.

From the experimental results, our model has a more excellent performance in the XHJ dataset than that

of NLPW dataset. By analyzing the dialogue text in NLPW dataset and found the sentence quality

generally poor. Compared with the general dialogue text, the length of the sentences in the NLPW dataset

is shorter, and sentences are awkward, which may be caused by the irregularity of comment sentence in

Weibo. In the NLPW data set, there are a large number of binary and triple tuples with the frequency of one

occurrence, which is not conducive to the construction of the language model, and it is difficult for the

dialogue generation model to reach the convergence state. Analyzing dialogue text generated by the

baseline model shows that the majority of the generated sentences are a sequence with random words, and

the fluency of the generated sentences is particularly bad. It can see in Table 8, the generated dialogue texts

of the Seq2Seq model get lower fluency scores, and its fluency scores on different emotions are roughly the

same. Because in the process of fluency evaluation, we give the transition probability of this triple tuple a

small value (0.02) when the binary tuple corresponding to the triple tuple does not exist. Due to the fluency

Page 15: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

of the generated dialogue texts are too weak, the transition probability of most triple tuples are close to the

minimum value.

Some generated dialogue text samples get from the test set. Table 10 shows the generated dialogue text

of Seq2Seq and EMC-GAN model; the emotion column in the table is the emotion category that dialogue

text requires should generate. It can see that the response generated by Seq2Seq is relatively weak in terms

of emotional and content consistency, and the generated responses are not well satisfied in multiple

evaluation dimensions. Our model is better at emotion accuracy, coherence, and fluency.

Table 10 Cases of generated dialogue from Seq2Seq and EMC-GAN

Model Post(translate) Emotion Response(translate)

Seq2Seq

我命运多坎坷啊你都预见不到!

How rough my fate has been, you never saw it

coming!

Anger

关我什么事?

What is it to me?

懒得理你。

Talk to the hand. Disgust

切~,我是傻。

Cut ~, I am stupid.

你吹牛。

You brag. Happiness

就是啊、已经注意到了。

Yeah, I've noticed.

我也要参加。

I'll be there, too. Like

谢谢您的关注!祝您周末愉快。

Thank you for your attention! Have a nice weekend.

牙齿不痛了?

No pain in your tooth? Sadness

我爬山好痛苦...

It was painful climbing the mountain...

来的话,如何找你?

How can I get to you if I come? Other

星期六晚。

Saturday night.

EMC-GAN

不是我吓你,你晚上真的会做噩梦,小狗会

变成鬼找你。

It’s not that I scare you, you really have nightmares at night, and the puppy will turn into

a ghost to find you.

Anger

吓到我了!

Scared me!

青岛,一座来过就不想再来的城市!

Qingdao, a city that has been here but never

wants to come again!

Disgust

你怎么这么说话呢?赤裸裸的羡慕嫉妒。

Why are you talking like this? Naked envy and

jealousy.

又是第一名。

Number one again. Happiness

感觉心情不错很好。

I feel in a good mood.

我爱你。

I love you. Like

真的啊~,超级无敌帅。

Really, super invincible handsome.

我不是故意的亲爱的。

I didn't mean to, honey. Sadness

太伤心了…

It's so sad...

你是机器人嘛?回复这么快。

Are you a robot? Reply so quickly. Other

我又不是,我怎么知道。

I am not, how do I know.

4 Conclusion

A new emotional dialogue generation framework EMC-GAN proposed in this work uses multiple

classifiers to generate better dialogues in different feature dimensions. The generative model generates the

target sentences for the input source sentences. The basic discriminative model distinguishes generated fake

sentences and real sentences in the dataset. The emotion discriminative model evaluates whether the

emotion of the generated dialogue text is the same as the specified emotion. And the fluency discriminative

model judges the input dialogue is fluent or not, and give the fluency score for the input dialogue text.

According to the experimental results, the EMC-GAN model can generate dialogue text with a specified

emotion. Compared with other models, the generated dialogue texts of EMC-GAN are more fluent and

smoother. However, the accuracy of the emotion classifier should be improved to obtain more realistic

Page 16: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

dialogue text. Besides, other features of the sentence, such as novelty and variability, would be considered

to make the final dialogue text more fluent and natural.

References

1 Young T, Cambria E, Chaturvedi I, Zhou H, Biswas S, Huang M. Augmenting end-to-end dialogue systems with

commonsense knowledge. Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, Louisiana, USA,

2018: 4970-4977

2 Young T, Pandelea V, Poria S, Cambria E. Dialogue systems with audio context. Neurocomputing. 2020, 388:102-109

DOI: 10.1016/j.neucom.2019.12.126

3 Xu H, Peng H, Xie H, Cambria E, Zhou L, Zheng W. End-to-End latent-variable task-oriented dialogue system with

exact log-likelihood optimization. World Wide Web, 2020, 23(3): 1989-2002

DOI: 10.1007/s11280-019-00688-8

4 Zhang Z, Liao L, Huang M, Zhu X, Chua T S. Neural multimodal belief tracker with adaptive attention for dialogue

systems. The World Wide Web Conference. San Francisco, CA, USA, 2019: 2401-2412

DOI: 10.1145/3308558.3313598

5 Ma Y, Nguyen K L, Xing F Z, Cambria E. A survey on empathetic dialogue systems. Information Fusion, 2020.

6 Adrian P, Harold B. Rule Responder: Rule-Based Agents for the Semantic-Pragmatic Web. International Journal on

Artificial Intelligence Tools. 2011, 20:1043-1081

DOI: 10.1142/S0218213011000528

7 Xu M, Li P, Yang H, Ren P, Ren Z, Chen Z, Ma J. A Neural Topical Expansion Framework for Unstructured Persona-

oriented Dialogue Generation. The 24th European Conference on Artificial Intelligence (ECAI). Santiago, Chile, 2020

8 Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio, Y. Learning phrase

representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference

on Empirical Methods in Natural Language Processing. Doha, Qatar, ACL, 2014: 1724-1734

DOI: 10.3115/v1/d14-1179

9 Sutskever I, Vinyals O, Le Q V. Sequence to Sequence Learning with Neural Networks. Advances in Neural Information

Processing Systems. Quebec, Canada, 2014: 3104-3112

10 Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. 3rd International

Conference on Learning Representations. San Diego, CA, 2015

11 Vinyals O, Le Q. A neural conversational model. arXiv preprint arXiv:1506.05869, 2015

12 Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative

adversarial nets. Advances in Neural Information Processing Systems. Montreal, Quebec, 2014: 2672-2680

13 Yu L, Zhang W, Wang J, Yu Y. SeqGAN: Sequence generative adversarial nets with policy gradient. In: Proceedings of

the Thirty-First AAAI Conference on Artificial Intelligence. San Francisco, California, AAAI, 2017: 2852-2858

14 Kaelbling L P, Littman M L, Moore A W. Reinforcement learning: A survey. Journal of artificial intelligence research,

1996, 4: 237-285

15 Sutton R S, McAllester D A, Singh S P, Mansour Y. Policy gradient methods for reinforcement learning with function

approximation. Advances in Neural Information Processing Systems 12, NIPS Conference. Denver, Colorado, USA,

1999: 1057-1063

16 Li J, Monroe W, Shi T, Jean S, Ritter A, Jurafsky D. Adversarial learning for neural dialogue generation. In:

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark,

ACL, 2017: 2157-2169

DOI: 10.18653/v1/d17-1230

17 Cui S, Lian R, Jiang D, Song Y, Bao S, Jiang Y. DAL: Dual Adversarial Learning for Dialogue Generation. arXiv

preprint arXiv:1906.09556, 2019

Page 17: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

18 Salovey P, Mayer J D. Emotional intelligence. Imagination, Cognition and Personality. 1990, 9(3): 185-211

DOI: 10.2190/DUGG-P24E-52WK-6CDG

19 Ghosh S, Chollet M, Laksana E, Morency L, Scherer S. Affect-LM: A neural language model for customizable affective

text generation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics,

Vancouver, Canada. 2017: 634-642

DOI: 10.18653/v1/P17-1059

20 Rashkin H, Smith E M, Li M, Boureau Y L. Towards Empathetic Open-domain Conversation Models: A New

Benchmark and Dataset. In: Proceedings of the 57th Conference of the Association for Computational Linguistics.

Florence, Italy, 2019, 1: 5370-5381

DOI: 10.18653/v1/p19-1534

21 Zhou H, Huang M, Zhang T, Zhu X, Liu B. Emotional chatting machine: Emotional conversation generation with

internal and external memory. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. New

Orleans, Louisiana, 2018:730-739

22 Wang K, Wan X. SentiGAN: Generating Sentimental Texts via Mixture Adversarial Networks. IJCAI. Stockholm,

Sweden, 2018: 4446-4452

DOI: 10.24963/ijcai.2018/618

23 Sun X, Peng X, Ding S. Emotional human-machine conversation generation based on long short-term memory.

Cognitive Computation, 2018, 10(3): 389-397

DOI: 10.1007/s12559-017-9539-4

24 Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv preprint

arXiv:1409.0473, 2014

25 Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks. 2013 IEEE international

conference on acoustics, speech and signal processing. IEEE, 2013: 6645-6649

DOI: 10.1109/ICASSP.2013.6638947

26 Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X. Improved techniques for training GANs.

Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems

2016, Barcelona, Spain. 2016: 2226-2234

27 Lai S, Xu L, Liu K, Zhao, J. Recurrent convolutional neural networks for text classification. In: Proceedings of the

Twenty-Ninth AAAI Conference on Artificial Intelligence. Austin, Texas, AAAI, 2015: 2267-2273

28 Srivastava R K, Greff K, Schmidhuber J. Highway networks. arXiv preprint arXiv:1505.00387, 2015

29 Zhang X, LeCun Y. Text Understanding from Scratch.arXiv preprint arXiv:1502.01710, 2015

30 Liu D. Approaches to Chinese Word Analysis; Utterance Segmentation and Automatic Evaluation of Machine

Translation. Dissertation for the Master Degree. Beijing: Institute of Automation, Chinese Academy of Sciences. 2004

31 Doddington G. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In:

Proceedings of the second international conference on Human Language Technology Research. San Diego, California,

2002: 138-145

DOI: 10.5555/1289189.1289273

32 Zhang H P, Yu H K, Xiong D Y, Liu, Q. HHMM-based Chinese lexical analyzer ICTCLAS. In: Proceedings of the

Second Workshop on Chinese Language Processing. Sapporo, Japan, ACL, 2003, 17: 184-187

DOI: 10.3115/1119250.1119280

33 Sun X, Chen X, Pei Z, Ren F. Emotional human machine conversation generation based on SeqGAN. First Asian

Conference on Affective Computing and Intelligent Interaction (ACII Asia). IEEE, Beijing, 2018: 1-6

DOI: 10.1109/ACIIAsia.2018.8470388

34 Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg

J, Monga R, Moore S, Derek G. Tensorflow: A system for large-scale machine learning. 12th USENIX Symposium on

Page 18: Emotional dialogue generation via multiple classifier base ... · There are many related types of research on dialogue systems, such as dialogue systems with commonsense knowledge[1],

Operating Systems Design and Implementation (OSDI 2016), Savannah, GA, USA, 2016: 265-283