The Kyutech corpusand topic segmentation
using a combined method
1
Takashi Yamamura, Kazutaka Shimada and Shintaro Kawahara
Kyushu Institute of TechnologyThe Kyutech corpus and topic segmentation using a combined method
2The Kyutech corpus and topic segmentation using a combined method
Today’s Topic▶ 1. Open the Kyutech corpus
Japanese conversation corpus about a decision-making task
The first Japanese corpus for summarizationFreely available to anyone one the web
▶ 2. Evaluate three topic segmentation methodsTopic segmentation has an important role in the meeting summarization.
Previous study : LCSeg and TopicTilingThe combined methods based on LCSeg and TopicTiling
Introduction
3The Kyutech corpus and topic segmentation using a combined method
Multi-party Conversation Understanding
▶ Summarization of multi-party conversationUseful to understand the content of conversation
▶ There are some meeting corpora in English.The AMI Corpus (Carletta, 2007)The ICSI Corpus (Janin et al., 2003)
▶ ProblemsNo Japanese Meeting corpus for summarization
Background
Release “the Kyutech corpus”
4The Kyutech corpus and topic segmentation using a combined method
Outline▶ The Kyutech corpus
9 conversations• 4 scenarios with different settings
The Kyutech corpus
Conversation
5The Kyutech corpus and topic segmentation using a combined method
Outline▶ The Kyutech corpus
9 conversations• 4 scenarios with different settings
Transcription• Transcription of the conversation
The Kyutech corpus
Transcription
Speaker UtteranceA U1B U2D U3C U4D U5
6The Kyutech corpus and topic segmentation using a combined method
Outline▶ The Kyutech corpus
9 conversations• 4 scenarios with different settings
Transcription• Transcription of the conversation
Topic annotation• Annotation of topic tags for each utterance
The Kyutech corpus
Topic annotation
A Topic U1 B Topic U2 D Topic U3 C Topic U4 D Topic U5
Add Topic
7The Kyutech corpus and topic segmentation using a combined method
Outline▶ The Kyutech corpus
9 conversations• 4 scenarios with different settings
Transcription• Transcription of the conversation
Topic annotation• Annotation of topic tags for each utterance
Reference summary generation• Abstractive hand summaries
The Kyutech corpus
Reference SummaryA Topic
U1B Topic U2D Topic U3C Topic U4D Topic U5
Summary
8The Kyutech corpus and topic segmentation using a combined method
Task▶ A decision-making task with four
participantsDetermine a new restaurant in a virtual shopping mall
Discussion based on the document• Candidate and existing restaurants in the shopping mall• Statistics information about the mall (e.g. target
customers)
The Kyutech corpus
Three candidate restaurantsGender distribution
of hourly target customers
9The Kyutech corpus and topic segmentation using a combined method
Transcription▶ The transcription of the conversations
Separated utterances by 0.2-sec interval
The Kyutech corpus
Speaker Start End UtteranceD 00:24.490 00:25.530 (F ahh), in this condition
+D 00:26.585 00:27.615 which one is suitable (Q)
/C 00:29.985 00:31.195 I think the ramen is
better / A 00:31.815 00:33.965 me too /
10The Kyutech corpus and topic segmentation using a combined method
Transcription▶ The transcription of the conversations
Separated utterances by 0.2-sec intervalAnnotated some tags (e.g. filler, falter, question)
The Kyutech corpus
Speaker Start End UtteranceD 00:24.490 00:25.530 (F ahh), in this condition
+D 00:26.585 00:27.615 which one is suitable (Q)
/C 00:29.985 00:31.195 I think the ramen is
better / A 00:31.815 00:33.965 me too /
(F) : Filler
(Q) : Question
11The Kyutech corpus and topic segmentation using a combined method
Transcription▶ The transcription of the conversations
Separated utterances by 0.2-sec intervalAnnotated some tags (e.g. filler, falter, question)Added tags for sentence-level identification
The Kyutech corpus
Speaker Start End UtteranceD 00:24.490 00:25.530 (F ahh), in this condition
+D 00:26.585 00:27.615 which one is suitable (Q)
/C 00:29.985 00:31.195 I think the ramen is
better / A 00:31.815 00:33.965 me too /
𝑺𝟏
𝑺𝟐𝑺𝟑
Links to the next utterance
12The Kyutech corpus and topic segmentation using a combined method
Topic Annotation▶ Annotation of topic tags for each
utteranceIt is important to consider topics in summarization.
Topic tags• Express a topic of an utterance• Created 28 topic tags by 4 annotators including the
authors
The Kyutech corpus
CandX Closed Exist4 ClEx Area Atomos AccessCandY Exist1 Exist5 Mall People Time Meeting
CandZ Exist2 Exist6OtherMa
ll Price Seat ChatCandS Exist3 Exists Location Menu Sell Vague
13The Kyutech corpus and topic segmentation using a combined method
Topic Annotation▶ Annotation of topic tags for each
utteranceIt is important to consider topics in summarization.
Topic tags• Express a topic of an utterance• Created 28 topic tags by 4 annotators including the
authors
The Kyutech corpus
CandX Closed Exist4 ClEx Area Atomos AccessCandY Exist1 Exist5 Mall People Time Meeting
CandZ Exist2 Exist6OtherMa
ll Price Seat ChatCandS Exist3 Exists Location Menu Sell Vague
the existing or closed restaurants
the candidate restaurants
14The Kyutech corpus and topic segmentation using a combined method
Topic Annotation▶ Annotation of topic tags for each
utteranceIt is important to consider topics in summarization.
Topic tags• Express a topic of an utterance• Created 28 topic tags by 4 annotators including the
authors
The Kyutech corpus
CandX Closed Exist4 ClEx Area Atomos AccessCandY Exist1 Exist5 Mall People Time Meeting
CandZ Exist2 Exist6OtherMa
ll Price Seat ChatCandS Exist3 Exists Location Menu Sell Vague
the shopping mall
the details of the restaurant
15The Kyutech corpus and topic segmentation using a combined method
Topic Annotation▶ Annotation of topic tags for each
utteranceIt is important to consider topics in summarization.
Topic tags• Express a topic of an utterance• Created 28 topic tags by 4 annotators including the
authors
The Kyutech corpus
CandX Closed Exist4 ClEx Area Atomos AccessCandY Exist1 Exist5 Mall People Time Meeting
CandZ Exist2 Exist6OtherMa
ll Price Seat ChatCandS Exist3 Exists Location Menu Sell Vague
- The proceedings and final decision- Not related to the task- Others and unknown
16The Kyutech corpus and topic segmentation using a combined method
Annotation of Topic Tags▶ Multiple topic tags for each utterance
Main tag• Essential topic tags : main topic of an utterance
Additional tag• Optional topic tags : more detailed topic in the main topic
Topic Annotation
17The Kyutech corpus and topic segmentation using a combined method
Annotation of Topic Tags▶ Multiple topic tags for each utterance
Main tag• Essential topic tags : main topic of an utterance
Additional tag• Optional topic tags : more detailed topic in the main topic
Topic Annotation
ID Main Addition Utterance
A Exist1 what do you think of “Kaibutsu” (Q) /
C Exist1 Menuit has a wide variety on the menu /
Discussion about the menu of the existing restaurant
Exist1 : the existing
restaurant 1
18The Kyutech corpus and topic segmentation using a combined method
Annotation of Topic Tags▶ Multiple topic tags for each utterance
Main tag• Essential topic tags : main topic of an utterance
Additional tag• Optional topic tags : more detailed topic in the main topic
Topic Annotation
ID Main Addition Utterance
A Exist1 what do you think of “Kaibutsu” (Q) /
C Exist1 Menuit has a wide variety on the menu /ID Main Addition Utterance
B Menu Exist1 in the point of view of menu, “Kaibutsu” looks good /
D Menu Exist2 I wonder “FamilyPlate” looks good, also /
Discussion about the menu of the existing restaurant
Discussion about the existing restaurants in the point of view of menu
Exist1 : the existing
restaurant 1
Exist2 : the existing
restaurant 2
19The Kyutech corpus and topic segmentation using a combined method
Process▶ Topic annotation process
Step1 : annotation by 2 annotators• Investigate the result of topic annotation by 2
annotatorsStep2 : final judgment of each tag by 3 authors
The Kyutech corpus
20The Kyutech corpus and topic segmentation using a combined method
Step1 : Annotation by 2 annotators
▶ Main tag and Additional tag for each utteranceEach annotator selects at least one suitable topic tag.
Topic Annotation
Annotator1 Annotator2 ID Main Add Main Add Utterance
A Exist4 Sell Exist4 Sell …... "FamilyPlate" made the biggest sale in the restaurants +
D Exist4 Sell Exist4 (L uhn) /A Exist4 Sell Meeting and the restaurant is … +A Exist4 Sell Meeting the reason, what is the reason (Q)
/D Exist4 Menu People many menus and branches (?
Maybe) /
Main tag : essentialAdditional tag : optional
CandX Closed Exist4 ClEx Area Atomos AccessCandY Exist1 Exist5 Mall People Time Meeting
CandZ Exist2 Exist6OtherMa
ll Price Seat ChatCandS Exist3 Exists Location Menu Sell Vague
CandX Closed Exist4 ClEx Area Atomos AccessCandY Exist1 Exist5 Mall People Time Meeting
CandZ Exist2 Exist6OtherMa
ll Price Seat ChatCandS Exist3 Exists Location Menu Sell Vague
21The Kyutech corpus and topic segmentation using a combined method
Step1 : Annotation by 2 annotators
▶ Main tag and Additional tag for each utteranceEach annotator selects at least one suitable topic tag.
▶ The Agreement Score between 2 annotatorsThe rate that the same tag from 2 annotators is included• 0.879
Topic Annotation
Annotator1 Annotator2 ID Main Add Main Add Utterance
A Exist4 Sell Exist4 Sell …... "FamilyPlate" made the biggest sale in the restaurants +
D Exist4 Sell Exist4 (L uhn) /A Exist4 Sell Meeting and the restaurant is … +A Exist4 Sell Meeting the reason, what is the reason (Q)
/D Exist4 Menu People many menus and branches (?
Maybe) /
22The Kyutech corpus and topic segmentation using a combined method
Step2 : Final judgment of each tag
▶ Determination of the final tags by authorsBased on each topic tag from annotatorsExtension : Main tag and 2 Additional tags
▶ The Agreement ScoreThe rate that contains one or more tags from annotators• 0.965
Topic Annotation
Annotator1 Annotator2ID Main Addition Main AdditionD Exist4 Menu Exist4 A Exist4 Menu People A Exist4 People People
Final tag
MainAddition
1Addition
2Exist4 Menu Exist4 Menu Exist4 Menu People
Modified topic tags
23The Kyutech corpus and topic segmentation using a combined method
Reference summary▶ Reference summary of the conversation
Size : 250 characters to 500 charactersBased on the guideline of the AMI corpus
The Kyutech corpus
Understandable for somebodywho was not present during the meeting
24The Kyutech corpus and topic segmentation using a combined method
Data▶ Open Resources
9 conversation (total utterances : 4,509)TranscriptionTopic annotationReference summaries
▶ Currently Unpublished ResourcesQuestionnaireAudio-visual data recoding the conversation• A four-direction camera and a video camera
The Kyutech corpus
http://www.pluto.ai.kyutech.ac.jp/~shimada/resources.html
25The Kyutech corpus and topic segmentation using a combined method
Today’s Topic▶ 1. Open the Kyutech corpus
Japanese conversation corpus about a decision-making task
The first Japanese corpus for summarizationFreely available to anyone one the web
▶ 2. Evaluate three topic segmentation methodsTopic segmentation has an important role in the meeting summarization.
Previous study : LCSeg and TopicTilingThe combined methods based on LCSeg and TopicTiling
Introduction
26The Kyutech corpus and topic segmentation using a combined method
Outline▶ Topic segmentation
Divide the conversation into topic segmentsThe first process in conversation summarization• It is possible to generate a summary covering all topics.
- (Banerjee et al., 2015), (Oya et al., 2014)Previous study : LCSeg and Topic Tiling
Background
Summary (Topic1)Summary (Topic2)
Summary (TopicN)
Topic Segmentation Summary Generation
The Kyutech corpus
:
Topic Segments Final Summaries
27The Kyutech corpus and topic segmentation using a combined method
LCSeg▶ Lexical Cohesion Segmentation (Galley et
al., 2003)Text segmentation method based on lexical cohesion
Compute cohesion between sentences with lexical chain• Lexical chain : chain of the same word
Topic Segmentation
ID Utterance
CI guess “Kaibutsu" is suitable as the new restaurants /
A I'm with you, “Kaibutsu" is better /A “Kaibutsu" has a wide variety on the menu /C right, there are many menus /= = = = = = = SEGMENT = = = =
= = =B I guess so, but I suppose Chinese food is better /D I'd prefer to Chinese food /Segment at the break of lexical chains
= = = =
28The Kyutech corpus and topic segmentation using a combined method
TopicTiling▶ TopicTiling (Riedl and Biemann, 2012)
Text segmentation method using the LDA topic model
Topic model• Assume that one document has multiple topics
Latent Dirichlet Allocation (LDA)• Estimate the word distributions representing topics
Topic Segmentation
Sentence topic1 topic2 topic3 topicN = {0.05, 0.10, 0.02, ・・・ , 0.04} = {0.20, 0.01, 0.20, ・・・ , 0.07} : :
29The Kyutech corpus and topic segmentation using a combined method
Combined Method▶ Combine LCSeg and TopicTiling
Merge the characteristics of methods (word, topic model)
Use the cohesion between sentences of each method
Compute a new score with a weight factor • : a trade-off parameter
Topic Segmentation
𝐜𝐨𝐬𝑪 ( 𝑨 ,𝑩)=𝒘𝒇 ×𝐜𝐨𝐬𝑳 ( 𝑨 ,𝑩 )+(𝟏−𝒘𝒇 )×𝐜𝐨𝐬𝑻 (𝑨 ,𝑩)
Combined LCSeg TopicTiling
: Increase in the weight of LCSeg : Increase in the weight of TopicTiling
30The Kyutech corpus and topic segmentation using a combined method
Experiment▶ Data set
The Kyutech corpus: 8 conversations• Excluding one conversation as the development data
▶ Two criteriaThe F-measure of the complete and partial matching
Topic Segmentation
ID MainA MeetingA MeetingA SellD SellA SellD SellA Sell
The complete matching
The partial matching
31The Kyutech corpus and topic segmentation using a combined method
Experimental Result
▶ LCSeg > Combined > TopicTilingTopicTiling-based methods were low accuracy• The size of the Kyutech corpus is not enough to apply the statistical methods
Topic Segmentation
The number of topics in LDA
The value of the weight factor
The Kyutech corpus and topic segmentation using a combined method
Summary and Future Work▶ Summary
Release the Kyutech corpus• The first Japanese conversation corpus for summarization
Evaluate three topic segmentation methods• Combine two different text segmentation methods
▶ Future workScaling up the Kyutech corpusOther annotations for summarization• Dialogue-acts (communicative functions (Bunt, 2000) )
Abstractive summarization using the segmented topics
END
Conclusion
Top Related