Event-Based Extractive Summarization
-
Upload
dustin-gibson -
Category
Documents
-
view
24 -
download
2
description
Transcript of Event-Based Extractive Summarization
![Page 1: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/1.jpg)
Event-Based Extractive SummarizationEvent-Based Extractive Summarization
E. Filatova and V. Hatzivassiloglou Department of Computer Science
Columbia University
(ACL 2004)
![Page 2: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/2.jpg)
2/23
Abstract Most approaches to extractive summarization define
a set of features upon which selection of sentences is based, using algorithms independent of the features
We proposed a new set of features based on low-level atomic events that describe relationships between important actors
Our experimental results indicate that not only the event-based features offer an improvement in summary quality over words as features, but that this effect is more pronounced for more sophisticated summarization methods
![Page 3: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/3.jpg)
3/23
Introduction
Identify what information is important and should
be included into the summary
Break the input text into textual units
(sentences, clauses, etc.)
Score every textual unitaccording to what information
is covered in it
textual unitsinformation features
Choose the textual unitthat should be added to the
summary
repeat until wereach the desired length
rescore thetextual unitsbased on whatinformationis alreadycovered bythe summary
![Page 4: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/4.jpg)
4/23
General Summarization Model
Ti: text unit I Ci: concept i* Each concept ci has an associated weight wi Indicating the importance
![Page 5: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/5.jpg)
5/23
General Summarization Model Using the above matrix it is possible to
formulate the extractive summarization problem as extracting the minimal amount of text units which cover all the concepts that are interesting or important
To account for the cost of long summaries, we can constrain the total length of the summary or balance it against the total weight of covered concepts
![Page 6: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/6.jpg)
6/23
Associating Concepts with Features Lexical features:
Words: tf*idf weights show what words are important Words used in titles and section headings (Luhn’59) Presence of cue phrases in the textual unit: in conclusion, significant (Kupie
c et al’95) Co-occurrence of some particular terms: lexical chains (Barzilay & Elhadad’
97), topic signatures (Lin & Hovy’2000)
Non-lexical features: Textual unit’s position in the input text: head-line, first sentence in the para
graph (Baxedale’58) Rhetorical representation of the source text (Marcu’97)
We suggest to use atomic events as features signaling out the important sentences
![Page 7: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/7.jpg)
7/23
Atomic EventsAtomic events = Relation + Connector (potential label for
the relation)
Relation is a pair of Named Entities or significant nouns For the input text, get all possible pairs of named entities
within one sentence For every relation analyze all the verbs and action defining
nouns in-between the named entities in the relation, these verbs/nouns can be used as labels for the extracted relations
Some important words are not marked by name entities but are highly likely to be among the most frequently used nouns
Top ten most frequent nouns are added to the relation list
![Page 8: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/8.jpg)
8/23
Atomic Events Algorithm for atomic event extraction
Analyze each input sentence one at a time; ignore sentences that do not contain at least two name entities or frequent nouns
Extract all the possible pairs (relations) of named entities / frequent nouns in the sentence and words in-between the name entities (connectors)
For each relation, count how many times this relation is used in the input texts
Keep only connectors that are content verbs or action nouns, according to Word-Net’s noun hierarchy. For each connector calculate how many times it is used for the extracted relations
![Page 9: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/9.jpg)
9/23
Atomic Events Calculate normalized frequencies for all relations
Normalized frequency of a relation = n/N, where n – frequency of the current relation in a topic N – overall frequency of all relations in a topic
Calculate normalized frequencies for all connectors
Normalized frequency of a connector = c/S, where
c – frequency of the current connector in a relationS – overall frequency of all connectors for a relation
![Page 10: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/10.jpg)
10/23
Atomic EventsNorm. Rel.Frequency
First Element
Second Element
0.0212 China Airlines Taiwan
0.0191 China Airlines Taipei
0.0170 China Airlines Monday
0.0170 Taiwan Monday
0.0170 Bali Taipei
0.0148 Taipei Taiwan
0.0148 Bali Taiwan
0.0148 Taipei Monday
0.0127 International Airport Taiwan
![Page 11: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/11.jpg)
11/23
Atomic EventsRelation Connecto
rNorm. Conn. Freq.
China Airlines - Taiwan crashed/VBDtrying/VBGburst/VBPland/VB
0.03120.03120.02670.0267
China Airlines - Taipei burst/VBPcrashed/VBDcrashed/VBN
0.03110.03110.0198
![Page 12: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/12.jpg)
12/23
Atomic Events Atomic event score:
The score of the atomic event predicts how well the important this atomic event for the collection of texts is
Atomic Event Score = Normalized freq.
of the relation( ) * ( Normalized freq. of the connector)
![Page 13: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/13.jpg)
13/23
Textual Unit Selection Static Greedy Algorithm
For every textual units, calculate the weight of this text unit as the sum of the weights of all the concepts covered by this textual unit
Choose the text unit with maximum weight and add it to the final output
Continue extracting other textual units in order of total weight till we get the summary of the desired length
![Page 14: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/14.jpg)
14/23
Textual Unit Selection Adaptive Greedy Algorithm
For each textual calculate its weight as the sum of weights of all concepts it covers
Choose the textual unit with the maximum weight and add it to the output. Add the concepts covered by this textual unit to the list of concepts covered in the final output.
Recalculate the weights of text units: subtract from each unit’s weight the weight of all concepts in it that are already covered in the output
Continue extracting text units in order of their total weight (back to step 2) until the summary is of the desired length
![Page 15: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/15.jpg)
15/23
Textual Unit Selection Modified Adaptive Greedy Algorithm
For every textual unit calculate its weight as the sum of weights of all concepts it covers
Consider only those textual units that contain the concept with the highest weight that has not yet been covered. Out of these, choose the one with highest total weight and add it to the final output. Add the concepts which are covered by this textual unit to the list of concepts covered in the final output.
Recalculate the weights of text units: subtract from each unit’s weight the weight of all concepts in it that are already covered in the output
Continue extracting text units in order of their total weight (back to step 2) until the summary is of the desired length
![Page 16: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/16.jpg)
16/23
Experiment Input Data
The document sets used in the evaluation of multi-document summarization during the first Document Understanding Conference (DUC2001)
30 test document sets, each with approximately 10 news stories on different events
For each document set three human-constructed summaries are provided for each of the target lengths 50, 100, 200 and 400 words
![Page 17: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/17.jpg)
17/23
Experiment Evaluation Metric
ROUGE (Lin and Hovy, 2003) Recall-based measure
Summary Length For each document set, four summaries of length 50, 100,
200 and 400 are created Rouge evaluation has not yet been tested extensivel
y , and ROUGE scores are difficult to interpret as they are not absolute and not comparable across source document sets
![Page 18: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/18.jpg)
18/23
Experiment Results: Static Greedy Algorithm
![Page 19: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/19.jpg)
19/23
Experiment Results: Adaptive Greedy Algorithm
![Page 20: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/20.jpg)
20/23
Experiment
![Page 21: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/21.jpg)
21/23
Experiment Results: Modified Greedy Algorithm
![Page 22: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/22.jpg)
22/23
Experiment Results: Comparison with DUC systems
In DUC2003 the task was to create summaries only of length 100
Events as features and adaptive greedy algorithm In 14 out of 30 cases our system outperforms the median
of the ROUGE scores of all the 15 participating systems over that specific document set
The suitability of the event-based summarizer varies according to the type of documents being summarized
![Page 23: Event-Based Extractive Summarization](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813691550346895d9e1b47/html5/thumbnails/23.jpg)
23/23
Conclusion Our experimental results indicate that events are indeed an
effective features, at least in comparison with words With all three of our summarization algorithms, we achieved
a gain in performance when using events Our approach to defining and extracting events can be
improved in many ways Matching connectors that are similar in meaning Representing paraphrases of the same event Methods for detecting and prioritizing special event components such
as time and location phrases Merging information across related atomic events Partial matches between atomic events and input sentences