Post on 25-Dec-2015
Introduction to Automatic Summarization
Li Sujian ( 李素建 )
Peking University, China
lisujian@pku.edu.cn
Room 1108, Winslow Building, RPI
2
Table of contents
1. Motivation
2. Overview of summarization system.
3. Summarization techniques
4. Building a summarization system.
5. Evaluating summaries
6. The future.
Information overload
The problem:4 Billion URLs indexed by Google200 TB of data on the Web [Lyman and Varian
03]
Possible approaches: information retrieval document clustering information extraction question answering text summarization
http://time.com/3522586/texas-hospital-ebola-mistakes-apology/
http://www.telegraph.co.uk/women/womens-life/11167681/John-Grisham-paedophile-row-sex-offenders-are-not-good-guys.html
http://www.reuters.com/article/2014/10/16/us-usa-florida-election-idUSKCN0I51EC20141016
MILAN, Italy, April 18. A small airplane crashed into a governmentbuilding in heart of Milan, setting the top floors on fire, Italianpolice reported. There were no immediate reports on casualties asrescue workers attempted to clear the area in the city's financialdistrict. Few details of the crash were available, but news reportsabout it immediately set off fears that it might be a terrorist actakin to the Sept. 11 attacks in the United States. Those fears sentU.S. stocks tumbling to session lows in late morning trading.
Witnesses reported hearing a loud explosion from the 30-storyoffice building, which houses the administrative offices of the localLombardy region and sits next to the city's central train station.Italian state television said the crash put a hole in the 25th floorof the Pirelli building. News reports said smoke poured from theopening. Police and ambulances rushed to the building in downtownMilan. No further details were immediately available.
(Radev, 2004)
MILAN, Italy, April 18. A small airplane crashed into a governmentbuilding in heart of Milan, setting the top floors on fire, Italianpolice reported. There were no immediate reports on casualties asrescue workers attempted to clear the area in the city's financialdistrict. Few details of the crash were available, but news reportsabout it immediately set off fears that it might be a terrorist actakin to the Sept. 11 attacks in the United States. Those fears sentU.S. stocks tumbling to session lows in late morning trading.
Witnesses reported hearing a loud explosion from the 30-storyoffice building, which houses the administrative offices of the localLombardy region and sits next to the city's central train station.Italian state television said the crash put a hole in the 25th floorof the Pirelli building. News reports said smoke poured from theopening. Police and ambulances rushed to the building in downtownMilan. No further details were immediately available.
(Radev, 2004)
MILAN, Italy, April 18. A small airplane crashed into a governmentbuilding in heart of Milan, setting the top floors on fire, Italianpolice reported. There were no immediate reports on casualties asrescue workers attempted to clear the area in the city's financialdistrict. Few details of the crash were available, but news reportsabout it immediately set off fears that it might be a terrorist actakin to the Sept. 11 attacks in the United States. Those fears sentU.S. stocks tumbling to session lows in late morning trading.
Witnesses reported hearing a loud explosion from the 30-storyoffice building, which houses the administrative offices of the localLombardy region and sits next to the city's central train station.Italian state television said the crash put a hole in the 25th floorof the Pirelli building. News reports said smoke poured from theopening. Police and ambulances rushed to the building in downtownMilan. No further details were immediately available.
How many victims?
Was it a terrorist act?
What was the target?
What happened?
Says who?
When, where?
1. How many people were injured?2. How many people were killed? (age, number, gender, description)3. Was the pilot killed?4. Where was the plane coming from?5. Was it an accident (technical problem, illness, terrorist act)? 6. Who was the pilot? (age, number, gender, description) 7. When did the plane crash? 8. How tall is the Pirelli building? 9. Who was on the plane with the pilot? 10. Did the plane catch fire before hitting the building? 11. What was the weather like at the time of the crash? 12. When was the building built? 13. What direction was the plane flying? 14. How many people work in the building? 15. How many people were in the building at the time of the crash? 16. How many people were taken to the hospital? 17. What kind of aircraft was used?
13
Questions What kinds of summaries do people want?
What are summarizing, abstracting, gisting,...?
How sophisticated must summ. systems be? Are statistical techniques sufficient?Or do we need symbolic techniques and deep
understanding as well?
What milestones would mark quantum leaps in summarization theory and practice? How do we measure summarization quality?
14
Table of contents
1. Motivation.
2. Overview of summarization system.
3. Summarization techniques
4. Building a summarization system.
5. Evaluating summaries
6. The future.
https://www.youtube.com/watch?v=qn_ZUf3r9zg
15
Summary definition(Sparck Jones,1999)“a reductive transformation of source text to summary text through content condensation by selection and/or generalization on what is important in the source.”
Definitions
Schematic summary processing model
Source text Interpretation
Source representation
Transformation
Summary
representation
GenerationSummary text
18
Summarizing factors Input (Sparck Jones 2007)
subject type: domain genre: newspaper articles, editorials, letters, reports... form: regular text structure; free-form source size: single doc; multiple docs (few; many)
Purpose situation: embedded in larger system (MT, IR) or not? audience: focused or general usage: IR, sorting, skimming...
Output completeness: include all aspects, or focus on some? format: paragraph, table, etc. style: informative, indicative, aggregative, critical...
19
ExamplesExercise: summarize the following texts for
the following readers:
text1: Coup Attempt
text2: childrens’ story
reader1: your friend, who knows nothing about South Africa.
reader2: someone who lives in South Africa and knows the political position.
reader3: your 4-year-old niece.reader4: the Library of Congress.
20
90 Soldiers Arrested After Coup Attempt In Tribal HomelandMMABATHO, South Africa (AP)
About 90 soldiers have been arrested and face possible death sentences stemming from a coup attempt in Bophuthatswana, leaders of the tribal homeland said Friday. Rebel soldiers staged the takeover bid Wednesday, detaining homeland President Lucas Mangope and several top Cabinet officials for 15 hours before South African soldiers and police rushed to the homeland, rescuing the leaders and restoring them to power. At least three soldiers and two civilians died in the uprising. Bophuthatswana's Minister of Justice G. Godfrey Mothibe told a news conference that those arrested have been charged with high treason and if convicted could be sentenced to death. He said the accused were to appear in court Monday. All those arrested in the coup attempt have been described as young troops, the most senior being a warrant officer. During the coup rebel soldiers installed as head of state Rocky Malebane-Metsing, leader of the opposition Progressive Peoples Party. Malebane-Metsing escaped capture and his whereabouts remained unknown, officials said. Several unsubstantiated reports said he fled to nearby Botswana. Warrant Officer M.T.F. Phiri, described by Mangope as one of the coup leaders, was arrested Friday in Mmabatho, capital of the nominally independent homeland, officials said. Bophuthatswana, which has a population of 1.7 million spread over seven separate land blocks, is one of 10 tribal homelands in South Africa. About half of South Africa's 26 million blacks live in the homelands, none of which are recognized internationally. Hennie Riekert, the homeland's defense minister, said South African troops were to remain in Bophuthatswana but will not become a ``permanent presence.'' Bophuthatswana's Foreign Minister Solomon Rathebe defended South Africa's intervention. ``The fact that ... the South African government (was invited) to assist in this drama is not anything new nor peculiar to Bophuthatswana,'' Rathebe said. ``But why South Africa, one might ask? Because she is the only country with whom Bophuthatswana enjoys diplomatic relations and has formal agreements.'' Mangope described the mutual defense treaty between the homeland and South Africa as ``similar to the NATO agreement,'' referring to the Atlantic military alliance. He did not elaborate. Asked about the causes of the coup, Mangope said, ``We granted people freedom perhaps ... to the extent of planning a thing like this.'' The uprising began around 2 a.m. Wednesday when rebel soldiers took Mangope and his top ministers from their homes to the national sports stadium. On Wednesday evening, South African soldiers and police stormed the stadium, rescuing Mangope and his Cabinet. South African President P.W. Botha and three of his Cabinet ministers flew to Mmabatho late Wednesday and met with Mangope, the homeland's only president since it was declared independent in 1977. The South African government has said, without producing evidence, that the outlawed African National Congress may be linked to the coup. The ANC, based in Lusaka, Zambia, dismissed the claims and said South Africa's actions showed that it maintains tight control over the homeland governments. The group seeks to topple the Pretoria government. The African National Congress and other anti-government organizations consider the homelands part of an apartheid system designed to fragment the black majority and deny them political rights in South Africa.
21
If You Give a Mouse a Cookie
Laura Joffe Numeroff © 1985
If you give a mouse a cookie,he’s going to ask for a glass of milk.
When you give him the milk, he’ll probably ask you for a straw.
When he’s finished, he’ll ask for a napkin.
Then he’ll want to look in the mirror to make sure he doesn’t have a milk mustache.
When he looks into the mirror, he might notice his hair needs a trim.
So he’ll probably ask for a pair of nail scissors.
When he’s finished giving himself a trim, he’ll want a broom to sweep up.
He’ll start sweeping.
He might get carried away and sweep every room in the house.
He may even end up washing the floors as well.
When he’s done, he’ll probably want to take a nap.
You’ll have to fix up a little box for him with a blanket and a pillow.
He’ll crawl in, make himself comfortable, and fluff the pillow a few times.
He’ll probably ask you to read him a story.
When you read to him from one of your picture books, he'll ask to see the pictures.
When he looks at the pictures, he’ll get so excited that he’ll want to draw one of his own. He’ll ask for paper and crayons.
He’ll draw a picture. When the picture is finished, he’ll want to sign his name, with a pen.
Then he’ll want to hang his picture on your refrigerator. Which means he’ll need Scotch tape.
He’ll hang up his drawing and stand back to look at it. Looking at the refrigerator will remind him that he’s thirsty.
So…he’ll ask for a glass of milk.
And chances are that if he asks for a glass of milk, he’s going to want a cookie to go with it.
22
‘Genres’ of Summary? Indicative vs. informative
...used for quick categorization vs. content processing.
Extract vs. abstract...lists fragments of text vs. re-phrases content coherently.
Generic vs. query-oriented...provides author’s view vs. reflects user’s interest.
Background vs. just-the-news...assumes reader’s prior knowledge is poor vs. up-to-date.
Single-document vs. multi-document source...based on one text vs. fuses together many texts.
23
A Summarization Machine
EXTRACTS
ABSTRACTS
?
MULTIDOCS
Extract Abstract
Indicative
Generic
Background
Query-oriented
Just the news
10%
50%
100%
Very BriefBrief
Long
Headline
Informative
DOC QUERY
CASE FRAMESTEMPLATESCORE CONCEPTSCORE EVENTSRELATIONSHIPSCLAUSE FRAGMENTSINDEX TERMS
24
The Modules of the Summarization Machine
EXTRACTION
INTERPRETATION
EXTRACTS
ABSTRACTS
?
CASE FRAMESTEMPLATESCORE CONCEPTSCORE EVENTSRELATIONSHIPSCLAUSE FRAGMENTSINDEX TERMS
MULTIDOC
EXTRACTS
GENERATION
FILTERING
DOCEXTRACTS
25
Table of contents
1. Motivation
2. Overview of summarization system.
3. Summarization techniques
4. Building a summarization system.
5. Evaluating summaries
6. The future.
26
Computational Approach
Top-Down: I know what I want!
User needs: only certain types of info
System needs: particular criteria of interest, used to focus search
Bottom-Up: • I’m dead curious:
what’s in the text?
• User needs: anything that’s important
• System needs: generic importance metrics, used to rate content
27
Review of Methods
Text location: title, position Cue phrases Word frequencies Internal text cohesion:
word co-occurrences local salience co-reference of names,
objects lexical similarity semantic rep/graph centrality
Discourse structure centrality
Information extraction templates
Query-driven extraction: query expansion lists co-reference with query
names lexical similarity to query
Bottom-up methods Top-down methods
28
Query-Driven vs. Text-Driven Focus Top-down: Query-driven focus
Criteria of interest encoded as search specs. System uses specs to filter or analyze text portions. Examples: templates with slots with semantic
characteristics; termlists of important terms. Bottom-up: Text-driven focus
Generic importance metrics encoded as strategies. System applies strategies over rep of whole text. Examples: degree of connectedness in semantic
graphs; frequency of occurrence of tokens.
29
Bottom-Up, using Info. Retrieval IR task: Given a query, find the relevant
document(s) from a large set of documents. Summ-IR task: Given a query, find the relevant
passage(s) from a set of passages (i.e., from one or more documents).
• Questions: 1. IR techniques work on large
volumes of data; can they scale down accurately enough?
2. IR works on words; do abstracts require abstract representations?
xx xxx xxxx x xx xxxx xxx xx xxx xx xxxxx xxxx xx xxx xx x xxx xx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xxxx xxxxxx xx xx xxxx x xxxxx x xx xx xxxxx x x xxxxx xxxxxx xxxxxx x xxxxxxxx xx x xxxxxxxxxx xx xx xxxxx xxx xx xxx xxxx xxx xxxx xx xxxxx xxxxx xx xxx xxxxxx xxx
30
Top-Down, using Info. Extraction IE task: Given a template and a text, find all the
information relevant to each slot of the template and fill it in.
Summ-IE task: Given a query, select the best template, fill it in, and generate the contents.
• Questions:1. IE works only for very particular
templates; can it scale up?
2. What about information that doesn’t fit into any template—is this a generic limitation of IE?
xx xxx xxxx x xx xxxx xxx xx xxx xx xxxxx xxxx xx xxx xx x xxx xx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xxxx xxxx xxxxxx xx xx xxxx x xxxxx x xx xx xxxxx x x xxxxx xxxxxx xxxxxx x xxxxxxxx xx x xxxxxxxxxx xx xx xxxxx xxx xx x xxxx xxxx xxx xxxx xx xxxxx xxxxx xx xxx xxxxxx xxx
Xxxxx: xxxx Xxx: xxxx Xxx: xx xxx Xx: xxxxx xXxx: xx xxx Xx: x xxx xx Xx: xxx x Xxx: xx Xxx: x
31
NLP/IE:• Approach: try to ‘understand’
text—re-represent content using ‘deeper’ notation; then manipulate that.
• Need: rules for text analysis and manipulation, at all levels.
• Strengths: higher quality; supports abstracting.
• Weaknesses: speed; still needs to scale up to robust open-domain summarization.
IR/Statistics:• Approach: operate at lexical
level—use word frequency, collocation counts, etc.
• Need: large amounts of text.
• Strengths: robust; good for query-oriented summaries.
• Weaknesses: lower quality; inability to manipulate information at abstract levels.
Paradigms: NLP/IE vs. ir/statistics
32
Toward the Final Answer... Problem: What if neither IR-like nor
IE-like methods work?
Solution: semantic analysis of the text (NLP), using adequate knowledge bases that
support inference (AI).
Mrs. Coolidge: “What did the preacher preach about?”
Coolidge: “Sin.”Mrs. Coolidge: “What did he
say?”Coolidge: “He’s against it.”
– sometimes counting and templates are insufficient,
– and then you need to do inference to understand.
Word counting
Inference
33
The Optimal Solution...
Combine strengths of both paradigms…
...use IE/NLP when you have suitable template(s),
...use IR when you don’t…
…but how exactly to do it?
Progress of automatic summarization methods
1950th Position, frequency, cue phrase based (Luhn, 1958; Edmundson, 1969)
1970th Document content (Schank and Abelson, 1977; DeJong, 1979)
1990th Information Extraction based (Hovy and Lin, 1997; Kupiec et al., 1995)
Current semantic analysis, graph theory, QA, machine learning …
35
Overview of Extraction Methods Word frequencies throughout the text Position in the text
lead method; optimal position policy title/heading method
Cue phrases in sentences Cohesion: links among words
word co-occurrence coreference lexical chains
Discourse structure of the text Information Extraction: parsing and analysis
Statistical scoring
Scoring techniquesWord frequencies throughout the
text(Luhn58)Position in the text(Edmundson69)Title Method(Edmundson69)Cue phrases in sentences (Edmundson69)
Luhn 58 Very first work in
automated summarization
Computes measures of significance
Words: Stemming
differ,difference bag of words
words
Wordfrequency
The resolving power of words
(Luhn, 58)
Luhn 58 Sentences:
concentration of high-score words
Cutoff values established in experiments with 100 human subjects
SIGNIFICANT WORDS
ALL WORDS
* * * * 1 2 3 4 5 6 7
SENTENCE
SCORE = 42/7 2.3
Edmundson 69
Cue method: stigma words
(“hardly”, “impossible”) bonus words
(“significant”)
Key method: similar to Luhn
Title method: title + headings
Location method: sentences under
headings sentences near
beginning or end of document and/or paragraphs (also [Baxendale 58])
Edmundson 69
Linear combination of four features:
1C + 2K + 3T + 4L
Manually labelled training corpus
Key not important!0 10 20 30 40 50 60 70 80 90 100 %
RANDOM
KEY
TITLE
CUE
LOCATION
C + K + T + L
C + T + L
1
Lin & Hovy 97
Optimum position policy (OPP)
Measuring yield of each sentence position against keywords (signature words) from Ziff-Davis corpus
Preferred order
[(T) (P2,S1) (P3,S1) (P2,S2) {(P4,S1) (P5,S1) (P3,S2)} {(P1,S1) (P6,S1) (P7,S1) (P1,S3)(P2,S3) …]
Kupiec et al. 95
Extracts of roughly 20% of original text
Feature set: sentence length
|S| > 5
fixed phrases26 manually chosen
paragraphsentence position in
paragraph
thematic wordsbinary: whether
sentence is included in the set of highest scoring sentences
uppercase wordsnot common acronyms
Corpus:188 document +
summary pairs from scientific journals
Kupiec et al. 95
Uses Bayesian classifier:
• Assuming statistical independence:
k
j j
k
j j
kFP
SsPSsFPFFFSsP
1
121
)(
)()|(),...,|(
),(
)()|,...,(),...,|(
,...21
2121
k
kk FFFP
SsPSsFFFPFFFSsP
Centroid (Radev, 2004)
Centroids consist of words which are central not only to one article in a cluster, but to all the articles.
Hypothesize that sentences that contain the words from the centroid are more indicative of the topic of the cluster.
A centroid is a pseudo-document which consists of words which have Count*IDF scores above a predefined threshold in the documents that constitute the cluster. Count: average number of occurrences of a word
across the entire cluster IDF: computed from a large corpus.
46
Topic Signature Method (Hovy and Lin, 98) Claim: Can approximate script identification at
lexical level, using automatically acquired ‘word families’.
Idea: Create topic signatures: each concept is defined by frequency distribution of its related words (concepts):
TS = {topic, signature } = {topic, (t1,w1) (t2,w2) ...} restaurant -visit waiter + menu + food + eat...
(inverse of query expansion in IR.)
50
Example SignaturesRANKaerospace banking environment telecommunication
1 contract bank epa at&t2 air_force thrift waste network3 aircraft banking environmental fcc4 navy loan water cbs5 army mr. ozone6 space deposit state bell7 missile board incinerator long-distance8 equipment fslic agency telephone9 mcdonnell fed clean telecommunication
10 northrop institution landfill mci11 nasa federal hazardous mr.12 pentagon fdic acid_rain doctrine13 defense volcker standard service14 receive henkel federal news15 boeing banker lake turner16 shuttle khoo garbage station17 airbus asset pollution nbc18 douglas brunei city sprint19 thiokol citicorp law communication20 plane billion site broadcasting21 engine regulator air broadcast22 million national_bankprotection programming23 aerospace greenspan violation television24 corp. financial management abc25 unit vatican reagan rate
The MEAD Summarizer
Extractive summarization as a classification task Approximate by classifying individual sentences
MEAD’s classification function Give each sentence a score based on a linear
combination of features (Position, Centroid, First sentence similarity)
}1,0{: sf
)()()()(Score sFwsCwsPws FCP
The Redundancy Problem
Sentences with duplicate information content Solemn ceremony marks handover. A solemn historic ceremony has marked the resumption of
the exercise of sovereigntyover Hong Kong.
MEAD’s method for combating redundancy• Do until # of sentences = max num for
summary: Foreach sentence si in order of score
if si too similar to sentences, skip
else add si
Redundancy between sentences
Example: similar sentences are both assigned a high score. “The top fine for smoking at work or in public places
would be 200 rand (dlrs 35). ” “The maximum fine for smoking at work or in public
places would be 200 rand (dlrs 35). MMR(maximal margin relevance) algorithm
When selecting a new sentence into the summary, it should be measured its similarity with the extracted sentences. If similarity greater than a threshold, redundant sentence.
Threshold is set to 0.7
Carbonell & Goldstein 98 MMR:
The Maximal Marginal Relevance (MMR) criterion strives to reduce redundancy while maintaining query relevance in re-ranking retrieved documents and in selecting appropriate passage for text summarization.
1 2\
max[ ( ( , ) (1 ) max ( , ))]i j
def
i i jD R S D S
MMR Arg Sim D Q Sim D D
C = doc collectionQ = user queryR = IR(C,Q,)S = already retrieved documentsSim = similarity metric used
Paice 90
Survey up to 1990 Techniques that
(mostly) failed: Frequency-keyword Title-keyword syntactic criteria indicator phrases
(“The purpose of this article is to review…)
Problems with extracts: lack of balance lack of cohesion
anaphoric reference lexical or definite
reference rhetorical connectives
Linguistic/Semantic Methods
Co-reference /Lexical Chain Rhetorical Analysis Information extraction ……
57
Cohesion-based methods Claim: Important sentences/paragraphs
contain the highest connected entities in more or less elaborate semantic structures.
Classes of approachesword co-occurrences; local salience and grammatical relations;co-reference; lexical similarity (WordNet, lexical chains);combinations of the above.
Barzilay and Elhadad 97
“Dr.Kenny” appears once in both sequences and so does “machine”. But sequence 1 is about the machine, and sequence 2 is about the “doctor”.
Barzilay and Elhadad 97 Chain computing
Procedure of constructing lexical chains:
three types of relations:extra-strong (repetitions)strong (WordNet relations, within 7 sentences)medium-strong (link between synsets is longer than one +
within 3 sentences)
Barzilay and Elhadad 97
Lexical chains [Stairmand 96]
Mr. Kenny is the person that invented the anesthetic machine which uses micro-computers to control the rate at which an anesthetic is pumped into the blood. Such machines are nothing new. But his device uses two micro-computers to achieve much closer monitoring of the pump feeding the anesthetic into the patient.
Co-reference/Lexical Chains
Mr. Kenny is the person that invented the anesthetic machine which uses micro-computers to control the rate at which an anesthetic is pumped into the blood. Such machines are nothing new. But his device uses two micro-computers to achieve much closer monitoring of the pump feeding the anesthetic into the patient
Barzilay and Elhadad 97 Scoring chains:
LengthHomogeneity index:
= 1 - # distinct words in chain/length
Score = Length * Homogeneity
Score > Average(scores) + 2 * st.dev(scores).
Barzilay and Elhadad 97 Extracting important sentences
Heuristic 1 For each chain in the summary representation, choose the sentence that contains the first appearance of a chain member in the text.
Heuristic 2 We therefore defined a second heuristic based on the notion of representative words: For each chain in the summary representation, choose the sentence that contains the first appearance of a representative chain member in the text.
Heuristic 3 For each chain, find the text unit where the chain is highly concentrated. Extract the sentence with the first chain appearance in this central unit. Concentration is computed as the number of chain members occurrences in a segment divided by the number of nouns in the segment.
65
Tree-like representation of texts in the style of Rhetorical Structure Theory (Mann and Thompson,88).
Claim: The multi-sentence coherence structure of a text can be constructed, and the ‘centrality’ of the textual units in this structure reflects their importance.
Use the discourse representation in order to determine the most important textual units. Attempts: (Marcu, 97) for English.
Discourse-based method
Rhetorical Structure Theory
Mann & Thompson 88 Rhetoric Relation
Between two non-overlapping text spans Nucleus - Core Idea, Writers Purpose Satellite - Referred in context to nucleus for Justifying,
Evidencing, Contradicting etc Nucleus of a rhetorical relation is comprehensible
independent of the satellite, but not vice versa All rhetoric relations are not nucleus-satellite
relations. Contrast is a multinuclear relationship
Rhetorical Structure Theory
RST ParsingBreaks into elementary units (EDU)Uses cue phrases(discourse markers) and
notion of semantic similarity in order to construct rhetorical relations
Rhetorical relations can be assembled into rhetorical structure trees (RS-trees) by recursively applying individual relations across the whole text
68
Rhetorical parsing (Marcu,97)
[With its distant orbit {– 50 percent farther from the sun than Earth –} and slim atmospheric blanket,1] [Mars experiences frigid weather conditions.2] [Surface temperatures typically average about –60 degrees Celsius (–76 degrees Fahrenheit) at the equator and can dip to –123 degrees C near the poles.3] [Only the midday sun at tropical latitudes is warm enough to thaw ice on occasion,4] [but any liquid water formed that way would evaporate almost instantly5] [because of the low atmospheric pressure.6]
[Although the atmosphere holds a small amount of water, and water-ice clouds sometimes develop,7] [most Martian weather involves blowing dust or carbon dioxide.8] [Each winter, for example, a blizzard of frozen carbon dioxide rages over one pole, and a few meters of this dry-ice snow accumulate as previously frozen carbon dioxide evaporates from the opposite polar cap.9] [Yet even on the summer pole, {where the sun remains in the sky all day long,} temperatures never warm enough to melt frozen water.10]
69
Rhetorical parsing
Use discourse markers to hypothesize rhetorical relations rhet_rel(CONTRAST, 4, 5) rhet_rel(CONTRAT, 4, 6) rhet_rel(EXAMPLE, 9, [7,8]) rhet_rel(EXAMPLE, 10, [7,8])
Use semantic similarity to hypothesize rhetorical relations if similar(u1,u2) then
rhet_rel(ELABORATION, u2, u1) rhet_rel(BACKGROUND, u1,u2)else
rhet_rel(JOIN, u1, u2)
rhet_rel(JOIN, 3, [1,2]) rhet_rel(ELABORATION, [4,6], [1,2])
Use the hypotheses in order to derive a valid discourse representation of the original text.
2Elaboration
2Elaboration
8Example
2BackgroundJustification
3Elaboration
8Concession
10Antithesis
Mars experiences
frigid weather
conditions(2)
Surface temperatures typically average
about -60 degrees
Celsius (-76 degrees
Fahrenheit) at the
equator and can dip to -
123 degrees C near the
poles(3)
4 5Contrast
Although the atmosphere
holds a small
amount of water, and water-ice
clouds sometimes develop,
(7)
Most Martian weather involves
blowing dust and carbon monoxide.
(8)
Each winter, for example, a blizzard of
frozen carbon dioxide
rages over one pole, and a few meters of
this dry-ice snow
accumulate as
previously frozen carbon dioxide
evaporates from the opposite
polar cap.(9)
Yet even on the summer pole, where
the sun remains in the sky all day long,
temperatures never warm
enough to melt frozen
water.(10)
With its distant orbit (50 percent farther from the sun than Earth) and
slim atmospheric
blanket,(1)
Only the midday sun at tropical latitudes is
warm enough to
thaw ice on occasion,
(4)
5Evidence
Cause
but any liquid water formed in this way would
evaporate almost
instantly(5)
because of the low
atmospheric pressure
(6)
71
Marcu 97
5Evidence
Cause
5 6
4
4 5Contrast
3
3Elaboration
1 2
2BackgroundJustification
2Elaboration
7 8
8Concession
9 10
10Antithesis
8Example
2Elaboration
Summarization = selection of the most important units
2 > 8 > 3, 10 > 1, 4, 5, 7, 9 > 6
72
Information extraction Method Idea: content selection using templates
Predefine a template, whose slots specify what is of interest.
Use a canonical IE system to extract from a (set of) document(s) the relevant information; fill the template.
Generate the content of the template as the summary.
Previous IE work: FRUMP (DeJong, 78): ‘sketchy scripts’ of terrorism,
natural disasters, political visits... (Mauldin, 91): templates for conceptual IR. (Rau and Jacobs, 91): templates for business. (McKeown and Radev, 95): templates for news.
73
Information Extraction method
Example template:
MESSAGE:ID TSL-COL-0001SECSOURCE:SOURCE ReutersSECSOURCE:DATE 26 Feb 93
Early afternoonINCIDENT:DATE 26 Feb 93INCIDENT:LOCATION World Trade CenterINCIDENT:TYPE BombingHUM TGT:NUMBER AT LEAST 5
74
Full Generation Example
Challenge: Pack content densely!
Example (McKeown and Radev, 95): Traverse templates and assign values to
‘realization switches’ that control local choices such as tense and voice.
Map modified templates into a representation of Functional Descriptions (input representation to Columbia’s NL generation system FUF).
FUF maps Functional Descriptions into English.
75
Generation Example (McKeown and Radev, 95)
NICOSIA, Cyprus (AP) – Two bombs exploded near government ministries in Baghdad, but there was no immediate word of any casualties, Iraqi dissidents reported Friday. There was no independentconfirmation of the claims by the Iraqi National Congress. Iraq’sstate-controlled media have not mentioned any bombings.
Multiple sources and disagreement
Explicit mentioning of “no information”.
Degree Centrality
Problem Formulation Represent each sentence by a vectorDenote each sentence as the node of a graph Cosine similarity determines the edges
between nodes
Degree Centrality
Since we are interested in significant similarities, we can eliminate some low values in this matrix by defining a threshold.
LexRank Centrality vector p which will give a
lexrank of each sentence (similar to page rank) defined by :
Perron-Frobenius Theorem
An irreducible and aperiodic Markov chain is guaranteed to converge to a stationary distribution
What Should B Satisfy?
Stochastic Matrix and Markov Chain property. Irreducible: any state is reachable from any
other state.Aperiodic:
If a Markov chain has reducible or periodic components, a random walker may get stuck in these components and never visit the other parts of the graph.
LexRank
B is a stochastic matrix Is it an irreducible and aperiodic matrix? Dampness (Page et al. 1998)
Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword
Extraction
Author: Xiaojun Wan, Jianwu Yang, Jianguo Xiao
Build Sentence-Sentence Graph
Sentence relations [ ]ij n nU U
1 2, ,... i ns wt wt wt
*
1 log( / )i i
i i
i t t
t t
wt tf isf
isf N n
( , ),
0,
ij i j
ij
U sim s s i j
U i j
Build Word-Word Graph
Word relations
Word similariy computation:Based on dictionary (WordNet)Based on corpus (mutual information)
[ ]ij n nV V
( , ),
0,
ij i j
ij
V sim t t i j
V i j
Build Sentence-Word Graph
Relation between sentences and words
Similarity computation:{ |1 }iS s i m { |1 }jT t j n
( , ) j j
i
t t
i jt t
t S
tf isfaff s t
tf isf
[ ]ij m nW W ( , )ij i jW aff s t
Document Model Assumption 1 If a sentence is important, its closely connected
sentences are also important; If a word is important, its closely related words are also important.
Assumption 2 More important words are included in a sentence,
more important the sentence is.
More frequent a word occurs in important sentences, more important the word is.
Reinforcement Algorithm
Assumptions:
( ) ( )i ji jju s U u s( ) ( )j ij ii
v t V v t( ) ( )i ji jj
u s W v t( ) ( )j ij ii
v t W u s
matrix form:
1 1
( ) ( ) ( )m n
i ji j ji jj j
u s U u s W v t
1 1
( ) ( ) ( )n m
j ij i ij ii i
v t V v t W u s
T
T T
u aU u W v
v aV v W u
Then we can simultaneously rank sentences (u) and words (v).
98
Table of contents
1. Motivation
2. Definition, genres and types of summaries.
3. Summarization techniques
4. Building a summarization system.
5. Evaluating summaries
6. The future.
Document Understanding Conference
NIST (National Institute of Standards and Technology )
Summarization tasks:2001-2002: 100 words, query-independent
single-doc. and multi-doc. summ.2003: query-dependent summ.2004: multi-lingual summ.2005-2007: Query focused multi-document
summ.
Query focusedMulti-document Summarization
Task description System Implementation
Feature driven system design Improvement based machine learningPost-processing
Task Description
Combination of QA and Summarization Input: topic query and 25 related documentsOutput: a summary no more than 250 words
News documents from Associated Press, New York Times and Xinhua News Agency, related to recently important events.
Evaluation using ROUGE 、 PyramidEach topic has four human summaries
DUC Corpus example(1)
Each topic is composed of a topic query and 25 related documents Query sample :
<topic>
<num> D0601A </num>
<title> Native American Reservation System - pros and cons </title>
<narr>
Discuss conditions on American Indian reservations or among Native American communities. Include the benefits and drawbacks of the reservation system. Include legal privileges and problems.
</narr>
</topic>
DUC corpus example(2)
<DOC>
<DOCNO> APW20000416.0024 </DOCNO>
<DOCTYPE> NEWS STORY </DOCTYPE>
<DATE_TIME> 2000-04-16 12:56 </DATE_TIME>
<HEADLINE> Clinton To Visit Navajo Community </HEADLINE>
By CHRIS ROBERTS, Associated Press Writer
<TEXT>
<P>ALBUQUERQUE, N.M. (AP) -- At Mesa Elementary School in far northwest New Mexico, Navajo children line up to use the few computers connected to the Internet. Their time online must be short for everyone to get a chance.
</P>
</P>
… …
… …
<P>
Navajo Nation: http://www.navajo.org/nnhomepg.html
</P>
</TEXT>
</DOC>
Document sample :
Summarization process
PreprocessingParse the documents and extract the required content
Feature design and Sentence ScoringExtract the features (position, length) for each
sentenceScore each sentence based on the features
PostprocessingSentence simplification, redundancy removal,
reordering
Preprocessing
Input : documents with XML format1) Parse the XML format and extract the text in
the documents
2) Rule based sentence segmentation
3) POS tagging, syntactic parsing
Output : Segmented sentences
Preprocessing Sample
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE DOCSENT SYSTEM "/home/lsjian/mead/dtd/docsent.dtd">
<DOCSENT DID='APW20000416.0024' DOCNO='APW20000416.0024' LANG='ENG' CORR-DOC='FT.c'>
<BODY>
<HEADLINE>
<S PAR='1' RSNT='1' SNO='1'> Clinton To Visit Navajo Community </S>
</HEADLINE>
<TEXT>
<S PAR='2' RSNT='1' SNO='2'> ALBUQUERQUE, N.M. </S>
<S PAR='2' RSNT='2' SNO='3'> At Mesa Elementary School in far northwest New Mexico, Navajo children line up to use the few computers connected to the Internet. </S>
… …
<S PAR='2' RSNT='27' SNO='28'> Clinton's plans Monday including speaking at Shiprock Boys/Girls Club, then joining an evening Webcast at Dine Tribal College in Shiprock that will involve high school student online at Lake Valley Navajo School, about 55 miles away. ------ On the Net: Navajo Nation: http://www.navajo.org/nnhomepg.html </S>
</TEXT></BODY></DOCSENT>
Sentence Scoring
Feature designLexical, syntactic, semanticQuery dependent and query independent
Linear combination of featuresA linear function of weighted features
S= a1f1+a2f2+….
Human weights by experience, trial and error
Features : query-dependent
Describe the overlap between each sentence and the topic query (QA)
featuresFrequency based: word overlap between
sentence and queryNamed entity based: the overlap of named
entities between sentence and queryWordNet based: Computing the similarity
between sentence and query based on WordNet
features : query-independent
Describe the importance of each sentence in the documents (Summarization)
featuresCentroid: sum of word centroid Tf*isf: sum of word tf*isfNamed entities: number of named entitiesStop words: number of stop wordsposition: Length: word number in the sentence
Weights Learning with a regression model
Defects of human weightsSubjectivity
Solution: learning weightsThe scoring process can be seen as a
mapping between a vector and a score.A regression model is used to train the
feature weights
Sentence Scoring based on regression model
ProcessTraining corpus is used to train
the regression function f0
Rank the test sentences with
Keys:Model selection: Support Vector RegressionAcquisition of Training corpus: use human summary
{( ( ), ( )) | }score s V s s D
0 : ( ) ( )f V s score s
* *0ˆ ( ) ( ( ))score s f V s
Postprocessing
Simple processingExtract the highest scored sentences until the
length reaches the requirement.
Problems1: redundancy 2: meaningless words in sentences (rules
based)3: coherence
Sentence simplification
Delete meaningless words in sentences News specific noisy words Content irrelevant words
Rule based method The beginning of news: e.g.,“ALBUQUERQUE, N.M. (AP) ; The initial words in the sentence: such as “and” ,” als
o” ,” besides ,”,” though ,”,” in addition” ,” somebody said” ,“ somebody says”; ;
“somebody ( 代词 )/It is said/reported/noticed/thought that” ; The parenthesized content in captalized letters …
Sentence ordering
Sentence ordering by score: no logic in the content
Temporal based sentence ordering Acquire the time stamp from the original texts
Order sentence according to the publish time of documents; For the sentences in the same document, order them by their occurrence in the document
115
Table of contents
1. Motivation.
2. Genres and types of summaries.
3. Approaches and paradigms.
4. Summarization techniques
5. Evaluating summaries.
6. The future.
116
How can You Evaluate a Summary? When you generate a summary…
..…:1. the gold standard summaries (human),2. choose a granularity (clause; sentence;
paragraph),3. create a similarity measure for that granularity
(word overlap; multi-word overlap, perfect match),
4. measure the similarity of each unit in the new to the most similar unit(s) in the gold standard,
5. measure Recall and Precision.
e.g., (Kupiec et al., 95).
Human summaries
Two persons A and B extract sentences from documents as summaries
Their agreement?Kappa value
≥0.75 Excellent, 0.40 to 0.75 Fair to good, < 0.40 as Poor
0 to 0.20 Slight, >0.20 to 0.40 Fair, >0.40 to 0.60 Moderate, >0.60 to 0.80 Substantial, >0.80 Almost perfect
117
Kappa
P(A): the observed agreement among raters P(E): the expectation probability of chance
agreement
)(1
)()(
EP
EPAP
B
Yes No
AYes 20 5
No 10 15
119
P(A) = (20 + 15) / 50 = 0.70P(E) = 0.5*0.6+ 0.5* 0.4 = 0.3 + 0.2 = 0.5
)(1
)()(
EP
EPAP
Evaluation Manual
Linguistic Quality(readability)GrammaticalityNon-redundancyReferential clarityFocusStructure
Five-point scale (1 very poor, 5 very good) Pyramid: SCU
Automatic Rouge
ROUGE 2ROUGE SU4
4
3
21
– Responsiveness(content)
– BE(basic element)
Pyramid(1)
The pyramid method is designed to address the observation: summaries from different humans always have partly overlapping content.
The pyramid method includes a manual annotation method to represent Summary Content Units (SCUs) and to quantify the proportion of model summaries that express this content.
All SCUs have a weight representing the number of models they occur in, thus from 1 to maxn, where maxn is the total number of models There are very few SCUs expressed in all models (i.e.,
weight=maxn), and increasingly many SCUs at each lower weight, with the most SCUs at weight=1.
1
2
34
Pyramid(2)
The approach involves two phases of manual annotation:pyramid constructionannotation against the pyramid to determine
which SCUs in the pyramid have been expressed in the peer summary.
The total weight is
ROUGE basics
Rouge(Recall-Oriented Understudy for Gisting Evaluation) Recall-oriented, within-sentence word overlap with model(s)
Models - no theoretical limit to number compared system output to 4 models compared manual summaries to 3 models
Using n-gram Correlate reasonably with human coverage judgements Not address summary discourse characteristics, and suffer from
lack of text cohesion or coherence ROUGE v1.2.1 measures
ROUGE-1,2,3,4: N-gram matching where N = 1,2,3,4 ROUGE-LCS: Longest common substring
ROUGE: ROUGE: RRecall-ecall-OOriented riented UUnderstudy for nderstudy for GGisting isting EEvaluationvaluation
Rouge – Ngram co-occurrence metrics measuring content overlap
Counts of n-gram overlaps between candidate and model
summaries
Total n-grams in summary model
ROUGE
Where Nn represents the set of all n-grams and i is one member from Nn. Xn(i) is the number of times the n-gram i occurred in the summary and Mn(i,j) is the number of times the n-gram i ocurred in the j-th model reference(human) summary. There are totally h human summaries.
1
1
min( ( ), ( , ))( )
( , )n
n
h
n nj i Nn h
nj i N
X i M i jR X
M i j
Example
peer: A B C D E F G A B H1: A B G A D E C D (7) H2: A C E F G A D (6)
How to compute the ROUGE-2 score?
128
129
Table of contents
1. Motivation.
2. Genres and types of summaries.
3. Approaches and paradigms.
4. Summarization methods .
5. Evaluating summaries.
6. The future.
In the last decade there has been a surge of interest in automatic summarizing. … There has been some progress, but there is much to do.
Karen Sparck Jones
Karen Sparck Jones, Karen Sparck Jones, automatic summarising: the state of the artautomatic summarising: the state of the art, , Information Processing & Management, 43(6): 1449--1481, 2007.Information Processing & Management, 43(6): 1449--1481, 2007.
131
The Future (1) — There’s much to do!
Data preparation: Collect large sets of texts with abstracts, all genres. Build large corpora of <Text, Abstract, Extract> tuples. Investigate relationships between extracts and
abstracts (using <Extract, Abstract> tuples).
Topic Identification: Develop new identification methods (discourse, etc.). Develop heuristics for method combination (train
heuristics on <Text, Extract> tuples).
132
The Future (2)
Concept Interpretation (Fusion): Investigate types of fusion (semantic, evaluative…). Create large collections of fusion knowledge/rules. Study incorporation of User’s knowledge in interpretation.
Generation: Develop sentence generation rules (using <Extract,
Abstract> pairs).
Evaluation: Develop better automatic evaluation metrics.
Reference
Karen Sparck Jones, automatic usmmarising: the state of the art, Information Processing & Management, 43(6): 1449--1481, 2007.
Lin C. Y. , Hovy E., Manual and automatic evaluation of summaries, Proceedings of the Workshop on Automatic Summarization (including DUC 2002), Philadelphia, July 2002, pp. 45-51. Association for Computational Linguistics.
http://haydn.isi.edu/ROUGE/ Passonneau, R.J. et. al (2005), applying the Pyramid method in DUC 2005, DUC 2005, 25-32. Jaime Carbonell, Jade Goldstein, "The use of MMR, diversity-based reranking for reordering
documents and producing summaries," in Proceedings of the 21st ACM-SIGIR International Conference on Research and Development in Information Retrieval, Melbourne, Australia, 1998.
Radev, D.R., Jing, H. and Budzikowska, M. (2000) ‘Centroid-based summarisation of mul-tiple documents: sentence extraction, utility-based evaluation, and user studies’, ANLP/NAACL-00, 2000, 21-30.
J.M. Conroy, et. al., Topic-focused multi-document summarization using an approximate oracle score, ACL 2006.
Paice, C.D. 1990. Constructing Literature Abstracts by Computer: Techniques and Prospects. Information Processing and Management 26(1): 171–186.
Barzilay, R. and M. Elhadad. 1997. Using Lexical Chains for Text Summarization. In Proceedings of the Workshop on Intelligent Scalable Text Summarization at the ACL/EACL Conference, 10–17. Madrid, Spain.
Edmundson, H.P. 1968. New Methods in Automatic Extraction. Journal of the ACM 16(2), 264–285.
Hovy, E.H. and Lin, C-Y. 1998. Automated Text Summarization in SUMMARIST. In M. Maybury and I. Mani (eds), Intelligent Scalable Summarization Text Summarization. Forthcoming.
Kupiec, J., J. Pedersen, and F. Chen. 1995. A Trainable Document Summarizer. In Proceedings of the Eighteenth Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR), 68–73. Seattle, WA.
Luhn, H.P. 1959. The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development, 159–165.
Lin, C-Y. 1995. Topic Identification by Concept Generalization. In Proceedings of the Thirty-third Conference of the Association of Computational Linguistics (ACL-95), 308–310. Boston, MA.
Mann, W.C. and S.A. Thompson. 1988. Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text 8(3), 243–281. Also available as USC/Information Sciences Institute Research Report RR-87-190
134
Marcu, D. 1998. Improving Summarization Through Rhetorical Parsing Tuning. Proceedings of the Workshop on Very Large Corpora. Montreal, Canada.
McKeown, K.R. and D.R. Radev. 1995. Generating Summaries of Multiple News Articles. In Proceedings of the Eighteenth Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR), 74–82. Seattle, WA.
Salton, G., J. Allen, C. Buckley, and A. Singhal. 1994. Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts. Science 264: 1421–1426.
Schank, R.C. and R.P. Abelson. 1977. Scripts, Plans, Goals, and Understanding. Hillsdale, NJ: Lawrence Erlbaum Associates.
DeJong, G. 1978. Fast Skimming of News Stories: The FRUMP System. Ph.D. diss. Yale University.
135