Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative...
Transcript of Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative...
![Page 1: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/1.jpg)
1
Multiple Alternative SentenceCompressions (MASC)
A Framework for Automatic Summarization
Nitin Madnani, David Zajic, Bonnie Dorr
Necip Fazil Ayan, Jimmy Lin
University of Maryland, College Park
![Page 2: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/2.jpg)
2
Outline
• Problem Description
• MASC Architecture
• MASC Results
• Improving Candidate Selection
• Summary & Future Work
![Page 3: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/3.jpg)
3
Problem Description
• Sentence-level extractive summarization– Source sentences contain mixture of relevant/non-
relevant, novel/redundant information.
• Compression– Single output compression can’t provide best
compression of each sentence for every user need.
• Multiple Alternative Sentence Compression– Generation of multiple candidate compressions of
source sentences.– Feature-based selection to choose among
candidates.
![Page 4: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/4.jpg)
4
Outline
• Problem Description
• MASC Architecture
• MASC Results
• Improving Candidate Selection
• Summary & Future Work
![Page 5: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/5.jpg)
5
MASC Architecture
SentenceFiltering
SentenceCompression
CandidateSelection
Sentences
Candidates
Task-Specific Features(e.g. query)
Documents
Summary
HMM Hedge
Trimmer
Topiary
(Zajic et al., 2005) (Zajic et al., 2006)
![Page 6: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/6.jpg)
6
HMM Hedge Architecture
Part of SpeechTagger1
HMM Hedge
Sentence
Sentence withVerb Tags
VERB VERB
Compressions
HeadlineLanguage Model
StoryLanguage Model
Language models based on242,918 AP headlines andstories from Tipster Corpus
1TreeTagger (Schmid, 1994)
![Page 7: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/7.jpg)
7
HMM HedgeMultiple Alternative Compressions
• Calculate best compression at each word-lengthfrom 5 to 15 words
• Calculate 5 best compressions at each wordlength
![Page 8: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/8.jpg)
8
Trimmer Architecture
Entity Tagger1
Trimmer
Sentence
Sentence withEntity Tags
PERSON TIME EXPR
Compressions
Parser2Parse
1BBN IdentiFinder (Bikel et al., 1999)
2Charniak Parser (Charniak, 2000)
![Page 9: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/9.jpg)
9
Multi-candidate Trimmer
• How to generate multiple candidatecompressions?– Use the state of the parse tree after each rule
application as a candidate
– Use rules that generate multiple candidates
– 9 single-output rules, 3 multi-output rules• Zajic et al, 2005, 2006; Zajic 2007
![Page 10: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/10.jpg)
10
Trimmer Rule: Root-S
• Select node to be root of compression• Consider any S node with NP,VP children
The latest flood crestpassed Chongqing in
southwest China
and waters were rising inYichang on the middlereaches of the Yangtze
statetelevision
reportedSunday
S1
S
S2 CC S3NP VP
![Page 11: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/11.jpg)
11
Trimmer Rule: Conjunction
Illegalfireworks
injured hundredsof people
and startedsix fires
S
NP VP CC VP
VP
• Conjunction rule removes right, left orneither child.
![Page 12: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/12.jpg)
12
Topiary Architecture
Topiary
Sentence
Candidates
TopicAssignment1
Document DocumentCorpus
TopicTerms
Compressions
Trimmer
1BBN Unsupervised TopicDetection
![Page 13: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/13.jpg)
13
Topiary ExamplesDUC2004
PINOCHET: wife appealed saying he too sick to beextradited to face charges
MAHATHIR ANWAR_IBRAHIM: Lawyers went tocourt to demand client's release– Mahathir Mohamad is the former Prime Minister of
Malaysia
– Anwar bin Ibrahim is a former deputy prime ministerand finance minister of Malaysia, convicted ofcorruption in 1998
![Page 14: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/14.jpg)
14
Selector Architecture
Relevance &Centrality Scorer1
SentenceSelector
Candidates + Features
Document
DocumentSet
Candidates + More Features
Query
?
FeatureWeightsSummary
Cull &Rescore
1Uniform Retrieval Architecture(URA), UMD’s software infrastructurefor IR tasks.
![Page 15: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/15.jpg)
15
Outline
• Problem Description
• MASC Architecture
• MASC Results
• Improving Candidate Selection
• Summary & Future Work
![Page 16: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/16.jpg)
16
Evaluation ofHeadline Generation Systems
DUC2004 Test Data, Rouge recall with unigrams
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
First 75 UTD
Topics
HMM
Hedge
Trimmer Topiary HMM
Hedge
Trimmer Topiary
Ro
ug
e 1
Re
ca
ll
No MASC MASC
![Page 17: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/17.jpg)
17
Evaluation of Multi-DocumentSummarization Systems
DUC2006 Test Data
0.05
0.055
0.06
0.065
0.07
0.075
No Compression HMM Hedge Trimmer
Ro
ug
e 2
Re
ca
ll
![Page 18: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/18.jpg)
18
Outline
• Problem Description
• MASC Architecture
• MASC Results
• Improving Candidate Selection
• Summary & Future Work
![Page 19: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/19.jpg)
19
Tuning Feature Weights with ΔROUGE
c1
c2
ck
.
.
.
Initialize: S = {}, H = {}
C ← current k-best candidates
for c ∈ C
ΔROUGE(c) = R2R(S∪{c}) - R2R(S)
Add hypothesis to HS ← S ∪ {c1}
Update remaining candidates
Repeat unless |S| > L
wopt ← powellROUGE(H, w0)
Summary(S)
Hypotheses(H)
C
…
Δ1
Δ2
Δk
.
.
.
![Page 20: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/20.jpg)
20
Optimization Results
0.1540.126SU-4
0.1040.0812
0.4030.3631
ΔROUGE (k=10)ManualROUGE
DUC2007 data, all differences significant at p < 0.05
Manual : Feature weights optimized manually to maximizeROUGE-2 Recall on the final system output
Key Insights for ΔROUGE optimization:• Uses multiple alternative sentence compressions• Directly optimizes candidate selection process.
![Page 21: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/21.jpg)
21
• Candidate words can be emitted by two disparate worddistributions
• Assuming candidate words are i.i.d., the redundancyfeature for a given candidate is:
Redundancy
S = Summary, L = General English language!
P(w | S) = n(w,S) S( )
!
R(c) = log P(c)( ) = log "P(w | S) + (1# ")P(w | L)w$c
%&
' (
)
* +
!
P(w | L) = n(w,L) L( )
Other documents in the same cluster are used to represent the general language
REDUNDANT NON-REDUNDANT
λ + (1-λ)
!
P(w) =
![Page 22: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/22.jpg)
22
Incorporating Paraphrases
• Redundancy uses bags-of-words to compute P(w|S)
• Not useful if candidate word is a paraphrase of summaryword (classified as non-redundant)
• Add another bag-of-words P, such that
• Use n(w,P) for redundancy computation if n(w,S) = 0
!
P(w | S) =n(w,S)
| S |
!
"w # SP = { a paraphrase for w, }
![Page 23: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/23.jpg)
23
Generating Paraphrases
• Leverage phrase-based MT system– Use E-F correspondences extracted from word-aligned bi-
text– Pivot each pair of E-F correspondence with common
foreign side to get E-E correspondence–
• Example
• Pick most frequent correspondence for w
!
c(e1,e2) = c(e
1, f )c( f ,e
2)
f
"
上升 ||| climbed ||| 1.0上升 ||| increased ||| 2.0上升 ||| uplifted ||| 1.0
increased ||| climbed ||| 2.0climbed ||| uplifted ||| 1.0. . .. . .uplifted ||| increased ||| 2.0
![Page 24: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/24.jpg)
24
Paraphrase Results
• Using paraphrases yields no significantimprovements
• Unrelated to the quality of the paraphrases
• Anomalous cases occur extremely rarely– The original bag-of-words is sufficient to
capture candidate redundancy almost all thetime
![Page 25: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/25.jpg)
25
Outline
• Problem Description
• MASC Architecture
• MASC Results
• Improving Candidate Selection
• Summary & Future Work
![Page 26: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/26.jpg)
26
DUC 2007 Results
• Systems 7, 36• Main:
– Responsiveness = 3.089 (4th)– ROUGE-2 = 0.108 (8th)– ROUGE-SU4 = 0.158 (11th)
• Update:– Responsiveness = 2.800 (2nd)– ROUGE-2 = 0.086 (9th)– ROUGE-SU4 = 0.124 (8th)
![Page 27: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/27.jpg)
27
Summary
• MASC with feature-based candidate selectionimproves headline generation and showspromise for multi-document summarization.
• Optimizing for ΔROUGE provides significantimprovements over previous approach
• Redundancy feature works at lexical as well asdocument-level
• Using paraphrases requires novel formulation
![Page 28: Multiple Alternative Sentence Compressions (MASC) · 2007. 4. 26. · 1 Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David](https://reader036.fdocuments.us/reader036/viewer/2022071517/613aba890051793c8c0134da/html5/thumbnails/28.jpg)
28
Future Work
• Fully explore Trimmer search space• Split redundancy feature into its components
and tune λ automatically• Use an n-gram LM to estimate P(w|L)• Continue to experiment with paraphrase-based
approaches to redundancy– Scale up to phrase-level paraphrases– Use combination of high-coverage and high-quality
paraphrases