Lecture 7 Centrality (cont) Slides modified from Lada Adamic and Dragomir Radev.
Automatic summarization Dragomir R. Radev University of Michigan [email protected].
-
Upload
dimitri-holbert -
Category
Documents
-
view
223 -
download
0
Transcript of Automatic summarization Dragomir R. Radev University of Michigan [email protected].
Outline
• What is summarization• Genres of summarization (Single-doc, Multi-
doc, Query-based, etc.)• Extractive vs. non-extractive summarization• Evaluation metrics• Current systems
– Marcu/Knight– MEAD/Lemur– NewsInEssence/NewsBlaster
• What is possible and what is not
Goal of summarization
• Preserve the “most important information” in a document.
• Make use of redundancy in text
• Maximize information density
Compression Ratio =|S|
|D|
Retention Ratio =i (S)
i (D)
Goal: i (S)
i (D)
|S|
|D|>
Sentence-extraction based (SE) summarization
• Classification problem
• Approximation
}1,0{2: Sf
}1,0{: isf
Typical approaches to SE summarization
• Manually-selected features: position, overlap with query, cue words, structure information, overlap with centroid
• Reranking: maximal marginal relevance [Carbonell/Goldstein98]
Non-SE summarization
• Discourse-based [Marcu97]
• Lexical chains [Barzilay&Elhadad97]
• Template-based [Radev&McKeown98]
Evaluation metrics
• Intrinsic measures– Precision, recall– Kappa– Relative utility [Radev&al.00]– Similarity measures (cosine, overlap, BLEU)
• Extrinsic measures– Classification accuracy– Informativeness for question answering– Relevance correlation
Web resources
http://www.summarization.comhttp://duc.nist.govhttp://www.newsinessence.comhttp://www.clsp.jhu.edu/ws2001/groups/asmd/http://www.cs.columbia.edu/~jing/summarization.htmlhttp://www.dcs.shef.ac.uk/~gael/alphalist.htmlhttp://www.csi.uottawa.ca/tanka/ts.htmlhttp://www.ics.mq.edu.au/~swan/summarization/
Summarization architecture
• What do human summarizers do?– A: Start from scratch: analyze, transform,
synthesize (top down)– B: Select material and revise: “cut and
paste summarization” (Jing & McKeown-1999)
• Automatic systems:– Extraction: selection of material – Revision: reduction, combination, syntactic
transformation, paraphrasing, generalization, sentence reordering
com
plex
ity
Extracts
Abstracts
Required knowledge
lexical level local context discourse global worldbag of words bigrams/trigrams referential links structure knowledge
Ad Hoc IRQASentence selectionSentence reductionSentence combinationsyntactic transformationlexical paraphrasegeneralization/specificationsentence reordering
dependency structure
Examples of generative models in summarization systems
• Sentence selection
• Sentence / document reduction
• Headline generation
Ex. 1: Sentence selection
• Conroy et al (DUC 2001):• HMM on sentence level, each state has an associated
feature vector (pos,len, #content terms)
• Compute probability of being a summary sentence
• Kraaij et al (DUC 2001)• Rank sentences according to posterior probability given
a mixture model
+ Grammaticality is OK– Lacks aggregation, generalization, MDS
Knight & Marcu (AAAI2000)
• Compression: delete substrings in an informed way (based on parse tree)– Required: PCFG parser, tree aligned training
corpus– Channel model: probabilistic model for expansion
of a parse tree– Results: much better than NP baseline
+ Tight control on grammaticality+ Mimics revision operations by humans
Daumé & Marcu (ACL2002)
• Document compression, noisy channel– Based on syntactic structure and discourse
structure (extension of Knight & Marcu model)– Required: Discourse & syntactic parsers– Training corpus where EDU’s in summaries are
aligned with the documents
– Cannot handle interesting document lengths (due to complexity)
Berger & Mittal (sigir2000)
• Input: web pages (often not running text)– Trigram language model– IBM model 1 like channel model:
• Choose length, draw word from source model and replace with similar word, independence assumption
– Trained on Open Directory
+ Non-extractive– Grammaticality and coherence are
disappointing: indicative
))()|((maxarg* sPsdPSs
Zajic, Dorr & Schwartz (duc2002)
• Headline generation from a full story: P(S|H)P(H)
• Channel model based on HMM consisting of a bigram model of headline words and a unigram model of story words, bigram language model
• Decoding parameters are crucial to produce good results (length, position, strings)
+ Good results in fluency and accuracy
Conclusions
• Fluent headlines within reach of simple generative models
• High quality summaries (coverage, grammaticality, coherence) require higher level symbolic representations
• Cut & paste metaphor divides the work into manageable sub-problems
• Noisy channel method effective, but not always efficient
Open issues• Audience (user model)• Types of source documents• Dealing with redundancy• Information ordering (e.g., temporal)• Coherent text• Cross-lingual summarization (Norbert Fuhr)• Use summaries to improve IR (or CLIR) - relevance correlation• LM for text generation• Possibly not well-defined problem (low interjudge agreement)• Develop models with more linguistic structure• Develop integrated models, e.g. by using priors (Rosenfeld)• Build efficient implementations• Evaluation: Define a manageable task