Automatic summarization Dragomir R. Radev University of Michigan [email protected].

22
Automatic summarization Dragomir R. Radev University of Michigan [email protected]

Transcript of Automatic summarization Dragomir R. Radev University of Michigan [email protected].

Automatic summarization

Dragomir R. Radev

University of Michigan

[email protected]

Outline

• What is summarization• Genres of summarization (Single-doc, Multi-

doc, Query-based, etc.)• Extractive vs. non-extractive summarization• Evaluation metrics• Current systems

– Marcu/Knight– MEAD/Lemur– NewsInEssence/NewsBlaster

• What is possible and what is not

Goal of summarization

• Preserve the “most important information” in a document.

• Make use of redundancy in text

• Maximize information density

Compression Ratio =|S|

|D|

Retention Ratio =i (S)

i (D)

Goal: i (S)

i (D)

|S|

|D|>

Sentence-extraction based (SE) summarization

• Classification problem

• Approximation

}1,0{2: Sf

}1,0{: isf

Typical approaches to SE summarization

• Manually-selected features: position, overlap with query, cue words, structure information, overlap with centroid

• Reranking: maximal marginal relevance [Carbonell/Goldstein98]

Non-SE summarization

• Discourse-based [Marcu97]

• Lexical chains [Barzilay&Elhadad97]

• Template-based [Radev&McKeown98]

Evaluation metrics

• Intrinsic measures– Precision, recall– Kappa– Relative utility [Radev&al.00]– Similarity measures (cosine, overlap, BLEU)

• Extrinsic measures– Classification accuracy– Informativeness for question answering– Relevance correlation

Relevance correlation (RC)

22)()(

))((

ii

ii

iii

yyxx

yyxxr

Web resources

http://www.summarization.comhttp://duc.nist.govhttp://www.newsinessence.comhttp://www.clsp.jhu.edu/ws2001/groups/asmd/http://www.cs.columbia.edu/~jing/summarization.htmlhttp://www.dcs.shef.ac.uk/~gael/alphalist.htmlhttp://www.csi.uottawa.ca/tanka/ts.htmlhttp://www.ics.mq.edu.au/~swan/summarization/

Generative probabilistic models for summarization

Wessel Kraaij

TNO TPD

Summarization architecture

• What do human summarizers do?– A: Start from scratch: analyze, transform,

synthesize (top down)– B: Select material and revise: “cut and

paste summarization” (Jing & McKeown-1999)

• Automatic systems:– Extraction: selection of material – Revision: reduction, combination, syntactic

transformation, paraphrasing, generalization, sentence reordering

com

plex

ity

Extracts

Abstracts

Required knowledge

lexical level local context discourse global worldbag of words bigrams/trigrams referential links structure knowledge

Ad Hoc IRQASentence selectionSentence reductionSentence combinationsyntactic transformationlexical paraphrasegeneralization/specificationsentence reordering

dependency structure

Examples of generative models in summarization systems

• Sentence selection

• Sentence / document reduction

• Headline generation

Ex. 1: Sentence selection

• Conroy et al (DUC 2001):• HMM on sentence level, each state has an associated

feature vector (pos,len, #content terms)

• Compute probability of being a summary sentence

• Kraaij et al (DUC 2001)• Rank sentences according to posterior probability given

a mixture model

+ Grammaticality is OK– Lacks aggregation, generalization, MDS

Ex. 2: Sentence reduction

Knight & Marcu (AAAI2000)

• Compression: delete substrings in an informed way (based on parse tree)– Required: PCFG parser, tree aligned training

corpus– Channel model: probabilistic model for expansion

of a parse tree– Results: much better than NP baseline

+ Tight control on grammaticality+ Mimics revision operations by humans

Daumé & Marcu (ACL2002)

• Document compression, noisy channel– Based on syntactic structure and discourse

structure (extension of Knight & Marcu model)– Required: Discourse & syntactic parsers– Training corpus where EDU’s in summaries are

aligned with the documents

– Cannot handle interesting document lengths (due to complexity)

Ex. 3: Headline generation

Berger & Mittal (sigir2000)

• Input: web pages (often not running text)– Trigram language model– IBM model 1 like channel model:

• Choose length, draw word from source model and replace with similar word, independence assumption

– Trained on Open Directory

+ Non-extractive– Grammaticality and coherence are

disappointing: indicative

))()|((maxarg* sPsdPSs

Zajic, Dorr & Schwartz (duc2002)

• Headline generation from a full story: P(S|H)P(H)

• Channel model based on HMM consisting of a bigram model of headline words and a unigram model of story words, bigram language model

• Decoding parameters are crucial to produce good results (length, position, strings)

+ Good results in fluency and accuracy

Conclusions

• Fluent headlines within reach of simple generative models

• High quality summaries (coverage, grammaticality, coherence) require higher level symbolic representations

• Cut & paste metaphor divides the work into manageable sub-problems

• Noisy channel method effective, but not always efficient

Open issues• Audience (user model)• Types of source documents• Dealing with redundancy• Information ordering (e.g., temporal)• Coherent text• Cross-lingual summarization (Norbert Fuhr)• Use summaries to improve IR (or CLIR) - relevance correlation• LM for text generation• Possibly not well-defined problem (low interjudge agreement)• Develop models with more linguistic structure• Develop integrated models, e.g. by using priors (Rosenfeld)• Build efficient implementations• Evaluation: Define a manageable task