The Pythy Summarization System: Microsoft Research at DUC 2007

23
The PYTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki, and Lucy Vanderwende Microsoft Research April 26, 2007

description

The Pythy Summarization System: Microsoft Research at DUC 2007. Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki, and Lucy Vanderwende Microsoft Research April 26, 2007. DUC Main Task Results. Automatic Evaluations (30 participants) - PowerPoint PPT Presentation

Transcript of The Pythy Summarization System: Microsoft Research at DUC 2007

Page 1: The  Pythy  Summarization System: Microsoft Research at DUC 2007

The PYTHY Summarization System: Microsoft Research at DUC 2007

Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi,

Hisami Suzuki, and Lucy Vanderwende

Microsoft Research

April 26, 2007

Page 2: The  Pythy  Summarization System: Microsoft Research at DUC 2007

DUC Main Task Results

• Automatic Evaluations (30 participants)

• Human Evaluations

• Did pretty well on both measures

Criterion Rank ScoreROUGE-2 2 0.12028

ROUGE-SU4 3 0.17074

Criterion RankPyramid 1=

Content 5=

Page 3: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Overview of PYTHY

• Linear sentence ranking model

• Learns to rank sentences based on:• ROUGE scores against model summaries• Semantic Content Unit (SCU) weights of

sentences selected by past peers• Considers simplified sentences

alongside original sentences

Kk

kk sfwsScore..1

)()(

Page 4: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Featureinventor

y

TargetsROUGE Oracle

Pyramid/SCU

ROUGE X 2

Ranking/

Training

Model

Sentences

SimplifiedSentences

DocsDocsDocs

Docs

PYTHYTraining

Page 5: The  Pythy  Summarization System: Microsoft Research at DUC 2007

SentencesDocs

Docs

Featureinventor

y

SimplifiedSentences

DocsDocs

Model

PYTHYTesting

Search

Dynamic

Scoring

Summary

Page 6: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Sentence Simplification

• Extension of simplification method for DUC06• Provides sentence alternatives, rather than

deterministically simplify a sentence• Uses syntax-based heuristic rules• Simplified sentences evaluated alongside originals

• In DUC 2007:• Average new candidates generated: 1.38 per sentence• Simplified sentences generated: 61% of all sents• Simplified sentences in final output: 60%

Featureinventory

TargetsROUGE OraclePyramid

/SCUROUGE

X 2

Ranking

Training

Model

SentencesSimplifiedSentences

Docs Do

csDocs Doc

s

PYTHYTrainin

g

Page 7: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Sentence-Level Features

• SumFocus features: SumBasic (Nenkova et al 2006) + Task focus• cluster frequency and topic frequency• only these used in MSR DUC06

• Other content word unigrams: headline frequency• Sentence length features (binary features)• Sentence position features (real-valued and binary)• N-grams (bigrams, skip bigrams, multiword phrases)• All tokens (topic and cluster frequency)• Simplified Sentences (binary and ratio of relative length)• Inverse document frequency (idf)

Featureinventory

TargetsROUGE OraclePyramid

/SCUROUGE

X 2

Ranking

Training

Model

SentencesSimplifiedSentences

Docs Do

csDocs Doc

s

PYTHYTrainin

g

Page 8: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Pairwise Ranking

• Define preferences for sentence pairs• Defined using human summaries and SCU weights

• Log-linear ranking objective used in training

• Maximize the probability of choosing the better sentence from each pair of comparable sentences

Featureinventory

TargetsROUGE OraclePyramid

/SCUROUGE

X 2

Ranking

Training

Model

SentencesSimplifiedSentences

Docs Do

csDocs Doc

s

PYTHYTrainin

g

[Ofer et al. 03], [Burges et al. 05]

Page 9: The  Pythy  Summarization System: Microsoft Research at DUC 2007

ROUGE Oracle Metric

• Find an oracle extractive summary• the summary with the highest average ROUGE-2

and ROUGE-SU4 scores • All sentences in the oracle are considered

“better” than any sentence not in the oracle• Approximate greedy search used for finding

the oracle summary

Featureinventory

TargetsROUGE OraclePyramid

/SCUROUGE

X 2

Ranking

Training

Model

SentencesSimplifiedSentences

Docs Do

csDocs Doc

s

PYTHYTrainin

g

Page 10: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Pyramid-Derived Metric

• University of Ottawa SCU-annotated corpus (Copeck et al 06)

• Some sentences in 05 & 06 document collections are:• known to contain certain SCUs• known not to contain any SCUs

• Sentence score is sum of weights of all SCUs

• for un-annotated sentences, the score is undefined

• A sentence pair is constructed for training s1 > s2 iff w(s1) >w(s2)

TargetsROUGE OraclePyramid

/SCUROUGE

X 2

Ranking

Training

Model

SentencesSimplifiedSentences

Docs Do

csDocs Doc

s

PYTHYTrainin

g Feature

inventory

Page 11: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Model Frequency Metrics

• Based on unigram and skip bigram frequency

• Computed for content words only• Sentence si is “better” than sj if

TargetsROUGE OraclePyramid

/SCUROUGE

X 2

Ranking

Training

Model

SentencesSimplifiedSentences

Docs Do

csDocs Doc

s

PYTHYTrainin

g Feature

inventory

k

kcpsw )(ˆ)( models

Page 12: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Combining multiple metrics

• From ROUGE oracle

all sentences in oracle summary better than other sentences

• From SCU annotations

sentences with higher avg SCU weights better

• From model frequency

sentences with words occurring in models better• Combined loss: adding the losses according to all metrics

}:{1 ji ssijD

}:{2 ji ssijD

}:{3 ji ssijD

)()()( 321 DLDLDLL

TargetsROUGE OraclePyramid

/SCUROUGE

X 2

Ranking

Training

Model

SentencesSimplifiedSentences

Docs Do

csDocs Doc

s

PYTHYTrainin

g Feature

inventory

Ranking

Training

Page 13: The  Pythy  Summarization System: Microsoft Research at DUC 2007

SentencesDocs

Docs

Featureinventory

SimplifiedSentences

DocsDocs

Model

PYTHYTesting

Search

Dynamic

Scoring

Summary

Page 14: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Dynamic Sentence Scoring

• Eliminate redundancy by re-weighting• Similar to SumBasic (Nenkova et al 2006), re-

weighting given previously selected sentences

• Discounts for features that decompose into word frequency estimates

SearchDynami

c Scoring

Page 15: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Search

• The search constructs partial summaries and scores them:

• The score of a summary does not decompose into an independent sum of sentence scores• Global dependencies make exact search hard

• Used multiple beams for each length of partial summaries• [McDonald 2007]

),..,|(),...,,( 11...1

21 i

niin sssscoresssScore

SearchDynami

c Scoring

Page 16: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Impact of Sentence Simplification

No Simplified SimplifiedR-2 R-SU4 R-2 R-SU4

SumFocus 0.078 0.132 0.078 0.134

PYTHY 0.089 0.140 0.096 0.147

•Trained on 05 data, tested on O6 data

Page 17: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Impact of Sentence Simplification

No Simplified SimplifiedR-2 R-SU4 R-2 R-SU4

SumFocus 0.078 0.132 0.078 0.134

PYTHY 0.089 0.140 0.096 0.147

•Trained on 05 data, tested on O6 data

Page 18: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Impact of Sentence Simplification

No Simplified SimplifiedR-2 R-SU4 R-2 R-SU4

SumFocus 0.078 0.132 0.078 0.134

PYTHY 0.089 0.140 0.096 0.147

•Trained on 05 data, tested on O6 data

Page 19: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Evaluating the Metrics

Criterion Num Pairs

Train Acc Content Only All Words

R-2 R-SU4 R-2 R-SU4Oracle 941K 93.1 0.076 0.107 0.093 0.143

SCUs 430K 62.0 0.078 0.108 0.086 0.134

Model Freq. 6.3M 96.9 0.076 0.106 0.096 0.147All 7.7M 94.2 0.076 0.107 0.096 0.147

Trained on 05 data, tested on 06 dataIncludes simplified sentences

Page 20: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Evaluating the Metrics

Criterion Num Pairs

Train Acc Content Only All Words

R-2 R-SU4 R-2 R-SU4Oracle 941K 93.1 0.076 0.107 0.093 0.143

SCUs 430K 62.0 0.078 0.108 0.086 0.134

Model Freq. 6.3M 96.9 0.076 0.106 0.096 0.147All 7.7M 94.2 0.076 0.107 0.096 0.147

Trained on 05 data, tested on 06 dataIncludes simplified sentences

Page 21: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Update Summarization Pilot• SVM novelty classifier trained on TREC 02 & 03 novelty

track

ROUGE 2 ROUGE-SU4

PYTHY + Novelty (1) 0.07135 0.11164

PYTHY + Novelty (.5) 0.07879 0.12929

PYTHY + Novelty (.1) 0.08721 0.12958

PYTHY 0.08686 0.12876

SumFocus 0.07002 0.11033

)BG|)(novelPr(PrevS|(PrevS|(Score iiPythyi s)sScore)s

Page 22: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Summary and Future Work• Summary

• Combination of different target metrics for training• Many sentence features• Pair-wise ranking function• Dynamic scoring

• Future work• Boost robustness

• Sensitive to cluster properties (e.g., size)• Improve grammatical quality of simplified sentences• Reconcile novelty and (ir)relevance• Learn features over whole summaries rather than individual

sentences

Page 23: The  Pythy  Summarization System: Microsoft Research at DUC 2007

Thank You