Multi-Document Summarization of Evaluative Text

52
04/07/22 EACL 2006 1 Giuseppe Carenini, Raymond T. Ng, Adam Pauls Computer Science Dept. University of British Columbia Vancouver, CANADA Multi-Document Summarization of Evaluative Text

description

Multi-Document Summarization of Evaluative Text. Giuseppe Carenini, Raymond T. Ng, Adam Pauls Computer Science Dept. University of British Columbia Vancouver, CANADA. Multi-Document Summarization of Evaluative Text. Giuseppe Carenini, Raymond T. Ng, Adam Pauls Computer Science Dept. - PowerPoint PPT Presentation

Transcript of Multi-Document Summarization of Evaluative Text

Page 1: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 1

Giuseppe Carenini, Raymond T. Ng, Adam Pauls

Computer Science Dept.University of British Columbia

Vancouver, CANADA

Multi-Document Summarization of Evaluative Text

Page 2: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 2

Giuseppe Carenini, Raymond T. Ng, Adam Pauls

Computer Science Dept.University of British Columbia

Vancouver, CANADA

Multi-Document Summarization of Evaluative Text

Page 3: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 3

Motivation and Focus

Large amounts of info expressed in text form is constantly produced News, Reports, Reviews, Blogs,

Emails…. Pressing need to summarize Considerable work but limited factual

info

Page 4: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 4

Our Focus

Evaluative documents (good vs. bad, right vs. wrong) about a single entity

●Customer reviews (e.g. Amazon.com)●Travel logs about a destination●Teaching evaluations●User studies (!)

.

.

.

Page 5: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 5

Our Focus

We want to do this:

“The Canon G3 is a great camera. . .”

“Though great, the G3 has bad menus. . .”

“I love the Canon G3! It . . .”

Most users liked the Canon G3. Even though some did not like the menus, many . . .

Page 6: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 6

Two Approaches

Automatic summarizers generally produce two types of summaries:1. Extracts: A representative subset of text

from the original corpus2. Abstracts: Generated text which contains

the most relevant info from the original corpus

Page 7: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 7

Two Approaches (cont'd)

Extracts-based summarizers generally fare better for factual summarization (c.f. DUC 2005)

But extracts aren't well suited to capturing evaluative info● Can't express distribution of opinions (‘some/all’)● Can't aggregate opinions either numerically or

conceptually

So we tried both

Page 8: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 8

Two Approaches (cont'd)

Extract-based approach (MEAD*): Based on MEAD (Radev et al. 2003)

framework for summarization Augmented with knowledge of

evaluative info (I'll explain later) Abstract-based (SEA):

Based on GEA (Carenini & Moore, 2001) framework for generating evaluative arguments about an entity

Page 9: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 9

Pipeline Approach (for both)

Evaluative Documents

Organization

Extraction of evaluative info

Organization of extracted info

Selection of extracted info

Presentation of extracted info

Selection of extracted info

Shared

Page 10: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 10

Extracting evaluative info

We adopt previous work of Hu & Liu (2004) (but many others exist . . .)

Their approach extracts:What features of the entity are

evaluatedThe strength and polarity of the

evaluation on the [ -3 ….. +3 ] intervalApproach is (mostly) unsupervised

Page 11: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 11

Examples

•“the menus are easy to navigate and the buttons are easy to use. it is a fantastic camera ……”•“… the canon computer software used to download , sort , . . . is very easy to use. the only two minor issues i have with the camera are the lens cap ( it is not very snug and can come off too easily). . . .”

Page 12: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 12

Feature Discovery

•“the menus are easy to navigate and the buttons are easy to use. it is a fantastic camera …”•“…… the canon computer software used to download , sort , . . . is very easy to use. the only two minor issues i have with the camera are the lens cap ( it is not very snug and can come off too easily). . . .”

Page 13: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 13

Strength/Polarity Determination

•“the menus are easy to navigate(+2) and the buttons are easy to use(+2). it is a fantastic(+3) camera …”•“…… the canon computer software used to download , sort , . . . is very easy to use (+3). the only two minor issues i have with the camera are the lens cap ( it is not very snug (-2) and can come off too easily (-2))...”

Page 14: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 14

Pipeline Approach (for both)

Evaluative Documents

Organization

Extraction of evaluative info

Organization of extracted info

Selection of extracted info

Presentation of extracted info

Selection of extracted info

Shared

Partially shared

Page 15: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 15

Organizing Extracted Info

Extraction provides a bag of features But

features are redundant features may range from concrete and

specific (e.g. “resolution”) to abstract and general (e.g. “image”)

Solution: map features to a hierarchy [Carenini, Ng, & Zwart 2005]

Page 16: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 16

Canon G3 Digital Camera

User Interface

Buttons MenusLever

Convenience

Menu

Battery Life

Battery

Battery Charging System

. . .

.

.

.

Feature Ontology

[-1,-1,+1,+2,+2,+3,+3, +3]

[+1]

[+1] [+2,+2,+2,+3+3]

[-1,-1,-2]

“canon”“canon g3”

“digital camera”

Page 17: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 17

Organization: SEA vs. MEAD*

SEA operates only on the hierarchical data and forgets about raw extracted features

MEAD* operates on the raw extracted features and only uses hierarchy for sentence ordering (I'll come back to this)

Page 18: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 18

Pipeline Approach (for both)

Evaluative Documents

Organization

Extraction of evaluative info

Organization of extracted info

Selection of extracted info

Presentation of extracted info

Selection of extracted info

Shared

Partially shared

Not shared

Page 19: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 19

Feature Selection: SEA

node leaf if)(_

otherwise)()1()(_ )(

)(

ifmoidir

fmoii

fmoidirfmoi

ik fchildrenfk

i

We define a measure of importance (moi) for each feature fi in

the hierarchy of features2)(_ k

PSpsi psfmoidir

ik

psk

Canon G3 Digital Camera

User Interface Convenience[+1]

[-1,-1,+1,+2,+2,+3,+3, +3]

Page 20: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 20

Selection Procedure

=> Dynamic greedy selection:

Until desired number of features is selected• Most important node is selected• That node is removed from the tree• Importance of remaining nodes is recomputed

• Straightforward greedy selection would not work • if a node derives most of its importance from its child(ren) including both the node and the child(ren) would be redundant

Similar to redundancy reduction step in many automatic summarization algorithms

Page 21: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 21

Feature Selection: MEAD*

MEAD* selects sentences, not features

Calculate score for each sentence si with

the menus are easy to navigate(+2) and the buttons are easy to use(+2).

feature(si) ps

k

ik sps

ki pssscore )(

Break ties with MEAD centroid (common feature in multi-document summarization)

Page 22: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 22

Feature Selection: MEAD*

We want to extract sentences for most important features, and only one sentence per feature

Put each sentence in “bucket” for each feature(si)

the menus are easy to navigate(+2 ) and the buttons are easy to use(+2 ).

menus buttons

I like the menus . . .

Page 23: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 23

Feature Selection: MEAD*

Take the (single) highest scoring sentence from the “fullest” buckets until desired summary length is reached

Page 24: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 24

Pipeline Approach (for both)

Evaluative Documents

Organization

Extraction of evaluative info

Organization of extracted info

Selection of extracted info

Presentation of extracted info

Selection of extracted info

Shared

Partially shared

Not shared

Not shared

Page 25: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 25

Presentation: MEAD*

Display selected sentences in order from most general (top of feature hierarchy) to most specific

That's it!

Page 26: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 26

Presentation: SEA

SEA (Summarizer of Evaluative Arguments) is based on GEA (Generator of Evaluative Arguments) (Carenini & Moore, 2001)

GEA takes as input a hierarchical model of features for an entity objective values (good vs. bad) for each feature of

the entity

Adaptation is (in theory) straightforward

Page 27: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 27

Possible GEA Output

The Canon G3 is a good camera. Although the interface is poor, the image quality is excellent.

Page 28: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 28

Target SEA Summary

Most users thought Canon G3 was a good camera. Although, several users did not like interface, almost all users liked the image quality.

Page 29: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 29

Extra work

● What GEA gives us:– High-level text plan (i.e. content selection

and ordering)– Cue phrases for argumentation strategy

(“In fact”, “Although”, etc.)● What GEA does not give us:

– Appropriate micro-planning (lexicalization)

● Need to give indication of distribution of customer opinions

Page 30: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 30

Microplanning (incomplete!)

We generate one clause for each selected feature

Each clause includes 3 key pieces of information:1. Distribution of customers who evaluated the feature

(“Many”, “most”, “some” etc.)2. Name of the feature (“menus”, “image quality”,

etc.)3. Aggregate of opinions (“excellent”, “fair”, “poor”,

etc.)→“most users found the menus to be poor”

Page 31: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 31

Microplanning

Distribution is (roughly) based on fraction of customers who evaluated the feature (+ disagreement . . . )

Name of the feature is straightforward Aggregate of opinions is based on a

function similar in form to the measure of importance average polarity/strength over all

evaluations rather than summing

Page 32: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 32

Microplanning

We “glue” clauses together using cue phrases from GEA

Also perform basic aggregation

Page 33: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 33

Formative Evaluation

Goal: test user’s perceived effectiveness

Participants: 28 ugrad students

Procedure Pretend worked for manufacturer

Given 20 reviews (from either Camera or DVD corpus) and asked to generate summary (~100 words) for marketing dept

After 20 mins, given a summary of the 20 reviews

Asked to fill out questionnaire assessing summary effectiveness (multiple choice and open form)

Page 34: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 34

Formative Evaluation (cont'd)

Conditions: User given one of 4 summaries1.Topline summary (human)2.Baseline summary (vanilla MEAD)3.MEAD* summary4.SEA summary

Page 35: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 35

Quantitative ResultsResponses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)

SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24

Page 36: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 36

Quantitative Results

SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24

Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)

Page 37: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 37

Quantitative Results

SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24

Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)

Page 38: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 38

Quantitative Results

SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24

Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)

Page 39: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 39

Quantitative Results

SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24

Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)

Page 40: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 40

Quantitative Results

SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24

Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)

Page 41: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 41

Quantitative Results

SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24

Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)

Page 42: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 42

Quantitative Results

SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24

Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)

Page 43: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 43

Quantitative Results

SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24

Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)

Page 44: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 44

Qualitative Results: MEAD*

Surprising: many participants didn't notice or didn't mind verbatim text extraction

Two major complaints about content1.Summary was not representative (negative

sentence extracted even though majority were positive)

2.Evaluations of some features were repeated (2) could be addressed, but (1) can only

partially be fixed with pure extraction

Page 45: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 45

Qualitative Results: SEA

Some complaints about “robotic” feel of summary, and about repetition/lack of pronouns➔ Need to do more complex microplanning

Some wanted more details (which “manual features . . . “)● Note: this complaint absent with MEAD*

Some disagreed with feature selection (precision/recall), but this is a problem even with human summaries

Page 46: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 46

Conclusions

Extraction works surprisingly well even for evaluative summarization

Topline > MEAD* SEA > Baseline Need to combine strengths of SEA and

MEAD* for evaluative summarization● Need detail, variety, and natural-sounding text provided by extraction

● Need to generate opinion distributions● Need argument structure from SEA (?)

Page 47: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 47

Other Future Work

Automatically induce feature hierarchy Produce summaries tailored to user

preferences of the evaluated entity Summarize corpora of evaluative

documents about more than one entity

Page 48: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 48

ExamplesMEAD*: Bottom line , well made camera , easy to use, very flexible and powerful features to include the ability to use external flash and lense / filters choices . It has a beautiful design , lots of features, very easy to use , very configurable and customizable , and the battery duration is amazing! Great colors , pictures and white balance. The camera is a dream to operate in automode , but also gives tremendous flexibility in aperture priority , shutter priority, and manual modes . I ’d highly recommend this camera for anyone who is looking for excellent quality pictures and a combination of ease of use and the flexibility to get advanced with many options to adjust if you like.

Page 49: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 49

ExamplesSEA: Almost all users loved the Canon G3 possibly because some users thought the physical appearance was very good. Furthermore, several users found the manual features and the special features to be very good. Also, some users liked the convenience because some users thought the battery was excellent. Finally, some users found the editing/viewing interface to be good despite the fact that several customers really disliked the viewfinder . However, there were some negative evaluations. Some customers thought the lens was poor even though some customers found the optical zoom capability to be excellent.Most customers thought the quality of the images was very good.

Page 50: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 50

Examples

MEAD: I am a software engineer and am very keen into technical details of everything i buy , i spend around 3 months before buying the digital camera ; and i must say , g3 worth every single cent i spent on it . I do n’t write many reviews but i ’m compelled to do so with this camera . I spent a lot of time comparing different cameras , and i realized that there is not such thing as the best digital camera . I bought my canon g3 about a month ago and i have to say i am very satisfied .

Page 51: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 51

ExamplesHuman: The Canon G3 was received exceedingly well. Consumer reviews from novice photographers to semi-professional all listed an impressive number of attributes, they claim makes this camera superior in the market. Customers are pleased with the many features the camera offers, and state that the camera is easy to use and universally accessible. Picture quality, long lasting battery life, size and style were all highlighted in glowing reviews. One flaw in the camera frequently mentioned was the lens which partially obstructs the view through the view finder, however most claimed it was only a minor annoyance since they used the LCD screen.

Page 52: Multi-Document Summarization of Evaluative Text

21/04/23 EACL 2006 52

Microplanning

We “glue” clauses together using cue phrases from GEA “Although”, “however”, etc. indicate opposing

evidence “Because”, “in particular”, indicate supporting

evidence “Furthermore” indicates elaboration

Also perform basic aggregationmost users found the menus to be poor

most users found the buttons to be poor

most users found the menus and buttons to be poor