A summarization Journey

A SUMMARIZATION JOURNEY

Search and Information Extraction Lab

IIIT Hyderabad

Information OverloadExplosive growth of information on web

Failure of information retrieval systems tosatisfy user’s information need

Need for sophisticated information accesssolutions

Summarization

Summary is a condensed version of source document(s) having a recognizable genre : to give the reader an exact and concise idea of the contents of the source.

Text interpretation

Extraction of Relevant information

Condensing Extracted Information

Summary Generation

Flavors of Summarization

Progressive

Single documen

Query Focused

Opinion/ Sentimen

ComparativeGuided

Personalized

Extract Vs. Abstract

Extract An extract is a summary consisting of

entirely of material from the input text Abstract

An abstract is a summary at least some of whose material is not present in the input. eg. paraphrases of content, subject of

categories

Towards Abstraction

Personalized , Cross Lingual Summarization

Guided SummarizationCode SummarizationComparison SummarizationBlog summarization

Progressive Summarization

Abstractive

Single Document, Query Focused Multi Document Summarization

Technological Aspects

Summarization

Support Vector

RegressionRelevance

based Language

Models

External Knowledge

Web, Wikipedia

User Modeling

Statistics – word and

document

Similarity measures,

Novelty detection

Graph Clustering

– Topic identificati

EXTRACTIVE SUMMARIZERS

Query Focused Summarization

Documents should be ranked in order of probability of relevance to the request or information need, as calculated from whatever evidence is available to the system

Query Dependent ranking: Relevance Based Language models Language models (PHAL)

Query Independent ranking: Sentence Prior

RBLM is an IR approach that computes the conditional probabilities of relevance from document and query

PHAL- probabilistic extension to HAL spaces HAL constructs dependencies of a term w on other terms

based on their occurrence in its context in the corpus

DUC Peformance

38 systems participated in 2006

Significant difference between first two systems

Extract vs. Abstract Summarization

We conducted a study (post TAC 2006) Generated best possible extracts Calculated the scores for these extracts

Evaluation with respect to the reference summaries

Rouge 2 Rouge SU4

Human Answers 0.1025 0.1624

Best Answers 0.09965 0.15407

HAL Feature 0.07618 0.13805

Cross Lingual Summarization

Cross Lingual Summarization A bridge between CLIR and MT Extended our mono-lingual summarization

framework to a cross-lingual setting in RBLM framework

Designed a cross-lingual experimental setup using DUC 2005 dataset

Experiments were conducted for Telugu-English language pair

Comparison with mono-lingual baseline shows about 90% performance in ROUGE-SU4 and about 85% in ROUGE-2 f-measures

Progressive Summarization Emerging area of research in summarization

Summarization with a sense of prior knowledge

Introduced as “Update Summarization” at DUC 2007, TAC 2008, TAC 2009

Generate a short summary of a set of newswire articles, under the assumption that the user has already read a given set of earlier articles.

To keep track of temporal news stories

Key challenge

To detect information that is not only relevant but also new given the prior knowledge of reader

Relevant and new VsNon-Relevant and new Vs Relevant and redundant

Three level approach to Novelty DetectionSentence Scoring Developing new features

that capture novelty along with relevance of a sentence

NF, NWRanking Sentences are re ranked

based on the amount of novelty it containsITSim, CoSim

Summary GenerationA selected pool of sentences that contain novel facts. All remaining sentences are filtered out

Evaluations TAC 2008 Update Summarization

data for training: 48 topics Each topic divided into A, B with

10 documents Summary for cluster A is normal

summary and cluster B is update summary

TAC 2009 update Summarization for testing: 44 topics

Baseline summarizer generates summary by picking first 100 words of last document

Run1 – DFS + SL1 Run2 – PHAL + KL

Personalized Summarization Perception of text differs with background of

the reader Need of incorporating user background in the

summarization process Summarization not only a function of input text

but also the reader

Tennis player

Hotel manage

rPoliticia

Web-based profile creation: Personal information available on web- a conference page, a project page, an online paper, or even in a Weblog.

Estimate Model P(w/Mu) to incorporate user in sentence extraction process

Opinion summarizationSentiment Analysis User-generated-content is growing rapidly

through blogs Sentiment analysis provides better access to

information

Sentiment Textual information on the Web can be

categorized as facts and opinions Computational study of opinions, sentiments in

market perspective

Optimization of sentiment in the summary to the maximum extent

Sentiment summarization as a two stage classification problem at sentence level

Polarity Estimation Opinion/fact Positive/Negative

SEMI ABSTRACTIVE SUMMARIZERS

Comparative summarization Summaries for comparing multiples items belonging to a

category Category of “Mobile phones“ will have “Nokia”, “Black

berry’ as its items

Comparative summaries provide the properties or facts common to these items and their corresponding values with respect to each item. “Memory”, “Display”, “Battery Life”,

Memory

Battery Life

Comparative Summaries Generation Attribute Extraction

Find the attributes of the product class Attribute Ranking

Rank the attributes according to importance in comparison

Summary Generation Find the occurrence of attributes in various products

Guided Summarization Query Focused Summarization

User’s information need expressed as a query along with a narrative

Set of documents related to the topic Goal is to produce a shot coherent summary

focusing answer to the query Guided Summarization

Each topic is classified into a set of predefined categories

Each category has a template of important aspects about the topic

Summary is expected to answer all the aspects of template while containing other relevant information

Guided summarization Encourage deeper linguistic and semantic analysis of the

source documents instead of relying only on document word frequencies to select important concepts

Shares similarity with information extraction Specific information from unstructured text is identified

and consequently classified into a set of semantic labels (templates)

Makes information more suitable for other information processing tasks

A guided summarization system has to produce a readable summary encompassing all the information about the templates

Very few investigations exploring the potential of merging summarization with information extraction techniques

Our approach Building a domain model

Essential background knowledge for information extraction

Sentence Annotations To identify sentences having answers to aspects of

template

Concept Mining To use semantic concepts instead of words to calculate

sentence importance

Summary Extraction Modification of summary extraction algorithm to adapt

to the requirements using sentence annotations

THANKS

A summarization Journey

Documents

Transcript of A summarization Journey

A New Multi-document Summarization System

A Summarization System for Scientific Documents

TEACHING SUMMARIZATION

Automatic Keyword Extraction for Text Summarization: A Survey · 3.1 Single Document Text Summarization In single document text summarization, it takes a single document as an input

Speech Summarization

1 Sentiment Summarization: Evaluating and Learning User Preferences Goal: use extractive summarization to aggregate opinions on a product Approach:

Visualization & Summarization

The Dynamic VideoBook: A Hierarchical Summarization for ...media.cs.tsinghua.edu.cn/~ahz/papers/[2013][icip]SunL-0003963.pdf · THE DYNAMIC VIDEOBOOK: A HIERARCHICAL SUMMARIZATION

Cisco - OSPF Design Guidefaculty.weber.edu/kcuddeback/Common_Items/OSPF Configuration.pdf · OSPF and Route Summarization Inter−Area Route Summarization External Route Summarization

Lecture: Summarization

A New Approach to Unsupervised Text Summarization

Video summarization via spatio-temporal deep architecturefuturemedia.szu.edu.cn/assets/files/Video summarization... · 2020. 9. 7. · video summarization task. Zhang et al. tried

Automatic Text Summarization - Arvutiteaduse instituut · Automatic Text Summarization •Text summarization is the process of distilling the most important information from a text

A neural attention model for sentence summarization

TEXT SUMMARIZATION

Document Summarization

Cut-and-Paste Text Summarization - Columbia UniversityABSTRACT Cut-and-Paste Text Summarization Hongyan Jing Automatic text summarization provides a concise summary for a document.

Video Co-summarization: Video Summarization by …...Video Co-summarization: Video Summarization by Visual Co-occurrence Wen-Sheng Chu1 Yale Song2 Alejandro Jaimes2 1Robotics Institute,

News Summarization: Building a Fusion (a Solr …...Building a Fusion (a Solr based system) special collection: News articles template summarization and categorization Souleiman Ayoub,

MULTI DOCUMENT TEXT SUMMARIZATION USING … · Basically text Summarization methods can be classified into extractive and abstractive summarization. An extractive summarization method