Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +,...

download Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.

If you can't read please download the document

Transcript of Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +,...

  • Slide 1

Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research Center + Department of Electrical Engineering * Indian Institute of Science Bangalore-INDIA Slide 2 Agenda Introduction to Text Summarization Need for summarization, types of summaries Evaluating Extract Summaries Challenges in manual and automatic evaluation Fuzzy Summary Evaluation Complexity Scores Slide 3 What is Text Summarization Reductive transformation of source text to summary text by content generalization and/or selection Loss of information What can be lost and what should not be lost How much can be lost What is the size of the summary Types of Summaries Extracts and Abstracts Influence of genre on the performance of a summarization algorithm Newswire stories are favorable to sentence position Slide 4 Need for Summarization Explosive growth in availability of digital textual data Books in digital libraries, mailing-list archives, on-line news portals Duplication of textual segments in books E.g.: 10 introductory books on quantum physics have a number of paragraphs common to all of them (syntactically different but semantically the same) Hand-held devices Small screens and limited memory Low power devices and hence limited processing capability E.g.: Stream a book from a digital library to a hand-held device Production of information is faster than consumption Slide 5 Types of Summaries Extracts Text selection E.g: Paragraphs from books, sentences from editorials, phrases from e-mails Application of statistical techniques Abstracts Text selection followed by generalization Need for linguistic processing E.g.: Convert a sentence to a phrase Generic Summaries Independent of genre Indicative Summaries Gives a general idea as to the topic of discussion in the text being summarized Informational Summaries Serves as a surrogate to the original text Slide 6 Evaluating Extract Summaries Manual evaluation Human judges are allowed to score a summary on a well defined scale based on a well defined criteria Subject to judges understanding of the subject Depends on judges opinions Guidelines constrain opinions Individual judges scores are combined to generate the final score Re-evaluation might result in different scores Logistic problems for researchers Slide 7 Automatic Evaluation Machine-based evaluation Consistent over multiple runs Fast, avoids logistic problems Suitable for researchers experimenting with new algorithms Flip-side Not as accurate as human evaluation Should be used as precursor to a detailed human evaluation Algorithmically handles various sentence constructs and linguistic variants Slide 8 Fuzzy Summary Evaluation: FuSE Proposing the use of Fuzzy union theory to quantify the similarity of two extract summaries Similarity between the reference (human generated) summary and candidate (machine generated) summary is evaluated Each sentence is a fuzzy set Each sentence in the reference summary has a membership grade in every sentence of the candidate machine generated summary Membership grade of a reference summary sentence in the candidate summary is the union of membership grades across all candidate summary sentences Use membership grades to compute an f-score value Membership grade is the hamming distance between two sentences based on collocations Slide 9 Fuzzy F-score Candidate summary sentence set Union function Reference summary sentence set Membership grade of candidate sentence in reference sentence Fuzzy Precision Fuzzy Recall Slide 10 Choice of Union operator Propose the use of Franks S-norm operator Allows combining partial matches non- linearly Membership grade of a sentence in a summary is dependent on its length Automatically includes brevity-bonus into the scheme Slide 11 Franks S-norm operator Damping Coefficient Mean of non-zero membership grades for a sentence Sentence length Length of the longest sentence Slide 12 Characteristics of Franks base Slide 13 Performance of FuSE for various sentence lengths Slide 14 Dictionary-enhanced Fuzzy Summary Evaluation:DeFuSE FuSE does not understand sentence similarity based on synonymy and hypernymy Identifying synonymous words makes evaluation more accurate Identifying hypernymous word relationships allows consideration of gross information during evaluation Note: Very deep hypernymy trees could result in topic drift and hence improper evaluation Slide 15 Use of Word Net Slide 16 Example: Use of hypernymy HURRICANE GILBERT DEVASTATED DOMINICAN REPUBLIC AND PARTS OF CUBA (PHYSICAL PHENOMENON) GILBERT (DESTROY,RUIN) (REGION) AND PARTS OF (REGION) TROPICAL STORM GILBERT DESTROYED PARTS OF HAVANA TROPICAL (PHYSICAL PHENOMENON) GILBERT DESTROYED PARTS OF (REGION) Slide 17 Complexity Score Attempts to quantify the summarization algorithm based on the difficulty in generating a summary of a particular accuracy Generating a 9 sentence summary from a 10 sentence document is very easy. An algorithm which randomly selects 9 sentences will have a worst case accuracy of 90% A complicated AI+NLP based algorithm cannot do any better If a 2 sentence summary is to be generated from a 10 sentence document, we have 45 possible candidates out of which one is accurate Slide 18 Computing Complexity Score Probability of generating a summary of a length m1 with accurate sentences l 1 when human summary has h sentences and the document being summarized has n sentences Slide 19 Complexity Score (Cont..) To compare two summaries of equal length the performance of one relative to the baseline is given by Slide 20 Complexity Score (Cont..) Complexity in generating a 10% extract with 12 correct sentences is higher than generating a 30% extract with 12 correct sentences Slide 21 Conclusion Summary evaluation is as complicated as summary generation Fuzzy schemes are ideal for evaluating extract summaries Use of synonymy and hypernymy relations improve evaluation accuracy Complexity score is a new way of looking at summary evaluation