TEXT SUMMARIZATION
-
Upload
aman-sadhwani -
Category
Technology
-
view
70 -
download
3
Transcript of TEXT SUMMARIZATION
Saturday, April 15, 2023
1
Text SummarizationFor Review And
FeedbackBY :Aman Sadhwani
Saturday, April 15, 2023
2
What is Text Summarization?And why we need it?
• We can define summary as a text which reflects the main and important sentences from the original text. In Text summarization, Summary is generated by Computer.
• In Recent Years we are witnessing the amount of textual information is increasing day by day .The Textual Information grows rapidly. It becomes more difficult for the user to read the textual information and also it leads to loss of interest. That is the reason why Text Summarization came into picture which will solve this problem.
Saturday, April 15, 2023
3
Types of Text Summarization
1) Extraction: - In Extractive text summarization , summary is generated by selecting a set of words, phrases, paragraph or sentences from the original document.
2) Abstraction: - Abstractive methods are based on semantic representation and then use natural language processing techniques to generate a summary that is nearer to summary generated manually. This kind of summary may contain words that are not found in the original document. Currently research is going on this method and demand for this method is more.
Proposed System
4Saturday, April 15, 2023
We have developed and compared two text summarization techniques
1) Reduction based
2) Inter section based
Saturday, April 15, 2023
5
How Reduction Algorithm Works
Step 1 - It takes a text as input.
Step 2 - Splits it into one or more paragraph(s).
Step 3 - Splits each paragraph into one or more sentence(s).
Step 4 - Splits each sentence into one or more words.
Step 5 - Gives each sentence weight-age (a floating point value) by comparing Its words to a pre-defined dictionary called "stopWords.txt“
If some word of a sentence matches to any word with the pre-defined Dictionary, then the word is considered as Low weighted.
Saturday, April 15, 2023
6
Cont..
Step 6 - An ordered list of weighted sentences is then prepared (Relatively High weighted sentences comes first and low weighted sentences comes At last position).
Step 7 - Now, we have the ordered list of weighted sentences, it continues to Store each sentence (from ordered weighted sentences) in the output Variable (i.e. a list) until it reaches the reduction ratio (It uses A formula to determine max number of sentences to put in the output List)
Step 8 - The output list is then returned.
Saturday, April 15, 2023
7
How InterSection Algorithm Works?
1. Split input text into Paragraph.
2. Split paragraph into sentences.
3. Split sentences into words.
4. Calculate the intersection between 2 sentences.
5. Remove non-alphabetic characters from sentence.
6. Convert content into dictionary.
7. Build the sentence dictionary.
8. Return best sentences in a paragraph.
9. Get the best sentences according to dictionary.
Saturday, April 15, 2023
8
Flow Chart
Saturday, April 15, 2023
9
Screen shots
Saturday, April 15, 2023
10
Saturday, April 15, 2023
11
Saturday, April 15, 2023
12
Saturday, April 15, 2023
13
Saturday, April 15, 2023
14
Saturday, April 15, 2023
15
Conclusion
Saturday, April 15, 2023
16
Cont…
By looking at last table we can say that intersection is faster than reduction
But reduction creates better summary than intersection.
Intersection works fine on some documents but generates only 1 or 2 line of summary on some documents.
This is because intersection is the most basic algorithm for text summarization. It doesn’t use any NLP libraries like reduction.
Hardware & Software requirement
17Saturday, April 15, 2023
Minimum Hardware Requirements
Processor : Intel Pentium II or Higher RAM : 128 Mb or Higher Monitor ,Keyboard, Mouse Printer (Optional) Hard disk : 20 GB Or Higher
Software Requirements
OS: Windows xp or higher Java Installed On Machine Python 2.7 installed on machine.
Saturday, April 15, 2023
18
Tools used
NetBeans
Python 2.7 IDLE
Saturday, April 15, 2023
19
References
http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html
http://www.iajet.org/iajet_files/vol.1/no.4/Text%20Summarization%20Extraction%20System%20TSES%20Using%20Extracted%20Keywords_doc.pdf
http://en.wikipedia.org/wiki/Sentiment_analysis
Saturday, April 15, 2023
20
Future enhancement
Will support summarization for multiple file types.
User wise Document management.
Multi document summarization.
Improved summarization algorithms.
Saturday, April 15, 2023
21
THANK YOU