A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko...
Transcript of A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko...
![Page 1: A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.](https://reader034.fdocuments.us/reader034/viewer/2022051618/56649cff5503460f949d01e2/html5/thumbnails/1.jpg)
A Framework for Automated Corpus Generationfor Semantic Sentiment Analysis
Amna Asmi and Tanko Ishaya, Member, IAENG
Proceedings of the World Congress on Engineering 2012 Vol IWCE 2012, July 4 - 6, 2012, London, U.K.
![Page 2: A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.](https://reader034.fdocuments.us/reader034/viewer/2022051618/56649cff5503460f949d01e2/html5/thumbnails/2.jpg)
Introduction
• Variety of corpora present (WordNet, SentiWordNet and Multi-Perspective Question Answering (MPQA))
• Some corpora not large enough• Generation and annotation is time consuming and
inconsistent.• This paper presents a framework for automated
generation of corpus for semantic sentiment analysis of user generated web-content
![Page 3: A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.](https://reader034.fdocuments.us/reader034/viewer/2022051618/56649cff5503460f949d01e2/html5/thumbnails/3.jpg)
Existing corpora
• MPQA• Movie Review (pang and others, 2002)• Varbaul (Sankoff and Cedegan, program based on
multivariate analysis)• Fidditch (automated parser for English)• Automatic Mapping Among Lexico-Grammatical
Annotation Models (AMALGAM)• International corpus of English (ICE)
![Page 4: A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.](https://reader034.fdocuments.us/reader034/viewer/2022051618/56649cff5503460f949d01e2/html5/thumbnails/4.jpg)
Existing Techniques for Sentiment Analysis
• Direction based text including opinions, sentiments, affects and biases
• Opinion mining using ML techniques (supervised/ unsupervised) (document /sentence/clause level)
• Polarity, degree of polarity, features, subjectivity, relationships, identification, affect types, mood classification and ordinal scale
![Page 5: A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.](https://reader034.fdocuments.us/reader034/viewer/2022051618/56649cff5503460f949d01e2/html5/thumbnails/5.jpg)
Annotation Process• Methodology• Grabbing URL, author, subject, text, comments• Text broken to sentences• Sentence applied with Stanford Dependencies Parser and
Penn Treebank Tagging and broken down into clauses• Subject-Verb-Object triplet extracted• Rules according to POS, negation, punctuation, conjunction
is specified using SentiWordNet and WordNet• Rules used to extract sentiment, and define polarity and
intensity• Based on subject and object, and topic/title of sentence of
post, subjectivity is calculated
![Page 6: A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.](https://reader034.fdocuments.us/reader034/viewer/2022051618/56649cff5503460f949d01e2/html5/thumbnails/6.jpg)
Tools used• WordNet• SentiWordNet• Stanford Parser• PennTree Bank• UMLS(Unified Medical Language System)
![Page 7: A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.](https://reader034.fdocuments.us/reader034/viewer/2022051618/56649cff5503460f949d01e2/html5/thumbnails/7.jpg)
Framework• Repository:
• Wordnet, SentiWordNet dictionaries, UMLS Metathesaurus
• Rules for sentence, polarity, subjectivity and sentiment identification and analysis
• Data Pre-processor:• Input: Unstructured data from medical
forum (http://www.medhelp.org/forums/list)
• Input cleaned and filtered• Captures thread structure, comments of
forum, and arranges other info like author, topic, date.
• Spell checks• Split to set of posts and sent to post
pre-processor
![Page 8: A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.](https://reader034.fdocuments.us/reader034/viewer/2022051618/56649cff5503460f949d01e2/html5/thumbnails/8.jpg)
Framework• Post Pre-Processor
• Splits texts to sentences using Penn Tree Tagger
• Passes sentences to syntactic parser iteratively
• Keeps track of start and end of post
• Syntactic Parser (SP)• Collects sentences iteratively and
invokes POS tagger• Name entities and idioms are
identified• Identifies dependencies/ relationship• Classifies sentence as a question,
assertion, comparison, confirmation seeking or confirmation providing
![Page 9: A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.](https://reader034.fdocuments.us/reader034/viewer/2022051618/56649cff5503460f949d01e2/html5/thumbnails/9.jpg)
Framework
• Sentiment Analyser(SA)• Extracts sentiment oriented words
from each sentence by using relationship info (dependencies within)
• Polarity Calculator (PC) identifies + and – words.
• Synonyms used if word is not found• Collects synonyms from
SentiWordNet• Uses UMLS Metathesaurus if
synonym not found• Rules for polarity identification used
![Page 10: A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.](https://reader034.fdocuments.us/reader034/viewer/2022051618/56649cff5503460f949d01e2/html5/thumbnails/10.jpg)
Framework• Subjectivity Calculator(SC)
• Considers POS and relationships• Identifies all sentences related to topic• Takes nouns and associated info (synonyms,
homonyms, meronyms, holonyms and hyponyms)
• Sentiment Analyser:• Takes polarities of sentences marked by SC
for post polarity calculation• Takes aggregate of all polarities of sentences
related to post• Generates sentiment frame info for each
sentence• Frame contains type, subject, object/feature,
sentiment oriented word(s), sentiment type (absolute / relative), strength (very weak, weak, average, strong, very strong), polarity of sentence, post index and sentence index
• Forwards calculated values and info to Sentiment Frame manager
![Page 11: A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.](https://reader034.fdocuments.us/reader034/viewer/2022051618/56649cff5503460f949d01e2/html5/thumbnails/11.jpg)
Framework
• Sentiment Frame Manager• Stores all information to a physical
location• Loads all frames in tree structure at
runtime memory on program load• Keeps track of changes and appends
changes• Stored into XML file
![Page 12: A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.](https://reader034.fdocuments.us/reader034/viewer/2022051618/56649cff5503460f949d01e2/html5/thumbnails/12.jpg)
Future Work
• Currently being evaluated using medical based forums• Plans to make it general purpose