Proposal final

9
[Aspect Level Sentiment Analysis for Arabic Language] BY: Mahmoud Mohamed Hassan Mahmoud El Razzaz A proposal for a thesis to be submitted for the fulfillment of M.SC. Degree in computer science. Supervised by Prof. Dr. Hesham Hefny Dr. Mohamed Farouk Cairo, Egypt October 2013 1 Cairo University Institute of Statistical Studies and Research Department of Computer and Information Science

description

the proposal I introduced for my master thesis in computer science @ ISSR-Cairo University

Transcript of Proposal final

Page 1: Proposal final

[]

BY:Mahmoud Mohamed Hassan Mahmoud El Razzaz

A proposal for a thesis to be submitted for the fulfillment of

M.SC. Degree in computer science.

Supervised by

Prof. Dr. Hesham Hefny Dr. Mohamed Farouk

Cairo, Egypt

October 2013

1

Cairo UniversityInstitute of Statistical Studies and ResearchDepartment of Computer and Information Science

Page 2: Proposal final

1. INTRODUCTIONSentiment analysis, also called opinion mining, is the field of study that analyzes people’s opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes. It represents a large problem space. There are also many names and slightly different tasks, e.g., sentiment analysis, opinion mining, opinion extraction, sentiment mining, subjectivity analysis, affect analysis, emotion analysis, review mining, etc. However, they are now all under the umbrella of sentiment analysis or opinion mining. While in industry, the term sentiment analysis is more commonly used, but in academia both sentiment analysis and opinion mining are frequently employed. They basically represent the same field of study.

Although linguistics and natural language processing (NLP) have a long history, little research had been done about people’s opinions and sentiments before the year 2000. Since then, the field has become a very active research area. There are several reasons for this. First, it has a wide arrange of applications, almost in every domain. The industry surrounding sentiment analysis has also flourished due to the proliferation of commercial applications. This provides a strong motivation for research. Second, it offers many challenging research problems, which had never been studied before.

Third, for the first time in human history, we now have a huge volume of opinionated data in the social media on the Web. Without this data, a lot of research would not have been possible. Not surprisingly, the inception and the rapid growth of sentiment analysis coincide with those of the social media. In fact, sentiment analysis is now right at the center of the social media research. Hence, research in sentiment analysis not only has an important impact on NLP, but may also have a profound impact on management sciences, political science, economics, and social sciences as they are all affected by people’s opinions.

2. The research problem (problem definition)With the explosive growth of social media (e.g., reviews, forum discussions, blogs, micro-blogs, Twitter, comments, and postings in social network sites) on the Web,

2

Page 3: Proposal final

individuals and organizations are increasingly using the content in these media for decision making. Nowadays, if one wants to buy a consumer product, one is no longer limited to asking one’s friends and family for opinions because there are many user reviews and discussions in public forums on the Web about the product. For an organization, it may no longer be necessary to conduct surveys, opinion polls, and focus groups in order to gather public opinions because there is an abundance of such information publicly available. However, finding and monitoring opinion sites on the Web and distilling the information contained in them remains a formidable task because of the proliferation of diverse sites. Each site typically contains a huge volume of opinion text that is not always easily deciphered in long blogs and forum postings. The average human reader will have difficulty identifying relevant sites and extracting and summarizing the opinions in them.

3. The objectives of the studyI want to study the feasibility of constructing an automated sentiment classification system, find the best accuracy can be obtained from such systems if it is feasible, the affect of the domain of the data on the accuracy of the classification.

4. Related work / Literature reviewVarious machine learning and non-machine learning techniques have been used for classifying Sentiment texts in English Language. Many of these techniques are discussed in Bing Liu. “Sentiment Analysis and Opinion Mining”.

In Arabic Language many researcher started to apply sentiment classification on Arabic Language in the past few years such as:

Document Level Sentiment Classification for Arabic Language:

Mohamed El Arnaoty et al., who provided “a machine learning approach for opinion holder extraction in Arabic language” 2012[1], Mohamed Aly et al., who provided “A Large Scale Arabic Book reviews Data Set” 2013.[2]

3

Page 4: Proposal final

Sentence level Sentiment Classification for Arabic Language:

N. Farraet al., in Sentence-Level and Document-Level Sentiment mining for Arabic Texts. In proceedings of International Conference on data mining workshops. Pages 1114-1119. IEEE, 2010 [3]

Aspect Level Sentiment Classification for Arabic Language:

Some researcher conducted an Aspect level sentiment classifier for English Language as in Tun Thura Thetet al ., in “Aspect-based sentiment analysis of movie reviews on discussion boards” Journal of Information Science 2010 [4].

But for our best knowledge an aspect level sentiment classification have not been examined yet for Arabic Language.

Finally some researchers surveyed the work done so far in the research of SSA of Arabic and its key issues:

a Survey on Sentiment And Subjectivity Analysis of Arabic were introduced by Mohamed Korayem et al., in “Subjectivity and Sentiment Analysis of Arabic: A Survey” 2012 [5]. Furthermore the difficulties of applying sentiment classification in Arabic Language were disused by Soha Ahmed et al., in “Key Issues in Conducting Sentiment Analysis on Arabic Social Media Text” 2012 [6].

Also the SSA for Arabic Language have been applied in the domain of social media by Muhammad Abdul-Mageedet al., in “SAMAR: Subjectivity and sentiment analysis 1 for Arabic social media”[7].

4

Page 5: Proposal final

Work plan

1. Overview of Data collection

2. Overview of data preprocessing (entity extraction, entity categorization, feature selection, and feature extraction)

3. Overview of the Sentiment Analysis levels and techniques

4. The proposed approach for Sentiment Analysis: Aspect Level Sentiment classification.

5. Testing the proposal approach and comparing the results with related work.

6. Conclusion and future work.

References:

[1] Mohamed El Elarnaoty, Samir AbdelRahman, and Aly Fahmy: “a machine learning approach for opinion holder extraction in Arabic language” 2012.

[2] Mohamed Aly and Amir Atiya: “A Large Scale Arabic Book reviews Data Set” 2013.

[3] N. Farra, E. Challita, R. Assi, and H. Hajj. Sentence-Level and Document-Level Sentiment mining for Arabic Texts. In proceedings of International Conference on data mining workshops. Pages 1114-1119. IEEE, 2010

[4] Tun Thura Thet, Jin-Cheon Na and Christopher S.G. Khoo: “Aspect-based sentiment analysis of movie reviews on discussion boards” Journal of Information Science 2010.

[5] Mohamed Korayem et al., in “Subjectivity and Sentiment Analysis of Arabic: A Survey” 2012

[6] Soha Ahmed, Michel Pasquier, and Ghassan Qadah: “Key Issues in Conducting Sentiment Analysis on Arabic Social Media Text” 2012.

5

Page 6: Proposal final

[7] Muhammad Abdul-Mageed, Mona Diab and Sandra Kübler: “SAMAR: Subjectivity and sentiment analysis for Arabic social media” .

6