Literature Review and General Observation of Recent Research in the Emerging Field of Sentiment...

download Literature Review and General Observation of Recent Research in the Emerging Field of Sentiment Analysis

of 12

Transcript of Literature Review and General Observation of Recent Research in the Emerging Field of Sentiment...

  • 8/11/2019 Literature Review and General Observation of Recent Research in the Emerging Field of Sentiment Analysis

    1/12

    Prae 1

    Literature Review and General Observation of Recent Research in the Emerging Field of

    Sentiment Analysis

    By Paul Prae

    October 5th, 2010

  • 8/11/2019 Literature Review and General Observation of Recent Research in the Emerging Field of Sentiment Analysis

    2/12

    Prae 2

    The recent data explosion has spawned an incredible increase in innovation. While many new

    fields are emerging, many old fields have been redefined. The internet is the catalyst for these changes.

    This massive network holds the data that some of these new fields are focused on leveraging. Much of

    this data is organized and retrieved through methods that focus on definitions and context. However,

    these methods leave out one of the most important aspects of the creators of this data: emotion. The

    emotional subjectivity of human beings drives the choices we make. A concept that involves the use of

    this digital data increase and the emotions of the users and creators of the data is the area of sentiment

    analysis. This paper will cover the general concepts behind sentiment analysis and the uses of the

    concept in current society. It will also focus on areas involving the benefits of sentiment analysis for

    corporations and consumers.

    Sentiment analysis is a newer field that has only recently traversed from the academic realm to

    corporate use. Much of the current published research on the subject was developed by research

    facilities strongly associated with companies such as IBM. The sentiment detection of texts has

    witnessed a booming interest in recent years (Tang et al., 2009) with [t]he emergence of new social

    media such as blogs, message boards, news, and web content dramatically changing the ecosystems of

    corporations (Cai et al., 2010). The academic contributors to the subject have combined many specific

    areas of linguistics, computer science, artificial intelligence, and psychology. More specifically it is a

    discipline at the crossroads of NLP[natural language processing] and IR[information retrieval], and as

    such it shares a number of characteristics with other tasks such as information extraction and

    text-mining"( Tang et al., 2009). Machine learning techniques, basic statistical analysis, and linguistic

    semantic representation are also well represented in the designs of the field. As with many new fields,

    sentiment analysis is a combination of a few novel concepts reapplied to a wide range of specific

  • 8/11/2019 Literature Review and General Observation of Recent Research in the Emerging Field of Sentiment Analysis

    3/12

    Prae 3

    aspects of other older fields.

    Sentiment analysis is a system of techniques that are organized and applied differently depending

    on the designer. Before looking at how scientists and developers are currently searching for sentiment in

    text, it is best to understand where they search and why. The internet is an ever-expanding search

    space. Searching and analyzing all possible sources of relevant information would be enormously

    complex. Companies, scientists, and software developers must choose a subset of this massive search

    space to apply their software. It is important that a search space is chosen that will have the highest

    concentration of easily accessible relevant data. This paper will discuss some of the problems that highly

    unstructured text and noisy, useless or irrelevant text, can cause. The following graph from Alta Planas

    Text Analytics 2009 research study, which surveyed 116 companies that use text analytics software,

    lists some of the top areas that companies use as the source of the text. Notice that content generated

    by general user discussion in open social settings dominates the list.

    The importance of the data generated by the Web 2.0 phenomena is readily apparent. Cai

  • 8/11/2019 Literature Review and General Observation of Recent Research in the Emerging Field of Sentiment Analysis

    4/12

    Prae 4

    (2010) describes this importance, "The widespread availability of consumer generated media (CGM)

    such as blogs, message boards, and news articles post great opportunities as well as risks to todays

    enterprises." As of 2009 companies have already been applying this realization. The complexity issue is

    still relevant even when narrowing the search space to a single source of information. Facebook is a

    good example of an extraordinarily popular social media platform that generates a large amount of text

    that could be analyzed through its API. The search space here involves [m]ore than 500 million active

    users, over 900 million [facebook specific] objects(pages, groups, events and community pages),

    and the [m]ore than 30 billion pieces of content (web links, news stories, blog posts, notes, photo

    albums, etc.) shared each month (http://www.facebook.com/press/info.php?statistics, 2010). The

    useful data is just as plentiful as the irrelevant. There are endless amounts of both being produced in

    outlets across the internet. It is the relevant subjective human opinion that is a rich and useful source for

    marketing intelligence, social psychologists, and others interested in extracting and mining opinions,

    views, moods, and attitudes (Tang et al., 2009). With this information sentiment analysis can begin.

    The challenge that exists after the search space is established is to locate the relevant data. After

    the relevant data is established it can then be assessed for sentiment. These two stages are commonly

    referred to as subjectivity classification and sentiment classification. "Subjectivity classification is a task

    to investigate whether a paragraph presents the opinion of its author or reports facts Subjectivity

    classification can prevent the polarity[i.e. sentiment] classifier from considering irrelevant or even

    potentially misleading text" (Tang et al., 2009). Depending on the application, contextual matching or

    similar may be applied to the resulting data that is already deemed subjective. Guaranteeing that the

    sections of the original document that are extracted are contextual ensures that the topics being

    discussed in the text are those that are important to the results the designer is examining. This concept is

    http://www.google.com/url?q=http%3A%2F%2Fwww.facebook.com%2Fpress%2Finfo.php%3Fstatistics&sa=D&sntz=1&usg=AFQjCNGNc_UCa4D7iSIr-ZA64QbcOf2NcA
  • 8/11/2019 Literature Review and General Observation of Recent Research in the Emerging Field of Sentiment Analysis

    5/12

    Prae 5

    common in automated advertising displays "Contextual advertising is a major type of online advertising

    in which ads are placed on Web pages according to their content" (Qiu et al., 2010). After the process

    has narrowed the initial data down to the relevant snippets, the application of sentiment can begin.

    Sentiment classification has some variation among designers of each approach but ultimately

    serves the same abstract purpose. "Sentiment analysis traditionally emphasizes on classification of web

    comments into positive, neutral, and negative categories (Cai et al., 2010). There are several variations

    of this tradition. A more common trend in recent research is to get more specific in defining the

    sentiment spectrum. "Sentiment classification includes two kinds of classification forms, i.e., binary

    sentiment classification and multi-class sentiment classification" (Tang et al., 2009). This multi-class

    sentiment approach will likely be the standard of the future. Human emotion spans a much more

    complicated spectrum than the simple black and white notions of positive and negative. Human beings

    have the strange capability to love and to hate something at the same time. Take this simulated example

    that I may hear from a roommate that is a new user who just purchased a recent video game: I hate

    that I am not acquiring the same kill to death ratio in the new Call of Duty. The new user interface is

    quite frustrating. I love the challenge though. It will be fun to learn a new UI. Here the user portrays

    negative and positive sentiments on the same product. This is easy for humans to decipher but much

    more complicated for a machine. This and many other problems are being addressed in current

    research.

    A few different approaches have been developed to create more accurate results. General

    polarity-based sentiment classification is a great step forward from the previous contextual only

    approaches. Cai (2010) mentions that [s]uch analysis is useful, but it lacks insights on the drivers

    behind the sentiments. His group developed a better solution: To address this problem, we introduce

  • 8/11/2019 Literature Review and General Observation of Recent Research in the Emerging Field of Sentiment Analysis

    6/12

    Prae 6

    our sentiment analysis approach which combines a unique sentiment classification approach with a topic

    detection approach that discovers terms that are highly correlated to different sentiment classification

    categories. This allows results that cater to the original reasons for the given sentiment. There are more

    elaborate designs that break down the content into greater detail allowing for more results that are more

    specific.

    After the sentiments are established each sentiment analysis system will then use the results in

    ways appropriate to the application. Qiu (2010) developed an idea titled Dissatisfaction-oriented

    Advertising Sentiment Analysis or DASA that combines traditional sentiment analysis with basic

    keyword matching. In this approach the software detects the negative sentiment of certain products. The

    advertising on the web page that contains the text then displays a product that has the positive attributes

    that the original text complained about. The example used in Qius (2010) paper is one in which the

    writer on the forum complains about the safety of a car. After the comment is posted and a new user

    loads the forum page, the advertisements are re-established based on the new comment. The new

    advertisements now have a Volvo ad that exemplifies new safety features and a history of safe

    production standards. This process is shown in the following diagram from the same research paper.

    The uses of sentiment analysis can be applied to many industries. Any company under the

  • 8/11/2019 Literature Review and General Observation of Recent Research in the Emerging Field of Sentiment Analysis

    7/12

    Prae 7

    scrutiny of public opinion should be analyzing all relevant data they can obtain. As Nick Bilton of the

    New York Times mentions, When people want to know how the media business will deal with the

    internet, the best way to begin to understand the sweeping changes is to recognize that the consumer of

    entertainment and information is now in the center." Current applications take this into account and focus

    on the subjective user or consumer views of certain areas that the enterprises will generally be interested

    in surveying. The most popular and basic use of sentiment analysis involves mining text of written

    reviews from customers for certain products or services, and classifying the reviews into positive or

    negative opinions"(Ye et al., 2009). It is this type of classification that has become one of the foci of

    recent research endeavors sponsored by companies that realize the potential value of sentiment analysis

    on their data (Ye et al., 2009). Companies with a heavy online presence have a myriad of data that

    could easily utilize this research.

    These same companies can choose to use text analytic software in different ways to meet

    different goals. Another graph from Alta Planas Text Analytics 2009 research study shows the wide

    array of end goals that companies may be looking to meet when using text analytic software.

  • 8/11/2019 Literature Review and General Observation of Recent Research in the Emerging Field of Sentiment Analysis

    8/12

    Prae 8

    The highest use percentage shown above involves branding and reputation management. Most

    applications of sentiment analysis in recent research represent a similar trend. The technologies

    surrounding text analytics will be desired by many industries and for different applications in each

    industry. Taking this into consideration, different algorithms, techniques, and sometimes just small

    alterations will be required before sentiment analysis software from one industry will be able to be

    applied to another. This also may foreshadow that the text analytics software industry may be able to

    create lucrative consulting firms similar to those that are currently faring well in the general management

    information systems sector.

    The massive information systems that corporations already have could integrate aspects of

    existing processes with sentiment analysis. Newly refined systems could extend the capabilities of search

    engines, classify reviews, summarize reviews, track opinions in online discussions, analyze survey

  • 8/11/2019 Literature Review and General Observation of Recent Research in the Emerging Field of Sentiment Analysis

    9/12

    Prae 9

    responses, implement online message sentiment filtering, create e-mail message classification systems,

    and many more yet to be discovered techniques (Tang et al., 2009). This may result in more efficient

    communication for the public relations departments and better products created by the development

    teams. Companies will be able to navigate through all available data and find comparisons of specific

    product features from competitors. "For a product manufacturer, the comparison enables it to easily

    gather marketing intelligence and product benchmarking information" (Tang et al., 2009). Sentiment

    analysis will allow businesses the ability to use their pre-existing text data in ways to benefit several

    departments within the traditional business structure. Businesses only require new software plus the

    necessary hardware to handle the new processing techniques and storage of the results.

    Marketing companies and advertising branches of businesses are easy benefactors of the

    resulting conclusions derived from sentiment analysis. Major search engines and e-mail hosts such as

    Yahoo and Google, as well as social media companies such as Facebook, have been implementing

    contextually relevant advertising to users for years now. The current web landscape demands relevancy

    and personalized information for users and potential consumers. "The tradeoff between financial revenue

    and market share triggers the emergence of relevant advertising to emphasize the relevance between ads

    and Web pages for the sake of consumers(Qui et al., 2010). Qui (2010) goes on to mention that

    "[t]argeted advertising is of great importance for internet companies to gain revenue from both

    advertisers and consumers. Previous approaches focus only on the topical relevance while the

    consumers attitudes are ignored. These approaches fail to meet the actual needs of consumers

    especially when they may have negative attitudes towards the mentioned topics." Cai (2010) adds that a

    companys resistance to these new trends could have serious impact on their competitive market

    advantages. Leveraging the massive amount of data that is produced by the consumer voice could

  • 8/11/2019 Literature Review and General Observation of Recent Research in the Emerging Field of Sentiment Analysis

    10/12

    Prae 10

    catalyze the growth of a company. The opposing danger to this concept is that ignoring the voices of the

    ever-increasing amount of public opinion could result in a company being socially outcast. It is to the

    pure benefit of companies to implement sentiment analysis if these companies have the relevant

    information available for such a process. The branding and marketing aspects of businesses revolve

    around the consumer psychology. Sentiment analysis could reveal this psychology in a form that could

    be used for further analysis and study.

    It is important to notice that the implementations of such technology on the business side have

    mutually beneficial effects for the consumer. Depending on the industry and the manner in which

    sentiment analysis is being applied, a system for presenting the results and organized conclusions from

    the analysis could be created. Ye (2009) mentions a relationship here, With the results of sentiment

    classification, consumers would know the necessary information to determine which products to

    purchase and sellers would know the response from their customers and the performances of their

    competitors. It then turns into a cyclical system that should result in higher quality products over time. I

    is an efficient way to crowdsource useful data without the users putting forth any extra effort. The users

    could even be unaware that they are improving their future shopping experiences. The users and

    creators of the text to be analyzed will silently be benefiting two parties while expressing their natural

    opinions.

    Sentiment analysis is a useful tool for all users of the internet. Emotional classification and

    organization of content will be a beneficial contribution to the vast reservoir of data the internet holds.

    The field has made steady achievement over the last twenty years but still has much room to grow and

    improve. This is an exciting pursuit for those involved. The companies and researchers supporting the

    improvement of sentiment analysis will be contributing to an improved environment for all users. Users

  • 8/11/2019 Literature Review and General Observation of Recent Research in the Emerging Field of Sentiment Analysis

    11/12

    Prae 11

    should enjoy communicating with a machine that understands the emotional needs of the users and can

    offer effective solutions to the users problems. This is the basic result of sentiment analysis even in the

    current form. It will only evolve to learn how to meet our needs more effectively.

  • 8/11/2019 Literature Review and General Observation of Recent Research in the Emerging Field of Sentiment Analysis

    12/12

    Prae 12

    List of references

    Bilton, N. (2010, September 13). A Tech World that Center on the User.New York Times: New

    York Edition.p. B1.

    Cai, K., Spangler, S., Chen, Y., & Zhang, Li. (2010). Leveraging sentiment analysis for topic detection.

    Web Intelligence and Agent Systems: An International Journal, 8(2010), 291-302.

    Grimes, S. (2009). Text Analytics 2009: User Perspectives on Solutions and Providers. Alta Plana.

    Published under the Creative Commons Attribution 3.0 License.

    Kho, N. D. (2010). Customer experience and sentiment analysis. KMWorld, February 2010, 10-20.

    Li, N., Liang, X., Li, X., Wang, C., & Wu, D. (2009). Network Environment and Financial Risk Using

    Machine Learning and Sentiment Analysis.Human and Ecological Risk Assessment, 15,

    227-252.

    Qiu, G., He, X., Zhang, F., Shi, Y., Bu, J., & Chen, C. (2010). DASA: Dissatisfaction-oriented

    Advertising based on Sentiment Analysis.Expert Systems with Applications, 37(2010),

    6182-6191.

    Tang, H., Tan, S., & Cheng, X. (2009). A survey on sentiment detection of reviews.Expert Systems

    with Applications, 36(2009), 10760-10773.

    Ye, Q., Zhang, Z., & Law, R. (2009). Sentiment classification of online reviews to travel destinations by

    supervised machine learning approaches.Expert Systems with Applications, 36(2009),

    6527-6535.