Sentiment Analysis on Amazon Movie Reviews Dataset

Post on 08-Feb-2017

193 views 2 download

Transcript of Sentiment Analysis on Amazon Movie Reviews Dataset

SENTIMENT ANALYSISAMAZON MOVIE REVIEW DATASET

IS 688 – WEB MINING

INSTRUCTOR: CHRISTOPHER MARKSON

TEAM MEMBERS: Maham | Amit | Mashael | Karan | Nidhish

OUTLINE

• Data Source, Collection & Parsing• Model Selection & Optimizing Parameters• Methods / Code Sample• Results Overview & Value

DATA SOURCE, COLLECTION & PARSING

Amazon movie reviews, published by Jure Leskovec. Assistant Professor of Computer Science at Stanford University on his personal site.

PROBLEMS

• Format was not R-Friendly• Only partial information was available, data context were missing

• we had reviews but no information about the movie

WORKAROUND / SOLUTION• Wrote a parser to convert JSON txt file into CSV using R Compiler

• Developed a NodeJS middleware to gather information about movie

PREPARED FILESAfter parsing, and gather more data using Amazon Web Service, we got following 2 files

&

Reviews

Movie Details

MODEL SELECTION & OPTIMIZATION• Basic Sentiment Score for Each Review, using Syuzhet package

• Provides 4 types of method, bing, afinn, nrc, Stanford; AFFIN has weighted 2477 words and phrases

• Uses coreNLP, stringr libraries mainly.. Emotional trajectory of review

• Create WordCloud for Each Movie, using wordcloud package

• Combined all reviews into one variable, calculated term frequency & generated WordCloud images

• Used tm (text minig), SnowballC (text stemming), RColorBrewer (color palettes) alongside

• Pointwise Mutual Information (PMI) Sentiment Score for Each Movie, using RCurl package

• Wrote our own function

• Movie_Title vs Excellent/Poor, Movie_Genre vs Excellent/Poor

• Final score was the ratio of Movie_Title / Movie_Genre

MODEL SELECTION & OPTIMIZATION

• Aggregated all the Sentiment Scores• Took Median of all the users review score

• Took Median of all the users review text sentiment score

• Assigned an overall Sentiment Score to each movie• Took median of

• User Review Score Aggr,

• User Review Text Sentiment Score Aggr,

• Movie_Title vs Genre PMI Score

METHODS / CODE SAMPLE

Basic Sentiment Score

WordCloud

METHODS / CODE SAMPLE

Aggregation

PMI

RESULT OVERVIEW & VALUE

RESULT OVERVIEW & VALUE

The Count of Monte Cristo [Region 2]

Far from HomePhonics Volume 1

RESULT OVERVIEW & VALUE

• Alongside with aggregate user reviews, Amazon can present

• overall rating score, and

• Word Cloud local to that product

• This will save users a lot of time to read through all the reviews and they can easily picture the overall user sentiments regarding that product.

THANK YOU