Sentiment Analysis on Amazon Movie Reviews Dataset

14
SENTIMENT ANALYSIS AMAZON MOVIE REVIEW DATASET IS 688 – WEB MINING INSTRUCTOR: CHRISTOPHER MARKSON TEAM MEMBERS: Maham | Amit | Mashael | Karan | Nidhish

Transcript of Sentiment Analysis on Amazon Movie Reviews Dataset

Page 1: Sentiment Analysis on Amazon Movie Reviews Dataset

SENTIMENT ANALYSISAMAZON MOVIE REVIEW DATASET

IS 688 – WEB MINING

INSTRUCTOR: CHRISTOPHER MARKSON

TEAM MEMBERS: Maham | Amit | Mashael | Karan | Nidhish

Page 2: Sentiment Analysis on Amazon Movie Reviews Dataset

OUTLINE

• Data Source, Collection & Parsing• Model Selection & Optimizing Parameters• Methods / Code Sample• Results Overview & Value

Page 3: Sentiment Analysis on Amazon Movie Reviews Dataset

DATA SOURCE, COLLECTION & PARSING

Amazon movie reviews, published by Jure Leskovec. Assistant Professor of Computer Science at Stanford University on his personal site.

Page 4: Sentiment Analysis on Amazon Movie Reviews Dataset

PROBLEMS

• Format was not R-Friendly• Only partial information was available, data context were missing

• we had reviews but no information about the movie

Page 5: Sentiment Analysis on Amazon Movie Reviews Dataset

WORKAROUND / SOLUTION• Wrote a parser to convert JSON txt file into CSV using R Compiler

• Developed a NodeJS middleware to gather information about movie

Page 6: Sentiment Analysis on Amazon Movie Reviews Dataset

PREPARED FILESAfter parsing, and gather more data using Amazon Web Service, we got following 2 files

&

Reviews

Movie Details

Page 7: Sentiment Analysis on Amazon Movie Reviews Dataset

MODEL SELECTION & OPTIMIZATION• Basic Sentiment Score for Each Review, using Syuzhet package

• Provides 4 types of method, bing, afinn, nrc, Stanford; AFFIN has weighted 2477 words and phrases

• Uses coreNLP, stringr libraries mainly.. Emotional trajectory of review

• Create WordCloud for Each Movie, using wordcloud package

• Combined all reviews into one variable, calculated term frequency & generated WordCloud images

• Used tm (text minig), SnowballC (text stemming), RColorBrewer (color palettes) alongside

• Pointwise Mutual Information (PMI) Sentiment Score for Each Movie, using RCurl package

• Wrote our own function

• Movie_Title vs Excellent/Poor, Movie_Genre vs Excellent/Poor

• Final score was the ratio of Movie_Title / Movie_Genre

Page 8: Sentiment Analysis on Amazon Movie Reviews Dataset

MODEL SELECTION & OPTIMIZATION

• Aggregated all the Sentiment Scores• Took Median of all the users review score

• Took Median of all the users review text sentiment score

• Assigned an overall Sentiment Score to each movie• Took median of

• User Review Score Aggr,

• User Review Text Sentiment Score Aggr,

• Movie_Title vs Genre PMI Score

Page 9: Sentiment Analysis on Amazon Movie Reviews Dataset

METHODS / CODE SAMPLE

Basic Sentiment Score

WordCloud

Page 10: Sentiment Analysis on Amazon Movie Reviews Dataset

METHODS / CODE SAMPLE

Aggregation

PMI

Page 11: Sentiment Analysis on Amazon Movie Reviews Dataset

RESULT OVERVIEW & VALUE

Page 12: Sentiment Analysis on Amazon Movie Reviews Dataset

RESULT OVERVIEW & VALUE

The Count of Monte Cristo [Region 2]

Far from HomePhonics Volume 1

Page 13: Sentiment Analysis on Amazon Movie Reviews Dataset

RESULT OVERVIEW & VALUE

• Alongside with aggregate user reviews, Amazon can present

• overall rating score, and

• Word Cloud local to that product

• This will save users a lot of time to read through all the reviews and they can easily picture the overall user sentiments regarding that product.

Page 14: Sentiment Analysis on Amazon Movie Reviews Dataset

THANK YOU