Prediction of box office revenue of movies using hype analysis of Twitter data
-
Upload
sameer-thigale -
Category
Engineering
-
view
159 -
download
3
Transcript of Prediction of box office revenue of movies using hype analysis of Twitter data
PREDICTION OF BOX OFFICE SUCCESS OF MOVIES USING HYPE ANALYSIS OF TWITTER
DATA(PREDICTING THE FUTURE)
By
SAMEER THIGALE, TUSHAR PRASAD
MIT COLLEGE OF ENGINEERING, PUNE
Internal Guide:
PROF. REENA PAGARE
Sponsored Organization:
PERSISTENT SYSTEMS LIMITED
A BRIEF OUTLINE
• Presence of “rich insights” in
social networks
• The Hypothesis:
“A Movie Well Talked About is Well Watched”
• Pre-release buzz- a success factor
2
LITERATURE SURVEY
3
REFERENCE DESCRIPTION
[1] FORECASTING- Methods andApplications by- Spyros M., Steven W., RobH., 3rd Edition, Wiley Publication (book)
Basic concepts of statistics like correlationStudy of forecasting models.Linear regressionTime series regression
[2] Predicting the Future with Social Media-S Asur, B Huberman, HP Labs, HP Journal, Jan2012
The various factors that could be consideredfor calculating the success rate might beattention seeking, Distribution, Polarity, Typeof film etc.Prediction can be made using linearregression.
EXISTING MODELS
• HOLLYWOOD STOCKEXCHANGE (HSX.COM)
– Uses Virtual Stocks to predict revenue
– Accuracy 90%, confidence: medium
• INTERNET MOVIE DB (IMDB.COM)
– Uses clicks, reviews, blogs, star casts to predict
• BoxOfficeMojo.com
– Uses clicks, reviews, blogs, star casts to predict
4
But None of the leading movie database sites use Social Media to make predictions. Why?
PROBLEM DEFINITION
• To demonstrate that the amount of attentiona subject has, has strong correlation to itsranking in future.
• To show that a simple regression model builtfrom the Twitter chatter can outperformmarket based predictions.
• To demonstrate how the model built can alsobe extended to products of consumer interest
5
Technical Keywords:Statistical prediction, Social network analysis, Regression
THE DATASET
• 100,000+ unique users
• Dataset of 6 weeks4 million tweets
6
MOVIE NAME
Jupiter Ascending
Shamitabh
SpongeBob: Sponge out of water
LoveSick
Fifty Shades of Grey
Birdman
American Sniper
Foxcatcher
Hot Tub Time Machine 2
Chappie Movie
Badlapur
MODEL EMPLOYED
• MULTIPLE LINEAR REGRESSION
– BASED ON FINDING “A STRAIGHT LINE PREDICTING Y(INCOME)”
7
MODEL EMPLOYED
A AVG COUNT OF TWEETS PER HOUR
P CALCULATED USING SENTIMENT ANALYSISRANGE: 0 TO 4 (0: VERY NEGATIVE, 4: VERY POSITIVE)
D NUMBER OF THEATRES MOVIE IS RELEASED IN
C CATEGORY OF MOVIE:ACTION, THRILLER, COMEDY, ANIMATION, ROMANCE
E STAR CAST- DIVIDED INTO 3 CATEGORIES; DEPEND ON TWITTER FOLLOWER
S SEQUELRANGE: 0 IF NOT SEQUEL, 1 IF SEQUEL
8
CONTRIBUTION
• In our model we are using multiple linearregression for forecasting which guarantees abetter and accurate outcome rather thanusing complicated Neural Networks, patternrecognition and other AI concepts.
• Model is robust and can be extended to otherconsumer products by just changing theregression parameters.
9
DEMO
10
SYSTEM ARCHITECTURE
11
PLATFORM AND TECHNOLOGY
• OPERATING SYSTEM AND ARCHITECTURE INDEPENDENT
– TESTED ON WINDOWS XP+, UBUNTU 12.04 LTS+
– BOTH 32-BIT AND 64-BIT ARCHITECTURE
• SOFTWARE REQUIREMENTS (MINIMUM):
– JDK 8
– MYSQL 5+
12
SALIENT FEATURES• Client-server architecture
• Accurate prediction
• Displays
– Sentiment of tweets
– tag cloud of tweets
– Location of tweet
– Rate of tweets per hour
PROUDLY BUILT ON THE OPEN SOURCE MODEL. ALL OPEN-SOURCE TOOLS USED. SOFTWARE LICENSED UNDER GNU GPL. 13
RESULTS
Features R2
Avg tweet rate 0.02
Avg tweet rate + theatre count 0.91
14
Movie Name Release Date What we predicted (in USD)
What actually happened!
Fifty Shades of Grey 13-Feb-2015 80,214,910 85,043,000
Shamitabh 06-Feb-2015 243,661 241,720
Kingsman: Secret Service
13-Feb-2015 34,345,613 36,225,000
HotTubTimeMachine2
20-Feb-2015 30,255,168 ????(IMDB SAYS 25M)
APPLICATIONS
• Forecasting products of consumer interestgiven the chatter
– Movies
– Elections
– ICC World Cup
– Epidemiology (Google Flu trends)
• For theatre owners to predict the number ofshows to be scheduled
– Similarly to retailers of respective products
15
LIMITATIONS
• Data cleaning limitations– Presence of reference to two or more movies
– Presence of sarcastic tweets
– Emoticons
• CONSTRAINTS:– Due to Twitter API limitations only 1% of tweets
can be caught (Can be improved by Firehoseaccess)
– Only tweets in English language accepted
16
Such a wonderful movie #Humshakal is!
I <3 d mve #Shamitabh
FUTURE SCOPE
• Estimating from “negative hype”
– For e.g. Revenue of #PK increased due to the#PKDebate
• Correlating success of songs to success ofmovie
– Famous example of the song “Tum Hi Ho”
• Correlating “structure” of retweets and“favorited” tweets
17
THANK YOU!
18