Prediction of box office revenue of movies using hype analysis of Twitter data

18
PREDICTION OF BOX OFFICE SUCCESS OF MOVIES USING HYPE ANALYSIS OF TWITTER DATA (PREDICTING THE FUTURE) By SAMEER THIGALE, TUSHAR PRASAD MIT COLLEGE OF ENGINEERING, PUNE Internal Guide: PROF. REENA PAGARE Sponsored Organization: PERSISTENT SYSTEMS LIMITED

Transcript of Prediction of box office revenue of movies using hype analysis of Twitter data

Page 1: Prediction of box office revenue of movies using hype analysis of Twitter data

PREDICTION OF BOX OFFICE SUCCESS OF MOVIES USING HYPE ANALYSIS OF TWITTER

DATA(PREDICTING THE FUTURE)

By

SAMEER THIGALE, TUSHAR PRASAD

MIT COLLEGE OF ENGINEERING, PUNE

Internal Guide:

PROF. REENA PAGARE

Sponsored Organization:

PERSISTENT SYSTEMS LIMITED

Page 2: Prediction of box office revenue of movies using hype analysis of Twitter data

A BRIEF OUTLINE

• Presence of “rich insights” in

social networks

• The Hypothesis:

“A Movie Well Talked About is Well Watched”

• Pre-release buzz- a success factor

2

Page 3: Prediction of box office revenue of movies using hype analysis of Twitter data

LITERATURE SURVEY

3

REFERENCE DESCRIPTION

[1] FORECASTING- Methods andApplications by- Spyros M., Steven W., RobH., 3rd Edition, Wiley Publication (book)

Basic concepts of statistics like correlationStudy of forecasting models.Linear regressionTime series regression

[2] Predicting the Future with Social Media-S Asur, B Huberman, HP Labs, HP Journal, Jan2012

The various factors that could be consideredfor calculating the success rate might beattention seeking, Distribution, Polarity, Typeof film etc.Prediction can be made using linearregression.

Page 4: Prediction of box office revenue of movies using hype analysis of Twitter data

EXISTING MODELS

• HOLLYWOOD STOCKEXCHANGE (HSX.COM)

– Uses Virtual Stocks to predict revenue

– Accuracy 90%, confidence: medium

• INTERNET MOVIE DB (IMDB.COM)

– Uses clicks, reviews, blogs, star casts to predict

• BoxOfficeMojo.com

– Uses clicks, reviews, blogs, star casts to predict

4

But None of the leading movie database sites use Social Media to make predictions. Why?

Page 5: Prediction of box office revenue of movies using hype analysis of Twitter data

PROBLEM DEFINITION

• To demonstrate that the amount of attentiona subject has, has strong correlation to itsranking in future.

• To show that a simple regression model builtfrom the Twitter chatter can outperformmarket based predictions.

• To demonstrate how the model built can alsobe extended to products of consumer interest

5

Technical Keywords:Statistical prediction, Social network analysis, Regression

Page 6: Prediction of box office revenue of movies using hype analysis of Twitter data

THE DATASET

• 100,000+ unique users

• Dataset of 6 weeks4 million tweets

6

MOVIE NAME

Jupiter Ascending

Shamitabh

SpongeBob: Sponge out of water

LoveSick

Fifty Shades of Grey

Birdman

American Sniper

Foxcatcher

Hot Tub Time Machine 2

Chappie Movie

Badlapur

Page 7: Prediction of box office revenue of movies using hype analysis of Twitter data

MODEL EMPLOYED

• MULTIPLE LINEAR REGRESSION

– BASED ON FINDING “A STRAIGHT LINE PREDICTING Y(INCOME)”

7

Page 8: Prediction of box office revenue of movies using hype analysis of Twitter data

MODEL EMPLOYED

A AVG COUNT OF TWEETS PER HOUR

P CALCULATED USING SENTIMENT ANALYSISRANGE: 0 TO 4 (0: VERY NEGATIVE, 4: VERY POSITIVE)

D NUMBER OF THEATRES MOVIE IS RELEASED IN

C CATEGORY OF MOVIE:ACTION, THRILLER, COMEDY, ANIMATION, ROMANCE

E STAR CAST- DIVIDED INTO 3 CATEGORIES; DEPEND ON TWITTER FOLLOWER

S SEQUELRANGE: 0 IF NOT SEQUEL, 1 IF SEQUEL

8

Page 9: Prediction of box office revenue of movies using hype analysis of Twitter data

CONTRIBUTION

• In our model we are using multiple linearregression for forecasting which guarantees abetter and accurate outcome rather thanusing complicated Neural Networks, patternrecognition and other AI concepts.

• Model is robust and can be extended to otherconsumer products by just changing theregression parameters.

9

Page 10: Prediction of box office revenue of movies using hype analysis of Twitter data

DEMO

10

Page 11: Prediction of box office revenue of movies using hype analysis of Twitter data

SYSTEM ARCHITECTURE

11

Page 12: Prediction of box office revenue of movies using hype analysis of Twitter data

PLATFORM AND TECHNOLOGY

• OPERATING SYSTEM AND ARCHITECTURE INDEPENDENT

– TESTED ON WINDOWS XP+, UBUNTU 12.04 LTS+

– BOTH 32-BIT AND 64-BIT ARCHITECTURE

• SOFTWARE REQUIREMENTS (MINIMUM):

– JDK 8

– MYSQL 5+

12

Page 13: Prediction of box office revenue of movies using hype analysis of Twitter data

SALIENT FEATURES• Client-server architecture

• Accurate prediction

• Displays

– Sentiment of tweets

– tag cloud of tweets

– Location of tweet

– Rate of tweets per hour

PROUDLY BUILT ON THE OPEN SOURCE MODEL. ALL OPEN-SOURCE TOOLS USED. SOFTWARE LICENSED UNDER GNU GPL. 13

Page 14: Prediction of box office revenue of movies using hype analysis of Twitter data

RESULTS

Features R2

Avg tweet rate 0.02

Avg tweet rate + theatre count 0.91

14

Movie Name Release Date What we predicted (in USD)

What actually happened!

Fifty Shades of Grey 13-Feb-2015 80,214,910 85,043,000

Shamitabh 06-Feb-2015 243,661 241,720

Kingsman: Secret Service

13-Feb-2015 34,345,613 36,225,000

HotTubTimeMachine2

20-Feb-2015 30,255,168 ????(IMDB SAYS 25M)

Page 15: Prediction of box office revenue of movies using hype analysis of Twitter data

APPLICATIONS

• Forecasting products of consumer interestgiven the chatter

– Movies

– Elections

– ICC World Cup

– Epidemiology (Google Flu trends)

• For theatre owners to predict the number ofshows to be scheduled

– Similarly to retailers of respective products

15

Page 16: Prediction of box office revenue of movies using hype analysis of Twitter data

LIMITATIONS

• Data cleaning limitations– Presence of reference to two or more movies

– Presence of sarcastic tweets

– Emoticons

• CONSTRAINTS:– Due to Twitter API limitations only 1% of tweets

can be caught (Can be improved by Firehoseaccess)

– Only tweets in English language accepted

16

Such a wonderful movie #Humshakal is!

I <3 d mve #Shamitabh

Page 17: Prediction of box office revenue of movies using hype analysis of Twitter data

FUTURE SCOPE

• Estimating from “negative hype”

– For e.g. Revenue of #PK increased due to the#PKDebate

• Correlating success of songs to success ofmovie

– Famous example of the song “Tum Hi Ho”

• Correlating “structure” of retweets and“favorited” tweets

17

Page 18: Prediction of box office revenue of movies using hype analysis of Twitter data

THANK YOU!

18