Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO...

20
Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Transcript of Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO...

Page 1: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Forecasting with Twitter dataPresented by : Thusitha Chandrapala20064923

MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

Page 2: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

What information does twitter messages have?

•Twitter information▫Sentiment analysis: Are people happy or

unhappy about a certain topic? ▫Volume: Number of tweets about a given

topic

•Does twitter really help in predicting time series data?▫Moving stream of info.

Page 3: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

This motivation of the paper

•Use three different forecasting model families, vary parameters systematically and analyze under which conditions twitter information is actually useful

•Testing non-linearity and causality between twitter data and the target

•Introduction of summery tree

Page 4: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Related work

• Stock market prediction▫Bollen et al:

Twitter -> sentiment->predict Dow Jones Industrial average

▫Wolfram et al. Twitter as an additional source of features, no sentiment

analysis

• Movie box office income▫Mishne et al:

correlation, blog posts▫Asur et al:

predict sales

Page 5: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Work flow

1)• Collecting data

2)

• Cleaning and preprocessing

3)• Sentiment analysis

4)• Prediction model

Page 6: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Preprocessing:

•Language detection

•Negation handling: considering “I like this…” and “I don’t like this… “ to be 2 features

•Relevance filtering and topic classification: using LDA▫Latent Dirichlet Allocation

Page 7: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Sentiment classification•Whether the text contains negative or

positive impressions on a given subject•Approach 1:

▫Automatic tagging to extract training instances :) :D - Happy sentiment :( - Unhappy sentiment

▫Binary classification problem: Use naïve Bayes to train the classifier

▫Use different dictionaries as features

Page 8: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Sentiment classification•Whether the text contains negative or

positive impressions on a given subject•Approach 1:

▫Automatic tagging to extract training instances :) :D - Happy sentiment :( - Unhappy sentiment

▫Binary classification problem: Use naïve Bayes to train the classifier

▫Use different dictionaries as features

Page 9: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Sentiment index

•A time-series of sentiment values▫The daily value is calculated based on the

daily % of +/- tweets over the total number of messages on a specific topic

Page 10: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Training the model

•ARMA : Auto Regressive Moving Average ▫y[t] = a.x[t]+b.x[t-1]+… +m.y[t-1]+n.y[t-2]

…..

•Simplified prediction:▫A binary prediction, which says if y[t]>y[t-

1]▫Use past values of self, and twitter time

series

Page 11: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Model parameters

Target Time series Share Market :ReturnsMovie box office: Revenue

Twitter series VolumeSentiment Index

Forecasting model family Linear modelsSupport vector machinesNeural networks

Result: Does including Twitter data increase classification accuracy by 5%?

Page 12: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Study details

•Stock market prediction targets▫Companies: Apple, google, … ▫General market indices: S&P100, S&P500

•Box office data▫Daily sales revenue series

Page 13: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Summery Tree

•Helps to identify model parameters that leads to consistently +/- results

•Decision Tree structure ▫Nodes are different parameters▫Leaves : Result

Page 14: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Summery Tree

Page 15: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Results: Stock market data

•Summery of prediction results:▫Generally Linear models do not provide a

significance performance improvement either for twitter volume or sentiment analysis based info.

▫Non-linear models can give an improvement!

▫Neural network based models gave the best performance

Page 16: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Results: Stock market data

Page 17: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Results: Movie box office

•Summary:▫Sentiment analysis did not have a positive

impact▫Volume information had a positive impact

with Linear regression and SVM

Page 18: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Conclusion

•In general, twitter information when used with non-linear models increase the prediction accuracy for long term stock market predictions

•Twitter volume had a linear relationship with movie sales, but sentiment analysis had none

Page 19: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Appendix

•Logarithmic returns of the series

1

1

t tt

t

P PR

P

Page 20: Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Testing model adequacy

•Testing the relationship between twitter time series and the time series that has to be forecasted

•Neglected nonlinearity▫Are the 2 Time series non-linearly related?

•Granger causality▫X->Y OR Y->X ?