Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

44
WILL TWITTER MAKE YOU A BETTER INVESTOR? A LOOK AT SENTIMENT, USER REPUTATION AND THEIR EFFECT ON THE STOCK MARKET Eric D. Brown

description

In this presentation, I provide an overview of my research into using twitter sentiment and message volume as inputs into modeling stock price movements. A quick and dirty linear regression model using Twitter Sentiment, the Number of Tweets per day, the VIX Closing price and the VIX Price change delivers a simple model for the S&P 500 SPY ETF that has an accuracy of 57% over 6 months (tested on out-of sample data). This model was built using data from July 11 2011 to August 11 2011.

Transcript of Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Page 1: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

WILL TWITTER MAKE YOU A BETTER INVESTOR?

A LOOK AT SENTIMENT, USER REPUTATION AND THEIR EFFECT ON THE STOCK MARKET

Eric D. Brown

Page 2: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Background• Sentiment has long been an underlying factor in the

investing world• Consumer Confidence Index• Investors Intelligence Sentiment Index• “Market Sentiment”

• Rather than waiting days, months or weeks, can the ‘sentiment of now’ be used to improve trading performance and investing decisions?

• Can Twitter be used to determine the ‘sentiment of now’?

Page 3: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

BackgroundThe thoughts driving this research are:

• Can analysis of publicly available Twitter Messages provide insight for decision making for investing?

• Do Twitter messages (and their subsequent sentiment) have any effect on movement in the stock market?

• Can Twitter messages be mined and analyzed to predict movements in the stock market?

• Does a Twitter user’s reputation have an effect on how people perceive and use their shared investing ideas?

Page 4: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Research Method

Twitter Data Collection

Sentiment Analysis

Social Analysis

Stock Market Data

Price & Volume Analysis

Positive Correlation of sentiment and message volume with

price/volume

Reputation of Twitter user

Understanding of predictive capabilities of Twitter Sentiment and the affect of user reputation investing decision support

Page 5: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Research Method• Data Collection

• Using Twitter API to collect tweets (tweet, sender, date, time)• Tweets referencing companies and sectors are collected and stored in a

MySQL database for future study• Using the nomenclature made popular by StockTwits (

www.stocktwits.com). Example: The stock symbol for Apple is AAPL. Users following the StockTwits nomenclature add a “$” to the symbol – “$AAPL”.

• Stocktwits.com describes their purpose as a place to: • …share ideas, market insights and trades on stocks, futures and the market in

general *.

• Using Yahoo Finance data feed to gather Stock Market data (price and volume)• Provides historical data

Page 6: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Research Method• Sentiment Analysis• Using a Naïve Bayesian text classification algorithm to

determine sentiment of collected Tweets• Naïve Bayesian is being used for simplicity but also because many

researchers have pointed out very minor differences between it and other sentiment analysis methods

• A subset of the data collected has been manually assigned ‘sentiment’ to build the necessary training dataset

• Using the Python Natural Language Toolkit, the Bayesian classification is performed

• For each tweet, the overall score is calculated and assigned.• Ideally, tweets will fall into +1 (Bullish), 0 (Neutral), -1 (Bearish)

buckets.

Page 7: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Sentiment Classification Process

Training Dataset

Twitter Data

Bayes Classifier

Trained Classifier

Classified Twitter

Messages

Page 8: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Current Dataset• Twitter Dataset:

• May 1 2011 to Dec 31 2011• 473,901 Tweets• No deduplication performed

• Training Dataset:• 5000 messages randomly selected from collected Tweets• Messages have been manually coded as Bullish, Bearish, Neutral

or Spam• 544 Bullish (10.88%)• 638 Bearish (12.76%)• 3454 Neutral (69.08%)• 364 Spam (7.28%)

Page 9: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Bullish Examples• Markets seems like consolidating before another rally

higher $SPX $SPY $QQQ

• $KFT - Kraft Foods Stock Analysis - CCI is bullish and rising

• If this daily candle ends like this do not go short! 3 white soldiers (bullish) $SPX $SPY http://t.co/oWWUqnu

• buy the dips WORKING :-) $ES_F $SPY

Page 10: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Bearish Examples• $K - Kellogg Stock Analysis - bearish stochastic

crossdown

• $SPY - Might be trying to roll a little.

• warning sign as $IWM didn’t make a higher high, unlike $QQQ and $SPY

• RT @grassosteve: $SPX Jobs although important only 1 aspect of the weakness in the mkts, i would sell pops still levels up 1216 1228 123 ...

Page 11: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Neutral Examples• Stocks and Inflation: What the Market Is Really Telling Us

http://seekingalpha.com/a/5tap $TLT $TLH $SPY

• NEW POST: UPDATED- MEAN REVERSION TRADE ON THE RUSSELL 2000 http://bit.ly/l2Q1Rf $IWM $TNA $SPY $QQQ

• Sold my $SPY Jun 03 2011 133.0 Puts for 78c made 7c

• Durable Goods as a Leading Market Indicator http://seekingalpha.com/a/5u8j $DIA $SPY $QQQ

Page 12: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Spam Examples• Hop up out the bed turn my $wag on. ;*

• MILLIONAIRE SECRETS CLUB - MAKE $1 MILLION A YEAR http://goo.gl/2Yv8s $OXY $PBR $PDCO $pennystocks $POT $PRU $QCOM $QSII axsc

• HOME TYPERS NEEDED - MAKE $1000s WEEKLY - PAYS DAILY http://goo.gl/hoNUn $SNP $SOHU $SPLS $SPRD $SPX $SPY $STO $stocks sg2

• UNLIMITED FREE TV SHOWS on YOUR PC - 12,000 FREE CHANNELS http://goo.gl/v55Nw $CBG $CBS $CF $CLR $CMCSA $COF $COP $CROX $CTIC qika

Page 13: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Sentiment Analysis using full training Set

• From May 1 to Dec 31 2011:

All Tweets 473,901

Bullish

103,770 21.90%

Bearish 84,454 17.82%

Neutral 224,300 47.33%

Spam 61,371 12.95%

None

6 0.001%

Page 14: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

A look at the Market & Sentiment• I ran a short analysis on the S&P 500 ETF (SPY) between

July 11 and August 11 2011

• This date range chosen mainly due to a very volatile movement down

• 32 days of data

• 26,307 tweets

• On Jul 11 SPY is 131.40• On Aug 11 SPY is 117.33

Page 15: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

S&P 500 ETF (SPY) - 7/11 to 8/11

Page 16: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

S&P 500 ETF (SPY) - 7/11 to 8/11

Page 17: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

S&P 500 ETF (SPY) - 7/11 to 8/11

Page 18: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

S&P 500 ETF (SPY) - 7/11 to 8/11

Page 19: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

But…is the Classifier accurate?• We can determine sentiment of a tweet…but is it really

accurate?

• The Python Natural Language Toolkit provides a method to determine accuracy of training dataset

• Build training dataset as normal• Use training dataset as the “input data”• Run all messages through classifier and determine accuracy

Page 20: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Sentiment Classification Process

Training Dataset

Twitter Data

Bayes Classifier

Trained Classifier

Classified Twitter

Messages

Page 21: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Classification Accuracy

Training Dataset

Training Dataset

Bayes Classifier

Trained Classifier

Accuracy

Page 22: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Accuracy of Training Dataset• If you recall, our training dataset is:

• 5000 messages randomly selected from collected Tweets• Messages have been manually coded as Bullish, Bearish, Neutral

or Spam• 544 Bullish (10.88%)• 638 Bearish (12.76%)• 3454 Neutral (69.08%)• 364 Spam (7.28%)

• Running the accuracy method of the Python NLTK delivers a 54.18% accuracy.

Page 23: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

How can we improve accuracy?• If we think about the theory behind the Bayesian classifier, it may

shine some light on the inaccuracies.

• The Bayesian Classifier is a probability based theory and is only as good as the data used to train.

• Research suggests that having non-symmetric training data sets / features, can throw the Bayesian filter off.

• The training dataset used is non-symmetric.

• What if we create a symmetric dataset with the same number of Bullish, Bearish, Neutral and Spam data?

Page 24: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Equivalent Sized Training Dataset• Rebuild Training Dataset

• Randomly select 500 tweets from each training dataset

• Re-run the accuracy method again.• Accuracy = 91.94%

• An improvement from 54.18% to 91.94%

• What will this improved accuracy do for the overall dataset?

Page 25: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Sentiment Analysis using Symmetric training Set

• From May 1 to Dec 31 2011:

All Tweets 473,901

Bullish

110,141 23.24%

Bearish

103,233 21.78%

Neutral 212,509 44.84%

Spam 48,012 10.13%

None

6 0.001%

Page 26: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

S&P 500 ETF (SPY) - 7/11 to 8/11

Page 27: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

S&P 500 ETF (SPY) - 7/11 to 8/11

Page 28: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

S&P 500 ETF (SPY) - 7/11 to 8/11

Page 29: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

S&P 500 ETF (SPY) - 7/11 to 8/11

Page 30: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Statiscally Speaking• There seems to be some correlation between twitter &

Stock data

• Correlation of TransformedBBN and Closing Price• Correlation=0.495• P-Value = 0.004

• Correlation of Num Tweets and Daily Volume• Correlaton = 0.648• P-Value = 0.000

Page 31: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Correlations?• Basic Analysis shows some correlation between

price/sentiment and volume/tweet volume.

• Using Time Series analysis, a cross-correlation analysis can be completed to determine how these variables are related at different ‘lag’ periods.

• Using Cross-correlation analysis we can get the Cross-Correlation Coefficient (CCF) which gives us an idea of how well two variables are correlated at lag time r.

• If a correlation is found a negative lag time r, that variable is a candidate for use in predicting the output variable.

Page 32: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Correlations• Closing Price Cross Correlation Coefficient

• Volume Cross Correlation Coefficient

Lag Time r

Variable -6 -5 -4 -3 -2 -1 0

Sentiment 0.299 0.399 0.599 0.620 0.663 0.599 0.495

Num Tweets -0.497 -0.461 -0.406 -0.533 -0.577 -0.627 -0.714

Lag Time r

Variable -6 -5 -4 -3 -2 -1 0

Sentiment -0.307 -0.394 -0.527 -0.574 -0.617 -0.605 -0.557

Num Tweets 0.373 0.443 0.485 0.552 0.613 0.642 0.648

Page 33: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

What now?

• Correlations exist between price/volume and sentiment/message volume.

• The main reason I’m looking at sentiment is to determine if it can somehow be used to predict price movement.

• So…let’s build a model using Linear Regression (note…linear regression isn’t a likely fit but a good place to start).

• I want the model to predict Closing Price…so let’s start with a simple model using Sentiment only

Page 34: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Building Models• Using Sentiment to predict Price:

• The regression equation is• Closing Price = 133 + 47.3 TransformedBBN

• The p-value is less than 0.05, so we should be good but R-Squared is 24.5…which tells me this isn’t a very good model.

Page 35: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Building Models (cont)• What other variables can be used?

• Volume?• Volume change?• Number of Tweets?• Volatility measurements?

• There are a lot of combinations…but let’s keep it simple. Let’s select the Number of Tweets and re-run the analysis.

Page 36: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Building Models (cont)• Using Sentiment + Number of Tweets:• The regression equation is

• Closing Price = 139 + 41.4 TransformedBBN - 0.00856 Num Tweets

• P-values look good still and R-Squared is up to 69.6%.

Page 37: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Building Models (cont)• What else can we add?

• How about a measure of volatility?

• One often quoted measure is the VIX. This is a measurement of implied volatility of S&P 500 index options.

• Often referred to as the ‘fear index’.

• On 7/11, the VIX was 18.39• On 8/11 the VIX was 39.00

Page 38: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Building Models (cont)• Using Sentiment, Number of Tweets, the VIX Closing Price + the change

in price for the VIX

• The regression equation is• Closing Price = 148 - 0.00197 Num Tweets + 12.1 TransformedBBN - 0.713 VIX

Closing Price + 0.128 VIX Price Change

• P-values are good & R-Squared hits 97.3%...which tells us that this might be a good model.

Page 39: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Checking the model• Our equation is:

• Closing Price = 148 + 12.1 TransformedBBN - 0.00197 Num Tweets - 0.713 VIX Closing Price + 0.128 VIX Price Change

• So…for August 11, our variables are:• TransformedBBN = -0.075• Num Tweets = 1357• VIX Closing Price = 39• Vix Price Change = -3.99

• Our predicted closing price for August 12 is then: 116.65• The prediction is for a move down from the August 11 price of 117.33

• The actual closing price from August 12 was 118.20• Our directional prediction was incorrect and our price prediction was

incorrect.

Page 40: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Checking the Model• A few more predictions (50% accuracy) :

• For August 15 – prediction is up. Price is 118.44• Actual is 120.62. Price moved up.

• For August 16 – prediction is up. Price is 121.69• Actual is 119.59. Price moved down.

• For August 17 – prediction is up. Price is 121.07• Actual is 119.67. Price moved up.

• For August 18 – prediction is up. Price is 121.73• Actual is 114.51. Price moved down

Page 41: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Checking the model• The model works for prediction direction on a few dates

immediately…but about the rest of the time?

• Looking at the rest of the year, there are 99 trading days.• For 53 days, the prediction was correct.• For 46 days, the prediction was incorrect.

• The model gives a 53.54% accuracy. • Not great….but better than 50%. • With proper risk management of investments, a 3.54% “edge” on the

market might be perfectly acceptable.

• FYI - This model applied to a full 6 month data set gives an accuracy of 57.06%

Page 42: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Checking the model• As a test, I removed the Tweet Sentiment and the Number

of Tweets from consideration and only re-ran an analysis to create a model.

• The model uses the VIX and VIX Price change only and gives R-Squared of 95.5%.

• The regression equation is• Closing Price = 149 - 0.847 VIX Closing Price + 0.120 VIX Price

Change

• This model gives an accuracy of 48.48%....so there seems to some value in sentiment/tweet volume.

Page 43: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Next Steps• There seems to be some correlation between twitter &

Stock data

• Begin building more complex predictive models using Time Series modeling and prediction methods (ARIMA, etc).

• Continue analysis of sentiment and price movements.

• Begin Social Network Analysis of twitter users for reputation, etc

Page 44: Twitter Sentiment & Investing - modeling stock price movements with twitter sentiment.

Thank you

Eric D. [email protected]://ericbrown.com