Predictive Analytics in the Land of the Vampire Squid · Predictive Analytics in the Land of the...

36
Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Predictive Analytics in the Land of the Vampire Squid Dr. David Andre CEO Cerebellum Capital dandre at cerebellumcapital.com Predictive Analytics World, 2011 San Francisco

Transcript of Predictive Analytics in the Land of the Vampire Squid · Predictive Analytics in the Land of the...

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Predictive Analytics

in the Land of the Vampire Squid

Dr. David Andre

CEO

Cerebellum Capital

dandre at cerebellumcapital.com

Predictive Analytics World, 2011 San Francisco

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

2

What I did after getting my

PhD:

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

3

What I moved to in early 2008

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Overview

• Perils and opportunities of Wall Street

• Seven ways to get it very wrong

• Our approach

• Takeaways

4

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Computers tend to win in the end when the contests are heavily

influenced by speed, memory, and probability calculations

Watson AI has come a long way:

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

So…Is The Finance Domain Worth the Effort?

• Asset Management is a $20 Trillion - $50 Trillion Domain,

depending on how you count.

• Machines are taking over trading at a fantastic rate.

• Many excellent algorithms exist for high-frequency trading,

many using predictive analytics in a relatively simple way.

• Yet, there a widespread belief that computers can’t

design meaningful new strategies – this belief is oddly

ubiquitous in the field of quantitative finance.

• Furthermore, Finance is nearly a perfect test-bed for AI.

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Finance seems deep & rich for AI

• Huge datasets in both time & companies

• Huge amounts of data about companies is flooding the web

• Easily measured metrics for success

• Huge rewards if you get it right

• Nearly every aspect of AI can be useful:

– Machine Learning (prediction & estimation)

– Planning (trade planning)

– Optimization (Portfolio optimization)

– Knowledge Engineering

– Text & Speech understanding

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Wait – aren’t the markets efficient? Maybe, unless:

• You have better info or can use the

info to make better predictions

• You can reach the right answer

faster or you can trade faster

• Markets or investors allow a bookie

to be a middle man, making both

sides happy and taking a cut

• You’re first to the inefficiency!

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Four ways being smart helps funds win

• Deducing (sometimes using data available only to a few)

who’s buying/selling what/when allows high-frequency

front-running.

• Predicting who will buy/sell based on recent price moves

is technical trading.

• Predicting relationships between assets that are semi-

constant yields statistical arbitrage and pairs trades.

• Estimating better a company’s true value yields longer-

term profits.

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Seven traps of using predictive analytics on Wall St.

1. Lemons & Butter

2. Know thyself (or how to overfit without really trying)

3. Broker is not a noun, it’s an adjective

4. The test set must be IID and drawn from the same

distribution…

5. What, my reality isn’t the worlds?!?

6. The big kahuna

7. Time moves forward one moment at a time

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Lemons and butter

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Lemons and butter

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Know thyself

• Most Predictive Analytics researchers know well to test

out-of-sample

• However, you have to test the whole system out-of-

sample (including the humans) to really get a fair test

• Canonical examples are to run the whole process more

than once (even if it includes cross-validation and out-of-

sample testing).

• Another is to throw away ideas that don’t work.

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Train/Learn

In Sample

Test Out of

Sample

Discard Most

Recently Added

Input Stream

Keep Most

Recently Added

Input Stream Does this Out of

Sample Result Look

Good?

Add A New

Input Stream

YES

NO

Do the Out of

Sample Results

Always Look Good?

NO

Continue

YES

Done

Great way to fool yourself

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Broker?

• Simple trading algorithms offered by most brokers are

easily manipulated by the HFT community.

• If you aren’t the shark, you are getting eaten.

• The rules are set up to favor the bankers.

• Brokers all have different symbology.

• Trades do get messed up, and usually not in your favor.

• The big brokers won’t take you if you’re not big, and it’s

hard to get big without the advantages they offer.

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

The same distribution…

• For the guarantees of theoretical ML to hold, the test set

must be drawn from the same distribution the training set.

• This is seldom less true than in finance.

• Sarbanes-Oxley?

• New scrutiny and regulation on shorting?

• Context matters, e.g. tightness of credit:

• Other funds/traders are figuring things out so alpha

changes over time in a very complex way…

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

The same distribution?

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

The same distribution? Not so much

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

What? I can’t trade for free?

• Most academic or online sources present alpha without

taking into account real costs

• Costs to borrow money and obtain leverage

• Minimum commission costs

• Slippage is real, especially in non S&P 100 names

• Costs to borrow (and some assets can’t be borrowed at

all) can be very high

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

What? I can’t trade for free?

Without realistic trading constraints

With realistic trading constraints

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

The Big Kahuna

• You may not be as clever as you think, and lots of other

hedge funds might be trading on very similar names.

• If they have to get out of their positions (say, due to a

liquidity crisis), your positions can get pounded.

• If you can manage to hang in there, it can be worth it – but

you have to be willing to accept 20 or 30% down!

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Time moves forward one moment at a time

• It is remarkably easy to time-travel.

• And, despite there being “so much data”, there is no way

to get it faster – it comes only one minute a minute, one

day a day.

• Time travel is letting any information from the future get

into the simulated past.

• This can include even things like processor speed, so it’s

very difficult to police.

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Some examples of time-travel

• Survivor bias – can’t throw out Enron or Lehman.

• Using a modern computer? They didn’t that in 1995.

• Using a cleaned dataset? When did it get cleaned?

• Stock universes? Are you selecting stocks based on

inclusion in the SP500 today and looking at their past?

• Throwing out models that used to look great

• Trading like you could know the price now, now.

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Our approach

• Focus on the long tail and automate discovery, not just

tuning

• Search over strategy space

• Hybrid of humans and computers

• Time safety

• Bogosity detectors.

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

A Decomposition of the Challenge

Feature Y of

Data Stream Z

Predictor N Strategy W

Feature K of

Data Stream T

Predictor K Strategy K

Feature 1 of

Data Stream 1

Predictor 1 Strategy 1

Raw

Data

Str

eam

s

Allocate to the

best strategies

so as to

maximize

returns,

minimize risk,

and keep

portfolio

balanced

within risk

constraints

Tra

de

Hundreds Hundreds Hundreds

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Long Tail

26

Cost-

effe

ctive f

or

most

Quant

funds

The other 99% of the market inefficiencies!

Getting the cost to launch 1 additional strategy/leg toward zero

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Long Tail

27

Cost-

effe

ctive f

or

most

Quant

funds

The other 99% of the market inefficiencies!

Getting the cost to launch 1 additional strategy/leg toward zero

• Learn/discover the strategies

automatically

• Humans find useful data sets

• Advanced program search &

evaluation finds features and

combines them

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Search in program space…

• We focus on looking for diverse strategies, not just simple

variants of a human-derived strategy.

• Essentially, we’re automating the process of science with

respect to financial data.

• Key to this is good input, as garbage-in, garbage-out.

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Process Is What Matters

X X

! X

World Class

Human Chess Players

Weak

Human Chess Players

World Class

Computer Chess

Players

Weak

Computer Chess

Players

(2005 “Freestyle

Tournament”

Playchess.com

Winner was

2 weak humans

+

3 weak laptops +

Innovative

Process)

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Roll Forward Cross Validation

Learn from past

examples

Time

Pick best predictors

and strategies to use

Out of Sample Test

Pick best predictors

and strategies to use

Learn from past

examples

Out of Sample Test Out of Sample Test

Pick best predictors

and strategies to use

Learn from past

examples

Pick best predictors

and strategies to use

Learn from past

examples

Pick best predictors

and strategies to use

Learn from past

examples

Out of Sample Test

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Time safety in the programming language

• Our strategy language and software infrastructure was

built from time-safety first.

• All strategies are coded in the same framework, with a

“self-aware” representation so that the system can reason

about the strategies.

• The code can’t “see ahead” in the data structures, so time

safety is guaranteed.

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Detecting Bogosity

Candidate

strategy

SPY

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Detecting Bogosity

Candidate

strategy

12% of random

strategies with the

same structure

beat it in returns,

29% in Sharpe!

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Some Strategies that ran in February

34

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Takeaways

1. Be rigorous to avoiding time-travel and fool yourself

2. Learn/discover the right features

3. Use randomization to find the right complexity of model

4. The hybrid person/machine solution works best

5. Look where everyone else isn’t looking

Confidential and Proprietary - © Cerebellum Capital, 2009-2011

Questions?