Download - Haefele june27 1150am_room212_v2

Transcript
Page 1: Haefele june27 1150am_room212_v2

Deep Value

Hadoop Summit June 2013

Deep Value, Inc.

Page 2: Haefele june27 1150am_room212_v2

Outline of talk l  Who are we

l  What do we do

l  What is HFT

l  What is the structure of our technology effort

l  How we use Hadoop

l  Focus on what we've built at top level and lessons learned

l  Next steps? Open source with founding team

Page 3: Haefele june27 1150am_room212_v2

Deep Value l  Started in 2006 to provide high performance execution

algorithms on a “paid for performance” basis.

l  Execution algorithms take large client orders and split into small pieces to execute through the day

l  Routinely trade 0.5 – 1% of US stock market volumes. Highest date in 2012 was ~4% and ~3% this year

l  Exchange sponsored execution algorithms to NYSE floor brokers.

l  45 people based in US and India

Page 4: Haefele june27 1150am_room212_v2

What do we do l  Utilize sophisticated math and statistics to see patterns in

the data to come up with trading tactics

l  Use simulation to understand if trading ideas in-fact work.

l  Core business is providing tools (algos) to mutual funds and others to avoid being gamed by pure HFT-traders

l  Ability to harness compute resources is a key determinant of success - Hadoop

l  All compute resources are now cluster based and need a grid platform to utilize - Hadoop

Page 5: Haefele june27 1150am_room212_v2

What if HFT? l  Look at every order in the market and make real-time

decisions on what to do next

l  Looking to receive rebates by providing liquidity when sensible to do so

– Citibank was favourite for many years due to low price and thus large % spread

l  Some amount of “sniffing out” of large orders

l  Often a speed game – faster routers, shorter wires, FPGA

l  We use smarts to try and not show our hand

Page 6: Haefele june27 1150am_room212_v2

Trading Systems l  Order management systems (OMS) / Execution

Management Systems (EMS)

l  Takes in market data representing every order placed in every market

l  Sends out orders to market, manipulates those orders (replace/cancel) and receives fills

– Via name-value protocol call FIX

l  Fills represent actual trades

l  Logs what it is doing via structured logging

Page 7: Haefele june27 1150am_room212_v2

Cloe

Page 8: Haefele june27 1150am_room212_v2

Lessons from building grid l  Cluster wide locks is the problem

–  Focus on these in design

– Batch changes and get lock once

l  Build for performance case, and have failure case be potentially slower / more complex

– Regular message processing doesn't get cluster locks

l  Hybrid of message passing & centralized control

Page 9: Haefele june27 1150am_room212_v2

Questions to solve: Hadoop l  What is the algorithm actually doing?

– Complexity e.g. feedback loops

– Testing against intentions

l  Can we do better next time

– Back-testing

–  Improved research process

l  Log and historical market data management

Page 10: Haefele june27 1150am_room212_v2

DV Research Process l  What to be able to look at “raw” market data to be able to

prove ideas

– Typically non-programmers with statistical background

– R-project including R-Hadoop

l  Want to be able to make change to production code, and test if this works better via simulation

– Does it work better, how, when?

l  Roll out code to production easily

Page 11: Haefele june27 1150am_room212_v2

Hadoop-ifying Cloe l  Realized we could run Cloe under Hadoop

l  Drive “orders” into Cloe via Hadoop

l  Pass in market data quote files via HBase

l  Store simulation results in Hadoop/HBase

l  Market Simulation Framework outputs fills

l  Cascading to allow complex analysis by senior coders

Page 12: Haefele june27 1150am_room212_v2
Page 13: Haefele june27 1150am_room212_v2

Lessons learned - Hadoop l  EC2 costs can mount quickly

– Had hybrid plan (either own or EC2)

– Built our own 50 node cluster. See DV blog.

l  Smaller files should be in Hbase not Hadoop has a NameNode limitation

– All file pointers in memory

l  Different tasks with different resource requirements don't play nicely in single cluster

– YARN should solve this.

Page 14: Haefele june27 1150am_room212_v2

Lessons learned – Hadoop...

l Make developer machine setup turn-key

– We use extensive scripting to make getting dev environment running a one step process

– Dev environment was controlled to close to cluster environment

l Cascading is great for complex analysis

l Importance of configuration of cluster

– Memory, threads, cores for your jobs

Page 15: Haefele june27 1150am_room212_v2

Next steps l  Considering open-sourcing via Apache license

l  Bring some sanity to traditional execution technology space

l  Looking for a founding team

l  Please talk to me afterward if you're interested in investigating further

Page 16: Haefele june27 1150am_room212_v2

End