Complex Er[jl]ang Processing with StreamBase

46
1 Copyright 2011 © StreamBase Systems, Inc. Darach Ennis ([email protected]) @darachennis StreamBase Systems Erlang Factory London June 10 th 2011 Complex Er[jl]ang Processing with StreamBase: A DSL for Low Latency High Frequency CompuOng + + =

description

Slides from my Erlang Factory London 2011 presentation yesterday.

Transcript of Complex Er[jl]ang Processing with StreamBase

Page 1: Complex Er[jl]ang Processing with StreamBase

1  Copyright 2011 © StreamBase Systems, Inc.

Darach  Ennis  ([email protected])  @darachennis  StreamBase  Systems  

Erlang  Factory  London  -­‐    June  10th  2011  

Complex  Er[jl]ang  Processing  with  StreamBase:  A  DSL  for  Low  Latency  High  Frequency  CompuOng  

+ + =

Page 2: Complex Er[jl]ang Processing with StreamBase

2  Copyright 2011 © StreamBase Systems, Inc.

  What  is  ‘Complex  Event  Processing’  •  Specifically  flow  oriented  event  processing  (there  are  others)  •  Streams  &  Operators.  Windowing,  Branching,  Combining,  Extending  

  A  day  in  the  life  of  a  flow  programmer  •  RelaOvity  -­‐  Data  parallelism,  concurrency,  latency  &  throughput  

•  ConOnuity  -­‐  ConOnuous  Streaming  Map  Reduce  

•  Reliability  –  High  availability,  the  low  latency  way  •  Flow,  meets  FuncOon.  Embed  Erlang  in  process  via  Erjang  

  Integra?on.  Erlang  –  the  ecosystem.  •  Calling  Erlang  from  StreamBase  –  Simple  &  windowed  funcOons  

•  Client/Server  –  Pushing  events  to/from  StreamBase  

•  RabbitMQ  -­‐  Messaging  

  TheC.  Erlang  –  the  inspira?on.  Paxos,  in  StreamBase  

Agenda  

Page 3: Complex Er[jl]ang Processing with StreamBase

3  Copyright 2011 © StreamBase Systems, Inc.

Myth:  High  level  domain  specific  languages  are  too  slow  for  HFT.  

Reality:  High  level  domain  specific  languages  can  deliver  beLer  performance  than  system  programming  languages  when  tailored  to  a  specific  task.  

High  Level  DSLs  :  Myth  Vs  Reality  

Page 4: Complex Er[jl]ang Processing with StreamBase

4  Copyright 2011 © StreamBase Systems, Inc.

Complex  Event  Processing  aka  Event  Processing  

  SoCware  organized  by  events  (cf:  object/func?on  oriented)  •  What’s  an  event?  What’s  an  object?  

•  Something  that  can  trigger  processing,  can  include  data.  

•  Naturally  but  not  usually  represents  a  “real  world”  events  &  observaOons.  

  Complex  Event  Processing  PlaTorms  •  Soeware  stack  for  event  based  systems,  event  driven  architectures  

•  Event  Programming  Language  –  SQL-­‐based,  Rules-­‐based,  or  State-­‐based  

•  Commercial  and  open  source:  StreamBase,  Progress,  Microsoe,  IBM,  Oracle,  SAP,  Esper,  Drools  and  many  more  

  Adopted  in  financial  services  and  other  markets  •  System  monitoring,  industrial  process  control,  logisOcs,  defense/

intelligence  

  Other  Event  Processing  Approaches:  •  Erlang,  Scala/Akka,  Actors,  node.js,  .NET  Rx  

Page 5: Complex Er[jl]ang Processing with StreamBase

5  Copyright 2011 © StreamBase Systems, Inc.

What  does  a  CEP  DSL  or  Language  offer?  

  Con?nuously  Observe,  Orient,  Decide  Act  (OODA)  on  event  streams  •  ConOnuous  Incremental  Query  

•  Pajern  matching  within  or  across  streams  

•  Branch  –  Split,  Causal  Split,  Filter.  •  Combine  -­‐  Semi-­‐Join,  Union,  Gather,  Merge,  Join,  Pajern  

•  Windows  –  Process  sets  of  streaming  data  •  Sliding  or  Tumbling,  Overlapping  or  Non-­‐Overlapping,  Gaps  or  No  Gaps  

•  Finite  (1  second,  1000  tuples),  Infinite  •  Emission  Policies:  On  Close,  Every  odd  message  

•  Predicate  based  –  Roll  your  own  window  type  •  State  Management  –  In  memory,  CSV  files,  CSV  sockets,  RDBMS,  Parallel  

DBMSs,  Column  Stores,  KV  stores,  NoSQL,  NewSQL…  

•  Nice  to  have:    •  DeclaraOve  concurrency,  Interface  Polymorphism,  DistribuOon,  Extensible  

Wikipedia:  John  Boyd,  OODA  Loop  

Page 6: Complex Er[jl]ang Processing with StreamBase

6  Copyright 2011 © StreamBase Systems, Inc.

Challenges  for  CEP  

  ‘Über’  Ultra  Low  Latency?  •  Sub-­‐milli  is  standard,  sub-­‐100-­‐micro  is  desirable.  Less  is  more!  

  Large  Data  Volumes  •  Hundreds  of  thousands  of  events,  thousands  of  decisions,  per  thread.  •  Big  Data.  ~Hundreds  of  SMP  CEP  nodes.  

  Demanding  Opera?onal  Environment  •  24x7,  365  –  in  criOcal  environments  (trading,  surveillance,  uOliOes)  

  Sophis?cated  Data  Processing  (some?mes)  •  OpOons  pricing,  yield  curves,  risk  metrics,  smart  grid  capacity  planning,  

fraud  detecOon.  

  How  it’s  done  (QCON  London  2011):  •  How  LMAX  did  it?  hjp://bit.ly/fUeS0P  

•  How  we  did  it?  hjp://bit.ly/hM6NAP  <-­‐  Our  CTO  Richard  Tibbejs  talk  

Page 7: Complex Er[jl]ang Processing with StreamBase

7  Copyright 2011 © StreamBase Systems, Inc.

StreamBase  Event  Processing  Plauorm  

Studio  Integrated  Development  Environment  

ApplicaOons  

Adapters   StreamBase  Server     Adapters  

Event  Processing  Server  High  performance  opOmized  engine  can  process  events  at  market  data  speeds.  

Output  Adapter(s)  Send  results  to  systems,  users,  user  screens  and  databases.  

Developer  Studio  Graphical  StreamSQL  for  developing,    back  tesOng  and  deploying  applicaOons.      

Visualiza?on  

StreamBase  Component  Exchange  

StreamBase  Frameworks    

Input  Adapter(s)  Inject  streaming  (market  data)  and  staOc  (reference  data)  sources.  

Page 8: Complex Er[jl]ang Processing with StreamBase

8  Copyright 2011 © StreamBase Systems, Inc.

How  did  we  do  it?  

  Compila?on  and  Sta?c  Analysis  •  Design  the  language  for  it  

  Modular  abstrac?on,  interfaces  •  Quants  and  Developers  Collaborate,  share  code  

  Bytecode  genera?on  and  the  Janino  compiler  •  OpOmized  bytecodes,  in-­‐memory  generaOon    

  Garbage  op?miza?on  •  Pooling,  data  class,  invasive  collecOons  

  Integra?ons,  C++  and  Java  plugins  •  Efficient  naOve  interfaces,  Hardware  acceleraOon  

  Adapter  API,  FIX  Messaging  •  Threading  and  API  structure  for  ultra  low  latency  

  Parallelism,  Clustering,  Lanes  and  Tiers  •  Scalability,  with  a  latency  bias.  

  Modularity  through  Named  Data  Formats,  Schemas  •  Sharing  data  and  semanOcs  between  apps  

Page 9: Complex Er[jl]ang Processing with StreamBase

9  Copyright 2011 © StreamBase Systems, Inc.

StreamBase  StreamSQL  EventFlow  

Off  The  Shelf  Business  Logic  

Rapid  Deployment  &  Unit  TesOng  

Off  The  Shelf  ConnecOvity  

Modularity  &  Polymorphism  

Interfaces  &  Extension  Points  

Page 10: Complex Er[jl]ang Processing with StreamBase

10  Copyright 2011 © StreamBase Systems, Inc.

Operators  –  Hi  Erlang.  Hello  StreamBase  

Did  you  just  tell  me  to  go  flow  

myself?  

Page 11: Complex Er[jl]ang Processing with StreamBase

11  Copyright 2011 © StreamBase Systems, Inc.

  What  is  ‘Complex  Event  Processing’  •  Specifically  flow  oriented  event  processing  (there  are  others)  •  Streams  &  Operators.  Windowing,  Branching,  Combining,  Extending  

  A  day  in  the  life  of  a  flow  programmer  •  RelaOvity  -­‐  Data  parallelism,  concurrency,  latency  &  throughput  

•  ConOnuity  -­‐  ConOnuous  Streaming  Map  Reduce  

•  Reliability  –  High  availability,  the  low  latency  way  •  Flow,  meets  FuncOon.  Embed  Erlang  in  process  via  Erjang  

  Integra?on.  Erlang  –  the  ecosystem.  •  Calling  Erlang  from  StreamBase  –  Simple  &  windowed  funcOons  

•  Client/Server  –  Pushing  events  to/from  StreamBase  

•  RabbitMQ  -­‐  Messaging  

  TheC.  Erlang  –  the  inspira?on.  Paxos,  in  StreamBase  

Agenda  

Page 12: Complex Er[jl]ang Processing with StreamBase

12  Copyright 2011 © StreamBase Systems, Inc.

A  day  in  the  life…  Why  a  DSL?  

  High  level  –  Windowing,  Combina?on  &  PaLern  Matching  Streams  

  Graphical  –  ‘See’  the  flow,  dependencies,  pathways    Fast,  Flexible  SDLC  –  Deploy  new  algorithms,  con?nuously  

  Understandable  –  Rise  to  the  abstrac?on    Flexible  

Page 13: Complex Er[jl]ang Processing with StreamBase

13  Copyright 2011 © StreamBase Systems, Inc.

A  day  in  the  life..  More  is  more!  

Page 14: Complex Er[jl]ang Processing with StreamBase

14  Copyright 2011 © StreamBase Systems, Inc.

Make  it  work.  Measure  it.  -­‐  Baseline  ‘Noop’  Performance  

WARNING:  Micro-­‐benchmark.  

YMMV  

Page 15: Complex Er[jl]ang Processing with StreamBase

15  Copyright 2011 © StreamBase Systems, Inc.

Concurrency  –  Baseline  -­‐  Results  

Nice.  1.1  million  events/sec  

Page 16: Complex Er[jl]ang Processing with StreamBase

16  Copyright 2011 © StreamBase Systems, Inc.

Parallelize  it  –  The  ‘Convenient,  but  incorrect’  way  

Execution Order & Concurrency:

Rule 1 – Each event processed to completion left-right Rule 2 – Branches processed sequentially Rule 3 – Outputs are processed sequentially Rule 4 – Module output processed to completion immediately Rule 5 – One operator executed at a time

Page 17: Complex Er[jl]ang Processing with StreamBase

17  Copyright 2011 © StreamBase Systems, Inc.

Gah!  WTF?!  

Page 18: Complex Er[jl]ang Processing with StreamBase

18  Copyright 2011 © StreamBase Systems, Inc.

OpOmize  it!  ObservaOon  #1  

Measure  the  cost  of  thread  context  switches.  Dodge  

&  Burn!  

Page 19: Complex Er[jl]ang Processing with StreamBase

19  Copyright 2011 © StreamBase Systems, Inc.

OpOmize  it!  ObservaOon  #2  –  It’s  Map/Combine/Reduce  

Page 20: Complex Er[jl]ang Processing with StreamBase

20  Copyright 2011 © StreamBase Systems, Inc.

CSMR  –  A  ‘pajern’  for  low  latency  high  throughput?  

Page 21: Complex Er[jl]ang Processing with StreamBase

21  Copyright 2011 © StreamBase Systems, Inc.

CSMR  –  A  ‘pajern’  for  low  latency  high  throughput?  Yup!  

4  

8  

4   4  

10  

8  

20  

9.4  

18  

20  

0  

5  

10  

15  

20  

25  

Gather   Merge   Branch  &  Join   Filter  Even  Random   Baseline  (Noop)  

Throughp

ut  (M

tps)    

Millions  of  tup

les  /  second

 

Benchmark  HW:  Dell  RS510,  8  core,  32  GB  RAM,  2.4GHz  IA-­‐64,  OS:  RHEL  5u3,  SB  7.0.1.2  

Across  Region  

Within  Region  

Page 22: Complex Er[jl]ang Processing with StreamBase

22  Copyright 2011 © StreamBase Systems, Inc.

CSMR  +  GPUs  /  FPGAs?  Sure!  

We can build on SMP and cluster wide distribution algorithms to further optimize interactions with accelerated compute technologies. We can exploit accelerated hardware messaging where regular network IO is insufficient for distribution results over 1Ge or regular networking technologies. The pattern above is a continuous streaming variant of Map/Reduce in EventFlow.

Page 23: Complex Er[jl]ang Processing with StreamBase

23  Copyright 2011 © StreamBase Systems, Inc.

Similar  OpOmizaOon’s  apply  to  GPUs  too!  

Page 24: Complex Er[jl]ang Processing with StreamBase

24  Copyright 2011 © StreamBase Systems, Inc.

A  day  in  the  life..  Reliability  

Page 25: Complex Er[jl]ang Processing with StreamBase

25  Copyright 2011 © StreamBase Systems, Inc.

A  day  in  the  life..  Reliability  

  Hot/Hot,  Hot/Warm,  Hot/Cold,  None?!  

Page 26: Complex Er[jl]ang Processing with StreamBase

26  Copyright 2011 © StreamBase Systems, Inc.

A  day  in  the  life..  Reliability  

Page 27: Complex Er[jl]ang Processing with StreamBase

27  Copyright 2011 © StreamBase Systems, Inc.

A  day  in  the  life..  Reliable  CSMR  

Page 28: Complex Er[jl]ang Processing with StreamBase

28  Copyright 2011 © StreamBase Systems, Inc.

  What  is  ‘Complex  Event  Processing’  •  Specifically  flow  oriented  event  processing  (there  are  others)  •  Streams  &  Operators.  Windowing,  Branching,  Combining,  Extending  

  A  day  in  the  life  of  a  flow  programmer  •  RelaOvity  -­‐  Data  parallelism,  concurrency,  latency  &  throughput  

•  ConOnuity  -­‐  ConOnuous  Streaming  Map  Reduce  

•  Reliability  –  High  availability,  the  low  latency  way  

  Integra?on.  Erlang  –  the  ecosystem.  •  Calling  Erlang  from  StreamBase  –  Simple  &  windowed  funcOons  

•  Client/Server  –  Pushing  events  to/from  StreamBase  

•  RabbitMQ  -­‐  Messaging  

  TheC.  Erlang  –  the  inspira?on.  Paxos,  in  StreamBase  

Agenda  

Page 29: Complex Er[jl]ang Processing with StreamBase

29  Copyright 2011 © StreamBase Systems, Inc.

  Aggregate  –  A  window  of  moving  data  •  Time  based  –  eg:  the  last  10  seconds,  .001  seconds,  day  

•  Field  based  –  eg:  historical  Ome  (Omestamp,  …)  

•  Tuple  based  –  eg:  the  last  1000  tuples  •  Predicate  based  –  expressions  determine  when  to  open,  close  and/or  emit  

interesOng  results  from  a  window  

  Mul?-­‐dimensional  •  Give  me  the  last  seconds  worth  of  events  or  the  last  10000,  whichever  

happens  first  

  Grouping  •  ParOOon  by  Symbol  implies  a  window  per  Symbol  ‘concurrently  –  on  the  

same  thread’  

Operators  –  Hi  Erlang.  Hello  StreamBase  

Page 30: Complex Er[jl]ang Processing with StreamBase

30  Copyright 2011 © StreamBase Systems, Inc.

Windows  in  the  Wild  -­‐  #1  Riak  

  Riak  Core:  hLps://github.com/basho/riak_core  •  Mixes  the  ‘when’  window  dimension  (Ome)  with  the  what  ‘aggregaOon’  

Page 31: Complex Er[jl]ang Processing with StreamBase

31  Copyright 2011 © StreamBase Systems, Inc.

Windows  in  the  Wild  -­‐  #2  tlack  on  BitBucket  

  Thomas  Lackner  -­‐  tlack  -­‐  EMA:  •  hjps://bitbucket.org/tlack/erlang-­‐exponenOal-­‐moving-­‐average/overview  •  Mixes  the  ‘when’  window  dimension  (Ome)  with  the  what  ‘aggregaOon’  

and  number  of  occurrences  ‘how  many’  

Page 32: Complex Er[jl]ang Processing with StreamBase

32  Copyright 2011 © StreamBase Systems, Inc.

Embedding  Erlang  (Erjang)  

Prepend the dir defining our

behaviors to the code path

Page 33: Complex Er[jl]ang Processing with StreamBase

33  Copyright 2011 © StreamBase Systems, Inc.

Define  Behavioral  Contracts  

Page 34: Complex Er[jl]ang Processing with StreamBase

34  Copyright 2011 © StreamBase Systems, Inc.

Implement  &  Test  in  Erlang  (or  Erjang!)  

Duh!  Typo!  

Page 35: Complex Er[jl]ang Processing with StreamBase

35  Copyright 2011 © StreamBase Systems, Inc.

Expose  to  StreamBase  [  Call  by  Behaviour/Mod]  #1  

Page 36: Complex Er[jl]ang Processing with StreamBase

36  Copyright 2011 © StreamBase Systems, Inc.

Expose  To  StreamBase  [Call  by  Behaviour/Mod]  #2  

Page 37: Complex Er[jl]ang Processing with StreamBase

37  Copyright 2011 © StreamBase Systems, Inc.

Use,  Deploy  &  Run  

Example:

Continuous Streaming Map/Reduce with 1-second MACD compression. Linearly SMP scalable.

Just add boxes to scale!

Wikipedia:  MACD  

Page 38: Complex Er[jl]ang Processing with StreamBase

38  Copyright 2011 © StreamBase Systems, Inc.

Run,  Rabbit,  Run,  Rabbit,  …  

Page 39: Complex Er[jl]ang Processing with StreamBase

39  Copyright 2011 © StreamBase Systems, Inc.

  What  is  ‘Complex  Event  Processing’  •  Specifically  flow  oriented  event  processing  (there  are  others)  •  Streams  &  Operators.  Windowing,  Branching,  Combining,  Extending  

  A  day  in  the  life  of  a  flow  programmer  •  RelaOvity  -­‐  Data  parallelism,  concurrency,  latency  &  throughput  

•  ConOnuity  -­‐  ConOnuous  Streaming  Map  Reduce  

•  Reliability  –  High  availability,  the  low  latency  way  

  Integra?on.  Erlang  –  the  ecosystem.  •  Calling  Erlang  from  StreamBase  –  Simple  &  windowed  funcOons  

•  Client/Server  –  Pushing  events  to/from  StreamBase  

•  RabbitMQ  -­‐  Messaging  

  TheC.  Erlang  –  the  inspira?on.  Paxos,  in  StreamBase  

Agenda  

Page 40: Complex Er[jl]ang Processing with StreamBase

40  Copyright 2011 © StreamBase Systems, Inc.

SB  Paxos  –  Entrypoint  (cc  @kevsmith!)  

Page 41: Complex Er[jl]ang Processing with StreamBase

41  Copyright 2011 © StreamBase Systems, Inc.

SB  Paxos  –  Autonomic,  Self  Healing  Ring  

Page 42: Complex Er[jl]ang Processing with StreamBase

42  Copyright 2011 © StreamBase Systems, Inc.

SB  Paxos  –  Generic  FSM  ‘Behaviour’  

Page 43: Complex Er[jl]ang Processing with StreamBase

43  Copyright 2011 © StreamBase Systems, Inc.

StateMap,State,TransiOon,NextState,AcOon,DescripOon,  

BasicPaxosMap,IniOal,Bootstrap,Proposer,Prepare,"..."  

BasicPaxosMap,IniOal,PromiseOk,Accepter,PromiseOk,"..."  

BasicPaxosMap,IniOal,PromiseNotOk,Done,PromiseNotOk,"..."  

BasicPaxosMap,IniOal,Accepted,Learner,Response,"When  ..."  

BasicPaxosMap,Proposer,Accept,Proposer,Accept,"..."  

BasicPaxosMap,Proposer,PromiseNotOk,Proposer,DoNothing,"..."  

BasicPaxosMap,Proposer,Response,Done,Gone,DoStop,"..."  

BasicPaxosMap,Accepter,Accept,Learner,Accept,"..."  

BasicPaxosMap,Accepter,Accepted,Learner,Response,"..."  

BasicPaxosMap,Learner,Accepted,Done,Response,"..."  

BasicPaxosMap,Learner,Done,Gone,DoStop,"..."  

SB  Paxos  –  Generic  FSM  ‘Behaviour’  –  Paxos  FSM  

Page 44: Complex Er[jl]ang Processing with StreamBase

44  Copyright 2011 © StreamBase Systems, Inc.

Shameless  Plugs  

  StreamBase  •  You  could  build  one  of  these  yourself,  or  use  ours…  •  Download  and  test  out  the  full  product  hjp:/www.streambase.com  

•  Build  something  and  submit  to  the  StreamBase  Component  Exchange  hjp://sbx.streambase.com      

•  Contact  us  to  buy  or  to  an  OEM  partner,  offices  London,  Boston,  New  York  

•  We’re  hiring  

•  We’re  training  •  hjp://www.streambase.com/developers-­‐training-­‐events.htm  

  DEBS  –  Distributed  Event  Based  Systems  •  Academic  (ACM)  Conference  outside  NYC  in  July  

  EPTS  –  Event  Processing  Technology  Society  •  hjp://ep-­‐ts.org  industry  consorOum  

Ques?ons?  

Page 45: Complex Er[jl]ang Processing with StreamBase

45  Copyright 2011 © StreamBase Systems, Inc.

Acknowledgements  

  Erlang,  Erlang  Solu?ons,  Erlang  UG  London,  &  Erlang  Factory  •  hjp://www.erlang.org/  •  hjp://www.erlang-­‐soluOons.com/    

•  hjp://www.erlang-­‐soluOons.com/etc/usergroup/london    

•  hjp://www.erlang-­‐factory.com/  

  Erjang  –  The  Java  based  Erlang  Virtual  Machine  •  hjps://github.com/trifork/erjang/wiki/    

  erlIDE  •  hjp://erlide.sourceforge.net/    

  @?bbeLs  –  I  stole  borrowed  some  of  his                                        slides!  

  Download  StreamBase  and  tell  us  what  you  think:  •  hjp://www.streambase.com  

Page 46: Complex Er[jl]ang Processing with StreamBase

46  Copyright 2011 © StreamBase Systems, Inc.

Download  StreamBase  and  More  InformaOon  hjp://www.streambase.com  

QuesOons?  

+ + =