Streamy, Pipy, Analyticy

41
Copyright Push Technology 2012 LNUG London January 2013

description

Node.js Streams & Pipes revised for analytics

Transcript of Streamy, Pipy, Analyticy

Page 1: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

LNUG  London  

January  2013  

Page 2: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012   [email protected]  

About  me?  

•  Distributed  Systems  /  HPC  guy.    

•  Chief  Scien*st  :-­‐  at  Push  Technology  

•  Responds  to:  Guinness,  Whisky  

•  TwiOer:  @darachennis  

Page 3: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Streamy  Pipy  

Analy*cy  

Page 4: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

EEP  +  ‘Streams  &  Pipes’=  CEP  

•  An  experiment  in  Embedded  Event  Processing  •  Sliding,  Tumbling,  Monotonic  and  Periodic  windows  •  Separate  ‘window’  definiYon  from  operaYon  •  Aggregate  funcYons.  Window  of  data  produces  scalar  result  

•  But?  No  filtering,  branching  or  combinators,  no  flows  …  

•  That’s  a  job  for  Streams  &  Pipes.  Let’s  add  that.  

eep.js:  Func*onal  Opera*ons  on  Streaming  Data  Windows    

S Cw ww w Q

Page 5: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Windows  

Page 6: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Windows  +  Aggregate  FuncYons  

•  A  window  of  data  is  a  slice  of  data  over  Yme,  number  of  events  or  some  other  dimension  

•  An  aggregate  funcYon  is  something  you  do  in  the  context  of  a  window.  

What  is  this?  •  Average  –    Aggregate  Func*on  •  CPU  –  Data  (events)  •  On  a  second  by  second  basis    -­‐  Periodic  Yme  window  

Example  

Page 7: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Tumbling  Windows  

•  Every  N  events,  give  me  an  average  of  the  last  N  events  •  Does  not  overlap  windows  •  ‘Closing’  a  window,  ‘Emits’  a  result  (the  average)  •  Closing  a  window,  Opens  a  new  window  

What  is  a  tumbling  window?  

1 2 3 4

2 3 4 5

2 3 4 5

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 ...

init()

init()

init()

emit()

emit()

emit()

x() x() x() x()

x() x() x() x()

x() x() x() x()

Page 8: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Sliding  Windows  

•  Like  tumbling,  except  can  overlap.    •  But  typically  O(N2),  Keep  N  small.  Except  EEP.js.  O(N)  perf.  

•  Every  event  opens  a  new  window.  •  Ader  N  events,  every  subsequent  event  emits  a  result.  •  Like  all  windows,  cost  of  calculaYon  amorYzed  over  events  

What  is  a  sliding  window?  

1 2 3 4

1 2 3 4

1 2 3 ..

1 2 .. ..

5

..

..

..

..

..

init()

x()

x()

x()

..

.. ..

..

..

..

..

..

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 ...

Page 9: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Periodic  Windows  

•  Driven  by  ‘wall  clock  Yme’  in  milliseconds  •  Not  monotonic,  natch.  Beware  of  NTP  

What  is  a  periodic  window?  

1 2 3 4

2 3 4 5

2 3 4 5

t0 t1 t2 t3 ...

init()

init()

init()

emit()

emit()

emit()

x() x() x() x()

x() x() x() x()

x() x() x() x()

Page 10: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Monotonic  Windows  

•  Driven  mad  by  ‘wall  clock  Yme’?  Need  a  logical  clock?  •  No  worries.  Provide  your  own  clock!  Eg.  Vector  clock  

What  is  a  monotonic  window?  

1 2 3 4

2 3 4 5

2 3 4 5

t0 t1 t2 t3 ...

init()

init()

init()

emit()

emit()

emit()

x() x() x() x()

x() x() x() x()

x() x() x() x()

my my my

Page 11: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Slide  beOer  with  CompensaYng  Aggregates  

1

1 2 3 4

1 2 3 4

1 2 3 ..

1 2 .. ..

5

..

..

..

..

..

init()

x()

x()

x()

..

.. ..

..

..

..

..

..

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 ...

do { … } while (…)

compensate()

Page 12: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Bad  Sliding  -­‐  O(N2)  

Page 13: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Good  Sliding  

•  Takes  us  from  O(N2)  to  O(N)  for  Sliding  windows  

Page 14: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

EEP.js  is  fast  

Page 15: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Using  Sliding,  Tumbling  Windows  

Page 16: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Using  Periodic,  Monotonic  Windows  

Page 17: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Custom  clocks  (noYon  of  Yme)  

Page 18: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

EEP.js  v0.1,  v0.2  were  ugly  babies.  

Sorry!    Swear,  the  next  version  will  be  just  as  funcYonal  but  preOy…  

Page 19: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Streams  &  Pipes  

Page 20: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

What  about  Streams  &  Pipes?  

S C Q

w ww weep

????

+

Page 21: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Streams  &  Pipes:  Origins  

•  Do  one  thing.  Do  it  well  •  Compose  sophisYcated  behaviors  from  simple  parts  

•  Maximize  reuse  •  Unix,  ‘Chain  of  Responsibility’  (GoF),  Interceptor  (POSA2),  XPipe,  Builder,  …  

•  The  ‘Assembly  Line  Principle’  is  nothing  new  

Page 22: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Streams  &  Pipes:  Node.JS  

•  var  events  =  require(‘events’)  •  Publish/Subscribe  to  event  (streams)  

•  var  stream  =  require(‘stream’)  •  Readable  –  Consume  a  (finite)  set  of  events  •  Writable  –  Produce  a  (finite)  set  of  events  •  readable.pipe(writeable)  •  writeable.pipe(readable)  

Page 23: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Streams  &  Pipes:  streams2  

•  Transform  –  Compress,  Encrypt,  Encode,  …  •  Duplex  –  Readable  and  Writable  •  Passthrough  –  The  canonical  ‘noop’  transform  

•  Node.js  Streams  history  (so  far)    hOp://bit.ly/XupqkO  -­‐  by  @izs  

Page 24: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Streams  &  Pipes:  but  …  

•  Oriented  for  IO,  not  compute/analy*cs  •  Array-­‐like  buffers  not  individual  datums  •  @dominictarr  event-­‐streams?  Array  based  •  ASCII,  UTF-­‐8,  Binary  -­‐  not  JS  types  •  Oden  require  copying,  parsing,  …  (slow)  

•  So,  streams  &  pipes  for  JS  types?  Yes!  •  Do  one  thing.  Do  it  well  •  Compose  sophisYcated  simple  parts  •  Maximize  reuse  

Page 25: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Introducing  Beam.js  

Page 26: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Beams,  Pipes  

•  Streams  &  Pipes  for  analyYcs  •  Not  designed  for  IO.  Use  Streams  for  that  

•  Not  concerned  with  CEP.    •  …  Use  EEP  for  that?  J  

•  Not  concerned  with  arrays  of  things  •  …  Use  Dominic  Tarr’s  event-­‐stream  for  that  

•  Beam  •  Crunch  events  •  Pipeline,  Branch  &  Combine  

Page 27: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Beams  &  Pipes.  

•  Streams  &  Pipes,  reconsidered  for  JS  types  

•  var  Beam  =  require(‘beam’);  

•  Beam.Source      -­‐-­‐  Push  data  in  •  Beam.Sink        -­‐-­‐  Suck  analysis  out  •  Beam.Operator  -­‐-­‐  OODA  /  PDCA  

•  Really  Simple:  ~150  LOC    

Page 28: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Beams  &  Pipes:  Operators  

•  Three  types  of  operator    •  Transform  •  1  in,  1  out.  Output  data/type  may  differ  

 •  Filter  •  1  in,  1  or  none  out.  Output  data/type  same  as  input  

 •  Custom  •  May  transform,  filter  

Page 29: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Example:  Defini*ons  

Page 30: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Example:  Usage  

Page 31: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Example:  Easy  to  debug  …  

Page 32: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Example:  Streams  &  Beams  

Page 33: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Branch  

•  You  can  define  1  or  many  •  They  can  overlap  or  not  as  you  see  fit  •  It’s  just  an  applicaYon  of  predicate  (boolean)  filters  •  Simple  

Page 34: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Combine?  

•  You  can  combine  many  sources  or  branches  into  one  •  Works  like  a  union.  First  in,  first  out.  •  You  can  write  your  own.  It’s  just  an  Operator  •  You  can  branch  from,  combine  to  …  any  beam  

Page 35: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Streams  &  Pipes,  ++  

•  In  Node.js  the  definiYon  and  usage  of  streams  in  a  pipe  are  entangled.  •  Typically,  with  Streams  &  Pipes  for  IO,  you  only  ever  want  one.  •  In  algorithms  you  may  want  to  reuse.  •  Think  about  it  …  

•  Event  EmiOer.    1  square  …    2  branches?  

Page 36: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Pipes  ++  

•  Beam  Pipes  are  different  (&  really  really  really  simple)  •  You  can  define  a  filter  once  •  You  can  store  it  in  a  module  •  Store  like  opera*ons  together  •  Make  libraries  

 •  Use  ‘em.  Share  ‘em.  

Page 37: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

EEP  based  on  Beam  soon!  

Page 38: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Un*l  then?  

•  npm  install  beam  

• Filter  data  events  • Transform  data  events  • Analyze,  crunch  all  the  things  • Branch  all  the  things  • Combine  all  the  things  

Page 39: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Beam  futures?  

•  Taps  –  Convert  events  into  beams  • Drain  –  Convert  beams  into  events  • Beams  • Write  Beam  operators  in  ‘beam’  • Beams  ‘inside’  beams  • Source.pipe(op).compile();  //  Maybe?  

Page 40: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012  

Ques*ons  

Page 41: Streamy, Pipy, Analyticy

Copyright  Push  Technology  2012   [email protected]  

QuesYons?  

•  Thank  you  for  listening  to,  having  me    •  Le  twiOer:  @darachennis  

 •  hOps://github.com/darach/beam-­‐js  

hOps://github.com/darach/eep-­‐js      

•  npm  install  eep  npm  install  beam  

•  EEP  built  on  beam?  EEP  in  other  langs?  Soon  

•  Fork  it,  Port  it,  Enjoy  it!