Fabian Hueske – Juggling with Bits and Bytes

32
Juggling with Bits and Bytes How Apache Flink operates on binary data Fabian Hueske :[email protected] @:ueske 1

Transcript of Fabian Hueske – Juggling with Bits and Bytes

Page 1: Fabian Hueske – Juggling with Bits and Bytes

Juggling  with  Bits  and  Bytes  How  Apache  Flink  operates  on  binary  data  

 Fabian  Hueske  

:[email protected]                    @:ueske    

1  

Page 2: Fabian Hueske – Juggling with Bits and Bytes

Big  Data  frameworks  on  JVMs  

•  Many  (open  source)  Big  Data  frameworks  run  on  JVMs  –  Hadoop,  Drill,  Spark,  Hive,  Pig,  and  ...  –  Flink  as  well  

•  Common  challenge:  How  to  organize  data  in-­‐memory?  –  In-­‐memory  processing  (sorOng,  joining,  aggregaOng)  –  In-­‐memory  caching  of  intermediate  results  

•  Memory  management  of  a  system  influences  –  Reliability  –  Resource  efficiency,  performance  &  performance  predictability  –  Ease  of  configuraOon  

2  

Page 3: Fabian Hueske – Juggling with Bits and Bytes

The  straight-­‐forward  approach  

Store  and  process  data  as  objects  on  the  heap  •  Put  objects  in  an  array  and  sort  it    

A  few  notable  drawbacks  •  PredicOng  memory  consumpOon  is  hard  

–  If  you  fail,  an  OutOfMemoryError  will  kill  you!  

•  High  garbage  collecOon  overhead  –  Easily  50%  of  Ome  spend  on  GC  

•  Objects  have  considerable  space  overhead  –  At  least  8  bytes  for  each  (nested)  object!  (Depends  on  arch)  

3  

Page 4: Fabian Hueske – Juggling with Bits and Bytes

FLINK’S  APPROACH  

4  

Page 5: Fabian Hueske – Juggling with Bits and Bytes

Flink  adopts  DBMS  technology  

•  Allocates  fixed  number  of  memory  segments  upfront  •  Data  objects  are  serialized  into  memory  segments  •  DBMS-­‐style  algorithms  work  on  binary  representaOon  

5  

Page 6: Fabian Hueske – Juggling with Bits and Bytes

Why  is  that  good?  

•  Memory-­‐safe  execuOon  –  Used  and  available  memory  segments  are  easy  to  count  –  No  parameter  tuning  for  reliable  operaOons!  

•  Efficient  out-­‐of-­‐core  algorithms  –  Memory  segments  can  be  efficiently  wrifen  to  disk  

•  Reduced  GC  pressure  –  Memory  segments  are  off-­‐heap  or  never  deallocated  –  Data  objects  are  short-­‐lived  or  reused  

•  Space-­‐efficient  data  representaOon  

•  Efficient  operaOons  on  binary  data  6  

Page 7: Fabian Hueske – Juggling with Bits and Bytes

What  does  it  cost?  

•  Significant  implementaOon  investment  –  Using  java.uOl.HashMap  vs.  –  ImplemenOng  a  spillable  hash  table  backed  by  byte  arrays  and  custom  serializaOon  stack  

•  Other  systems  use  similar  techniques  –  Apache  Drill,  Apache  AsterixDB  (incubaOng)  

•  Apache  Spark  evolves  into  a  similar  direcOon  

7  

Page 8: Fabian Hueske – Juggling with Bits and Bytes

MEMORY  ALLOCATION  

8  

Page 9: Fabian Hueske – Juggling with Bits and Bytes

Memory  segments  

•  Unit  of  memory  distribuOon  in  Flink  –  Fixed  number  allocated  when  worker  starts  

•  Backed  by  a  regular  byte  array  (default  32KB)  

•  On-­‐heap  or  off-­‐heap  allocaOon  

•  R/W  access  through  Java’s  efficient  unsafe  methods  

•  MulOple  memory  segments  can  be  logically  concatenated  to  a  larger  chunk  of  memory  

9  

Page 10: Fabian Hueske – Juggling with Bits and Bytes

On-­‐heap  memory  allocaOon  

10  

Page 11: Fabian Hueske – Juggling with Bits and Bytes

Off-­‐heap  memory  allocaOon  

11  

Page 12: Fabian Hueske – Juggling with Bits and Bytes

On-­‐heap  vs.  Off-­‐heap  

•  No  significant  performance  difference  in    micro-­‐benchmarks  

•  Garbage  CollecOon  –  Smaller  heap  -­‐>  faster  GC  

•  Faster  start-­‐up  Ome  –  A  mulO-­‐GB  JVM  heap  takes  Ome  to  allocate  

12  

Page 13: Fabian Hueske – Juggling with Bits and Bytes

DATA  SERIALIZATION  

13  

Page 14: Fabian Hueske – Juggling with Bits and Bytes

Custom  de/serializaOon  stack  

•  Many  alternaOves  for  Java  object  serializaOon  –  Dynamic:  Kryo  –  Schema-­‐dependent:  Apache  Avro,  Apache  Thrip,  Protobufs  

•  But  Flink  has  its  own  serializaOon  stack  –  OperaOng  on  serialized  data  requires  knowledge  of  layout  –  Control  over  layout  can  improve  efficiency  of  operaOons  –  Data  types  are  known  before  execuOon  

14  

Page 15: Fabian Hueske – Juggling with Bits and Bytes

Rich  &  extensible  type  system  

•  SerializaOon  framework  requires  knowledge  of  types  

•  Flink  analyzes  return  types  of  funcOons  –  Java:  ReflecOon  based  type  analyzer  –  Scala:  Compiler  informaOon  +  CodeGen  via  Macros  

•  Rich  type  system  –  Atomics:  PrimiOves,  Writables,  Generic  types,  …  –  Composites:  Tuples,  Pojos,  CaseClasses  –  Extensible  by  custom  types  

15  

Page 16: Fabian Hueske – Juggling with Bits and Bytes

Serializing  a  Tuple3<Integer,  Double,  Person>  

16  

Page 17: Fabian Hueske – Juggling with Bits and Bytes

OPERATING  ON  BINARY  DATA  

17  

Page 18: Fabian Hueske – Juggling with Bits and Bytes

Data  processing  algorithms  

•  Flink’s  algorithms  are  based  on  RDBMS  technology  –  External  Merge  Sort,  Hybrid  Hash  Join,  Sort  Merge  Join,  …  

•  Algorithms  receive  a  budget  of  memory  segments  –  AutomaOc  decision  about  budget  size  –  No  fine-­‐tuning  of  operator  memory!  

•  Operate  in-­‐memory  as  long  as  data  fits  into  budget  –  And  gracefully  spill  to  disk  if  data  exceeds  memory  

18  

Page 19: Fabian Hueske – Juggling with Bits and Bytes

In-­‐memory  sort  –  Fill  the  sort  buffer  

19  

Page 20: Fabian Hueske – Juggling with Bits and Bytes

In-­‐memory  sort  –  Sort  the  buffer  

20  

Page 21: Fabian Hueske – Juggling with Bits and Bytes

In-­‐memory  sort  –  Read  sorted  buffer  

21  

Page 22: Fabian Hueske – Juggling with Bits and Bytes

SHOW  ME  NUMBERS!  

22  

Page 23: Fabian Hueske – Juggling with Bits and Bytes

Sort  benchmark  

•  Task:  Sort  10  million  Tuple2<Integer,  String>  records  –  String  length  12  chars  

•   Tuple  has  16  Bytes  of  raw  data  •  ~152  MB  raw  data  

–  Integers  uniformly,  Strings  long-­‐tail  distributed  –  Sort  on  Integer  field  and  on  String  field  

•  Generated  input  provided  as  mutable  object  iterator  

•  Use  JVM  with  900  MB  heap  size  –  Minimum  size  to  reliable  run  the  benchmark  

23  

Page 24: Fabian Hueske – Juggling with Bits and Bytes

SorOng  methods  1.  Objects-­‐on-­‐Heap:    

–  Put  cloned  data  objects  in  ArrayList  and  use  Java’s  CollecOon  sort.    –  ArrayList  is  iniOalized  with  right  size.  

2.  Flink-­‐serialized  (on-­‐heap):    –  Using  Flink’s  custom  serializers.  –  Integer  with  full  binary  sorOng  key,  String  with  8  byte  prefix  key.  

3.  Kryo-­‐serialized  (on-­‐heap):    –  Serialize  fields  with  Kryo.    –  No  binary  sorOng  keys,  objects  are  deserialized  for  comparison.  

•  All  implementaOons  use  a  single  thread  •  Average  execuOon  Ome  of  10  runs  reported  •  GC  triggered  between  runs  (does  not  go  into  reported  Ome)  

24  

Page 25: Fabian Hueske – Juggling with Bits and Bytes

ExecuOon  Ome  

25  

Page 26: Fabian Hueske – Juggling with Bits and Bytes

Garbage  collecOon  and  heap  usage  

26  

Objects-­‐on-­‐heap  

Flink-­‐serialized  

Page 27: Fabian Hueske – Juggling with Bits and Bytes

Memory  usage  

27  

•  Breakdown:  Flink  serialized  -­‐  Sort  Integer  –  4  bytes  Integer  –  12  bytes  String  –  4  bytes  String  length  –  4  bytes  pointer  –  4  bytes  Integer  sorOng  key  –  28  bytes  *  10M  records  =  267  MB  

Object-­‐on-­‐heap   Flink-­‐serialized   Kryo-­‐serialized  

Sort  Integer   Approx.  700  MB   277  MB   266  MB  

Sort  String   Approx.  700  MB   315  MB   266  MB  

Page 28: Fabian Hueske – Juggling with Bits and Bytes

Going  out-­‐of-­‐core  

28  

•  Single  thread  HashJoin  with  4GB  memory  budget  •  Build  side  varies,  Probe  side  64GB  

Page 29: Fabian Hueske – Juggling with Bits and Bytes

WHAT’S  NEXT?  

29  

Page 30: Fabian Hueske – Juggling with Bits and Bytes

We’re  not  done  yet!  

 

•  SerializaOon  layouts  tailored  towards  operaOons  – More  efficient  operaOons  on  binary  data  

•  Table  API  provides  full  semanOcs  for  execuOon  –  Use  code  generaOon  to  operate  fully  on  binary  data  

•  …  

30  

Page 31: Fabian Hueske – Juggling with Bits and Bytes

Summary  

•  AcOve  memory  management  avoids  OOMErrors  

•  Highly  efficient  data  serializaOon  stack  –  Facilitates  operaOons  on  binary  data  –  Makes  more  data  fit  into  memory  

•  DBMS-­‐style  operators  operate  on  binary  data    –  High  performance  in-­‐memory  processing    –  Graceful  destaging  to  disk  if  necessary  

•  Read  Flink’s  blog:    –  hfp://flink.apache.org/news/2015/05/11/Juggling-­‐with-­‐Bits-­‐and-­‐Bytes.html  –  hfp://flink.apache.org/news/2015/03/13/peeking-­‐into-­‐Apache-­‐Flinks-­‐Engine-­‐Room.html  –  hfp://flink.apache.org/news/2015/09/16/off-­‐heap-­‐memory.html    

31  

Page 32: Fabian Hueske – Juggling with Bits and Bytes

32  

hfp://flink.apache.org    @ApacheFlink  

Apache  Flink