Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... ·...

32
CAP THEOREM Large Scale Data Management

Transcript of Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... ·...

Page 1: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

CAP  THEOREM  Large  Scale  Data  Management  

Page 2: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Consistency,  Availability,  Par99ons-­‐Tolerance  

•  Conjecture  by  Eric  Brewer  at  PODC  2000  :  –  It  is  impossible  for  a  web  service  to  provide  following  three  guarantees  :  •  Consistency      •  Availability    •   Par99on-­‐tolerance  

•  Established  as  theorem  in  2002:    –  Lynch,  Nancy,  and  Seth  Gilbert.  Brewer’s  conjecture  and  the  feasibility  of  consistent,  available,  par99on-­‐tolerant  web  services.  ACM  SIGACT  News,  v.  33  issue  2,  2002,  p.  51-­‐59.  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 3: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

CAP  theorem  

•  Consistency  -­‐  all  nodes  should  see  the  same  data  at  the  same  9me  

•  Availability  -­‐  node  failures  do  not  prevent  survivors  from  con9nuing  to  operate  

•  Par88on-­‐tolerance  -­‐  the  system  con9nues  to  operate  despite  arbitrary  message  loss  

•  A  distributed  system  can  sa8sfy  any  two  of  these  guarantees  at  the  same  8me  but  not  all  three  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 4: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Consistency  +  Availability  

•  Examples:  – Single-­‐site  databases    – Cluster  databases  – LDAP  – xFS  file  system  

•  Traits:  – 2-­‐phase  commit  – cache  valida9on  protocols  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 5: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Consistency  +  Par99on  Tolerance  

•  Examples:  – Distributed  databases  – Distributed  Locking  – Majority  protocols  

•  Traits:  – Pessimis9c  locking  – Make  minority  par99ons  unavailable  (Quorums)  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 6: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Availability  +  Par99on  Tolerance  

•  Example:  – Code  – DNS  – Usenet  

•  Traits:  – Expira9on/leases  – Conflict  resolu9on  – Op9mis9c  replica9on  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 7: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Data  Store  and  CAP  

•  RDBMS  :  CA  (Master/Slave  replica8on,  Sharding)      •  Amazon  Dynamo  :  AP  (Read-­‐repair,  applica9on  hooks)      

•  Terracota  :  CA  (Quorum  vote,  majority  par99on  survival)      

•  Apache  Cassandra  :  AP  (Par99oning,  Read-­‐repair)  •  Apache  Zookeeper:  AP  (Consensus  protocol)      •  Google  BigTable  :  CA      •  Apache  CouchDB  :  AP  Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 8: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

hTp://blog.nahurst.com/visual-­‐guide-­‐to-­‐nosql-­‐systems  

Page 9: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Techniques  for  CAP  

•  Consistent  Hashing  •   Vector  Clocks    •  Sloppy  Quorum    •  Merkle  trees  •  Gossip-­‐based  protocols    •  CRDTs  •  See  that  later…  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 10: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Idea  of  the  proof  

•  hTp://www.youtube.com/watch?v=Jw1iFr4v58M    

Page 11: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Atomic  Data  Object  

•  Atomic/Linearizable  Consistency:  – There  must  exist  a  total  order  on  all  opera9on  such  that  each  opera9on  looks  as  if  it  were  completed  at  a  single  instant  

– This  is  equivalent  to  requiring  requests  on  the  distributed  shared  memory  to  act  as  if  they  are  execu9ng  on  single  node,  responding  to  opera9ons  one  at  the  9me  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 12: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Available  Data  Objects  

•  For  a  distributed  system  to  be  con9nuously  available,  every  request  received  by  a  non-­‐failing  node  in  the  system  must  result  in  a  response  –  That  is,  any  algorithm  used  by  service  must  eventually  terminate  •  (In  some  ways,  this  is  weak  defini9on  of  availability  :  it  puts  no  bounds  on  how  long  the  algorithm  may  run  before  termina9ng,  and  therefore  allows  unbounded  computa9on)  

•  (On  the  other  hand,  when  qualified  by  the  need  for  par99on  tolerance,  this  can  be  seen  as  a  strong  defini9on  of  availability  :  even  when  severe  network  failures  occur,  every  request  must  terminate)  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 13: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Par99on  Tolerance  •  In  order  to  model  par99on  tolerance,  the  network  is  allowed  to  

lose  arbitrary  many  messages  sent  from  one  node  to  another  •  When  a  network  is  par99oned,  all  messages  sent  from  nodes  in  one  

component  of  the  par99on  to  another  component  are  lost.  •  The  atomicity  requirement  implies  that  every  response  will  be  

atomic,  even  though  arbitrary  messages  sent  as  part  of  the  algorithm  might  not  be  delivered  

•  The  availability  requirement  therefore  implies  that  every  node  receiving  request  from  a  client  must  respond,  even  through  arbitrary  messages  that  are  sent  may  be  lost  

•  Par99on  Tolerance  :  No  set  of  failures  less  than  total  network  failure  is  allowed  to  cause  the  system  to  respond  incorrectly  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 14: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Asynchronous  Network  Model  

•  There  is  no  clock  •  Nodes  must  make  decisions  based  only  on  messages  received  and  local  computa9on  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 15: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Asynchronous  Networks:  impossibility  result  

•  Theorem  1  :  It  is  impossible  in  the  asynchronous  network  model  to  implement  a  read/write  data  object  that  guarantees  the  following  proper:es:  – Availability  – Atomic  consistency  in  all  fair  execu9ons  (including  those  in  which  messages  are  lost)  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 16: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Asynchronous  Networks:  impossibility  result  

•  Proof  (by  contradic9on)  :      –  Assume  an  algorithm  A  exists  that  meets  the  three  criteria  :  •  atomicity,  availability  and  par99on  tolerance  

– We  construct  an  execu9on  of  A  in  which  there  exists  a  request  that  returns  and  inconsistent  response  

–  Assume  that  the  network  consists  of  at  least  two  nodes.  Thus  it  can  be  divided  into  two  disjoint,  non-­‐empty  sets  G1,G2  

–  Assume  all  messages  between  G1  and  G2  are  lost.  –  If  a  write  occurs  in  G1  and  read  occurs  in  G2,  then  the  read  opera9on  cannot  return  the  results  of  earlier  write  opera9on.  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 17: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Asynchronous  Networks:  impossibility  result  

•  Formal  proof:  –  Let  v0  be  the  ini9al  value  of  the  atomic  object  –  Let  α1  be  the  prefix  of  an  execu9on  of  A  in  which  a  single  write  of  a  value  not  equal  to  v0  occurs  in  G1,  ending  with  the  termina9on  of  the  write  opera9on.  

–  assume  that  no  other  client  requests  occur  in  either  G1  or  G2.    assume  that  no  messages  from  G1  are  received  in  G2  and  no  messages  from  G2  are  received  in  G1  

– we  know  that  write  opera9on  will  complete  (by  the  availability  requirement)  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 18: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Asynchronous  Networks:  impossibility  result  

•  Let  α2  be  the  prefix  of  an  execu9on  in  which  a  single  read  occurs  in  G2  and  no  other  client  requests  occur,  ending  with  the  termina9on  of  the  read  opera9on  

•  During  α2  no  messages  from  G2  are  received  in  G1  and  no  messages  from  G1  are  received  in  G2  

•  We  know  that  the  read  must  return  a  value  (by  the  availability  requirement)  

•  The  value  returned  by  this  execu9on  must  be  v0  as  no  write  opera9on  has  occurred  in  α2  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 19: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Asynchronous  Networks:  impossibility  result  

•  Let  α  be  an  execu9on  beginning  with  α1  and  con9nuing  with  α2.  To  the  nodes  in  G2  ,  α  is  indis9nguishable  from  α2,  as  all  the  messages  from  G1  to  G2  are  lost  (in  both  α1  and  α2  that  together  make  up  α),  and  α1  does  not  include  any  client  requests  to  nodes  in  G2.  

•  Therefore,  in  the  α  execu9on  -­‐  the  read  request  (from  α2)  must  s9ll  return  v0.  

•  However,  the  read  request  does  not  begin  un9l  aler  the  write  request  (from  α1)  has  completed  

•  This  therefore  contradicts  the  atomicity  property,  proving  that  no  such  algorithm  exists  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 20: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Asynchronous  Networks:  Impossibility  Result  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 21: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Impossibility  results  

•  It  is  impossible  in  the  asynchronous  network  model  to  implement  a  read/write  data  object  that  guarantees  the  following  proper9es:  – Availability  -­‐  in  all  fair  execu9ons  – Atomic  consistency  -­‐  in  fair  execu9ons  in  which  no  messages  are  lost  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 22: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Impossibility  results  

•  Proof:  –  The  main  idea  is  that  in  the  asynchronous  model,  an  algorithm  has  no  way  of  determining  whether  a  message  has  been  lost,  or  has  been  arbitrary  delayed  in  the  transmission  channel  

–  Therefore  if  there  existed  an  an  algorithm  that  guaranteed  atomic  consistency  in  execu9ons  in  which  no  messages  were  lost,  there  would  exist  an  algorithm  that  guaranteed  atomic  consistency  in  all  execu9ons.  

–  This  would  violate  Theorem  1  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 23: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

CAP  theorem  

•  While  it  is  impossible  to  provide  all  three  proper9es  :  atomicity,  availability  and  par99on  tolerance,  any  two  of  these  proper9es  can  be  achieved:  – Atomic,  Par99on  Tolerant  – Atomic,  Available  – Atomic,  Par99on  Tolerant  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 24: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Atomic,  Par99on-­‐Tolerant  •  If  availability  is  not  required  ,  it  is  easy  to  achieve  atomic  data  and  

par99on  tolerance  •   The  trivial  system  that  ignores  all  requests  meets  these  requirements  •   Stronger  liveness  criterion  :  if  all  the  messages  in  an  execu9on  are  

delivered,  system  is  available  and  all  opera9ons  terminate  •   A  simple  centralized  algorithm  meets  these  requirements  :  a  single  

designated  node  maintains  the  value  of  an  object  •  A  node  receiving  request  forwards  the  request  to  designated  node  which  

sends  a  response.  When  acknowledgement  is  received,  the  node  sends  a  response  to  the  client  

•   Many  distributed  databases  provide  this  guarantee,  especially  algorithms  based  on  distributed  locking  or  quorums  :  if  certain  failure  paTerns  occur,  liveness  condi9on  is  weakened  and  the  service  no  longer  returns  response.  If  there  are  no  failures,  then  liveness  is  guaranteed.  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 25: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Atomic,  Available  

•  If  there  are  no  par99ons  -­‐  it  is  possible  to  provide  atomic,  available  data  

•  Centralized  algorithm  with  single  designated  node  for  maintaining  value  of  an  object  meets  these  requirements  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 26: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Available,  Par99on-­‐Tolerant  

•  It  is  possible  to  provide  high  availability  and  par99on  tolerance  if  atomic  consistency  is  not  required  

•  If  there  are  no  consistency  requirements,  the  service  can  trivially  return  v0,  the  ini9al  value  in  response  to  every  request  

•  It  is  possible  to  provide  weakened  consistency  in  an  available,  par99on-­‐tolerant  semng  

•  Web  caches  are  one  example  of  weakly  consistent  network  

Aleksandar  Bradic,  Vast.com  hTp://fr.slideshare.net/alekbr/cap-­‐theorem  

Page 27: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Par9ally  Synchronous  Model  

•  The  Lynch  Paper  also  details  CAP  in  Par9ally  Synchronous  Model  :  every  node  has  a  clock  and  all  clocks  increase  at  the  same  rate.  

•  However,  clocks  are  not  synchronized  •  If  theorem  1  holds  in  Par9ally  synchronous  model,  the  corollary  1.1  does  not  hold.  

•  Weaker  consistency  (t-­‐connected)  can  be  achieved.  

Page 28: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

CAP  Conclusion  

•  It  is  possible  to  build  Large  Scale  Distributed  Data  Management  systems  under  the  CAP  theorem:    – One  property  should  be  sacrified.  

Page 29: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Sacrifying  one  Property  •  If  Consistency  is  sacrified  (AP):  –  Push  consistency  problems  to  applica9ons,  Can  be  more  difficult  to  solve,  or  not…  high  programming  cost  

–  Deployement  on  asynchonous  infrastructure…  •  If  Availability  is  sacrified  (CP)  –  Blocking  protocols  can  really  block  the  system,    –  Cheap  programming  cost  on  asynchronous  infrastructure  

•  If  P  is  sacrified  (AC)  –  Need  to  provide  a  quasi-­‐synchronous  model,  where  complex  failures  never  happens  

–  Cheap  programming  cost  with  synchronous  infra…  Stonebraker  CACM  CACM  2010  

Page 30: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Challenges  

•  Whatever  the  choices  been  made,  AC/AP/CP  •  Scalability  and  throughtput  that  can  be  achieved  with  different  approaches  will  make  the  difference    

•  The  balance  between  programming  cost/scalability-­‐efficiency  will  be  the  key.  

•  Nice  challenges  for  scien9st  and  engineers…  

Page 31: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

Clash  of  cultures  

•  a  Classic  distributed  systems:  focused  on  ACID  seman9cs  –  A:  Atomic  –  C:  Consistent  –  I:  Isolated  –  D:  Durable  

•  a  “Modern”  Internet  systems:  focused  on  BASE  –  Basically  Available  –  Sol-­‐state  (or  scalable)  –  Eventually  consistent  

 

NoSQL  (CouchDB…)  vs  NewSQL  (VoltDB…)    Dan  PritcheT    BASE,  an  ACID  Alterna9ve  ACM  Queue  hTp://queue.acm.org/detail.cfm?id=1394128  

Page 32: Large&Scale&DataManagement CAP$THEOREMpagesperso.lina.univ-nantes.fr/~molli-p/pmwiki/... · Available&DataObjects& • For&adistributed&system&to&be&con9nuously& available,&every&requestreceived&by&anon:failing&

hTp://blogs.the451group.com/informa9on_management/2011/04/15/nosql-­‐newsql-­‐and-­‐beyond/