Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul*...

55
Distributed Data Management Serge Abiteboul INRIA Saclay, Collège de France, ENS Cachan

Transcript of Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul*...

Page 1: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Distributed  Data  Management  

Serge  Abiteboul  INRIA  Saclay,  Collège  de  France,  ENS  Cachan  

 

Page 2: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Distributed    compu@ng  

A  distributed  system    is  an  applica@on  that  coordinates  the  ac@ons  of  several  computers  to  achieve  a  specific  task.  

Distributed  compu@ng  is  a  lot  about  data  –  System  state  –  Session  state  including  security  informa@on  –  Protocol  state  –  Communica@on:  exchanging  data  –  User  profile  –  ...  and  of  course,  the  actual  “applica@on  data”  

Distributed  compu@ng  is  about  querying,  upda@ng,  communica@ng  

data    ⤳        distributed  data  management    5/16/12   2  

Page 3: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Parallelism  and  distribu@on  

Sequen@al  access:    166  minutes  (more  than  2  hours  and  a  half)  to  read  a  1  TB  disk  

Parallel  access:  With  100  disks  working  in  parallel,  less  than  2mn  

Distributed  access:  With  100  computers,  each  disposing  of  its  own  local  disk:  each  CPU  processes  its  own  dataset  

 This  is  scalable  

5/16/12   3  

100MB/s  

100MB/s  

1TB  

1TB  

1TB   1TB   1TB  1TB  

1TB   1TB   1TB   1TB  

network  

Page 4: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Organiza@on  

1.  Data  management  architecture  2.  Parallel  architecture      Zoom  on  two  technologies  

3.  Cluster  (grappe):  MapReduce        4.  P2P:  storage  and  indexing    5.  Limita@ons  of  distribu@on  6.  Conclusion  

5/16/12   4  

Page 5: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Data  management  architecture  

Page 6: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Deployment  architecture  

 Centralized    •       Mul@-­‐user  mainframe  &  terminals      Client-­‐Server  •  Mul@-­‐user  server  &  terminals  

worksta@on    •  Client:  applica@on  &  Graphical  interface  •  Server:  database  system  

5/16/12  

6  

Database  system  

Applica@on  

API  (e.g.,  JDBC)  

Applica@on  

 Database  server  

Page 7: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Deployment  architecture  –  3  @er  

Client  is  a  browser  that  –  Displays  content,  e.g.  in  HTML  –  Communicates  via  HTTP  

Central  @er  –  Generates  the  content  for  the  client  –  Runs  the  applica@on  logic  –  Communicates  with  the  database  

Data  server  @er  –  Serves  data  just  like  in  the  client/server  case  

API  (e.g.,  JDBC)  

Applica@on  

Database  server  

Browser  

HTTP  

TCP/IP  

5/16/12   7  

Page 8: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Another  dimension:  Server  architecture    

•  Example  1  –  Server:  single  machine  

   

•  Example  2  –  Server:  parallel  machine  

5/16/12  

8  

Applica@on  

Database  server  

   5/16/12  

Applica@on  

DB  server  

   

DB  server   DB  server  

Deployment:  client/server    

Page 9: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

JDBC  /  ODBC  

Applica@on  

Rela@onal  server  

High-­‐level  queries  

Applica@on  

Page  or  object    server  

Page  cache  

Query  server   Data  server  

Server  architecture:  Query  vs.  page/object  

Answers   Pages    or  objects  

Requests  

5/16/12   9  

Applica@on   Applica@on  

Page 10: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Parallel  architecture  

Page 11: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Parallel  architecture  

The  architecture  of  a  server  is  typically  •  mul@-­‐CPU,  mul@-­‐memory,  mul@-­‐disk  •  Based  on  very  fast  network  Architectures  •  Shared  memory  •  Shared  disk  •  Shared  nothing  •  Hybrid  

P P P P

M  

P P P P

M  M  M  M  

P P P P

M  M  M  M  

P P P P

M  M  

5/16/12   11  

Page 12: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Comparison  

Shared  memory  –       The  bus  becomes  the  bojleneck  beyond  32-­‐64  processors  –       Used  in  prac@ce  in  machine  of  4  to  8  processors  

Shared  disk  –       Inter-­‐CPU  communica@on  slower  –       Good  for  fault-­‐tolerance  –       Bojleneck  pushed  to  hundreds  of  processors  

No  sharing  –  only  for  very  parallelizable  applica@ons  –       Higher  communica@on  cost  –       Scaling  to  thousands  of  processors  –       Adapted  for  analysis  of  large  data  sets    

5/16/12   12  

Page 13: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Main  memory  database  

Beyond  100  cores  Beyond  10  Tb  memory        

Blade  server  

Blade  

CPU  

Core  

8  -­‐  16  

4  

4  -­‐  8  

RAM  

L2  cache  

L1  cache  

0,5  –  1  TB  

1  –  4  Mb  

16  –  32  Kb  

 blades  share  nothing  

Complex  programming  to  leverage  parallelism    Issue:  compu@ng  power  and  memory  throughput  augment  –  latency  augments  much  less   13  

serveur  lame  

Page 14: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Massive  parallelism  and  par@@oning  

•  Line  store   •  Column  store  

5/16/12   14  

Horizontal  ParGGoning  

VerGcal  ParGGoning  

Page 15: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Line  vs.  column  store    

Lines  •  Read/write  a  full  tuple:  fast  •  Read/write  an  ajribute  for  

the  en@re  rela@on:  slow  •  Limited  compression  •  Slow  aggrega@on  

•  Adapted  to  transacGonal  applica@ons  

Columns    •  Read/write  a  full  tuple:  slow  •  Read/write  an  ajribute  for  

the  en@re  rela@on:  fast  •  Excellent  compression  •  Fast  aggrega@on  

•  Adapted  to  decisional  applica@ons  

5/16/12   15  

Page 16: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Massive  parallelism  &  column  store  

Parallelism  SGBD-­‐R  :  Teradata                                  Neteeza(IBM)  

       DATAllegro  (Microsol)  Open  source  :  Hadoop                                  (in  a  few  minutes…)  

Column  store  Sybase  IQ  Kickfire  (Teradata)  :Open  source  MonetDB  

Parallelism  &  column  store  Exasol  Ver@ca  Greenplum  (EMC)  Opensource:  Hadoop  HBase  

5/16/12   16  

Page 17: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Cluster:  MapReduce  

To  process  (e.g.  to  analyze)  large  quan@@es  of  data    •  Use  parallelism  •  Push  data  to  machines  

   

Page 18: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

MapReduce  

MapReduce  :  a  compu@ng  model  based  on  heavy  distribu@on  that  scales  to  huge  volumes  of  data  •  2004  :  Google  publica@on  •  2006:  open  source  implementa@on,  Hadoop  

Principles  •  Data  distributed  on  a  large  number  of  shared  nothing  machines  •  Parallel  execu@on;  processing  pushed  to  the  data  

5/16/12   18  

Page 19: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

MapReduce  

Three  opera@ons  on  key-­‐value  pairs  Map    user-­‐defined        (transforme)    Shuffle    fixed  behavior        (mélange)  Reduce  user-­‐defined      (réduire)  

5/16/12   19  

Page 20: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Data  flow  MapReduce  MAP   REDUCE  SHUFFLE  

User  defined   User  defined  

5/16/12   20  

Page 21: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

MapReduce  example  

•  Count  the  number  of  occurrences  of  each  word  in  a  large  collec@on  of  documents  

5/16/12   21  

Page 22: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Map  

u1    jaguar  world  mammal  felidae  family.  u2    jaguar    atari  keen  use  68K  family  device.      u3    mac  os  jaguar  available  price  us  199  apple  new  family  pack  u4    such  ruling  family  incorporate  jaguar  their  name  

5/16/12   22  

Jaguar      1  Atari            1  Felidae    1  Jaguar      1…  

Jaguar            1  Available  1  Apple              1  Jaguar            2…  

Page 23: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Shuffle  

Jaguar      1  Atari            1  Felidae    1  Jaguar      1…  …  Jaguar            1  Available  1  Apple              1  Jaguar            2  …  

 

Jaguar  1,1,1,2  Mammal  1  Family  1,1,1  Available  1  …  

5/16/12   23  

Page 24: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Reduce  

Jaguar  1,1,1,2  Mammal  1  Family  1,1,1  Available  1  …  

5/16/12   24  

Jaguar  5  Mammal  1  Family  3  Available  1  …  

Page 25: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

MapReduce  func@onali@es  

Map:  (K,  V)  à  list  (K’,  V’)  ;  typically:  –  Filter,  select  a  (new)  key,  project,  transform  –  Split  results  in  M  files  for  M  reducers  

Shuffle:  list  (K’,  V’)  à  list  (K’,  list  (V’))    –  Regroup  the  pairs  with  the  same  keys  

Reduce:  (K’,  list  (V’))  à  list  (K’’,  V’’)  ;  typically:  –  Aggrega@on(COUNT,  SUM,  MAX)  –  Combina@on,  filtering  (example  join)  

Op@onal  op@miza@on  :  combine:  list(V’)  à  V’  –  Run  on  a  mapper  to  combine  pairs  with  the  same  key  into  a  single  pair  

5/16/12   25  

Page 26: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Hadoop  

Open  source,  Apache  implementa@on  in  Java  –  Main  contribu@on  from  Yahoo    

Main  components  –  Hadoop  file  system  (HDFS)  –  MapReduce  (MR)    –  Hive:  simple  data  warehouse  based  on  HDFS  and  MR  –  Hbase:  key-­‐value  column  store  on  HDFS  –  Zookeeper:  coordina@on  service  for  distributed  applica@ons  

–  Pig:  dataflow  language  on  HDFS  and  MR    ☞  Java  and  C++  API  

–  Streaming  API  for  other  language  Very  ac@ve  community  

5/16/12   26  

Page 27: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Pig  La@n  

N1NF  model    Example              

Compila@on  in  MapReduce    

Books = LOAD ’book.txt’ AS (title: chararray, author: chararray,…); Abiteboul = FILTER Books BY author == ’Serge Abiteboul’; Edits = LOAD ’editors.txt’ AS (title: chararray, editor: chararray); Joins = JOIN Abiteboul BY title, Edits BY title; Groups = GROUP Joins BY Abiteboul::author; Number = FOREACH groups GENERATE group, COUNT(Joins.editor); DUMP Number

LOAD   FILTER   LOAD   JOIN   GROUP   FOREACH   DUMP  

MAP   REDUCE   MAP   REDUCE  

For  some  author,  count  how  many  editors  this  

author  has  

5/16/12   27  

Page 28: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

What’s  going  on  with  Hadoop  

•  Limita@ons  –  Simplis@c  data  model  &  no  ACID  transac@on  –  Limited  to  batch  opera@on  –  Limited  to  extremely  parallelisable  applica@ons    

•  Good  recovery  to  failure  •  Scales  to  huge  quan@@es  of  data  

–  For  smaller  data,  it  is  simpler  to  use  large  flash  memory  or  main  memory  database  

•  Main  usage  today  (sources:  TDWI,  Gartner)  –  Marke@ng  and  customer  management    –  Business  insight  discovery  

5/16/12   28  

Page 29: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Where  does  this  technology  fit  

Data  warehouse            

   X                  X      X            X                      X  

 

ETL   Storage   Business  Intelligence  

5/16/12   29  

Page 30: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

P2P:  storage  and  indexing  

To  index  large  quan@@es  of  data  •  Use  exis@ng  resources  •  Use  parallelism  •  Use  replica@on  

Page 31: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Peer-­‐to-­‐peer  architecture  

P2P:  Each  machine  is  both  a  server  and  a  client  Use  the  resources  of  the  network    

–  Machines  with  free  cycles,  available  memory/disk)  

 •  Communica@on:  Skype    •  Processing:  se@@home,  foldit  •  Storage:  emule    

5/16/12   31  

Page 32: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Power  of  parallelism  

5/16/12   32  Server  saturates   P2P  scales   Parallel  loads  

Performance,  availability,  etc.      

Page 33: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Managing  a  large  collec@on  

5/16/12  

33  Col.  

33  

Col.  por@on   Col.  por@on   Col.  por@on   Col.  por@on  

LAN  

Col.  por@on   Col.  por@on   Col.  por@on   Col.  por@on  

Internet  

Col.  por@on  Col.  por@on  

P2P  approach  

Page 34: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Difficul@es  

•  Peers  are  autonomous,  less  reliable  •  Network  connec@on  is  much  slower  (WAN  vs.  LAN)  •  Peers  are  heterogeneous  

–  Different  processor  &  network  speeds,  available  memories  

•  Peers  come  and  go    –  Possibly  high  churn  out  (taux  de  désabonnement)      

•  Possibly  much  larger  number  •  Possible  to  have  peers  “nearby  on  the  network”  

5/16/12   34  

Page 35: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

And  the  index?  

Centralized  index:  a  central  server  keeps  a  general  index  –  Napster  

Pure  P2P:  communica@ons  are  by  flooding  –  Each  request  is  sent  to  all  neighbors  (modulo  @me-­‐to-­‐life)  –  Gnutella  0.4,  Freenet  

Structured  P2P:  no  central  authority  and  indexing  using  an  "overlay“  network  (réseau  surimposé)  –  Chord,  Pastry,  Kademlia  –  Distributed  HASH  table:  index  search  in  O  (log(n))  

5/16/12   35  

Page 36: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Implementa@on:  Chord  Ring  

   Hashing  is  modulo  2n  H    distributes  the  peers  around  the  (0..  2n)  ring    

O  <  H(pid)  <    2n  We  assume  no  two  peers  have  the  same  H(pId)  

H  distributes  the  keys  around  the  ring    

 5/16/12   O  1  

2n  

In  charge  of  all  keys  from  H(Mi)  

to  H(Mi+1)  

k  H(k)  H(pid)  pid  

Page 37: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Perform  a  search  in  log(n)  

•  Each  node  has  a  rou@ng  table  with  :  «  fingers  »  •  Key  k  with            H(k)  =  13  •  4<13<23  •  Forward  the            query  to  the            peer  in  charge          of  4…  

?H(k)  

5/16/12   37  

Page 38: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Search  in  log(n)  

•  Ask  any  peer  for  key  k  •  This  peers  knows  log(n)  peers  and  the  smallest  key  of  each  •  Ask  the  peer  with  key  immediately  less  than  H(k)  •  In  the  worst  case,  divide  by  2  the  search  space  •  Aler  log(n)  in  the  worst  case,    find  the  peer  in  charge  of  k  

•  Same  process  to  add  an  entry  for  k    •  Or  to  find  the  values  for  key  k  

5/16/12   38  

Page 39: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Joining  the  DHT  

5/16/12   39  O  1  2n  

In  charge  of  all  keys  from  H(Mi)  

to  H(Mi+1)  

M  joins  

1)  Computes          H(M)  

2)  Contacts  peer  Mi  In  charge  of  H(M)  

3)  Receives  all  the            entries  between          H(M)  and  H(Mi+1)      

In  charge  of  all  keys  from  H(M)  

to  H(Mi+1)  

In  charge  of  all  keys  from  H(Mi)  

to  H(M)  

Page 40: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Leaving  the  DHT  

5/16/12   40  O  1  2n  

In  charge  of  all  keys  from  H(Mi)  

to  H(Mi+1)  

M  leaves  

1)  Sends  to  previous                peer  on  the  ring                all  its  entries                (between              H(M)  and  H(Mi+1))      

In  charge  of  all  keys  from  H(M)  

to  H(Mi+1)  

In  charge  of  all  keys  from  H(Mi)  

to  H(M)  

Page 41: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Issues  

•  When  peers  come  and  go,  maintenance  of  finger  tables  is  tricky  

•  Peer  may  leave  without  no@ce:  only  solu@on  is  replicaGon  –  Use  several  hash  func@on  H1,  H2,  H3  and  maintain  each  piece  of  

informa@on  on  3  machines  

 

5/16/12   41  

Page 42: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Advantages  &  disadvantages  

•  Advantages  –  Scaling  –  Cost  effec@ve:  take  advantage  of  exis@ng  resources  –  Performance,  availability,  reliability  (poten@ally  because  of  

redundancy  but  rarely  the  case  in  prac@ce)  

•  Disadvantages    –  Servers  may  be  selfish,  unreliable  ⤳  hard  to  guarantee  service  quality  –  Communica@on  overhead  –  Servers  come  and  go  ⤳  need  replica@on  

 replica@on  overhead  –  Slower  response  –  Updates  are  expensive  

5/16/12   42  

Page 43: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Limita@ons  of  distribu@on:  CAP  theorem  

5/16/12   43  

Page 44: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Main  idea  

•  Use  heavy  distribu@on  •  Use  heavy  replica@on  (at  least  for  popular  data)  

•  Is  this  the  magical  solu@on  to  any  management  of  huge  data?  •  Yes  for  very  parallelizable  problems  and  staGc  data  

collec@ons  •  If  there  are  many  updates:  

5/16/12   44  

Overhead:  for  each  update,  we  have  to  realize  as  many  

updates  as  there  are  replicas  

Problem:  the  replicas  start  diverging  

Page 45: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Proper@es  of    distributed  data  management  systems  

Scalability    refers  to  the  ability  of  a  system  to  con@nuously  evolve  in  order  to  support  a  growing  amount  of  tasks  

Efficiency  –  response  @me    (or  latency):  the  delay  to  obtain  the  first  item,  and  –  throughput    (or  bandwidth):  the  number  of  items  delivered  in  a  given  

period  unit  (e.g.,  a  second)  

5/16/12   45  

Page 46: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

CAP  proper@es  

Consistency  =  all  replicas  of  a  fragment  are  always  equal    –  Not  to  be  confused  with  ACID  consistency  –  Similar  to  ACID  atomicity:  an  update  atomically  updates  all  replicas    –  At  a  given  @me,  all  nodes  see  the  same  data    

Availability    –  The  data  service  is  always  available  and  fully  opera@onal    –  Even  in  presence  of  node  failures  –  Involves  several  aspects:  

   Failure  recovery      Redundancy:  Data  replica@on    on  several  nodes  

 

5/16/12   46  

Page 47: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

CAP  proper@es  

Par@@on  Tolerance    –  The  system  must  respond  correctly  even  in  presence  of  node  failures    –  Only  accepted  excep@on:      total  network  crash  –  However,  olen  mul@ple  par@@ons  may  form;  the  system  must  

•  prevent  this  case  of  ever  happening    •  Or  tolerate  forming  and  merging  of  par@@ons  without  producing  failures    

   

5/16/12   47  

Page 48: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Distribu@on  and  replica@on:  limita@ons  

CAP  theorem:  Any  highly-­‐scalable  distributed  storage  system  using  replica@on  can  only  achieve  a  maximum  of  two  proper@es  out  of  consistency,  availability  and  par@@on  tolerance            

 •  Intui@ve;  main  issue  is  to  formalize  and  prove  the  theorem  

–  Conjecture  by  Eric  Brewer        –  Proved  by  Seth  Gilbert,  Nancy  Lynch      

•  In  most  cases,  consistency  is  sacrificed    –  Many  applica@on  can  live  with  minor  inconsistencies    –  Leads  to  using  weaker  forms  of  consistency  than  ACID    

   5/16/12   48  

Page 49: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Conclusion  

Page 50: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Trends  

The  cloud  Massive  parallelism  Main  memory  DBMS    Open  source  solware  

5/16/12   50  

Page 51: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Trends  (con@nued)  

Big  data  (OLAP)  –  Publica@on  of  larger  and  larger  volumes  of  interconnected  data    –  Data  analysis  to  increase  its  value  

•  Cleansing,  duplicate  elimina@on,  data  mining,  etc.    

–  For  massively  parallel  data,  a  simple  structure  is  preferable  for  performance  

•  Key  /  value  >  rela@onal  or  OLAP  •  But  a  rich  structure  is  essen@al  for  complex  queries  

Massive  transac@onal  systems    (OLTP)  –  Parallelism  is  expensive  –  Approaches  such  as  MapReduce  are  not  suitable  

 

5/16/12   51  

Page 52: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

3  principles?  

New  massively  parallel  systems  ignore  the  3  principles  –  Abstrac@on,  universality  &  independence  

   Challenge:  Build  the  next  genera@on  of  data  management  

systems  that  would  meet  the  requirements  of  extreme  applica@ons  without  sacrificing  any  of  the  three  main  database  principles  

     

5/16/12   52  

Page 53: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Reference  

Again  the  Webdam  book:  webdam.inria.fr/Jorge    Partly  based  on  some  joint  presenta@on  with  Fernando  Velez  at  Data  Tuesday,  in  Microsol  Paris    Also:    Principles  of  distributed  database  systems,    Tamer  Özsu,  Patrick  Valduriez,  Pren@ce  Hall      

5/16/12   53  

Page 54: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

Merci !

Page 55: Distributed*DataManagement · Distributed*DataManagement Serge*Abiteboul* INRIA*Saclay,*Collège*de*France,*ENS*Cachan* *

 Gerhard  Weikum  

•  Max-­‐Planck-­‐InsGtut  für  InformaGk  •  Fellow:  ACM,  German  Academy  of  

Science  and  Engineering  •  Previous  posi@ons:  Prof.  Saarland  

University,  ETH  Zurich,  MCC  in  Aus@n,  Microsol  Research  in  Redmond  

•  PC  chair  of  conferences  like  ACM  SIGMOD,  Data  Engineering,  and  CIDR  

•  President  of  the  VLDB  Endowment  •  ACM  SIGMOD  ContribuGons  Award  in  

2011  

5/16/12   55