So you want to liberate your data?

40
d60 developing smart software solutions So you want to liberate your data? April 2012

Transcript of So you want to liberate your data?

Page 1: So you want to liberate your data?

d60 developing smart software solutions

So you want to liberate your data? April 2012

Page 2: So you want to liberate your data?

Mogens  Heller  Grabe  

     

[email protected]  @mookid8000  

h8p://mookid.dk/oncode  

Page 3: So you want to liberate your data?
Page 4: So you want to liberate your data?

Agenda  

•  Data,  queries,  etc.  •  Concurrency  •  AggregaEon  •  Deployment  •  Durability  •  Things  to  be  aware  of  

Page 5: So you want to liberate your data?

MongoDB  

•  Document  database  •  Currently  in  v.  2.0.4  •  Developed  by  10gen  •  Open  source  

–  server  is  GNU  AGPL  v3  –  clients  (the  official)  are  Apache  V2  

•  Absolutely  free  to  use  –  you  can  get  a  commercial  version  of  the  db  though  –  has  support,  SSL,  and  more  security  features  

Page 6: So you want to liberate your data?

Conceptual  data  organizaEon  

process database collection document

process

database table row

Page 7: So you want to liberate your data?

Data  

Page 8: So you want to liberate your data?

Example  1  

•  Install  •  Mongo  Shell  •  Show  database  contents  •  Add  and  show  a  document  

Page 9: So you want to liberate your data?

Queries  

including  several  other  query  operators:  $gt,  $gte,  $lt,  $lte,  $exists,  $all,  etc...  

Page 10: So you want to liberate your data?

Indexes  

Page 11: So you want to liberate your data?

Updates  

including  several  other  update  modifiers:  $inc,  $set,  $addToSet,  $rename,  etc...  

Page 12: So you want to liberate your data?

Example  2  

•  Import  some  data  •  Query  •  Update  •  Index  •  Query  

Page 13: So you want to liberate your data?

ACID?  

•  Atomic:  Yeah  well,  per  document.  •  Consistent:  Yeah  well,  can  be.  •  Isolated:  Yeah  well,  per  document.  •  Durable:  Yeah  well,  can  be  –  not  default  though....  

Page 14: So you want to liberate your data?

Concurrency  

•  Pushing  it  down  the  stack  

Page 15: So you want to liberate your data?

Concurrency  

•  Preserve  invariants  with  update  precondiEons  

Page 16: So you want to liberate your data?

Concurrency  

•  Use  opEmisEc  locking  when  replacing  document  

 (and  then  check  whether  n  is  0  or  1...)  

Page 17: So you want to liberate your data?

Concurrency  

•  Use  FindAndModify  to  “check  out”  documents  

Page 18: So you want to liberate your data?

AggregaEon  

•  Map/reduce  

Page 19: So you want to liberate your data?

AggregaEon  

•  Map/reduce  – Map:  for  each  document:    emit  0  or  more  (key,  value)  tuples  

– Reduce:  given  a  (key,  value[]),    return  1  value  

Page 20: So you want to liberate your data?

AggregaEon  m  =  function()  {          var  doc  =  this;          doc.appearances.forEach(function(a)  {                  emit(a,  {                          count:  1,                            names:  [doc.firstName  +  “  “  +  doc.lastName]                  });          });  }    r  =  function(key,  values)  {          var  count  =  0;          var  names  =  [];          values.forEach(function(v)  {                  count  +=  v.count;                  names  =  names.concat(v.names);          });          return  {count:  count,  names:  names};  }  

Page 21: So you want to liberate your data?

Example  3  

•  Use  map/reduce  to  collect  informaEon  on  who  appeared  in  each  episode  

Page 22: So you want to liberate your data?

AggregaEon  

•  AggregaEon  framework  (not  available  unEl  2.2)  – declaraEve  syntax  for  construcEon  of  an  aggregaEon  pipeline  

Page 23: So you want to liberate your data?

AggregaEon  

•  AggregaEon  framework  (not  available  unEl  2.2)  

Page 24: So you want to liberate your data?
Page 25: So you want to liberate your data?
Page 26: So you want to liberate your data?
Page 27: So you want to liberate your data?
Page 28: So you want to liberate your data?

Deployment  

•  Several  configuraEons  – we’ll  check  out  replica  sets  and  sharding  

Page 29: So you want to liberate your data?

Replica  sets  

•  Master-­‐slave  with  automaEc  failover  – Each  mongod  should  be  started  with  the  -­‐-­‐replset  argument  

– AddiEonal  nodes  added  from  the  shell  – Make  sure  the  number  of  nodes  is  odd,  possibly  by  adding  an  arbiter  

Page 30: So you want to liberate your data?

Replica  sets  

•  Higher  availability  •  Scale  out  reads  •  Backup  without  interfering  with  the  primary  

Page 31: So you want to liberate your data?

Sharding  

•  Auto-­‐sharding  – happens  by  user-­‐defined  shard  key  

– can  be  defined  per  collecEon  

–  requires  special  nodes:  mongos  (the  load  balancer)  and  a  mongod  that  is  configured  to  be  a  configuraEon  server  

Page 32: So you want to liberate your data?

Sharding  

•  Scale  out  writes  

•  LimitaEons:  – Shard  key  is  immutable  – All  inserts/updates  must  include  the  shard  key  – Cannot  enforce  (arbitrary)  uniqueness  across  shards,  only  for  shard  key  

Page 33: So you want to liberate your data?

Sharding  +  replica  sets  

Page 34: So you want to liberate your data?

MongoDB’s  durability  story  

•  Memory-­‐mapped  files.  •  fsync.  

•  Durability  through  replicaEon  – pre  1.8  

•  Durability  through  journaling  – an  opEon  since  1.8  –  replica  sets  sEll  cool  though  – default  since  2.0  

Page 35: So you want to liberate your data?

MongoDB’s  durability  story  

•  Inserts  and  updates  are  unsafe  by  default!!  – only  purpose:  get  awesome  benchmarks  – bad:  bites  you  in  the  a**  

•  Exposed  differently  on  drivers,  but  always  maps  to  db.getLastError()  

Page 36: So you want to liberate your data?
Page 37: So you want to liberate your data?

MongoDB’s  durability  story  

•  Conclusion:  It’s  cool  that  you  can  tweak  it  per  operation,  but  it’s  uncool  that  it’s  unsafe.  

Page 38: So you want to liberate your data?

Things  to  be  aware  of  

•  Safe  mode  off  •  32/64  bit  •  Memory-­‐mapped  file  •  Global  write  lock  •  Indexes  should  always  fit  in  RAM  

Page 39: So you want to liberate your data?

Thanks  for  listening!  

[email protected]  @mookid8000  

h8p://mookid.dk/oncode  

Page 40: So you want to liberate your data?

Image  credits  The  world’s  most  interesEng  man:  h8p://i.qkme.me/3mwy.jpg  Bison:  h8p://www.flickr.com/photos/johan-­‐gril/5632513228/  Tired  Fry:  h8p://cdn.memegenerator.net/instances/400x/18731987.jpg              Thanks  for  lerng  me  borrow  your  awesome  images  –  if  you  ever  meet  me,  I’ll  buy  you  a  beer.  Seriously,  I  will.