THE NOSQL MOUVEMENT (2) -...

63
THE NOSQL MOUVEMENT (2) GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE [email protected] http://www.vargas7solar.com/bigdata7managment

Transcript of THE NOSQL MOUVEMENT (2) -...

THE NOSQL MOUVEMENT (2)

GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE

[email protected]

http://www.vargas7solar.com/bigdata7managment0

THE NOSQL FAMILY

!  NoSQL concerns document databases, key-value databases and graph databases

2

NoSQL&Graph&

Document&

Key5value&store&

GRAPH DATABASE

!  Use graph structures with nodes, edges, and properties to represent and store data

!  Nodes are similar in nature to the objects that object-oriented programmers are familiar with

!  Properties are pertinent information that relate to nodes

!  Edges are the lines that connect nodes to nodes or nodes to properties and they represent the relationship between the two

!  By definition, a graph database is any storage system that provides index-free adjacency

!  Every element contains a direct pointer to its adjacent element

!  No index lookups are necessary

3

GRAPH

4

Takahiro Inoue, MongoDB leader, slideshare

UNDIRECTED GRAPH

5

Takahiro Inoue, MongoDB leader, slideshare

DIRECTED GRAPH

6

Takahiro Inoue, MongoDB leader, slideshare

7

Takahiro Inoue, MongoDB leader, slideshare

MIXED GRAPH, MULTIGRAPH

8

Takahiro Inoue, MongoDB leader, slideshare

SINGLE RELATIONAL GRAPH

9

Takahiro Inoue, MongoDB leader, slideshare

MULTI RELATIONAL GRAPH

10

Takahiro Inoue, MongoDB leader, slideshare

11

Takahiro Inoue, MongoDB leader, slideshare

12

Takahiro Inoue, MongoDB leader, slideshare

PROPERTY GRAPH

13

Takahiro Inoue, MongoDB leader, slideshare

14

Takahiro Inoue, MongoDB leader, slideshare

PROPERTY GRAPH: SUMMARY

15

Takahiro Inoue, MongoDB leader, slideshare

GRAPH TRAVERSALS

16

Takahiro Inoue, MongoDB leader, slideshare

GRAPH TRAVERSALS

17

Takahiro Inoue, MongoDB leader, slideshare

THE NOSQL FAMILY

18

NoSQL& Graph&

DOCUMENT DATABASE

!  Computer program designed for storing, retrieving, and managing document-oriented, or semi structured data, information

!  Document encapsulates and encodes data (or information) in some standard formats or encodings.

!  XML, YAML, JSON, and BSON, as well as binary forms like PDF and Microsoft Office documents (MS Word, Excel, and so on)

!  Similar, in some ways, to records or rows, in relational databases, but they are less rigid.

!  Not required to adhere to a standard schema nor will they have all the same sections, slots, parts, keys, or the like

19

ORGANIZATION AND ACCESS

!  Organizing documents, include notions of Collections,0Tags,0Non7visible0Metadata,0Directory0hierarchies,0Buckets0

!  Documents are addressed in the database via a unique key that represents that document

!  This key is a simple string e.g., URI or path

!  This key can be used to retrieve the document from the database

!  The database retains an index on the key such that document retrieval is fast

!  Simple key-document (or key-value) lookup to retrieve a document,

!  the database offers an API or query language to retrieve documents based on their contents

!  For example, you may want a query that gets you all the documents with a certain field set to a certain value

20

DATA MODEL: DOCUMENT

!  An object with named attributes and «attachments»: !  Identified by one unique ID and a version number

!  Different data types: Text, numbers, booleans, dates, lists, maps

!  Does not use locks for dealing with concurrency control: conflicts can be merged

!  Examples: !  “Title”:0“CouchDB:0The0Definitive0Guide:0Time0to0Relax0(Animal0Guide)”0

!  “Authors”:0[“Chris0Anderson”,0“Jan0Lehnardt”,0“Noah0Slater”]0

!  “Keywords”:0[“NoSQL0databases”,0“Document0databases”]0

11/01/15

21

FLEXIBLE DOCUMENT STRUCTURE

!  Can represent different classes of tag as documents

!  Both documents can be inserted in the same collection

22

SIMPLE QUERY

!  db.tags.find({id:0“tone/obituaries”})0

!  Query&operators&(cf.&h<p://docs.mongodb.org/manual/crud/)&&

!  db.tags.find({“section”:0{$exists:0true}})0

!  db.tags.find({“webtitle”:0/^Obit*/i})0

23

24

MODIFYING THE DOCUMENT STRUCTURE

THE NOSQL FAMILY

25

NoSQL&

Document&

DEMO: COUCHDB

26

THE NOSQL FAMILY

27

NoSQL&

Key5value&store&

Eventually5consistent&

Hierarchical&

Hosted&services&

Stores&on&disk&

Ordered&stores&

MulIvalue&databases&

Object&databases&

Tuple&

Tabular&

28

Data&stores&designed&&to&scale&simple&&OLTP5style&applicaIon&loads&&

•  Data$model$$•  Consistency$$•  Storage$$•  Durability$$

•  Availability$$•  Query$support$

Read/Write$operations00by0thousands/millions0of0users0

(Katsov-2012)

Use the right tool for the right job…

How do I know which is the right tool for the right job?

29

PROBLEM STATEMENT: HOW MUCH TO GIVE UP?

!  CAP theorem1: a system can have two of the three properties

!  NoSQL systems sacrifice consistency

30

Consistency0Availability0

Fault7tolerant00partitioning0

10Eric0Brewer,0"Towards0robust0distributed0systems."0PODC.020000http://www.cs.berkeley.edu/~brewer/cs262b72004/PODC7keynote.pdf00

COMPARING NOSQL & NEWSQL SYSTEMS

SYSTEM CONCURRENCY CONTROL

DATA STORAGE

REPLICATION TRANSACTION

Redis Locks RAM Asynchronous No

Scalaris Locks RAM Synchronous Local

Tokyo Locks RAM/Disk Asynchronous Local

Voldemort MVCC RAM/BDB Asynchronous No

Riak MVCC Plug in Asynchronous No

Membrain Locks Flash+Disk Synchronous Local

Membase Locks Disk Synchronous Local

Dynamo MVCC Plug in Asynchronous No

SimpleDB Non S3 Asynchronous No

MongoDB Locks Disk Asynchronous No

CouchDB MVCC Disk Asynchronous No

31

SYSTEM CONCURRENCY CONTROL

DATA STORAGE

REPLICATION TRANSACTION

Terrastore Locks RAM+ Synchronous L

Hbase Locks HADOOP Asynchronous L

HyperTable Locks Files Synchronous L

Cassandra MVCC Disk Asynchronous L

BigTable Locs+stamps GFS Both L

PNuts MVCC Disk Asynchronous L

MySQL-C ACID Disk Synchronous Y

VoltDB ACID/no Lock RAM Synchronous Y

Clustrix ACID/no Lock Disk Synchronous Y

ScaleDB ACID Disk Synchronous Y

ScaleBase ACID Disk Asynchronous Y

NimbusDB ACID/no Lock Disk Synchronous Y

Key7Value0

Document0

Extended0records0

Relational0

Cattell,0Rick.0"Scalable0SQL0and0NoSQL0data0stores."0ACM0SIGMOD0Record039.40(2011):0127270

CONCLUSIONS

!  Data are growing big and more heterogeneous and they need new adapted ways to be managed thus the NoSQL movement is gaining momentum

!  Data heterogeneity implies different management requirements this is where polyglot persistence comes up

!  Consistency – Availability – Fault tolerance theorem: find the balance !

!  Which data store according to its data model?

!  A lot of programming implied …

32

Open opportunities if you’re interested in this topic!

POLYGLOT PERSISTENCE

GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE

[email protected]

http://www.vargas7solar.com0

34

THIS TALK IS ABOUT

35

alternative for managing multiform and multimedia data collections according to different properties and requirements

36

POLYGLOT PERSISTENCE

!  Polyglot Programming: applications should be written in a mix of languages to take advantage of different languages are suitable for tackling different problems

!  Polyglot persistence: any decent sized enterprise will have a variety of different data storage technologies for different kinds of data

!  a new strategic enterprise application should no longer be built assuming a relational persistence support

!  the relational option might be the right one - but you should seriously look at other alternatives

37

M.0Fowler0and0P.0Sadalage.0NoSQL&Distilled:&A&Brief&Guide&to&the&Emerging&World&of&Polyglot&Persistence.0Pearson0Education,0Limited,020120

DESIGNING AND BUILDING A POLYGLOT DATABASE

38

39

OBJECTIVE

!  Build a MyNet app based on a polyglot database for building an integrated directory of my contacts including their status and posts from several social networks

MyNet0DB0MyNet0App0 Social0network0

40

Analysis on contacts networks, overlapping according to interests, posts topics Top 10 most popular contacts

User sessions in different Social networks

Integrating posts from all networks

Contact graph traversal For building groups out of Common characteristics

Synchronizing posts to all SN Friends network

User accounts activity In different social networks

Directory synchronisation Integrating contacts’ information From all SN

DEPLOYING A POLYGLOT DATABASE

41

REST0

JSON0documents0

MyNet00

MULTI-CLOUD POLYGLOT DATABASE

MyNetContacts0

webSite:(URI(socialNetworkID:(URI(((

Basic(Info(Contact'

idContact:(Integer(lastName:(String(givenName:(String(society:(String(

<<'hasBasicInfo'>>'

<<'isContactof'>>'

groupName:(String(

Group'

<<'isComposedof'>>'

1' *'

1'

*'1'

*'

postID:(Integer(timeStamp:(Date(geoStamp:(String(

Post'

contentID:(Integer(text:(String(image:(Jpeg(video:(Avi(

Content'

<<'hasContent'>>'

<<'publishesPost'>>'*'

1'

1' 1'

street:(String,((number:(Integer,(City:(Sting,((Zipcode:(Integer(

Address'

1'

*'

number:(String(type:({pers,(prof}(

Phone'

email:(String(type:({pers,(prof}(

Email'<<'hasAddress'

>>'

<<'hasOtherPhones'

>>'

1'

*'

1'

*'

*'

<<'hasMobilePhones

'>>'

1'

<<'hasEmail'>>'

webSite:(URI(socialNetworkID:(URI(((

Basic(Info(Contact'

idContact:(Integer(lastName:(String(givenName:(String(society:(String(

<<'hasBasicInfo'>>'

<<'isContactof'>>'

groupName:(String(

Group'

<<'isComposedof'>>'

1' *'

1'

*'1'

*'

postID:(Integer(timeStamp:(Date(geoStamp:(String(

Post'

contentID:(Integer(text:(String(image:(Jpeg(video:(Avi(

Content'

<<'hasContent'>>'

<<'publishesPost'>>'*'

1'

1' 1'

street:(String,((number:(Integer,(City:(Sting,((Zipcode:(Integer(

Address'

1'

*'

number:(String(type:({pers,(prof}(

Phone'

email:(String(type:({pers,(prof}(

Email'<<'hasAddress'>>'

<<'hasOtherPhones'>>'

1'

*'

1'

*'

*'

<<'hasMobilePhones'>>'

1'

<<'hasEmail'>>'

webSite:(URI(socialNetworkID:(URI(((

Basic(Info(Contact'

idContact:(Integer(lastName:(String(givenName:(String(society:(String(

<<'hasBasicInfo'>>'

<<'isContactof'>>'

groupName:(String(

Group'

<<'isComposedof'>>'

1' *'

1'

*'1'

*'

postID:(Integer(timeStamp:(Date(geoStamp:(String(

Post'

contentID:(Integer(text:(String(image:(Jpeg(video:(Avi(

Content'

<<'hasContent'>>'

<<'publishesPost'>>'*'

1'

1' 1'

street:(String,((number:(Integer,(City:(Sting,((Zipcode:(Integer(

Address'

1'

*'

number:(String(type:({pers,(prof}(

Phone'

email:(String(type:({pers,(prof}(

Email'<<'hasAddress'>>'

<<'hasOtherPhones'>>'

1'

*'

1'

*'

*'

<<'hasMobilePhones'>>'

1'

<<'hasEmail'>>'

MANAGING A POLYGLOT DATABASE QUERYING, INSERTING, MAINTAINING

43

GENERATING NOSQL PROGRAMS FROM HIGH LEVEL ABSTRACTIONS

44

UML class diagram application classes

Spring Roo

Java0web0App0 Spring Data

Graph database Relational database

High-level abstractions

Low-level abstractions

http://code.google.com/p/model2roo/00

POLYGLOT DATABASE EVOLUTION

!  Problem statement: !  Evolution of the application: modification of classes, new

classes, new relationships among classes

!  Evolution of the “entities” managed in the polyglot database

!  Some change structure, change values, …

!  The content of the stores start deriving from the application data structures

!  Which is the current structure of the entities stored?

!  Are there elements that are not being accessed because they do not longer correspond to the application data structures?

45

Contact'

idContact:"Integer"firstName:"String"lastName:"String"

webSite:(URI(socialNetworkID:(URI(((

Basic(Info(Contact'

idContact:(Integer(lastName:(String(givenName:(String(society:(String( <<'hasBasicInfo'>>'

1' *'

postID:(Integer(timeStamp:(Date(geoStamp:(String(

Post'

contentID:(Integer(text:(String(

Content'

<<'hasContent'>>'

<<publishesPost>>'*'

1'

1' 1'

street:(String,((number:(Integer,(City:(Sting,((Zipcode:(Integer(

Address'

*'

email:(String(type:({pers,(prof}(

Email'

<<'h

asAddres

s'>>'

*'

1' <<'hasEmail'>>'

ID:(Integer,(Lang:(String(

Language'*'

<<'speaksLanguage'>>'

CRUD OPERATIONS

46

Contact

Group

Content

Post

Id$ FirstName$ LastName$ Society$

Consistent view of data

BACKGROUND

!  Classic protocols are not an option (Ex: Two-phase commit)

!  Voting phase, Commit phase = Scalability issues

!  Limited transactional support by NoSQL solutions

!  Neo4j is one of the few that truly supports ACID (Atomicity, Consistency, Isolation, Durability)

!  Others provide transactions limited to single entities (MongoDB), no roll-back (Redis), etc.

!  Rely on BASE (Basic Availability, Soft-state, Eventual consistency) instead of ACID

47

EXAMPLE 1: SYNCHRONIZING REDIS+MYSQL

48

https://oracleus.activeevents.com/connect/sessionDetail.ww?SESSION_ID=477500

Updating REDIS #FAIL

begin0MySQL0transaction00update0MySQL00update0Redis0

rollback0MySQL0transaction0

begin0MySQL0transaction00update0MySQL0

commit0MySQL0transaction0<<0system0crashes0>>0update0Redis0

Redis has updated MySQL does not

MySQL has updated Redis does not

EXAMPLE 1: UPDATING REDIS RELIABLY

Step I Step 2

49

begin0MySQL0transaction00update0MySQL00queue0CRUD0event0in0MySQL0

commit0transaction0

Event0Id0Operation:0Create,0Update,0Delete0queue0CRUD0event0in0MySQL0

New0entity0state,0e.g.0JSON0

ACID for0each0CRUD0event0in0MySQL0queue00

get0next0CRUD0event0from0MySQL0queue0if0CRUD0event0is0not0duplicate0then0

0update0Redis0(incl.0eventID)0end0if0

0begin0MySQL0transaction0

0mark0CRUD0event0processed0commit0transaction0

0end0for0each0

EXAMPLE 1: UPDATING REDIS RELIABLY

50

EntityCRUDEvent0Repository0

EntityCRUDEvent0Processor0 Redis0updater0

ID$ JSON$ Processed?$

INSERT0INTO0..0 SELECT0…0FROM..0

apply(event)0

Timer0Step 1 Step 2

EXAMPLE 1: TRACKING CHANGES (HIBERNATE)

51

EXAMPLE 1: SYNCHRONIZING REDIS+MYSQL

52

!  Tracking changes (Hibernate)…

EXAMPLE 2: SYNCHRONIZING MONGODB+RELATIONAL

53

EXAMPLE 2: SYNCHRONIZING MONGODB+RELATIONAL

54

EXAMPLE 2: SYNCHRONIZING MONGODB+RELATIONAL

55

!  Spring Data project

http://static.springsource.org/spring-data/data-mongodb/docs/current/reference/html/#mongo.cross.store

(Katsov-2012)

Use the right tool for a given job…

56

Lack of standardization of models and data storage technologies

+

Data&stores&designed&&to&scale&simply&&OLTP5style&applicaIon&loads&& Read/Write$operations00

by0thousands/millions0of0users0

CHARACTERISTIC$ SUBCHARACTERISTIC$ METRIC$

Reliability0 Maturity0 API changes0

Availability0 Downtime 30

Fault tolerance0 Node down throughput 30

Recoverability0 Time to stabilize on node up 30

Performance and efficiency0

Time behaviour0 Throughput, latency 20

Resource utilisation0 CPU, Memory and disk usage 40

QUALITY DRIVEN BENCHMARK1

57

1Yahoo Cloud Serving Benchmark, https://github.com/brianfrankcooper/YCSB/wiki 2 Cooper,B.F.,Silberstein,A.,Tam,E.,Ramakrishnan,R.,Sears,R.:Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM symposium on Cloud computing. pp. 143–154. SoCC ’10, ACM, New York, NY, USA (2010) 3 Nelubin, D., Engber, B.: Failover Characteristics of leading NoSQL databases. Tech. rep., Thumbtack Technology (2013) 4 Massie, M.L., Chun, B.N., Culler, D.E.: The Ganglia Distributed Monitoring Sys- tem: Design, Implementation, and Experience. Parallel Computing 30(7) (Jul 2004)

+QUALITY DRIVEN BENCHMARK

58

Workload0

executor0

DB0interface0

layer0

Client0threads0

Stats0 Cloud0serving0store0

QDB0

YSCB0Client0

•  Read/write0mix0•  Record0size0•  Popularity0distribution0

•  DB0to0use0•  Workload0to0use0•  Target0throughput0•  Number0of0threads0

Read0latency0Throughput0

Linked0data0&0temporal0streams0

+ONGOING WORK !  QDB benchmark extends YCSB: FaultTolerance, Recoverability and TimeBehaviour

!  Pivot data model for representing NoSQL stores data models

!  Sample application: Shopping system1 (ProductInfo)

!  Document data stores: MongoDB, Couchbase, VoltDB, Redis, Neo4J

!  Cluster of four Ubuntu 12.04 servers deployed with extra large VM instances (8 virtual cores and 14 GB of RAM) in Windows Azure2

!  Distributed polyglot (big) database engineering !  Model2Roo: engineering data storage solutions for given data collections

!  ExSchema for supporting the maintenance of a polyglot storage solution

59

1 McMurtry, D., Oakley, A., Sharp, J., Subramanian, M., Zhang, H.: Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence Microsoft patterns & practices, Microsoft (2013) 2 http://www.windowsazure.com/ 3 http://forge.puppetlabs.com/puppetlabs/ 4Yahoo Cloud Serving Benchmark, https://github.com/brianfrankcooper/YCSB/wiki

IN BRIEF…

!  Many proposals, but no definite solution yet…

!  Research/Industry challenges

!  Open opportunities if you’re interested in this topic!

60

WHEN IS POLYGLOT PERSISTENCE PERTINENT?

!  Application essentially composing and serving web pages

!  They only looked up page elements by ID, they had different needs or availability, concurrency and no need to share all their data

!  A problem like this is much better suited to a NoSQL store than the corporate relational DBMS

!  Scaling to lots of traffic gets harder and harder to do with vertical scaling

!  Many NoSQL databases are designed to operate over clusters

!  They can tackle larger volumes of traffic and data than is realistic with a single server

61

Juan0Carlos0Castrejón0University0of0Grenoble0France0

Javier0Espinosa0University0of0Grenoble0France0

Dr.0Genoveva0Vargas7Solar0CNRS,0LIG7LAFMIA0France0

[email protected]://www.vargas7solar.com/bigdata7management0000

REFERENCES

!  Eric0A.,0Brewer0"Towards0robust0distributed0systems."0PODC.020000

!  Rick,0Cattell0"Scalable0SQL0and0NoSQL0data0stores."0ACM0SIGMOD0Record039.40(2011):0127270

!  Juan0 Castrejon,0 Genoveva0 Vargas7Solar,0 Christine0 Collet,0 and0 Rafael0 Lozano,0 ExSchema:0Discovering0and0Maintaining0Schemas0from0Polyglot0Persistence0Applications,0In0Proceedings0of0the0International0Conference0on0Software0Maintenance,0Demo0Paper,0IEEE,0201300

!  M.0Fowler0and0P.0Sadalage.0NoSQL0Distilled:0A0Brief0Guide0to0the0Emerging0World0of0Polyglot0Persistence.0Pearson0Education,0Limited,020120

!  C.0 Richardson,0 Developing0 polyglot0 persistence0 applications,0 http://fr.slideshare.net/chris.e.richardson/developing7polyglotpersistenceapplications7gluecon20130

63