THE NOSQL MOUVEMENT (2) -...
Transcript of THE NOSQL MOUVEMENT (2) -...
THE NOSQL MOUVEMENT (2)
GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE
http://www.vargas7solar.com/bigdata7managment0
THE NOSQL FAMILY
! NoSQL concerns document databases, key-value databases and graph databases
2
NoSQL&Graph&
Document&
Key5value&store&
GRAPH DATABASE
! Use graph structures with nodes, edges, and properties to represent and store data
! Nodes are similar in nature to the objects that object-oriented programmers are familiar with
! Properties are pertinent information that relate to nodes
! Edges are the lines that connect nodes to nodes or nodes to properties and they represent the relationship between the two
! By definition, a graph database is any storage system that provides index-free adjacency
! Every element contains a direct pointer to its adjacent element
! No index lookups are necessary
3
DOCUMENT DATABASE
! Computer program designed for storing, retrieving, and managing document-oriented, or semi structured data, information
! Document encapsulates and encodes data (or information) in some standard formats or encodings.
! XML, YAML, JSON, and BSON, as well as binary forms like PDF and Microsoft Office documents (MS Word, Excel, and so on)
! Similar, in some ways, to records or rows, in relational databases, but they are less rigid.
! Not required to adhere to a standard schema nor will they have all the same sections, slots, parts, keys, or the like
19
ORGANIZATION AND ACCESS
! Organizing documents, include notions of Collections,0Tags,0Non7visible0Metadata,0Directory0hierarchies,0Buckets0
! Documents are addressed in the database via a unique key that represents that document
! This key is a simple string e.g., URI or path
! This key can be used to retrieve the document from the database
! The database retains an index on the key such that document retrieval is fast
! Simple key-document (or key-value) lookup to retrieve a document,
! the database offers an API or query language to retrieve documents based on their contents
! For example, you may want a query that gets you all the documents with a certain field set to a certain value
20
DATA MODEL: DOCUMENT
! An object with named attributes and «attachments»: ! Identified by one unique ID and a version number
! Different data types: Text, numbers, booleans, dates, lists, maps
! Does not use locks for dealing with concurrency control: conflicts can be merged
! Examples: ! “Title”:0“CouchDB:0The0Definitive0Guide:0Time0to0Relax0(Animal0Guide)”0
! “Authors”:0[“Chris0Anderson”,0“Jan0Lehnardt”,0“Noah0Slater”]0
! “Keywords”:0[“NoSQL0databases”,0“Document0databases”]0
11/01/15
21
FLEXIBLE DOCUMENT STRUCTURE
! Can represent different classes of tag as documents
! Both documents can be inserted in the same collection
22
SIMPLE QUERY
! db.tags.find({id:0“tone/obituaries”})0
! Query&operators&(cf.&h<p://docs.mongodb.org/manual/crud/)&&
! db.tags.find({“section”:0{$exists:0true}})0
! db.tags.find({“webtitle”:0/^Obit*/i})0
23
THE NOSQL FAMILY
27
NoSQL&
Key5value&store&
Eventually5consistent&
Hierarchical&
Hosted&services&
Stores&on&disk&
Ordered&stores&
MulIvalue&databases&
Object&databases&
Tuple&
Tabular&
28
Data&stores&designed&&to&scale&simple&&OLTP5style&applicaIon&loads&&
• Data$model$$• Consistency$$• Storage$$• Durability$$
• Availability$$• Query$support$
Read/Write$operations00by0thousands/millions0of0users0
(Katsov-2012)
Use the right tool for the right job…
How do I know which is the right tool for the right job?
29
PROBLEM STATEMENT: HOW MUCH TO GIVE UP?
! CAP theorem1: a system can have two of the three properties
! NoSQL systems sacrifice consistency
30
Consistency0Availability0
Fault7tolerant00partitioning0
10Eric0Brewer,0"Towards0robust0distributed0systems."0PODC.020000http://www.cs.berkeley.edu/~brewer/cs262b72004/PODC7keynote.pdf00
COMPARING NOSQL & NEWSQL SYSTEMS
SYSTEM CONCURRENCY CONTROL
DATA STORAGE
REPLICATION TRANSACTION
Redis Locks RAM Asynchronous No
Scalaris Locks RAM Synchronous Local
Tokyo Locks RAM/Disk Asynchronous Local
Voldemort MVCC RAM/BDB Asynchronous No
Riak MVCC Plug in Asynchronous No
Membrain Locks Flash+Disk Synchronous Local
Membase Locks Disk Synchronous Local
Dynamo MVCC Plug in Asynchronous No
SimpleDB Non S3 Asynchronous No
MongoDB Locks Disk Asynchronous No
CouchDB MVCC Disk Asynchronous No
31
SYSTEM CONCURRENCY CONTROL
DATA STORAGE
REPLICATION TRANSACTION
Terrastore Locks RAM+ Synchronous L
Hbase Locks HADOOP Asynchronous L
HyperTable Locks Files Synchronous L
Cassandra MVCC Disk Asynchronous L
BigTable Locs+stamps GFS Both L
PNuts MVCC Disk Asynchronous L
MySQL-C ACID Disk Synchronous Y
VoltDB ACID/no Lock RAM Synchronous Y
Clustrix ACID/no Lock Disk Synchronous Y
ScaleDB ACID Disk Synchronous Y
ScaleBase ACID Disk Asynchronous Y
NimbusDB ACID/no Lock Disk Synchronous Y
Key7Value0
Document0
Extended0records0
Relational0
Cattell,0Rick.0"Scalable0SQL0and0NoSQL0data0stores."0ACM0SIGMOD0Record039.40(2011):0127270
CONCLUSIONS
! Data are growing big and more heterogeneous and they need new adapted ways to be managed thus the NoSQL movement is gaining momentum
! Data heterogeneity implies different management requirements this is where polyglot persistence comes up
! Consistency – Availability – Fault tolerance theorem: find the balance !
! Which data store according to its data model?
! A lot of programming implied …
32
Open opportunities if you’re interested in this topic!
POLYGLOT PERSISTENCE
GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE
http://www.vargas7solar.com0
THIS TALK IS ABOUT
35
alternative for managing multiform and multimedia data collections according to different properties and requirements
POLYGLOT PERSISTENCE
! Polyglot Programming: applications should be written in a mix of languages to take advantage of different languages are suitable for tackling different problems
! Polyglot persistence: any decent sized enterprise will have a variety of different data storage technologies for different kinds of data
! a new strategic enterprise application should no longer be built assuming a relational persistence support
! the relational option might be the right one - but you should seriously look at other alternatives
37
M.0Fowler0and0P.0Sadalage.0NoSQL&Distilled:&A&Brief&Guide&to&the&Emerging&World&of&Polyglot&Persistence.0Pearson0Education,0Limited,020120
39
OBJECTIVE
! Build a MyNet app based on a polyglot database for building an integrated directory of my contacts including their status and posts from several social networks
MyNet0DB0MyNet0App0 Social0network0
40
Analysis on contacts networks, overlapping according to interests, posts topics Top 10 most popular contacts
User sessions in different Social networks
Integrating posts from all networks
Contact graph traversal For building groups out of Common characteristics
Synchronizing posts to all SN Friends network
User accounts activity In different social networks
Directory synchronisation Integrating contacts’ information From all SN
REST0
JSON0documents0
MyNet00
MULTI-CLOUD POLYGLOT DATABASE
MyNetContacts0
webSite:(URI(socialNetworkID:(URI(((
Basic(Info(Contact'
idContact:(Integer(lastName:(String(givenName:(String(society:(String(
<<'hasBasicInfo'>>'
<<'isContactof'>>'
groupName:(String(
Group'
<<'isComposedof'>>'
1' *'
1'
*'1'
*'
postID:(Integer(timeStamp:(Date(geoStamp:(String(
Post'
contentID:(Integer(text:(String(image:(Jpeg(video:(Avi(
Content'
<<'hasContent'>>'
<<'publishesPost'>>'*'
1'
1' 1'
street:(String,((number:(Integer,(City:(Sting,((Zipcode:(Integer(
Address'
1'
*'
number:(String(type:({pers,(prof}(
Phone'
email:(String(type:({pers,(prof}(
Email'<<'hasAddress'
>>'
<<'hasOtherPhones'
>>'
1'
*'
1'
*'
*'
<<'hasMobilePhones
'>>'
1'
<<'hasEmail'>>'
webSite:(URI(socialNetworkID:(URI(((
Basic(Info(Contact'
idContact:(Integer(lastName:(String(givenName:(String(society:(String(
<<'hasBasicInfo'>>'
<<'isContactof'>>'
groupName:(String(
Group'
<<'isComposedof'>>'
1' *'
1'
*'1'
*'
postID:(Integer(timeStamp:(Date(geoStamp:(String(
Post'
contentID:(Integer(text:(String(image:(Jpeg(video:(Avi(
Content'
<<'hasContent'>>'
<<'publishesPost'>>'*'
1'
1' 1'
street:(String,((number:(Integer,(City:(Sting,((Zipcode:(Integer(
Address'
1'
*'
number:(String(type:({pers,(prof}(
Phone'
email:(String(type:({pers,(prof}(
Email'<<'hasAddress'>>'
<<'hasOtherPhones'>>'
1'
*'
1'
*'
*'
<<'hasMobilePhones'>>'
1'
<<'hasEmail'>>'
webSite:(URI(socialNetworkID:(URI(((
Basic(Info(Contact'
idContact:(Integer(lastName:(String(givenName:(String(society:(String(
<<'hasBasicInfo'>>'
<<'isContactof'>>'
groupName:(String(
Group'
<<'isComposedof'>>'
1' *'
1'
*'1'
*'
postID:(Integer(timeStamp:(Date(geoStamp:(String(
Post'
contentID:(Integer(text:(String(image:(Jpeg(video:(Avi(
Content'
<<'hasContent'>>'
<<'publishesPost'>>'*'
1'
1' 1'
street:(String,((number:(Integer,(City:(Sting,((Zipcode:(Integer(
Address'
1'
*'
number:(String(type:({pers,(prof}(
Phone'
email:(String(type:({pers,(prof}(
Email'<<'hasAddress'>>'
<<'hasOtherPhones'>>'
1'
*'
1'
*'
*'
<<'hasMobilePhones'>>'
1'
<<'hasEmail'>>'
GENERATING NOSQL PROGRAMS FROM HIGH LEVEL ABSTRACTIONS
44
UML class diagram application classes
Spring Roo
Java0web0App0 Spring Data
Graph database Relational database
High-level abstractions
Low-level abstractions
http://code.google.com/p/model2roo/00
POLYGLOT DATABASE EVOLUTION
! Problem statement: ! Evolution of the application: modification of classes, new
classes, new relationships among classes
! Evolution of the “entities” managed in the polyglot database
! Some change structure, change values, …
! The content of the stores start deriving from the application data structures
! Which is the current structure of the entities stored?
! Are there elements that are not being accessed because they do not longer correspond to the application data structures?
45
Contact'
idContact:"Integer"firstName:"String"lastName:"String"
webSite:(URI(socialNetworkID:(URI(((
Basic(Info(Contact'
idContact:(Integer(lastName:(String(givenName:(String(society:(String( <<'hasBasicInfo'>>'
1' *'
postID:(Integer(timeStamp:(Date(geoStamp:(String(
Post'
contentID:(Integer(text:(String(
Content'
<<'hasContent'>>'
<<publishesPost>>'*'
1'
1' 1'
street:(String,((number:(Integer,(City:(Sting,((Zipcode:(Integer(
Address'
*'
email:(String(type:({pers,(prof}(
Email'
<<'h
asAddres
s'>>'
*'
1' <<'hasEmail'>>'
ID:(Integer,(Lang:(String(
Language'*'
<<'speaksLanguage'>>'
CRUD OPERATIONS
46
Contact
Group
Content
Post
Id$ FirstName$ LastName$ Society$
Consistent view of data
BACKGROUND
! Classic protocols are not an option (Ex: Two-phase commit)
! Voting phase, Commit phase = Scalability issues
! Limited transactional support by NoSQL solutions
! Neo4j is one of the few that truly supports ACID (Atomicity, Consistency, Isolation, Durability)
! Others provide transactions limited to single entities (MongoDB), no roll-back (Redis), etc.
! Rely on BASE (Basic Availability, Soft-state, Eventual consistency) instead of ACID
47
EXAMPLE 1: SYNCHRONIZING REDIS+MYSQL
48
https://oracleus.activeevents.com/connect/sessionDetail.ww?SESSION_ID=477500
Updating REDIS #FAIL
begin0MySQL0transaction00update0MySQL00update0Redis0
rollback0MySQL0transaction0
begin0MySQL0transaction00update0MySQL0
commit0MySQL0transaction0<<0system0crashes0>>0update0Redis0
Redis has updated MySQL does not
MySQL has updated Redis does not
EXAMPLE 1: UPDATING REDIS RELIABLY
Step I Step 2
49
begin0MySQL0transaction00update0MySQL00queue0CRUD0event0in0MySQL0
commit0transaction0
Event0Id0Operation:0Create,0Update,0Delete0queue0CRUD0event0in0MySQL0
New0entity0state,0e.g.0JSON0
ACID for0each0CRUD0event0in0MySQL0queue00
get0next0CRUD0event0from0MySQL0queue0if0CRUD0event0is0not0duplicate0then0
0update0Redis0(incl.0eventID)0end0if0
0begin0MySQL0transaction0
0mark0CRUD0event0processed0commit0transaction0
0end0for0each0
EXAMPLE 1: UPDATING REDIS RELIABLY
50
EntityCRUDEvent0Repository0
EntityCRUDEvent0Processor0 Redis0updater0
ID$ JSON$ Processed?$
INSERT0INTO0..0 SELECT0…0FROM..0
apply(event)0
Timer0Step 1 Step 2
EXAMPLE 2: SYNCHRONIZING MONGODB+RELATIONAL
55
! Spring Data project
http://static.springsource.org/spring-data/data-mongodb/docs/current/reference/html/#mongo.cross.store
(Katsov-2012)
Use the right tool for a given job…
56
Lack of standardization of models and data storage technologies
+
Data&stores&designed&&to&scale&simply&&OLTP5style&applicaIon&loads&& Read/Write$operations00
by0thousands/millions0of0users0
CHARACTERISTIC$ SUBCHARACTERISTIC$ METRIC$
Reliability0 Maturity0 API changes0
Availability0 Downtime 30
Fault tolerance0 Node down throughput 30
Recoverability0 Time to stabilize on node up 30
Performance and efficiency0
Time behaviour0 Throughput, latency 20
Resource utilisation0 CPU, Memory and disk usage 40
QUALITY DRIVEN BENCHMARK1
57
1Yahoo Cloud Serving Benchmark, https://github.com/brianfrankcooper/YCSB/wiki 2 Cooper,B.F.,Silberstein,A.,Tam,E.,Ramakrishnan,R.,Sears,R.:Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM symposium on Cloud computing. pp. 143–154. SoCC ’10, ACM, New York, NY, USA (2010) 3 Nelubin, D., Engber, B.: Failover Characteristics of leading NoSQL databases. Tech. rep., Thumbtack Technology (2013) 4 Massie, M.L., Chun, B.N., Culler, D.E.: The Ganglia Distributed Monitoring Sys- tem: Design, Implementation, and Experience. Parallel Computing 30(7) (Jul 2004)
+QUALITY DRIVEN BENCHMARK
58
Workload0
executor0
DB0interface0
layer0
Client0threads0
Stats0 Cloud0serving0store0
QDB0
YSCB0Client0
• Read/write0mix0• Record0size0• Popularity0distribution0
• DB0to0use0• Workload0to0use0• Target0throughput0• Number0of0threads0
Read0latency0Throughput0
Linked0data0&0temporal0streams0
+ONGOING WORK ! QDB benchmark extends YCSB: FaultTolerance, Recoverability and TimeBehaviour
! Pivot data model for representing NoSQL stores data models
! Sample application: Shopping system1 (ProductInfo)
! Document data stores: MongoDB, Couchbase, VoltDB, Redis, Neo4J
! Cluster of four Ubuntu 12.04 servers deployed with extra large VM instances (8 virtual cores and 14 GB of RAM) in Windows Azure2
! Distributed polyglot (big) database engineering ! Model2Roo: engineering data storage solutions for given data collections
! ExSchema for supporting the maintenance of a polyglot storage solution
59
1 McMurtry, D., Oakley, A., Sharp, J., Subramanian, M., Zhang, H.: Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence Microsoft patterns & practices, Microsoft (2013) 2 http://www.windowsazure.com/ 3 http://forge.puppetlabs.com/puppetlabs/ 4Yahoo Cloud Serving Benchmark, https://github.com/brianfrankcooper/YCSB/wiki
IN BRIEF…
! Many proposals, but no definite solution yet…
! Research/Industry challenges
! Open opportunities if you’re interested in this topic!
60
WHEN IS POLYGLOT PERSISTENCE PERTINENT?
! Application essentially composing and serving web pages
! They only looked up page elements by ID, they had different needs or availability, concurrency and no need to share all their data
! A problem like this is much better suited to a NoSQL store than the corporate relational DBMS
! Scaling to lots of traffic gets harder and harder to do with vertical scaling
! Many NoSQL databases are designed to operate over clusters
! They can tackle larger volumes of traffic and data than is realistic with a single server
61
Juan0Carlos0Castrejón0University0of0Grenoble0France0
Javier0Espinosa0University0of0Grenoble0France0
Dr.0Genoveva0Vargas7Solar0CNRS,0LIG7LAFMIA0France0
[email protected]://www.vargas7solar.com/bigdata7management0000
REFERENCES
! Eric0A.,0Brewer0"Towards0robust0distributed0systems."0PODC.020000
! Rick,0Cattell0"Scalable0SQL0and0NoSQL0data0stores."0ACM0SIGMOD0Record039.40(2011):0127270
! Juan0 Castrejon,0 Genoveva0 Vargas7Solar,0 Christine0 Collet,0 and0 Rafael0 Lozano,0 ExSchema:0Discovering0and0Maintaining0Schemas0from0Polyglot0Persistence0Applications,0In0Proceedings0of0the0International0Conference0on0Software0Maintenance,0Demo0Paper,0IEEE,0201300
! M.0Fowler0and0P.0Sadalage.0NoSQL0Distilled:0A0Brief0Guide0to0the0Emerging0World0of0Polyglot0Persistence.0Pearson0Education,0Limited,020120
! C.0 Richardson,0 Developing0 polyglot0 persistence0 applications,0 http://fr.slideshare.net/chris.e.richardson/developing7polyglotpersistenceapplications7gluecon20130
63