Introduction to NoSQL Databaseskti.tugraz.at/staff/rkern/courses/dbase2/slides_nosql.pdf ·...
Transcript of Introduction to NoSQL Databaseskti.tugraz.at/staff/rkern/courses/dbase2/slides_nosql.pdf ·...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction to NoSQL Databases
Roman Kern
KTI, TU Graz
2017-10-16
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
IntroWhy NoSQL?
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Introduction
The birth of NoSQL
Term appeared in 2009
Not only SQL
Common properties (pros)
Non relationalSchema-less (schema free)Good scalability
Potential down-sides (cons)
Limited query abilitiesNot standardised (evolving technology)
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 3 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Introduction
Motivations for starting NoSQL1 Growth of data
User-generatedMachine-generated, e.g. log-files, sensorsHigher degree of connectedness
2 Need for flexibility
... instead of a rigid schemaFor semi-structured data (schema-free / schema-less)
3 No separation of data management and data processing
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 4 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Introduction
Data Management vs. Data Processing
Classic CRUD operations no longer sufficient
... for advanced data analytics→ need to combine both functionalities
Paradigm shift: Bring the code to the data
i.e. the locality of data is taken into considerations... for the data processing
Example applications:
Online transaction processing (OLTP) → relational databasesOnline analytical processing (OLAP) → data warehousingHigh performance, scalability → NoSQL
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 5 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Introduction
Scalability
Scale up (scale vertically) vs scale out (scale horizontally)
Scale up: Add more hardware to a single machineScale out: Add more machines
Degree of sharing
Shared memory (single machine, single storage)Shared disk (multiple machines, single storage)Shared nothing (multiple machines, multiple storage)
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 6 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Introduction
Replication
In an distributed system, data is replicated between nodes
... thus data is stored multiple times
Types of replication1 Synchronous (eager)
All data is replicated to all nodes before ending the operation→ complex, even impossible in some configurations
2 Asynchronous (lazy)
Operation is finished before all data has been written by all nodes→ potentially inconsistent
Access for writing options1 Single node accepts writing of data (master/slave, primary copy)2 All nodes accept write operations (update anywhere)
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 7 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Introduction
Sharding
In an distributed system, each node may be responsible for differentparts of the full data
... still data is replicated for redundancyAlso known as: partitioning, fragmentationAdvantage: improved efficiency (fewer resources)
Types of sharding:1 Hash-based
Hash-key determines partition → no data locality
2 Range-based
Assigns range (binning) → rebalancing needed
3 Entity-group
All data from single transactions assigned to a single partition →partitions cannot easily change
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 8 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Introduction
ACID vs. BASE
ACID
AtomicityConsistencyIsolationDurability
BASE
Basically AvailableSoft stateEventually consistent
Trade-offs for improved performance
Some database systems prefer performance over durability
Redundancy for improved performance (no normalisation)
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 9 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Introduction
CAP theorem
Not possible to achieve all three properties:Consistent
Reads are guaranteed to incorporate all previous writes (all nodes seethe same data at the same time)
Availability
Every query returns an answer, instead of an error (failures do notprevent the remaining system to be operational)
Partitioned
The systems runs, even if a part of the system is not reachable (e.g.due to network failure, message loss)
Implications of CAP
One needs to find a trade-off between the properties, e.g. chooseavailability over consistency (as consistency is a major bottleneck forscalability)
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 10 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
Introduction
Classification scheme of NoSQL systems1 According to the data model
Key-ValuesTabular (wide column)DocumentGraphSpecialised, e.g. time-series, triples, objects, XML, files, ...
2 According to the CAP trade-off
Available & partition tolerantConsistent & partition tolerantNot partition tolerant
3 According to the replication & sharding types
lazy vs. eagerhash based vs. range based vs. entity-group
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 11 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
NoSQL SystemsWhat types of NoSQL systems are out there?
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 12 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
NoSQL Systems
Distributed File System
Data model Folders & files (plus metadata, e.g. time of creation, ...)
Interface File system operations
Variations Network File System: (often) single storageCluster File Systems: (multiple) storageDistributed File Systems: multiple, independent storage
Examples NFS, GPFS, HDFS
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 13 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
NoSQL Systems
Key/Value Store
Data model Key → Value... where the value is a (binary) opaque blob... similar to hash-tables
Interface CRUD operations
Properties Excellent scalabilityMay support redundant storage
Examples Amazon Dynamo (AP, lazy, hash-based), Redis (CP, lazy,hash-based), Riak (AP, lazy, hash-based), Memcached (CP),...
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 14 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
NoSQL Systems
Tabular / Wide Column
Data model (Rowkey ,Column,Timestamp) → Value... where the value is a (binary) opaque blob
Interface CRUD operations, scan operations
Properties Allow vertical and horizontal partitioning... adjacent rows are stored closed to other... certain columns are stored close to each other, e.g.via column familiesEach cell might have multiple versions (timestamps)
Examples Cassandra (AP, lazy, hash-based), Google BigTable (CP,eager, range-based), HBase (CP, eager, range-based),Parquet, ...
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 15 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
NoSQL Systems
Example of Cassandra Query Language
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 16 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
NoSQL Systems
Document Storage
Data model (Collection,Key) → Value... where the value is understood by the system
Interface CRUD operations, specialised queries (e.g. JavaScript)
Properties Documents are schema free, i.e. no need for schemamigrationsDocuments may also be versionedDocuments are often JSON
Examples CouchDB (AP, lazy), MongoDB (CP, lazy—eager,range-based), Amazon SimpleDB (AP), Cloudant, Rethink(lazy—eager, range-based), ...
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 17 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
NoSQL Systems
Key/Value Store vs. Document Storage vs. Tabular Storage
Key/Value store, if requirements are simple
Document store, if need to access parts of the value
Document store, if documents are independent units
Tabular store, if multiple entries (e.g. rows) are updated at the sametime
Tabular store, if only certain columns need to be retrieved
Things to watch out for
Maximum size of value depends on actual implementation
Avoid joins for optimal performance
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 18 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
NoSQL Systems
Consistency vs. Availability vs. Partitioning
See also: http://blog.nahurst.com/visual-guide-to-nosql-systems
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 19 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
NoSQL Systems
Graph Storage
Data model G = (V ,E )... where each vertex or edge may have additional properties
Interface Graph traversals, specialised queries & insert/updatemethods
Properties Optimised for graph traversal, i.e. no joins neededTypes of edges can be specified by the user
Examples Neo4J (CA), OrientDB (CA), TitanDB, Giraph,InfiniteGraph (CA), ...
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 20 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
NoSQL Systems
Search Storage
Data model documents,metadata... often stored as Vector Space Model
Interface specialised query languages
Properties Documents may consist of multiple fields (facets)... field may be structured as well, e.g. date, integer,stringsFine control over indexing process, i.e. how each field isindexed
Examples Solr, ElasticSearch, ...
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 21 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
NoSQL Systems
Object Oriented Storage
Data model classes, objects, relations
Interface CRUD, traversal methods
Properties Known model from OO programmingOften strong coupling between DB system andprogramming language
Examples db4o (Ca), Versant (CA), Objectivity (CA), ...
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 22 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
NoSQL Systems
XML Databases
Data model XML,RDF (triples)
Interface CRUD, query languages (XQuery, SPARQL, ...)
Properties RDF based systems often called TripleStoreOften used in combination with semantic technologies
Examples BaseX, MarkLogic (CA), AllegroGraph (CA), BigData, ...
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 23 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
NoSQL Systems
Timeseries Databases
Data model (timestamp)− > value
Interface CRUD, specialised query languages
Variations Type of value is the sample for all entries, typicallysimple, e.g. floating point numberComplex value type, e.g. JSON
Properties Optimised for time series data, i.e. small storagerequirementsQuery for time rangesOperations on time series
Examples InfluxDB, KairoDB, ...
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 24 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
NoSQL Systems
In-Memory Databases
Data model (key)− > value... but not limited to this model
Interface CRUD, specialised query languages
Properties Data is stored in RAMOften distributed over multiple machine (RAM is thenew Disk)In its purest form does not satisfy durability criteria
Examples Hazelcast, Redis, SAP HANA, ...
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 25 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
NoSQL Systems
API & Data Formats
NoSQL system often use RESTful APIs
Direct match with data model and CRUD operations
Serialisation of objects
Many techniques usede.g. Apache Avro, Protocol Buffers, ...
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 26 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
NoSQL Systems
Features
Not all NoSQL systems support transactions
Instead they support atomic single transactionsTherefore not all operations are supported
Not all NoSQL systems support security features
e.g. access control
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 27 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
NoSQL Systems
Cloud Database Solutions
Storage in the internet (cloud)
DBaaS - Database as a Service
Not limited to NoSQL, traditional SQL are available as well
Multi-tenancy as important feature (separation of multiple clients)
Private OS - all separate (e.g. Amazon RDS)Private process - same machine (e.g. Compose)Private schema - same database (e.g. Google DataStore)Shared schema - same tables (most SaaS apps)
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 28 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
Current State
Current state of data storage systems
Depending on the actual requirements
... select a suitable storage solution
Or select multiple solutions for each sub-system→ polyglot persistence
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 29 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
Future of NoSQL Systems
Outlook - NewSQLAttempt to achieve consistency and availability for distributedsystems
E.g. Google Spanner, CockroachDB... build on the Raft Consensus algorithm... relies on specialised hardware
https://github.com/cockroachdb/cockroach
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 30 / 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Systems
The EndNext: Graph Databases
CreditsScalable Data Management: NoSQL Data Stores in Research and Practicehttp://icde2016.fi/tutorials.php
Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 31 / 31