Overview of no sql
-
Upload
sean-murphy -
Category
Technology
-
view
2.353 -
download
0
description
Transcript of Overview of no sql
Overview of NoSQL...motivation, technologies, should you care?
Overview● Evolution of/motivation for NoSQL
databases● Characterization of NoSQL databases● Classification of NoSQL databases● Popularity/usage of NoSQL systems
A brief history of NoSQL● Originally coined in 1998 by Strozzi for
specific non-rel database○ easy to use, free, text based data storage, easy
manipulation of contents of db● Reintroduced by Evans (Rackspace) in 2009
for conf on open source distributed databases○ in response to increase in interest in non RDBMS
solutions■ bringing together Cassandra, Mongo, Couch, etc
● Has grown as a movement over last 3 years
Current status● Significant buzz within community in 2010
○ initial development of technology○ pioneer deployments○ lots of meetups/conferences/birds of feathers
● Many key technologies evolved later 2010, 2011○ more large deployments for some technologies○ small companies with no legacy basing operations
on NoSQL
Current Status● 2012
○ buzz/hype is fading○ technology continues to mature○ increased number of deployments○ skills sought in job market
NoSQL - a negative definition● NoSQL simply defined by being non-
relational○ diverse set of technologies fall into NoSQL camp
● Motivations mixed○ open source○ scale - TB, PB - particulary for read/write latency○ increased flexibility over RDBMS systems○ ability to work with raw data○ ACID not always most appropriate design choice
■ analytics data is excellent example● Results in many different NoSQL
technologies
Typical characteristics● Don't use SQL!● Open Source● Intended to deliver performance
○ in some dimension● Typically JOIN not supported
○ performance hit● Consistency often relaxed
○ eventual consistency● More flexibility in schema
○ if schema used at all!
Diversity of NoSQL databases● 122 seperate technologies listed on http:
//nosql-database.org/○ mix of commercial, open source and some
inbetween● Vary in many dimensions:
○ architecture○ interfaces
■ api/languages○ internal data storage○ distribution mechanisms
■ redundancy, reliability○ usage - deployments & support community○ maturity
Classification of NoSQL systems● Column based solutions● Document store solutions● Key/Value solutions● Graph based solutions● Less significantly:
○ XML databases○ Object databases○ Mulitvalue databases
Column based solutions● Structured data
○ similar to classical tables● Generally much more flexible
○ no rigorous schema necessary○ can typically add columns in ad hoc fashion
■ often without explicitly declaring column● However, can result in very different usage
○ eg can have millions of columns associated with given row
● Examples: Hadoop/HBase, Cassandra, Hypertable, SimpleDB
Document based solutions● Less structured data
○ DB composed of 'documents' containing arbitrary data■ usually containing longer form content eg CMS
● Documents contain some structure to support query/search/filter, etc
● Somewhat less emphasis on a key○ can be autogenerated
● Quite unlike classical databases● Examples: MongoDB, CouchDB
Key/value stores● DBs inspired by memcache
○ simple, fast key/value stores● Attempt to retain most of DB in memory
○ fast response times● Different designs for scalability
○ single node/multi node● Much emphasis on the keys in this type of
DB● Write usually overwrites entire previous entry● Examples: Redis, Couchbase/Membase,
DynamoDB, Riak
Graph based solutions● Obviously different from previous categories
○ Focus specifically on graphs● Queries supported are graph-specific
○ eg get nodes related to specified node● Typically support for solving standard graph
problems○ eg shortest path, general graph traversal
● Can deliver very significant performance over non-graph specific solutions○ for graph problems!
● Examples: Neo4j
It's a noisy space...● Very many candidate technologies● Relatively small amount of real world
solutions● Differences between classifications above is
one of emphasis...○ column based and document based arrive at semi-
structured sweet spot from opposite ends of spectrum
● ...although this results in different preferred use cases...○ document based solution better for document
problems, eg CMS
Common techniques used● Hashing techniques used to map data to
nodes in cluster● Internode communication via Gossip● Common replication techniques● Thrift is used in a few cases● MapReduce often used to search over
distributed system
Comparison (oldish)...
Comparison (oldish)
Comparison (oldish)
Horses for courses...● SQL is perfectly good solution for many
problems○ tried and tested
● Some problems require alternative solution○ typically driven by scale and/or flexibility
● NoSQL offers (many) alternatives○ although relatively easy to identify realistic options
● Column based approaches good for mostly structured data with enhanced flexibility
● Document based approaches good for document oriented problems
● Key/Value mostly intended for rapid response on more modest data sets
...so let's dive into one NoSQL database...● Cassandra...