Is NoSQL The Future of Data Storage?
-
Upload
saltmarch-media -
Category
Technology
-
view
759 -
download
4
description
Transcript of Is NoSQL The Future of Data Storage?
Introduction
• Gary Short
• Technical Evangelist for Developer Express
• C# MVP
• www.garyshort.org
• @garyshort.
2
Be Doing 3 Things
1. Define NoSQL databases
2. Look at scenarios where you can use NoSQL
3. Drill into a specific use case.
5
Where Does NoSQL Originate?
• 1998
– OS relational database
• Created by Carlo Strozzi
• Didn’t expose an SQL interface
• Called NoSQL
• The author said:
• “departs from the relational model altogether...”
• “...should have been called ‘NoREL”.
7
More Recently...
• Eric Evans reintroduced the term in 2009
– Johan Oskarsson (last.fm)
• Event to discuss OS distributed databases
• This labels growing number datastores
– Open source
– Non-relational
– Distributed
– (often) don’t guarantee ACID.
8
Atlanta 2009
• No:sql(east) conference
• Billed as “conference of no-rel datastores”
• Worst tag line ever– SELECT fun, profit FROM real_world WHERE rel=false.
9
Key Attributes of NoSQL Databases
• Don’t require fixed table schemas
• Non-relational
• (Usually) avoid join operations
• Scale horizontally
– Adding more nodes to a storage system.
12
Document Store
• RavenDB
• Apache Jackrabbit
• CouchDB
• MongoDB
• SimpleDB
• XML Databases
– MarkLogic Server
– eXist.
14
Which Means?
• Graph consists of
– Node (‘stations’ of the graph)
– Edges (lines between them)
• FlockDB
– Created by the Twitter folks
– Nodes = Users
– Edges = Nature of relationship between nodes.
17
Key/Value Stores
• On disk
• Cache in Ram
• Eventually Consistent
– Weak Definition• “If no updates occur for a period, eventually all updates will
propagate through the system and all replicas will be consistent”
– Strong Definition• “for a given update and a given replica eventually either the
update reaches the replica or the replica retires”
• Ordered
– Distributed Hash Table allows lexicographical processing.
19
You Need Constant Consistency
• You’re dealing with financial transactions
• You’re dealing with medical records
• You’re dealing with bonded goods
• Best you use a RDMBS ☺.
23
You Need Horizontal Scalability
• You’re working across defined geographic regions
• You’re working with large quantities of data
• Game server sharding
• Use NoSQL
– Something like Cassandra.
24
Frequently Written Rarely Read
• Think web counters and the like
• Every time a user comes to a page = ctr++
• But it’s only read when the report is run
• Use NoSQL (key-value storage/memcache).
27
Here Today Gone Tomorrow
• Transient data like..
– Web Sessions
– Locks
– Short Term Stats
• Shopping cart contents
• Use NoSQL (Memcache).
30
Data Replication
• Same data in two or more locations
– Music Library
• Web browser
• iPone App
• NoSQL (CouchDB).
31
Hit me Baby One More Time!
• High Availability
– High number of important transactions
• Online gambling
• Pay Per view
– Ahem!
• Online Auction
• NoSQL (Cassandra – automatic clustering).
32
Give me a Real World Example
– The challenges
• Needs to store many graphs
– Who you are following
– Who’s following you
– Who you receive phone notifications from etc
• To deliver a tweet requires rapid paging of followers
• Heavy write load as followers are added and removed
• Set arithmetic for @mentions (intersection of users).
33
What Did They Need?
• Simplest possible thing that would work
• Allow for horizontal partitioning
• Allow write operations to
– Arrive out of order
– Or be processed more than once
• Failures should result in redundant work
– Not lost work!
36
The Result was FlockDB
• Stores graph data
• Not optimised for graph traversal operations
• Optimised for large adjacency lists
– List of all edges in a graph
• Each entry is a set of end points (or tuple if directed)
• Optimised for fast read and write
• Optimised for page-able set arithmetic.
37
How Does it Work?
• Stores graphs as sets of edges between nodes
• Data is partitioned by node
– All queries can be answered by a single partition
• Write operations are idempotent
– Can be applied multiple times without changing the result
• And commutative
– Changing the order of operands doesn’t change the result.
38
A Little More About Idempotency
• Applied several times with no change to the
result
• A operation ’O’ on set S is called idempotent
if, for all x in S, x O x = x.
• Set union
– A U B = {X: X E A or X E B}
• Set intersection
– A n B = {X: X E A and X E B}
39
A Little More About Commutative
• Changing the order of operands doesn’t change the result.
40
3 + 2 = 5
• Can be combined with idempotency
• Let’s look at the follow command in Twitter
• Let X = follow person X
• Let Y = follow person Y
• Then 3X + 2Y = 2Y + 3X
• And 2X + 3Y = 3X + 2Y
• Note: it’s only true for the same operation.
Commutative Writes Help Bring up
Partitions
• Partition can receive write traffic immediately
• Receive dump of data in the background
• Live for read as soon as the dump is complete.
41