SMU No SQL Talk

41

Transcript of SMU No SQL Talk

Page 1: SMU No SQL Talk
Page 2: SMU No SQL Talk

No SQL is not about SQL

Page 3: SMU No SQL Talk

No SQL is a Zoo.. Key-Value Stores

BigTableSimpleDB

Azure Table

Wide Column Stores

Document Stores Graph Databases

Page 4: SMU No SQL Talk

Why not Traditional RDBMs?Offer incredibly useful guarantees and have been battleworn and tested.

Page 5: SMU No SQL Talk

Referential Integrity

Page 6: SMU No SQL Talk

ACID Transactions

Page 7: SMU No SQL Talk

And SQL..

SQL is a powerful expressive DSL (Domain Specific Language) that many, many people understand.

Page 8: SMU No SQL Talk

So Why No SQL?

Page 9: SMU No SQL Talk

Web Scale

Page 10: SMU No SQL Talk

Web scale can be done in SQL

Page 11: SMU No SQL Talk

How?• Vertical Part / Logical Sharding

(Instagram)• Caching (28 terabytes Facebook,

2008)• SQL + No SQL• Think about your Architect

Want to learn more? Spend time on http://highscalability.com/

Page 12: SMU No SQL Talk

But a reasonable question is..

How much time should we be devoting to managing scaling problems versus adding business value to these systems?

Page 13: SMU No SQL Talk

So what are we giving up?

Page 14: SMU No SQL Talk

Availability

Consistency

Partitiontolerant

MongoDB

MySQLSQL Server

Oracle

RDBMsHBase (Hadoop)

Google BigTable

DynamoCouch Cassandra

Voldemort

Redis

SimpleDB

CAP

Page 15: SMU No SQL Talk

FriendsWhoCook.comA social network of friends who enjoy cooking great food.

- Add my Recipes - Add my friends- Show my friends- Like / Comment on my Friend’s Recipes- Search recipes of my friends, their

friends, and so on by.

Page 16: SMU No SQL Talk
Page 17: SMU No SQL Talk
Page 18: SMU No SQL Talk
Page 19: SMU No SQL Talk

Problem 1: Store Recipes

Page 20: SMU No SQL Talk

Fairly Simple Objectclass Recipe {

Image PhotoList<Comments> CommentsList<Ingredients> IngredientsList<ProfileId> LikesCategory RecipeCategory}

Page 21: SMU No SQL Talk

Becomes a complex RDBM’ess

Page 22: SMU No SQL Talk

Object-Relational Impedance Mismatch

Page 23: SMU No SQL Talk

No SQL: Document Store• Data element is a document• Documents grouped into collections• Often store in JSON• Works great with Domain Driven

Design• Schema-less

Page 24: SMU No SQL Talk

Document Store Examples• MongoDB (PC)• CouchDB (PA)• RavenDB (PA)

Page 25: SMU No SQL Talk

DEMO: MongoDB

Page 26: SMU No SQL Talk

Demo: CouchDB

Page 27: SMU No SQL Talk

Problem 2: Model the Social Graph

Page 28: SMU No SQL Talk

Friends in RDBMS

For a more sophisticated view of modeling graphs in an RDBMs:http://www.slideshare.net/quipo/rdbms-in-the-social-networks-age

Page 29: SMU No SQL Talk

Get my Friends

Declare @ProfileID int

SELECT FirstDegreeProfile.ID, FirstDegreeProfile.FirstName, FirstDegreeProfile.LastName

FROM Profile AS FirstDegreeProfileJOIN Friendship ON FirstDegreeProfile.ID = Friendship.FriendIDWHERE Friendship.ProfileID = @ProfileID

Page 30: SMU No SQL Talk

Friends and their friends

Declare @ProfileID int Set @ProfileID = 1

Select FirstDegreeFriendship.FriendId as MyFriendId, SecondDegreeProfile.ID as

SecondDegreeId, SecondDegreeProfile.FirstName as SecondDegreeFirstName, SecondDegreeProfile.LastName as SecondDegreeLastName

from Profile as SecondDegreeProfileJoin Friendship as SecondDegreeFriendship ON SecondDegreeProfile.ID = SecondDegreeFriendship.FriendIDjoin Friendship as FirstDegreeFriendship ON SecondDegreeFriendship.ProfileID = FirstDegreeFriendship.FriendIDWhere FirstDegreeFriendship.ProfileId = @ProfileId

/* Note: A much better solution would use a recursive CTE to compute transitive closure */

Page 31: SMU No SQL Talk

Graph Databases• Optimized for graphs data• Check out Neo4J

Page 32: SMU No SQL Talk

Problem 3: Schemaless / Big Data

Facebook's Network: Credit Traud & Frost, UNC-Chapel Hill

Page 33: SMU No SQL Talk

How do we ask these questions?• After we changed the “like” button

icon for half of our users, did we get more or less likes from that sample?

• Of users who click on our ads, what pages did they spend the most time on?

• Which hidden patterns might make us competitive that we aren’t even aware of?Want to get far ahead of the pack? Read “The Lean Startup” by Eric Ries

Page 34: SMU No SQL Talk

Is this Actionable?

Page 35: SMU No SQL Talk

How about this?

Page 36: SMU No SQL Talk

Wide Column“A Bigtable is a sparse, distributed, persistent multidimensional sorted map”

Source: http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable

Page 37: SMU No SQL Talk

MapReduceMap(k,v) [(k1, v1), (k2, v2), (k1, v3), (k3, v4)]Map(k, v) (list of intermediate key / value pairs)

Internal Step: Takes list of intermediate key value pairs and converts to a key / list of values.

Reduce(k, [v1, v2, v3…]) (k, n1), (k, n2)

Page 38: SMU No SQL Talk

One Down Side…• We have to have smart people write

MapReduce programs and the problems need to be expressible as Map Reduce..

• General solutions are BIG money.

Page 39: SMU No SQL Talk

Final thought: Big Data is BIG

= ?

Page 40: SMU No SQL Talk

Things to Read• Bigtable: A Distributed Storage System for

Structured Data • Dynamo: Amazon’s Highly Available Key-value Store• MapReduce: Simplified Data Processing on Large

Clusters• The Google File System• Towards Robust Distributed Systems • http://jimbojw.com/wiki/index.php?

title=Understanding_Hbase_and_BigTable

Page 41: SMU No SQL Talk

Creative Commons Acknowledgments and Thanks!

Bobwitloxrosipaw