Scaling the Web: Databases & NoSQL

67
Scaling the Web: Databases & NoSQL Richard Schneeman @schneems works for @Gowalla Wed Nov 10 2011

description

This is an introduction to relational and non-relational databases and how their performance affects scaling a web application. This is a recording of a guest Lecture I gave at the University of Texas school of Information. In this talk I address the technologies and tools Gowalla (gowalla.com) uses including memcache, redis and cassandra. Find more on my blog: http://schneems.com

Transcript of Scaling the Web: Databases & NoSQL

Page 1: Scaling the Web: Databases & NoSQL

Scaling the Web:Databases &NoSQL

Richard Schneeman@schneems works for @Gowalla

Wed Nov 10 2011

Page 2: Scaling the Web: Databases & NoSQL

whoami• @Schneems

• BSME with Honors from Georgia Tech

• 5 + years experience Ruby & Rails

• Work for @Gowalla

• Rails 3.1 contributor : )

• 3 + years technical teaching

Page 3: Scaling the Web: Databases & NoSQL

Traffic

Page 4: Scaling the Web: Databases & NoSQL

Compounding Trafficex. Wikipedia

Page 5: Scaling the Web: Databases & NoSQL

Compounding Trafficex. Wikipedia

Page 6: Scaling the Web: Databases & NoSQL

Gowalla

Page 7: Scaling the Web: Databases & NoSQL

Gowalla• 50 best websites NYTimes 2010

• Founded 2009 @ SXSW

• 1 million+ Users

• Undisclosed Visitors

• Loves/highlights/comments/stories/guides

• Facebook/Foursquare/Twitter integration

• iphone/android/web apps

• public API

Page 8: Scaling the Web: Databases & NoSQL
Page 9: Scaling the Web: Databases & NoSQL

Gowalla Backend• Ruby on Rails

• Uses the Ruby Language

• Rails is the Framework

Page 10: Scaling the Web: Databases & NoSQL

The Web is Data• Username => String

• Birthday => Int/ Int/ Int

• Blog Post => Text

• Image => Binary-file/blob

Data needs to be stored to be useful

Page 11: Scaling the Web: Databases & NoSQL

Database

Page 12: Scaling the Web: Databases & NoSQL

Gowalla Database • PostgreSQL

• Relational (RDBMS)

• Open Source

• Competitor to MySQL

• ACID compliant

• Running on a Dedicated Managed Server

Page 13: Scaling the Web: Databases & NoSQL

Need for Speed• Throughput:

• The number of operations per minute that can be performed

• Pure Speed:

• How long an individual operation takes.

Page 14: Scaling the Web: Databases & NoSQL

Potential Problems • Hardware

• Slow Network

• Slow hard-drive

• Insufficient CPU

• Insufficient Ram

• Software

• too many Reads

• too many Writes

Page 15: Scaling the Web: Databases & NoSQL

Scaling Up versus Out• Scale Up:

• More CPU, Bigger HD, More Ram etc.

• Scale Out:

• More machines

• More machines

• More machines

• ...

Page 16: Scaling the Web: Databases & NoSQL

Scale Up• Bigger faster machine

• More Ram

• More CPU

• Bigger ethernet bus

• ...

• Moores Law

• Diminishing returns

Page 17: Scaling the Web: Databases & NoSQL

Scale Out• Forget Moores law...

• Add more nodes

• Master/ Slave Database

• Sharding

Page 18: Scaling the Web: Databases & NoSQL

Master DB

Slave DB Slave DB Slave DB Slave DB

Write

Copy

Read

Master/Slave

Page 19: Scaling the Web: Databases & NoSQL

Master & Slave +/-• Pro

• Increased read speed

• Takes read load off of master

• Allows us to Join across all tables

• Con

• Doesn’t buy increased write throughput

• Single Point of Failure in Master Node

Page 20: Scaling the Web: Databases & NoSQL

Users in USA

Read

Sharding

Write

Users in Europe

Users in Asia

Users in Africa

Page 21: Scaling the Web: Databases & NoSQL

Sharding +/-• Pro

• Increased Write & Read throughput

• No Single Point of failure

• Individual features can fail

• Con

• Cannot Join queries between shards

Page 22: Scaling the Web: Databases & NoSQL

What is a Database?• Relational Database Managment System

(RDBMS)

• Stores Data Using Schema

• A.C.I.D. compliant

• Atomic

• Consistent

• Isolated

• Durable

Page 23: Scaling the Web: Databases & NoSQL

RDBMS• Relational

• Matches data on common characteristics in data

• Enables “Join” & “Union” queries

• Makes data modular

Page 24: Scaling the Web: Databases & NoSQL

Relational +/-• Pros

• Data is modular

• Highly flexible data layout

• Cons

• Getting desired data can be tricky

• Over modularization leads to many join queries

• Trade off performance for search-ability

Page 25: Scaling the Web: Databases & NoSQL

Schema Storage• Blueprint for data storage

• Break data into tables/columns/rows

• Give data types to your data

• Integer

• String

• Text

• Boolean

• ...

Page 26: Scaling the Web: Databases & NoSQL

Schema +/-• Pros

• Regularize our data

• Helps keep data consistent

• Converts to programming “types” easily

• Cons

• Must seperatly manage schema

• Adding columns & indexes to existing large tables can be painful & slow

Page 27: Scaling the Web: Databases & NoSQL

ACID• Properties that guarante a database

transaction are processed reliably

• Atomic

• Consistent

• Isolated

• Durable

Page 28: Scaling the Web: Databases & NoSQL

ACID• Atomic

• Any database Transaction is all or nothing.

• If one part of the transaction fails it all fails

“An Incomplete Transaction Cannot Exist”

Page 29: Scaling the Web: Databases & NoSQL

ACID• Consistent

• Any transaction will take the database from one consistent state to another

“Only Consistent data is allowed to be written”

Page 30: Scaling the Web: Databases & NoSQL

ACID• Isolated

• No transaction should be able to interfere with another transaction

“the same field cannot be updated by two sources at the exact same time”

a = 0a += 1 a += 2 } a = ??

Page 31: Scaling the Web: Databases & NoSQL

ACID• Durable

• Once a transaction Is committed it will stay that way

“Save it once, read it forever”

Page 32: Scaling the Web: Databases & NoSQL

What is a Database?• RDBMS

• Relational

• Flexible

• Has a schema

• Most likely ACID compliant

• Typically fast under low load or when optimized

Page 33: Scaling the Web: Databases & NoSQL

What is SQL?• Structured Query Language

• The language databases speak

• Based on relational algebra

• Insert

• Query

• Update

• Delete

“SELECT Company, Country FROM Customers WHERE Country = 'USA' ”

Page 34: Scaling the Web: Databases & NoSQL

Why people <3 SQL• Relational algebra is powerful

• SQL is proven

• well understood

• well documented

Page 35: Scaling the Web: Databases & NoSQL

Why people </3 SQL• Relational algebra Is hard

• Different databases support different SQL syntax

• Yet another programming language to learn

Page 36: Scaling the Web: Databases & NoSQL

SQL != Database• SQL is used to talk to a RDBMS (database)

• SQL is not a RDBMS

Page 37: Scaling the Web: Databases & NoSQL

What is NoSQL?

Not ARelationalDatabase

Page 38: Scaling the Web: Databases & NoSQL

RDBMS

Page 39: Scaling the Web: Databases & NoSQL

Types of NoSQL• Distributed Systems

• Document Store

• Graph Database

• Key-Value Store

• Eventually Consistent Systems

Mix And Match ↑

Page 40: Scaling the Web: Databases & NoSQL

Key Value Stores• Non Relational

• Typically No Schema

• Map one Key (a string) to a Value (some object)

Example: Redis

Page 41: Scaling the Web: Databases & NoSQL

Key Value Exampleredis = Redis.new

redis.set(“foo”, “bar”)

redis.get(“foo”)

>> “bar”

Page 42: Scaling the Web: Databases & NoSQL

Key Value Exampleredis = Redis.new

redis.set(“foo”, “bar”)

redis.get(“foo”)

>> “bar”

Key Value

Key

Value

Page 43: Scaling the Web: Databases & NoSQL

Key Value• Like a databse that can only ever use

primary Key (id)

YESselect * from users where id = ‘3’;

NOselect * from users where name = ‘schneems’;

Page 44: Scaling the Web: Databases & NoSQL

NoSQL @ Gowalla• Redis (key-value store)

• Store “Likes” & Analytics

• Memcache (key-value store)

• Cache Database results

• Cassandra

• (eventually consistent, with-schema, key value store)

• Store “feeds” or “timelines”

• Solr (search index)

Page 45: Scaling the Web: Databases & NoSQL

Memcache• Key-Value Store

• Open Source

• Distributed

• In memory (ram) only

• fast, but volatile

• Not ACID

• Memory object caching system

Page 46: Scaling the Web: Databases & NoSQL

Memcache Examplememcache = Memcache.new

memcache.set(“foo”, “bar”)

memcache.get(“foo”)

>> “bar”

Page 47: Scaling the Web: Databases & NoSQL

Memcache• Can store whole objects

memcache = Memcache.newuser = User.where(:username => “schneems”)memcache.set(“user:3”, user)

user_from_cache = memcache.get(“user:3”)user_from_cache == user>> trueuser_from_cache.username>> “Schneems”

Page 48: Scaling the Web: Databases & NoSQL

Memcache @ Gowalla• Cache Common Queries

• Decreases Load on DB (postgres)

• Enables higher throughput from DB

• Faster response than DB

• Users see quicker page load time

Page 49: Scaling the Web: Databases & NoSQL

What to Cache?• Objects that change infrequently

• users

• spots (places)

• etc.

• Expensive(ish) sql queries

• Friend ids for users

• User ids for people visiting spots

• etc.

Page 50: Scaling the Web: Databases & NoSQL

Memcache Distributed

B

C

A

Page 51: Scaling the Web: Databases & NoSQL

Memcache Distributed

B C

A

Easily add more nodes

D

Page 52: Scaling the Web: Databases & NoSQL

Memcache <3’s DB• We use them Together

• If memcache doesn’t have a value

• Fetch from the database

• Set the key from database

• Hard

• Cache Invalidation : (

Page 53: Scaling the Web: Databases & NoSQL

Redis• Key Value Store

• Open Source

• Not Distributed (yet)

• Extremely Quick

• “Data structure server”

Page 54: Scaling the Web: Databases & NoSQL

Redis Example, againredis = Redis.new

redis.set(“foo”, “bar”)

redis.get(“foo”)

>> “bar”

Page 55: Scaling the Web: Databases & NoSQL

Redis - Has Data Types• Strings

• Hashes

• Lists

• Sets

• Sorted Sets

Page 56: Scaling the Web: Databases & NoSQL

Redis Example, setsredis = Redis.newredis.sadd(“foo”, “bar”)redis.members(“foo”)>> [“bar”]redis.sadd(“foo”, “fly”)redis.members(“foo”)>> [“bar”, “fly”]

Page 57: Scaling the Web: Databases & NoSQL

Redis => Likeable• Very Fast response

• ~ 50 queries per page view

• ~ 1 ms per query

• http://github.com/Gowalla/likeable

Page 58: Scaling the Web: Databases & NoSQL

Cassandra• Open Source

• Distributed

• Key Value Store

• Eventually Consistent

• Sortof not ACID

• Uses A Schema

• ColumnFamilies

Page 59: Scaling the Web: Databases & NoSQL

Cassandra Distributed

B C

A

Eventual Consistency

D

Data In

Copied To Extra Nodes ... Eventually

Page 60: Scaling the Web: Databases & NoSQL

Cassandra@ Gowalla{Activity

Feeds

Page 61: Scaling the Web: Databases & NoSQL

Cassandra @ Gowalla• Chronologic

• http://github.com/Gowalla/chronologic

Page 62: Scaling the Web: Databases & NoSQL

Should I use NoSQL?

Page 63: Scaling the Web: Databases & NoSQL

Which One?

Page 64: Scaling the Web: Databases & NoSQL

Pick the right tool

Page 65: Scaling the Web: Databases & NoSQL

Tradeoffs • Every Data store has them

• Know your data store

• Strengths

• Weaknesses

Page 66: Scaling the Web: Databases & NoSQL

NoSQL vs. RDBMS• No Magic Bullet

• Use Both!!!

• Model data in a datastore you understand

• Switch to when/if you need to

• Understand Your Options

Page 67: Scaling the Web: Databases & NoSQL

Questions?

Richard Schneeman@schneems works for @Gowalla