Cloudcon East Presentation

61
Scaling with HiveDB

description

Introduction to HiveDB presented at Cloudcon East.

Transcript of Cloudcon East Presentation

Page 1: Cloudcon East Presentation

Scaling with HiveDB

Page 2: Cloudcon East Presentation

The Problem• Hundreds of

millions of products

• Millions of new products every week

• Accelerating growth

Page 3: Cloudcon East Presentation

Scaling up is no longer an option.

Page 4: Cloudcon East Presentation

Requirements

➡ OLTP optimized➡ Constant response time is more

important than low-latency➡ Expand capacity online➡ Ability to move records➡ No re-sharding

Page 5: Cloudcon East Presentation

2 years agoClustered Non-relational Unstructured

Oracle RACMS SQL ServerMySQL Cluster

Teradata (OLAP)

HadoopMogileFS

S3

Page 6: Cloudcon East Presentation

TodayClustered Non-relational Unstructured

Oracle RACMS SQL ServerMySQL Cluster

DB2Teradata (OLAP)

HiveDB

HypertableHBase

CouchDBSimpleDB

HadoopMogileFS

S3

Page 7: Cloudcon East Presentation

System Storage Interface Partitioning Expansion Node Types Maturity

Oracle RAC Shared SQL TransparentNo

downtime Identical 7 years

MySQL Cluster Memory / Local SQL Transparent

Requires Restart Mixed 3 years

Hypertable Local HQL TransparentNo

downtime Mixed?

(released 2/08)

DB2 Local SQL Fixed HashDegraded Performan

ceIdentical

3 years (25 years

total)

HiveDB Local SQL Key-basedDirectory

No downtime Mixed 18 months

(+13 years!)

Page 8: Cloudcon East Presentation

Operating a distributed system is hard.

Page 9: Cloudcon East Presentation

Non-functional concerns quickly dominate functional concerns.

Page 10: Cloudcon East Presentation

ReplicationFail overMonitoring

Page 11: Cloudcon East Presentation

Our approach:minimize cleverness

Page 12: Cloudcon East Presentation

Build atop solid, easy to manage technologies.

Page 13: Cloudcon East Presentation

So we can get back to selling t-shirts.

Page 14: Cloudcon East Presentation

MySQL is fast.

Page 15: Cloudcon East Presentation

MySQL is well understood.

Page 16: Cloudcon East Presentation

MySQL replication works.

Page 17: Cloudcon East Presentation

MySQL is free.

Page 18: Cloudcon East Presentation

The same for JAVA

Page 19: Cloudcon East Presentation
Page 20: Cloudcon East Presentation

The simplest thing

➡ HiveDB is a JDBC Gatekeeper for applications.

➡ Lightest integration➡ Smallest possible implementation

Page 21: Cloudcon East Presentation

The eBay approach

➡ Pro: Internally consistent data➡ Con: Re-sharding of data required➡ Con: Re-sharding is a DBA

operation

Page 22: Cloudcon East Presentation

Partition by keyA simple bucketed partitioning scheme expressed in code.

Page 23: Cloudcon East Presentation

Partition by key

Page 24: Cloudcon East Presentation

Directory

➡ No broadcasting➡ No re-partitioning➡ Easy to relocate records➡ Easy to add capacity

Page 25: Cloudcon East Presentation
Page 26: Cloudcon East Presentation

Disadvantages

➡ Intelligence moved from the database to the application (Queries have to be planned and indexed)

➡ Can’t join across partitions➡ NO OLAP! (We consider this a

benefit.)

Page 27: Cloudcon East Presentation

Hibernate ShardsPartitioned Hibernate from Google

Page 28: Cloudcon East Presentation

Why did we build this thing again?

Page 29: Cloudcon East Presentation

Oh wait, you have to tell it where things are.

Page 30: Cloudcon East Presentation
Page 31: Cloudcon East Presentation

Benefits of Shards

➡ Unified data access layer➡ Result set aggregation across

partitions➡ Everyone in the JAVA-world knows

Hibernate.

Page 32: Cloudcon East Presentation
Page 33: Cloudcon East Presentation
Page 34: Cloudcon East Presentation
Page 35: Cloudcon East Presentation
Page 36: Cloudcon East Presentation
Page 37: Cloudcon East Presentation
Page 38: Cloudcon East Presentation

Working on simplifying it even further.

Page 39: Cloudcon East Presentation
Page 40: Cloudcon East Presentation

But does it go?

Page 41: Cloudcon East Presentation

Case Study: CafePress➡ Leader in User-generated

Commerce➡ More products than eBay

(>250,000,000)➡ 24/7/365

Page 42: Cloudcon East Presentation

Case Study: Performance Requirements

➡ Thousands of queries/sec➡ 10 : 1 read/write ratio➡ Geographically distributed

Page 43: Cloudcon East Presentation

Case Study:Test Environment

➡ Real schema ➡ Production-class hardware➡ Partial data (~40M records)

Page 44: Cloudcon East Presentation

[email protected] Modified on April 09 2007

CafePress HiveDB 2007Performance Test Environment

JMeter (100s of threads)

web s

erv

ice

(hiv

edb)

Dell 1950 / 2x2 Xeon

Tomcat 5.5

Hardware LB

Directory Partition 0 Partition 1

client.jar

data

bases

(mysql)

Dell 1950 / 2x2 Xeon

Tomcat 5.5

Dell 2950 / 2x2 Xeon

16GB, 6x72GB 15k

Dell 2950 / 2x2 Xeon

16GB, 6x72GB 15k

Dell 2950 / 2x2 Xeon

16GB, 6x72GB 15k

Dell 1950 / 2x2 Xeon

Tomcat 5.5

Dell 2950 / 2x2 Xeon

16GB, 6x72GB 15k

JMeter (100s of threads)

client.jar

Dell 2950 / 2x2 Xeon

16GB, 6x72GB 15k

load g

enera

tors

48GB backplane non-

blocking gigabit switch

100MBit switch

com

mand &

contr

ol

JMeter (1 thread)

client.jar

Measurement

Workstation

JMeter (no threads)

client.jar

Test Controller

Workstation

Page 45: Cloudcon East Presentation

Case Study:Performance Goals

➡ Large object reads: 1500/s

➡ Large object writes: 200/s

➡ Response time: 100ms

Page 46: Cloudcon East Presentation

Case Study:Performance Results

➡ Large object reads: 1500/sActual result: 2250/s

➡ Large object writes: 200/sActual result: 300/s

➡ Response time: 100msActual result: 8ms

Page 47: Cloudcon East Presentation

Case Study:Performance Results

➡ Max read throughputActual result: 4100/s(CPU limited in Java layer; MySQL <25% utilized)

Page 48: Cloudcon East Presentation

Case Study:Results➡ Billions of queries served➡ Highest DB uptime at CafePress➡ Hundreds of millions of updates

performed

Page 49: Cloudcon East Presentation

Show don’t tell!

Page 50: Cloudcon East Presentation

High Availability

➡ We don’t specify a fail over strategy.

➡ We don’t specify a backup or replication strategy.

Page 51: Cloudcon East Presentation

Fail Over Options

➡ Load balancer➡ MySQL Proxy➡ Flipper / MMM➡ HAProxy

Page 52: Cloudcon East Presentation

Replication

➡ You should use MySQL replication.➡ It really works.➡ You can add in LVM snapshots for

even more disastrous disaster recovery.

Page 53: Cloudcon East Presentation

The BottleneckThere is one problem with HiveDB. The directory is a scaling bottleneck.

Page 54: Cloudcon East Presentation

There’s no reason it has to be a database.

Page 55: Cloudcon East Presentation

MemcacheD

➡ Hive directory implemented atop memcached

Page 56: Cloudcon East Presentation

Roadmap

➡ MemcacheD directory➡ Simplified Hibernate support➡ Management web service➡ Command-line tools➡ Framework Adapters

Page 57: Cloudcon East Presentation

Call for developers!

➡ Grails (GORM)➡ ActiveRecord (JRuby on Rails)➡ Ambition (JRuby or web service)➡ iBatis➡ JPA➡ ... Postgres?

Page 58: Cloudcon East Presentation

Contributing

➡ Post to the mailing list http://groups.google.com/group/hivedb-dev

➡ Comment on our sitehttp://www.hivedb.org

➡ Submit a patch / pull requestgit clone git://github.com/britt/hivedb.git

Page 60: Cloudcon East Presentation

Blobject

➡ Gets around the problem of ALTER statements. No data set of this size can be transformed synchronously.

➡ The hive can contain multiple versions of a serialized record.

➡ Compression

Page 61: Cloudcon East Presentation

Hadoop!

➡ Query and transform blobbed data asynchronously.