Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Cloudcon East Presentation
-
Upload
br7tt -
Category
Technology
-
view
1.707 -
download
1
description
Transcript of Cloudcon East Presentation
Scaling with HiveDB
The Problem• Hundreds of
millions of products
• Millions of new products every week
• Accelerating growth
Scaling up is no longer an option.
Requirements
➡ OLTP optimized➡ Constant response time is more
important than low-latency➡ Expand capacity online➡ Ability to move records➡ No re-sharding
2 years agoClustered Non-relational Unstructured
Oracle RACMS SQL ServerMySQL Cluster
Teradata (OLAP)
HadoopMogileFS
S3
TodayClustered Non-relational Unstructured
Oracle RACMS SQL ServerMySQL Cluster
DB2Teradata (OLAP)
HiveDB
HypertableHBase
CouchDBSimpleDB
HadoopMogileFS
S3
System Storage Interface Partitioning Expansion Node Types Maturity
Oracle RAC Shared SQL TransparentNo
downtime Identical 7 years
MySQL Cluster Memory / Local SQL Transparent
Requires Restart Mixed 3 years
Hypertable Local HQL TransparentNo
downtime Mixed?
(released 2/08)
DB2 Local SQL Fixed HashDegraded Performan
ceIdentical
3 years (25 years
total)
HiveDB Local SQL Key-basedDirectory
No downtime Mixed 18 months
(+13 years!)
Operating a distributed system is hard.
Non-functional concerns quickly dominate functional concerns.
ReplicationFail overMonitoring
Our approach:minimize cleverness
Build atop solid, easy to manage technologies.
So we can get back to selling t-shirts.
MySQL is fast.
MySQL is well understood.
MySQL replication works.
MySQL is free.
The same for JAVA
The simplest thing
➡ HiveDB is a JDBC Gatekeeper for applications.
➡ Lightest integration➡ Smallest possible implementation
The eBay approach
➡ Pro: Internally consistent data➡ Con: Re-sharding of data required➡ Con: Re-sharding is a DBA
operation
Partition by keyA simple bucketed partitioning scheme expressed in code.
Partition by key
Directory
➡ No broadcasting➡ No re-partitioning➡ Easy to relocate records➡ Easy to add capacity
Disadvantages
➡ Intelligence moved from the database to the application (Queries have to be planned and indexed)
➡ Can’t join across partitions➡ NO OLAP! (We consider this a
benefit.)
Hibernate ShardsPartitioned Hibernate from Google
Why did we build this thing again?
Oh wait, you have to tell it where things are.
Benefits of Shards
➡ Unified data access layer➡ Result set aggregation across
partitions➡ Everyone in the JAVA-world knows
Hibernate.
Working on simplifying it even further.
But does it go?
Case Study: CafePress➡ Leader in User-generated
Commerce➡ More products than eBay
(>250,000,000)➡ 24/7/365
Case Study: Performance Requirements
➡ Thousands of queries/sec➡ 10 : 1 read/write ratio➡ Geographically distributed
Case Study:Test Environment
➡ Real schema ➡ Production-class hardware➡ Partial data (~40M records)
[email protected] Modified on April 09 2007
CafePress HiveDB 2007Performance Test Environment
JMeter (100s of threads)
web s
erv
ice
(hiv
edb)
Dell 1950 / 2x2 Xeon
Tomcat 5.5
Hardware LB
Directory Partition 0 Partition 1
client.jar
data
bases
(mysql)
Dell 1950 / 2x2 Xeon
Tomcat 5.5
Dell 2950 / 2x2 Xeon
16GB, 6x72GB 15k
Dell 2950 / 2x2 Xeon
16GB, 6x72GB 15k
Dell 2950 / 2x2 Xeon
16GB, 6x72GB 15k
Dell 1950 / 2x2 Xeon
Tomcat 5.5
Dell 2950 / 2x2 Xeon
16GB, 6x72GB 15k
JMeter (100s of threads)
client.jar
Dell 2950 / 2x2 Xeon
16GB, 6x72GB 15k
load g
enera
tors
48GB backplane non-
blocking gigabit switch
100MBit switch
com
mand &
contr
ol
JMeter (1 thread)
client.jar
Measurement
Workstation
JMeter (no threads)
client.jar
Test Controller
Workstation
Case Study:Performance Goals
➡ Large object reads: 1500/s
➡ Large object writes: 200/s
➡ Response time: 100ms
Case Study:Performance Results
➡ Large object reads: 1500/sActual result: 2250/s
➡ Large object writes: 200/sActual result: 300/s
➡ Response time: 100msActual result: 8ms
Case Study:Performance Results
➡ Max read throughputActual result: 4100/s(CPU limited in Java layer; MySQL <25% utilized)
Case Study:Results➡ Billions of queries served➡ Highest DB uptime at CafePress➡ Hundreds of millions of updates
performed
Show don’t tell!
High Availability
➡ We don’t specify a fail over strategy.
➡ We don’t specify a backup or replication strategy.
Fail Over Options
➡ Load balancer➡ MySQL Proxy➡ Flipper / MMM➡ HAProxy
Replication
➡ You should use MySQL replication.➡ It really works.➡ You can add in LVM snapshots for
even more disastrous disaster recovery.
The BottleneckThere is one problem with HiveDB. The directory is a scaling bottleneck.
There’s no reason it has to be a database.
MemcacheD
➡ Hive directory implemented atop memcached
Roadmap
➡ MemcacheD directory➡ Simplified Hibernate support➡ Management web service➡ Command-line tools➡ Framework Adapters
Call for developers!
➡ Grails (GORM)➡ ActiveRecord (JRuby on Rails)➡ Ambition (JRuby or web service)➡ iBatis➡ JPA➡ ... Postgres?
Contributing
➡ Post to the mailing list http://groups.google.com/group/hivedb-dev
➡ Comment on our sitehttp://www.hivedb.org
➡ Submit a patch / pull requestgit clone git://github.com/britt/hivedb.git
Photo Credits➡ http://www.flickr.com/photos/7362313@N07/1240245941/
sizes/o
➡ http://www.flickr.com/photos/99287245@N00/2229322675/sizes/o
➡ http://www.flickr.com/photos/51035555243@N01/26006138/
➡ http://www.flickr.com/photos/vgm8383/2176897085/sizes/o/
➡ http://t-shirts.cafepress.com/item/money-organic-cotton-tee/73439294
Blobject
➡ Gets around the problem of ALTER statements. No data set of this size can be transformed synchronously.
➡ The hive can contain multiple versions of a serialized record.
➡ Compression
Hadoop!
➡ Query and transform blobbed data asynchronously.