Post on 18-Dec-2015
Benchmarking Cloud Serving Systems with YCSB
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell SearsYahoo! Research
Presenter Duncan
Benchmarking Cloud Serving Systems with YCSB
• Benchmarking vs Testing
• Any difference?
• My opinion– Benchmarking: Performance– Testing: usability test, security test, performance
etc…
Motivation
• A lot of new systems in Cloud for data storage and management– MongoDB, MySQL, Asterix, etc..
• Tradeoff– E.g. Append update to a sequential disk-log
• Good for write, bad for read
– Synchronous replication • copies up to date, but high write latency
• How to choose?– Use benchmark to model your scenario!
Evaluate Performance =?
• Latency– Users don’t want to wait!
• Throughput– Want to serve more requests!
• Inherent tradeoff between latency and throughput– More requests => more resource contention=>
higher latency
Which system is better?
• “Typically application designers must decide on an acceptable latency,
and provision enough servers to achieve the desired throughput”
• achieve the desired latency and throughput with fewer servers.– Desired latency:0.1 sec, 100 request/sec– MongoDB, 10 server– Asterix DB, 15 server
What else to evaluate?
• Cloud platform
• Scalability– Good scalability=>performance proportional to #
of servers• Elasticity– Good elasticity=>performance improvement with
small disruption
A Short Summary
• Evaluate performance = evaluate latency, throughput, scalability, elasticity
• A better system= less machine to achieve the performance goal
YCSB
• Data generator
• Workload generator
• YCSB client– Interface to communicate with DB
YCSB Data Generator
• A table with F fields and N records
• Each field => a random string
• E.g. 1,000 byte records, F=10, 100 bytes per field
Workload Generator
• Basic operations– Insert, update, read, scan– No join, aggregate etc.
• Able to control the distributions of:• Which operation to perform
– E.g. 0.95 read, 0.05 update, 0 scan => read-heavy workload
• Which record to read or write– Uniform– Zipfian: some records are extremely popular– Latest: recent records are more popular
YCSB Client
• A script– Use the script to run the benchmark
• Workload parameter files– You can change the parameter
• Java program
• DB interface layer– You can implement the interface for your DB system
Experiments
• Experiment Setup:– 6 servers– YCSB client on another server– Cassandra, HBase, MySQL, PNUTS
• Update heavy, read heavy, read only, read latest, short range scan workload.
Future Work
• Availability– Impact of failure on the system performance
• Replication– Impact to performance when increase replication
4 criteria
• Author’s 4 criteria for a good benchmark:– Relevance to application– Portability• Not just for 1 system!
– Scalability• Not just for small system, small data!
– simplicity
Reference• Benchmarking Cloud Serving Systems with YCSB, Brian F. Cooper, Adam Silberstein,
Erwin Tam, Raghu Ramakrishnan, Russell Sears, SOCC 10 • BG: A Benchmark to Evaluate Interactive Social Networking Actions, Sumita
Barahmand, Shahram Ghandeharizadeh, CIDR 13• http://en.wikipedia.org/wiki/Software_testing• http://en.wikipedia.org/wiki/Benchmark_(computing)
• Thank You!
• Questions?
Why a new benchmark?
• Most cloud systems do not have a SQL interface => hard to implement complex queries
• Benchmark only for specific applications– TPC-W for E-commerce– TPC-C for apps that mange, sell, distribute
product/service