Diving into the NoSQL Technical Comparison Report
description
Transcript of Diving into the NoSQL Technical Comparison Report
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
NoSQL Database Architecture
and Performance: How to Evaluate and Benchmark
1 scoring framework
3 templates
Open-source tools to re-use
10 benchmark examples +
settings
With Renat Khasanshyn, CEO @ Altoros,
and Shane Johnson, Sr. Product Marketing Manager @
Couchbase
November 6, 2014 · 10 am PST
paas.ly/NoSQLTechGuide
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Why am I here?
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
MANY TOOLSMANY VENDORS MANY WORKLOADS
Problem #1
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
VENDORS HIDE WEAKNESSESProblem #2:
this isn’t helpful to end users
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
MOST BENCHMARKS SUCKProblem #3:
- Workloads are not meaningful - Not easy to reproduce in your own environment- Questionable configurations
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Shadow IT storm resulted in 10s of different database products even at
medium-size companies.
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
IT teams can’t provide SLAs for dozens database options.
Problem #4:
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
IT can only support 1-2 DBs, per category, “as a service”.
For example - 1 document, 1 column, 1 graph and 2 relational.
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
In large organizations, devs often already use 4-6 DBs per category
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
DATABASE AS A SERVICETypical mid-term goal of our customer:
So that IT can provide 1-2 DBs per category as a service, with superb self-service & SLAs, while leaving all others to DIY.
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
SO HOW DO WE PICK A DATABASE?
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Part 0. Why should I believe you? ❑Who is Altoros? ❑How real-world is this stuff?❑Why companies bother benchmarking?
Part 1. Evaluation of architecture & capabilities❑Why & How❑Templates
Part 2. Evaluation of performance❑ Why & How❑ Templates & Benchmarking Tools
Part 3. Examples of benchmarks❑ Examples - charts & graphs❑ Tips for applying in your company
Where I will take you today
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Part 0. Why should I believe you?
❑ Who is Altoros? ❑ How real-world is this stuff?❑ Why companies bother benchmarking?
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Buenos Aires
Oslo
London
Zurich
Sunnyvale, CA
Copenhagen
Minsk
Santa Fe
Boston
30%Female
70%Male
+5Join us on an
average month
Full Time EnablersDevOps Software Engineers
156+
FACTS ON ALTOROS
$0Revenuefrom referrals, vendor “kickbacks”, commissions, product or service resale.
Vendor-independent. Not a reseller. Never pushed boxes or software licenses. Never will.
We sell our time but not sell our soul. Some vendors hate us.
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
WHAT WE DO
We bring “software factories” and “data lakes” into organizations through
deployment and integration of solutions offered by the Cloud Foundry ecosystem
Software factories
“Data lakes”
Training Managed ServicesConsulting Integration
Altoros Core Services
“X as a Service” Enablement
“Software factories”
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
HOW REAL-WORLD IS THIS STUFF?Question:
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
* Where our experience comes from
We highly recommendAltoros to rapidly build complex applications using
cutting edge technologies. Again, great job!
Christopher Adorna, Sony Design Center, LA
“
”
NoSQL/Hadoop benchmarking, deployment and integration
Cloud Foundry training and integration
Enablement of X-as-Service
Integration with IaaS/PaaS
Enablement of strategic workload revenue
White label Cloud Foundry PaaS
Enterprises & SaaS
Software Companies
Hosting Providers
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
NoSQL to Rethink Music?4,000 employees, $3B revenue, privately owned
Challenge: rapidly changing competitive environment
Solution: Composable enterprise using Cloud Foundry and NoSQL
o over 400 servers
o 6 IaaS regions (3 on OpenStack, 3 on AWS)
Top 3 music
company
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
NoSQL to Save Lives?
15,000 employees, $4B revenue, member of S&P 500
Technical challenge: 300,000 devices in 800+
hospitals across 4500+ locations
Opportunity: Save ~ 300 lives lost every 24 hours due
to wrong drug, wrong IV pump, wrong quantity
Solution: Software defined IV drug delivery based on
real-time access to HMS & EMR patient data
#1 liquid intravenous drug & medical device
(intravenous pump) company
Daily Data Intake
7TB
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
NoSQL to monetize data?
Technical challenge: handling the data monster of
enabling 100s of APIs, web services and applications
developed by hundreds of software developers
Solution: Next generation API development platform
powered by NoSQL databases
Monthly Active Medical Professionals
115,000 Patient records
81million
Top 2 EMR
company
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Part 1. Evaluation of Architecture & Capabilities
❑ Why & How❑ Templates
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
How do we evaluate architectures & capabilities
Reality check - what I will show you often take 2 to 8 weeks with one or two
engineers- Typical budget – $25k-50k
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Step 1. What are our workloads?
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Step 2: What are our requirements?
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Step 3 – Select 2-3 finalists in each category (Template)Category #1 – Column-value Vendor
AVendor B
Vendor C
Vendor D Weight on
decision
Multi-data center (regions) bi-directional replication to multi regions 10
Support for active/active reads across regions 10
Auto resynchronization of data between regions in the event of loss of connectivity between regions 10
Support for encryption of data replication traffic across regions 10
Support for active/active writes across regions 9
Configurable replication factor (within cluster in a single region and across regions) 9
Tunable Consistency for reads and writes 9
Survive loss of nodes and up to an entire region without impacting clients and ability to serve requests (read and write)
8
Ability to add nodes in a cluster and rebalance data with minimal impact 8
Rich Query and Indexing capabilities 8
Product Support Training & Consulting from Vendor / 3rd party 7
Installed Customer base in Production with high workloads in a multi-region data center topology 7
Ease of Configuration & Setup 7
Security: Kerberos or similar authentication models to support secure server to server communication 7
Monitoring - CLI and GUI Tools 7
Security: Auth / Fine Grain Access Control 7
Compatibility to run on Public Cloud Infrastructure (AWS) 7
Backup/Recovery: Ability to perform live snapshots and restore 6
Security: Transparent data encryption 6
Bulk Loading and Extract/Dump capability; adapters for data transfer to Hadoop 3
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Template
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Part 2. Evaluation of Performance
❑ Why & How❑ Templates & Benchmarking Tools
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
1. Set your goals. Example:• Reproducible by anyone
– Open Source workload generator
• Focus on use case for which NoSQL typically selected • Use a realistic workload
– Simulate steady state of application running– Meaningful data amounts & runtime
• Compare latency vs throughput• Measure max throughput (for given scenario)
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
2. Set your benchmarking scenario• For interactive web application
• Scalability and performance are the most common requirements • Typically leads to users selecting NoSQL over RDBMS
• The working set of data changes with time• End users using the application change over time• Example: every few hours, every few days, every few weeks
• There is more data available than memory (RAM)• Replication is used for fault tolerance• Real world data sizes• Deployment platform
– Commonly used– Easy to replicate results
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Decide what do you want to measure.• Latency
– Round trip time taken for a request to execute from the client to the server and back
– Average, 95th and 99th percentile measured
• Why is this important?– You want your users to have a great
experience– Not just an “average” one
• Throughput– Throughput was varied from 1K
ops/sec to 25K ops/sec depending on NoSQL database
– Max throughput was measured
• Why is this important?– You want your app to support
hundreds of thousands of users
Make sure workloads are not rate limited, - if you are focused on max throughput.
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
3. Data sets. Example:
Data set #1 Data set #2
50 million 1 KB records 3 million 10 KB records
fieldcount=10fieldlength=100threadcount=50recordcount=50000000
fieldcount=10fieldlength=1000threadcount=50recordcount=3000000
Doesn’t fit memory in a cluster pool Fits memory in a cluster pool
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
IS THERE A WAY TO AUTOMATE IT?Recommended benchmarking tools
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
MEET YCSB
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Benchmark Implementation: YCSB• Yahoo! team offered a “standard” benchmark
• Yahoo! Cloud Serving Benchmark (YCSB)– Focus on database– Focus on performance
• YCSB Client consists of 2 parts– Workload generator– Workload scenarios
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Why YCSB• Open source• Extensible• Rich selection of connectors
• Azure, BigTable, Cassandra, CouchDB, • Dynomite, GemFire, HBase, Hypertable, • Infinispan, MongoDB, PNUTS, Redis, • Connector for Sharded RDBMS (i.e. MySQL), • Voldemort, GigaSpaces XAP
• We developed a few connectors• Accumulo, Couchbase, Riak, • Connector for Shared Nothing RDBMS (i.e. MySQL Cluster)
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
How YCSB Works
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
* Extra nodes for masters, routers, etc
YCSB Client
Benchmarking cluster
Database nodes
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
YCSB Connectors
github.com/Altoros/YCSB
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
❑ Example of Benchmark Results❑ Example of Benchmark Goals ❑ Example of Benchmark Design Scenario ❑ Recommendations for setting up your own
benchmarking❑ Recommended Benchmarking Tools
Part 3. Examples of Benchmarks
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Cluster specification
YCSB ClientDatabase nodes
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Data Sets
Data set #1 (did not fit the memory):
Data set #2 (fit the memory)
50 million 1 KB records 3 million 10 KB records
fieldcount=10fieldlength=100threadcount=50recordcount=50000000
fieldcount=10fieldlength=1000threadcount=50recordcount=3000000
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
WorkloadsWorkload B Workload C Workload D
50% read operations 40% update operations5% insert operations5% delete operations.
90% read operations8% update operations1% insert operations1% delete operations
10% read operations72% update operations9% insert operations9% delete
YCSB Settings YCSB Settings YCSB Settings
readallfields=truereadproportion=0.5updateproportion=0.4scanproportion=0insertproportion=0.05deleteproportion=0.05requestdistribution=zip
fian
readallfields=truereadproportion=0.9updateproportion=0.08scanproportion=0insertproportion=0.01deleteproportion=0.01requestdistribution=zip
fian
readallfields=truereadproportion=0.1updateproportion=0.72scanproportion=0insertproportion=0.09deleteproportion=0.09requestdistribution=lat
est
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Workload B:50% read operations, 40% update operations, 5% insert operations, and 5% delete operations.
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Scaling from3 to 6 nodes
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Lessons Learned• Run a close emulation of a real-world workloads using YCSB before
choosing the “horse”• NoSQL is a “different horses for different courses”• Construct your own or use existing workloads
• Benchmark it• Tune database!• Benchmark it again
• Ask me for workload sample files
Amazon EC2 for NoSQL workloads?• Scales perfectly for NoSQL• EBS slows down database on reads• RAID0 it! Use 4 disk in array (good choice), some reported performance degraded with higher number (6 and
>)
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Want all 50+ pages?
Download
http://paas.ly/nosql2014
@renatco Want all 50+ pages?
Download http://paas.ly/nosql2014
Download
Get your Hadoop Benchmark
Report
http://paas.ly/hadoop2014
Diving into the NoSQLTechnical Comparison
ReportShane K Johnson
Couchbase
©2014 Couchbase Inc.
Agenda
Topology Architecture Couchbase Server 3.0 Couchbase Server 3.X
54
©2014 Couchbase Inc.
ClientClient
TopologyMongoDB
55
Client
Router
Primary
Replica
Replica
Config
Config
Config
Primary
Replica
Replica
Primary
Replica
Replica
©2014 Couchbase Inc.
TopologyCouchbase Server / Cassandra (DataStax)
56
Client
Primary
Primary
Primary
Primary
Primary
Primary
Primary
Primary
Primary
©2014 Couchbase Inc.
Topology
57
Topology
Installation
Configuration
Consistency
Fault Tolerance
Performance
Scalability
©2014 Couchbase Inc.
Topology
58
Topology
Installation
Configuration
Consistency
Fault Tolerance
Performance
Scalability
How many different types of installations are
there?
©2014 Couchbase Inc.
Topology
59
Topology
Installation
Configuration
Consistency
Fault Tolerance
Performance
Scalability
How many different types of configurations are
there?
©2014 Couchbase Inc.
Topology
60
Topology
Installation
Configuration
Consistency
Fault Tolerance
Performance
Scalability
What is sacrificed to maintain
consistency?
©2014 Couchbase Inc.
Topology
61
Topology
Installation
Configuration
Consistency
Fault Tolerance
Performance
Scalability
What happens if certain nodes fail?
©2014 Couchbase Inc.
Topology
62
Topology
Installation
Configuration
Consistency
Fault Tolerance
Performance
Scalability
Is performance maximized?
©2014 Couchbase Inc.
Topology
63
Topology
Installation
Configuration
Consistency
Fault Tolerance
Performance
Scalability
Does it affect the scaling process?
©2014 Couchbase Inc.
ArchitectureMongoDB
Memory Mapped Files– Write to File via Memory– Update in Place
Durability– Asynchronous (fsync)
Database Locking
64
©2014 Couchbase Inc.
ArchitectureCassandra (DataStax)
Log Structured Merge Tree– memtable (memory)– commit log (disk)– sstable (disk)
– Write Once, Merge– Durability
– Asynchronous (fsync)– Synchronous Batch (fsync)
Row Locking
65
©2014 Couchbase Inc.
ArchitectureCouchbase Server
Copy on Write B-Tree– Append Only
Durability– Asynchronous (fsync)– Synchronous (fsync)
Integrated, Managed Object Cache Document Locking
66
©2014 Couchbase Inc.
Couchbase Server 3.0
Database Change Protocol
Tunable Memory
67
©2014 Couchbase Inc.
Couchbase Server 3.X
N1QL (Query Language)–select, from, where, join, group by, order
byForestDB (Storage)
–Hierarchical B+Tree based Trie–SSD Optimized
68