Diving into the NoSQL Technical Comparison Report

68
@renatc o Want all 50+ pages? Download http://paas.ly/ nosql2014 NoSQL Database Architecture and Performance: How to Evaluate and Benchmark 1 scoring framework 3 templates Open-source tools to re-use 10 benchmark examples + With Renat Khasanshyn, CEO @ Altoros, and Shane Johnson, Sr. Product Marketing Manager @ Couchbase November 6, 2014 · 10 am PST paas.ly/ NoSQLTechGuide

description

Join Renat Khasanshyn, CEO Altoros and Shane Johnson, Sr. Product Marketing Manager at Couchbase in this NoSQL Technical Comparison Webinar. The variety of NoSQL databases makes it difficult to select the best database for a particular case. Learn what information you need to consider when evaluating NoSQL databases.

Transcript of Diving into the NoSQL Technical Comparison Report

Page 1: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

NoSQL Database Architecture

and Performance: How to Evaluate and Benchmark

1 scoring framework

3 templates

Open-source tools to re-use

10 benchmark examples +

settings

With Renat Khasanshyn, CEO @ Altoros,

and Shane Johnson, Sr. Product Marketing Manager @

Couchbase

November 6, 2014 · 10 am PST

paas.ly/NoSQLTechGuide

Page 2: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Why am I here?

Page 3: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

MANY TOOLSMANY VENDORS MANY WORKLOADS

Problem #1

Page 4: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

VENDORS HIDE WEAKNESSESProblem #2:

this isn’t helpful to end users

Page 5: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

MOST BENCHMARKS SUCKProblem #3:

- Workloads are not meaningful - Not easy to reproduce in your own environment- Questionable configurations

Page 6: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Shadow IT storm resulted in 10s of different database products even at

medium-size companies.

Page 7: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Page 8: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

IT teams can’t provide SLAs for dozens database options.

Problem #4:

Page 9: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

IT can only support 1-2 DBs, per category, “as a service”.

For example - 1 document, 1 column, 1 graph and 2 relational.

Page 10: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

In large organizations, devs often already use 4-6 DBs per category

Page 11: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

DATABASE AS A SERVICETypical mid-term goal of our customer:

So that IT can provide 1-2 DBs per category as a service, with superb self-service & SLAs, while leaving all others to DIY.

Page 12: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

SO HOW DO WE PICK A DATABASE?

Page 13: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Part 0. Why should I believe you? ❑Who is Altoros? ❑How real-world is this stuff?❑Why companies bother benchmarking?

Part 1. Evaluation of architecture & capabilities❑Why & How❑Templates

Part 2. Evaluation of performance❑ Why & How❑ Templates & Benchmarking Tools

Part 3. Examples of benchmarks❑ Examples - charts & graphs❑ Tips for applying in your company

Where I will take you today

Page 14: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Part 0. Why should I believe you?

❑ Who is Altoros? ❑ How real-world is this stuff?❑ Why companies bother benchmarking?

Page 15: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Buenos Aires

Oslo

London

Zurich

Sunnyvale, CA

Copenhagen

Minsk

Santa Fe

Boston

30%Female

70%Male

+5Join us on an

average month

Full Time EnablersDevOps Software Engineers

156+

FACTS ON ALTOROS

$0Revenuefrom referrals, vendor “kickbacks”, commissions, product or service resale.

Vendor-independent. Not a reseller. Never pushed boxes or software licenses. Never will.

We sell our time but not sell our soul. Some vendors hate us.

Page 16: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

WHAT WE DO

We bring “software factories” and “data lakes” into organizations through

deployment and integration of solutions offered by the Cloud Foundry ecosystem

Software factories

“Data lakes”

Training Managed ServicesConsulting Integration

Altoros Core Services

“X as a Service” Enablement

“Software factories”

Page 17: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

HOW REAL-WORLD IS THIS STUFF?Question:

Page 18: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

* Where our experience comes from

We highly recommendAltoros to rapidly build complex applications using

cutting edge technologies. Again, great job!

Christopher Adorna, Sony Design Center, LA

NoSQL/Hadoop benchmarking, deployment and integration

Cloud Foundry training and integration

Enablement of X-as-Service

Integration with IaaS/PaaS

Enablement of strategic workload revenue

White label Cloud Foundry PaaS

Enterprises & SaaS

Software Companies

Hosting Providers

Page 19: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

NoSQL to Rethink Music?4,000 employees, $3B revenue, privately owned

Challenge: rapidly changing competitive environment

Solution: Composable enterprise using Cloud Foundry and NoSQL

o over 400 servers

o 6 IaaS regions (3 on OpenStack, 3 on AWS)

Top 3 music

company

Page 20: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

NoSQL to Save Lives?

15,000 employees, $4B revenue, member of S&P 500

Technical challenge: 300,000 devices in 800+

hospitals across 4500+ locations

Opportunity: Save ~ 300 lives lost every 24 hours due

to wrong drug, wrong IV pump, wrong quantity

Solution: Software defined IV drug delivery based on

real-time access to HMS & EMR patient data

#1 liquid intravenous drug & medical device

(intravenous pump) company

Daily Data Intake

7TB

Page 21: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

NoSQL to monetize data?

Technical challenge: handling the data monster of

enabling 100s of APIs, web services and applications

developed by hundreds of software developers

Solution: Next generation API development platform

powered by NoSQL databases

Monthly Active Medical Professionals

115,000 Patient records

81million

Top 2 EMR

company

Page 22: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Part 1. Evaluation of Architecture & Capabilities

❑ Why & How❑ Templates

Page 23: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

How do we evaluate architectures & capabilities

Reality check - what I will show you often take 2 to 8 weeks with one or two

engineers- Typical budget – $25k-50k

Page 24: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Step 1. What are our workloads?

Page 25: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Step 2: What are our requirements?

Page 26: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Step 3 – Select 2-3 finalists in each category (Template)Category #1 – Column-value Vendor

AVendor B

Vendor C

Vendor D Weight on

decision

Multi-data center (regions) bi-directional replication to multi regions 10

Support for active/active reads across regions 10

Auto resynchronization of data between regions in the event of loss of connectivity between regions 10

Support for encryption of data replication traffic across regions 10

Support for active/active writes across regions 9

Configurable replication factor (within cluster in a single region and across regions) 9

Tunable Consistency for reads and writes 9

Survive loss of nodes and up to an entire region without impacting clients and ability to serve requests (read and write)

8

Ability to add nodes in a cluster and rebalance data with minimal impact 8

Rich Query and Indexing capabilities 8

Product Support Training & Consulting from Vendor / 3rd party 7

Installed Customer base in Production with high workloads in a multi-region data center topology 7

Ease of Configuration & Setup 7

Security: Kerberos or similar authentication models to support secure server to server communication 7

Monitoring - CLI and GUI Tools 7

Security: Auth / Fine Grain Access Control 7

Compatibility to run on Public Cloud Infrastructure (AWS) 7

Backup/Recovery: Ability to perform live snapshots and restore 6

Security: Transparent data encryption 6

Bulk Loading and Extract/Dump capability; adapters for data transfer to Hadoop 3

Page 27: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Template

Page 28: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Part 2. Evaluation of Performance

❑ Why & How❑ Templates & Benchmarking Tools

Page 29: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

1. Set your goals. Example:• Reproducible by anyone

– Open Source workload generator

• Focus on use case for which NoSQL typically selected • Use a realistic workload

– Simulate steady state of application running– Meaningful data amounts & runtime

• Compare latency vs throughput• Measure max throughput (for given scenario)

Page 30: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

2. Set your benchmarking scenario• For interactive web application

• Scalability and performance are the most common requirements • Typically leads to users selecting NoSQL over RDBMS

• The working set of data changes with time• End users using the application change over time• Example: every few hours, every few days, every few weeks

• There is more data available than memory (RAM)• Replication is used for fault tolerance• Real world data sizes• Deployment platform

– Commonly used– Easy to replicate results

Page 31: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Decide what do you want to measure.• Latency

– Round trip time taken for a request to execute from the client to the server and back

– Average, 95th and 99th percentile measured

• Why is this important?– You want your users to have a great

experience– Not just an “average” one

• Throughput– Throughput was varied from 1K

ops/sec to 25K ops/sec depending on NoSQL database

– Max throughput was measured

• Why is this important?– You want your app to support

hundreds of thousands of users

Make sure workloads are not rate limited, - if you are focused on max throughput.

Page 32: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

3. Data sets. Example:

Data set #1 Data set #2

50 million 1 KB records 3 million 10 KB records

fieldcount=10fieldlength=100threadcount=50recordcount=50000000

fieldcount=10fieldlength=1000threadcount=50recordcount=3000000

Doesn’t fit memory in a cluster pool Fits memory in a cluster pool

Page 33: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

IS THERE A WAY TO AUTOMATE IT?Recommended benchmarking tools

Page 34: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

MEET YCSB

Page 35: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Benchmark Implementation: YCSB• Yahoo! team offered a “standard” benchmark

• Yahoo! Cloud Serving Benchmark (YCSB)– Focus on database– Focus on performance

• YCSB Client consists of 2 parts– Workload generator– Workload scenarios

Page 36: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Why YCSB• Open source• Extensible• Rich selection of connectors

• Azure, BigTable, Cassandra, CouchDB, • Dynomite, GemFire, HBase, Hypertable, • Infinispan, MongoDB, PNUTS, Redis, • Connector for Sharded RDBMS (i.e. MySQL), • Voldemort, GigaSpaces XAP

• We developed a few connectors• Accumulo, Couchbase, Riak, • Connector for Shared Nothing RDBMS (i.e. MySQL Cluster)

Page 37: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

How YCSB Works

Page 38: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

* Extra nodes for masters, routers, etc

YCSB Client

Benchmarking cluster

Database nodes

Page 39: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

YCSB Connectors

github.com/Altoros/YCSB

Page 40: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

❑ Example of Benchmark Results❑ Example of Benchmark Goals ❑ Example of Benchmark Design Scenario ❑ Recommendations for setting up your own

benchmarking❑ Recommended Benchmarking Tools

Part 3. Examples of Benchmarks

Page 41: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Cluster specification

YCSB ClientDatabase nodes

Page 42: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Data Sets

Data set #1 (did not fit the memory):

Data set #2 (fit the memory)

50 million 1 KB records 3 million 10 KB records

fieldcount=10fieldlength=100threadcount=50recordcount=50000000

fieldcount=10fieldlength=1000threadcount=50recordcount=3000000

Page 43: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

WorkloadsWorkload B Workload C Workload D

50% read operations 40% update operations5% insert operations5% delete operations.

90% read operations8% update operations1% insert operations1% delete operations

10% read operations72% update operations9% insert operations9% delete

YCSB Settings YCSB Settings YCSB Settings

readallfields=truereadproportion=0.5updateproportion=0.4scanproportion=0insertproportion=0.05deleteproportion=0.05requestdistribution=zip

fian

readallfields=truereadproportion=0.9updateproportion=0.08scanproportion=0insertproportion=0.01deleteproportion=0.01requestdistribution=zip

fian

readallfields=truereadproportion=0.1updateproportion=0.72scanproportion=0insertproportion=0.09deleteproportion=0.09requestdistribution=lat

est

Page 44: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Page 45: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Workload B:50% read operations, 40% update operations, 5% insert operations, and 5% delete operations.

Page 46: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Page 47: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Page 48: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Page 49: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Scaling from3 to 6 nodes

Page 50: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Lessons Learned• Run a close emulation of a real-world workloads using YCSB before

choosing the “horse”• NoSQL is a “different horses for different courses”• Construct your own or use existing workloads

• Benchmark it• Tune database!• Benchmark it again

• Ask me for workload sample files

Amazon EC2 for NoSQL workloads?• Scales perfectly for NoSQL• EBS slows down database on reads• RAID0 it! Use 4 disk in array (good choice), some reported performance degraded with higher number (6 and

>)

Page 51: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Want all 50+ pages?

Download

http://paas.ly/nosql2014

Page 52: Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Download

Get your Hadoop Benchmark

Report

http://paas.ly/hadoop2014

Page 53: Diving into the NoSQL Technical Comparison Report

Diving into the NoSQLTechnical Comparison

ReportShane K Johnson

Couchbase

Page 54: Diving into the NoSQL Technical Comparison Report

©2014 Couchbase Inc.

Agenda

Topology Architecture Couchbase Server 3.0 Couchbase Server 3.X

54

Page 55: Diving into the NoSQL Technical Comparison Report

©2014 Couchbase Inc.

ClientClient

TopologyMongoDB

55

Client

Router

Primary

Replica

Replica

Config

Config

Config

Primary

Replica

Replica

Primary

Replica

Replica

Page 56: Diving into the NoSQL Technical Comparison Report

©2014 Couchbase Inc.

TopologyCouchbase Server / Cassandra (DataStax)

56

Client

Primary

Primary

Primary

Primary

Primary

Primary

Primary

Primary

Primary

Page 57: Diving into the NoSQL Technical Comparison Report

©2014 Couchbase Inc.

Topology

57

Topology

Installation

Configuration

Consistency

Fault Tolerance

Performance

Scalability

Page 58: Diving into the NoSQL Technical Comparison Report

©2014 Couchbase Inc.

Topology

58

Topology

Installation

Configuration

Consistency

Fault Tolerance

Performance

Scalability

How many different types of installations are

there?

Page 59: Diving into the NoSQL Technical Comparison Report

©2014 Couchbase Inc.

Topology

59

Topology

Installation

Configuration

Consistency

Fault Tolerance

Performance

Scalability

How many different types of configurations are

there?

Page 60: Diving into the NoSQL Technical Comparison Report

©2014 Couchbase Inc.

Topology

60

Topology

Installation

Configuration

Consistency

Fault Tolerance

Performance

Scalability

What is sacrificed to maintain

consistency?

Page 61: Diving into the NoSQL Technical Comparison Report

©2014 Couchbase Inc.

Topology

61

Topology

Installation

Configuration

Consistency

Fault Tolerance

Performance

Scalability

What happens if certain nodes fail?

Page 62: Diving into the NoSQL Technical Comparison Report

©2014 Couchbase Inc.

Topology

62

Topology

Installation

Configuration

Consistency

Fault Tolerance

Performance

Scalability

Is performance maximized?

Page 63: Diving into the NoSQL Technical Comparison Report

©2014 Couchbase Inc.

Topology

63

Topology

Installation

Configuration

Consistency

Fault Tolerance

Performance

Scalability

Does it affect the scaling process?

Page 64: Diving into the NoSQL Technical Comparison Report

©2014 Couchbase Inc.

ArchitectureMongoDB

Memory Mapped Files– Write to File via Memory– Update in Place

Durability– Asynchronous (fsync)

Database Locking

64

Page 65: Diving into the NoSQL Technical Comparison Report

©2014 Couchbase Inc.

ArchitectureCassandra (DataStax)

Log Structured Merge Tree– memtable (memory)– commit log (disk)– sstable (disk)

– Write Once, Merge– Durability

– Asynchronous (fsync)– Synchronous Batch (fsync)

Row Locking

65

Page 66: Diving into the NoSQL Technical Comparison Report

©2014 Couchbase Inc.

ArchitectureCouchbase Server

Copy on Write B-Tree– Append Only

Durability– Asynchronous (fsync)– Synchronous (fsync)

Integrated, Managed Object Cache Document Locking

66

Page 67: Diving into the NoSQL Technical Comparison Report

©2014 Couchbase Inc.

Couchbase Server 3.0

Database Change Protocol

Tunable Memory

67

Page 68: Diving into the NoSQL Technical Comparison Report

©2014 Couchbase Inc.

Couchbase Server 3.X

N1QL (Query Language)–select, from, where, join, group by, order

byForestDB (Storage)

–Hierarchical B+Tree based Trie–SSD Optimized

68