Diving into the NoSQL Technical Comparison Report

Post on 14-Jun-2015

1.618 views 1 download

Tags:

description

Join Renat Khasanshyn, CEO Altoros and Shane Johnson, Sr. Product Marketing Manager at Couchbase in this NoSQL Technical Comparison Webinar. The variety of NoSQL databases makes it difficult to select the best database for a particular case. Learn what information you need to consider when evaluating NoSQL databases.

Transcript of Diving into the NoSQL Technical Comparison Report

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

NoSQL Database Architecture

and Performance: How to Evaluate and Benchmark

1 scoring framework

3 templates

Open-source tools to re-use

10 benchmark examples +

settings

With Renat Khasanshyn, CEO @ Altoros,

and Shane Johnson, Sr. Product Marketing Manager @

Couchbase

November 6, 2014 · 10 am PST

paas.ly/NoSQLTechGuide

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Why am I here?

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

MANY TOOLSMANY VENDORS MANY WORKLOADS

Problem #1

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

VENDORS HIDE WEAKNESSESProblem #2:

this isn’t helpful to end users

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

MOST BENCHMARKS SUCKProblem #3:

- Workloads are not meaningful - Not easy to reproduce in your own environment- Questionable configurations

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Shadow IT storm resulted in 10s of different database products even at

medium-size companies.

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

IT teams can’t provide SLAs for dozens database options.

Problem #4:

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

IT can only support 1-2 DBs, per category, “as a service”.

For example - 1 document, 1 column, 1 graph and 2 relational.

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

In large organizations, devs often already use 4-6 DBs per category

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

DATABASE AS A SERVICETypical mid-term goal of our customer:

So that IT can provide 1-2 DBs per category as a service, with superb self-service & SLAs, while leaving all others to DIY.

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

SO HOW DO WE PICK A DATABASE?

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Part 0. Why should I believe you? ❑Who is Altoros? ❑How real-world is this stuff?❑Why companies bother benchmarking?

Part 1. Evaluation of architecture & capabilities❑Why & How❑Templates

Part 2. Evaluation of performance❑ Why & How❑ Templates & Benchmarking Tools

Part 3. Examples of benchmarks❑ Examples - charts & graphs❑ Tips for applying in your company

Where I will take you today

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Part 0. Why should I believe you?

❑ Who is Altoros? ❑ How real-world is this stuff?❑ Why companies bother benchmarking?

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Buenos Aires

Oslo

London

Zurich

Sunnyvale, CA

Copenhagen

Minsk

Santa Fe

Boston

30%Female

70%Male

+5Join us on an

average month

Full Time EnablersDevOps Software Engineers

156+

FACTS ON ALTOROS

$0Revenuefrom referrals, vendor “kickbacks”, commissions, product or service resale.

Vendor-independent. Not a reseller. Never pushed boxes or software licenses. Never will.

We sell our time but not sell our soul. Some vendors hate us.

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

WHAT WE DO

We bring “software factories” and “data lakes” into organizations through

deployment and integration of solutions offered by the Cloud Foundry ecosystem

Software factories

“Data lakes”

Training Managed ServicesConsulting Integration

Altoros Core Services

“X as a Service” Enablement

“Software factories”

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

HOW REAL-WORLD IS THIS STUFF?Question:

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

* Where our experience comes from

We highly recommendAltoros to rapidly build complex applications using

cutting edge technologies. Again, great job!

Christopher Adorna, Sony Design Center, LA

NoSQL/Hadoop benchmarking, deployment and integration

Cloud Foundry training and integration

Enablement of X-as-Service

Integration with IaaS/PaaS

Enablement of strategic workload revenue

White label Cloud Foundry PaaS

Enterprises & SaaS

Software Companies

Hosting Providers

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

NoSQL to Rethink Music?4,000 employees, $3B revenue, privately owned

Challenge: rapidly changing competitive environment

Solution: Composable enterprise using Cloud Foundry and NoSQL

o over 400 servers

o 6 IaaS regions (3 on OpenStack, 3 on AWS)

Top 3 music

company

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

NoSQL to Save Lives?

15,000 employees, $4B revenue, member of S&P 500

Technical challenge: 300,000 devices in 800+

hospitals across 4500+ locations

Opportunity: Save ~ 300 lives lost every 24 hours due

to wrong drug, wrong IV pump, wrong quantity

Solution: Software defined IV drug delivery based on

real-time access to HMS & EMR patient data

#1 liquid intravenous drug & medical device

(intravenous pump) company

Daily Data Intake

7TB

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

NoSQL to monetize data?

Technical challenge: handling the data monster of

enabling 100s of APIs, web services and applications

developed by hundreds of software developers

Solution: Next generation API development platform

powered by NoSQL databases

Monthly Active Medical Professionals

115,000 Patient records

81million

Top 2 EMR

company

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Part 1. Evaluation of Architecture & Capabilities

❑ Why & How❑ Templates

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

How do we evaluate architectures & capabilities

Reality check - what I will show you often take 2 to 8 weeks with one or two

engineers- Typical budget – $25k-50k

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Step 1. What are our workloads?

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Step 2: What are our requirements?

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Step 3 – Select 2-3 finalists in each category (Template)Category #1 – Column-value Vendor

AVendor B

Vendor C

Vendor D Weight on

decision

Multi-data center (regions) bi-directional replication to multi regions 10

Support for active/active reads across regions 10

Auto resynchronization of data between regions in the event of loss of connectivity between regions 10

Support for encryption of data replication traffic across regions 10

Support for active/active writes across regions 9

Configurable replication factor (within cluster in a single region and across regions) 9

Tunable Consistency for reads and writes 9

Survive loss of nodes and up to an entire region without impacting clients and ability to serve requests (read and write)

8

Ability to add nodes in a cluster and rebalance data with minimal impact 8

Rich Query and Indexing capabilities 8

Product Support Training & Consulting from Vendor / 3rd party 7

Installed Customer base in Production with high workloads in a multi-region data center topology 7

Ease of Configuration & Setup 7

Security: Kerberos or similar authentication models to support secure server to server communication 7

Monitoring - CLI and GUI Tools 7

Security: Auth / Fine Grain Access Control 7

Compatibility to run on Public Cloud Infrastructure (AWS) 7

Backup/Recovery: Ability to perform live snapshots and restore 6

Security: Transparent data encryption 6

Bulk Loading and Extract/Dump capability; adapters for data transfer to Hadoop 3

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Template

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Part 2. Evaluation of Performance

❑ Why & How❑ Templates & Benchmarking Tools

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

1. Set your goals. Example:• Reproducible by anyone

– Open Source workload generator

• Focus on use case for which NoSQL typically selected • Use a realistic workload

– Simulate steady state of application running– Meaningful data amounts & runtime

• Compare latency vs throughput• Measure max throughput (for given scenario)

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

2. Set your benchmarking scenario• For interactive web application

• Scalability and performance are the most common requirements • Typically leads to users selecting NoSQL over RDBMS

• The working set of data changes with time• End users using the application change over time• Example: every few hours, every few days, every few weeks

• There is more data available than memory (RAM)• Replication is used for fault tolerance• Real world data sizes• Deployment platform

– Commonly used– Easy to replicate results

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Decide what do you want to measure.• Latency

– Round trip time taken for a request to execute from the client to the server and back

– Average, 95th and 99th percentile measured

• Why is this important?– You want your users to have a great

experience– Not just an “average” one

• Throughput– Throughput was varied from 1K

ops/sec to 25K ops/sec depending on NoSQL database

– Max throughput was measured

• Why is this important?– You want your app to support

hundreds of thousands of users

Make sure workloads are not rate limited, - if you are focused on max throughput.

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

3. Data sets. Example:

Data set #1 Data set #2

50 million 1 KB records 3 million 10 KB records

fieldcount=10fieldlength=100threadcount=50recordcount=50000000

fieldcount=10fieldlength=1000threadcount=50recordcount=3000000

Doesn’t fit memory in a cluster pool Fits memory in a cluster pool

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

IS THERE A WAY TO AUTOMATE IT?Recommended benchmarking tools

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

MEET YCSB

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Benchmark Implementation: YCSB• Yahoo! team offered a “standard” benchmark

• Yahoo! Cloud Serving Benchmark (YCSB)– Focus on database– Focus on performance

• YCSB Client consists of 2 parts– Workload generator– Workload scenarios

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Why YCSB• Open source• Extensible• Rich selection of connectors

• Azure, BigTable, Cassandra, CouchDB, • Dynomite, GemFire, HBase, Hypertable, • Infinispan, MongoDB, PNUTS, Redis, • Connector for Sharded RDBMS (i.e. MySQL), • Voldemort, GigaSpaces XAP

• We developed a few connectors• Accumulo, Couchbase, Riak, • Connector for Shared Nothing RDBMS (i.e. MySQL Cluster)

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

How YCSB Works

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

* Extra nodes for masters, routers, etc

YCSB Client

Benchmarking cluster

Database nodes

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

YCSB Connectors

github.com/Altoros/YCSB

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

❑ Example of Benchmark Results❑ Example of Benchmark Goals ❑ Example of Benchmark Design Scenario ❑ Recommendations for setting up your own

benchmarking❑ Recommended Benchmarking Tools

Part 3. Examples of Benchmarks

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Cluster specification

YCSB ClientDatabase nodes

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Data Sets

Data set #1 (did not fit the memory):

Data set #2 (fit the memory)

50 million 1 KB records 3 million 10 KB records

fieldcount=10fieldlength=100threadcount=50recordcount=50000000

fieldcount=10fieldlength=1000threadcount=50recordcount=3000000

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

WorkloadsWorkload B Workload C Workload D

50% read operations 40% update operations5% insert operations5% delete operations.

90% read operations8% update operations1% insert operations1% delete operations

10% read operations72% update operations9% insert operations9% delete

YCSB Settings YCSB Settings YCSB Settings

readallfields=truereadproportion=0.5updateproportion=0.4scanproportion=0insertproportion=0.05deleteproportion=0.05requestdistribution=zip

fian

readallfields=truereadproportion=0.9updateproportion=0.08scanproportion=0insertproportion=0.01deleteproportion=0.01requestdistribution=zip

fian

readallfields=truereadproportion=0.1updateproportion=0.72scanproportion=0insertproportion=0.09deleteproportion=0.09requestdistribution=lat

est

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Workload B:50% read operations, 40% update operations, 5% insert operations, and 5% delete operations.

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Scaling from3 to 6 nodes

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Lessons Learned• Run a close emulation of a real-world workloads using YCSB before

choosing the “horse”• NoSQL is a “different horses for different courses”• Construct your own or use existing workloads

• Benchmark it• Tune database!• Benchmark it again

• Ask me for workload sample files

Amazon EC2 for NoSQL workloads?• Scales perfectly for NoSQL• EBS slows down database on reads• RAID0 it! Use 4 disk in array (good choice), some reported performance degraded with higher number (6 and

>)

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Want all 50+ pages?

Download

http://paas.ly/nosql2014

@renatco Want all 50+ pages?

Download http://paas.ly/nosql2014

Download

Get your Hadoop Benchmark

Report

http://paas.ly/hadoop2014

Diving into the NoSQLTechnical Comparison

ReportShane K Johnson

Couchbase

©2014 Couchbase Inc.

Agenda

Topology Architecture Couchbase Server 3.0 Couchbase Server 3.X

54

©2014 Couchbase Inc.

ClientClient

TopologyMongoDB

55

Client

Router

Primary

Replica

Replica

Config

Config

Config

Primary

Replica

Replica

Primary

Replica

Replica

©2014 Couchbase Inc.

TopologyCouchbase Server / Cassandra (DataStax)

56

Client

Primary

Primary

Primary

Primary

Primary

Primary

Primary

Primary

Primary

©2014 Couchbase Inc.

Topology

57

Topology

Installation

Configuration

Consistency

Fault Tolerance

Performance

Scalability

©2014 Couchbase Inc.

Topology

58

Topology

Installation

Configuration

Consistency

Fault Tolerance

Performance

Scalability

How many different types of installations are

there?

©2014 Couchbase Inc.

Topology

59

Topology

Installation

Configuration

Consistency

Fault Tolerance

Performance

Scalability

How many different types of configurations are

there?

©2014 Couchbase Inc.

Topology

60

Topology

Installation

Configuration

Consistency

Fault Tolerance

Performance

Scalability

What is sacrificed to maintain

consistency?

©2014 Couchbase Inc.

Topology

61

Topology

Installation

Configuration

Consistency

Fault Tolerance

Performance

Scalability

What happens if certain nodes fail?

©2014 Couchbase Inc.

Topology

62

Topology

Installation

Configuration

Consistency

Fault Tolerance

Performance

Scalability

Is performance maximized?

©2014 Couchbase Inc.

Topology

63

Topology

Installation

Configuration

Consistency

Fault Tolerance

Performance

Scalability

Does it affect the scaling process?

©2014 Couchbase Inc.

ArchitectureMongoDB

Memory Mapped Files– Write to File via Memory– Update in Place

Durability– Asynchronous (fsync)

Database Locking

64

©2014 Couchbase Inc.

ArchitectureCassandra (DataStax)

Log Structured Merge Tree– memtable (memory)– commit log (disk)– sstable (disk)

– Write Once, Merge– Durability

– Asynchronous (fsync)– Synchronous Batch (fsync)

Row Locking

65

©2014 Couchbase Inc.

ArchitectureCouchbase Server

Copy on Write B-Tree– Append Only

Durability– Asynchronous (fsync)– Synchronous (fsync)

Integrated, Managed Object Cache Document Locking

66

©2014 Couchbase Inc.

Couchbase Server 3.0

Database Change Protocol

Tunable Memory

67

©2014 Couchbase Inc.

Couchbase Server 3.X

N1QL (Query Language)–select, from, where, join, group by, order

byForestDB (Storage)

–Hierarchical B+Tree based Trie–SSD Optimized

68