Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

48
Benchmarking XDCR Ring Topology at PayPal 1

Transcript of Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

Page 1: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

Benchmarking XDCR Ring Topology at PayPal

1

Page 2: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

2

The Couchbase Connect16mobile appTake our in-app survey!

Page 3: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

3

Dong WangSenior MTS, Core Data Platform, PayPal

Page 4: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

4

Anil Kumar Senior Product Manager Couchbase

Email: [email protected]

Twitter: @anilkumar1129

Page 5: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

5

Cross Datacenter Replication (XDCR)

Page 6: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 Couchbase Inc.

Power of XDCR

6

Simple,PowerfulAdministration ConsistentHighPerformance

ElasticScalability Multi-DataCenter,Active-ActiveDeployment

Page 7: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

7

Continuous Innovation

Page 8: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

8

Benchmarking XDCR Ring Topology at PayPal

Page 9: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

AGENDA

©2016 PayPal Inc. Confidential and proprietary. 9

1. Speaker/PayPal Intro

2. Background on Multi-DC Bi-Directional Ring Replication Benchmark

3. Benchmark Methodology

4. Benchmark Results

5. Tuning efforts

Page 10: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 10

About PayPal

• A leading technology platform company that enables digital and mobile payments on behalf of consumers and merchants worldwide.

• NASDAQ: PYPL

• Active customer accounts of 192 million as of Q3 2016

• 1.5 billion transactions processed in Q3 2016

• $87 billion in total payment volume in Q3 2016

Page 11: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©©2016 PayPal Inc. Confidential and proprietary. 11

About Speaker

• Ph.D. in Biochemistry

• Worked on database technologies for the past 18 years

• 10 years at PayPal

• Leading PayPal’s NoSQL Engineering Efforts

Page 12: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 12

A Multi-Data Center Deployment Challenge

• Requirement:

• Multiple DCs (>=4), each with its own Couchbase cluster

• Data replication among multiple clusters

• Active traffic to all clusters

• Deployment Choices (Ring vs Mesh):

Page 13: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 13

A Comparison of Ring and Mesh Replication Topologies

Topology

Data Centers

XDCR Streams N=3 N=4 N=5

Ring N 2 x N 6 8 10

Mesh N N x (N-1) 6 12 20

0

5

10

15

20

25

30

35

3 4 5 6

XDCR Streams vs Clusters

Ring Mesh

Page 14: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 14

Benchmarking a Bi-Directional Ring Deployment in 4 DCs

• Benchmark Environment

• Benchmark Procedure

• Benchmark Data Collection

• Benchmark Data Processing

• Benchmark Results

Page 15: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 15

Benchmark Environment

• Hardware: Bare Metal, Dell PowerEdge FC630

• 20 cores, 40 processors at 2.6 GHz

• 384 GB RAM

• 2 TB RAID1 SSD

• 10Gb network

• OS: RHEL 6.6

• Couchbase: 4.1.1, XDCR without SSL

• Couchbase Clusters

• 8 nodes per cluster, 4 clusters in 4 DCs

• Benchmark Tool:

• YCSB 0.9.0

• Custom XDCR monitoring python program using Couchbase SDK

Page 16: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary.

Benchmark Environment/Components

DCG11

CCG01

DCG12

CCG12

Page 17: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 17

Benchmark Procedure

Cluster Configuration

YCSB Workload Generation

Data Collection

Data Processing

Replication Latency Monitor

Result Analysis

Page 18: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 18

Benchmark Procedure –Couchbase Configuration

• Cluster Configuration

• Standard configuration with best OS practices (THP, swappiness, etc)

• XDCR without SSL with easy configuration using Rest APIs

• Bucket Configuration• Bucket Type: Couchbase

• Memory Allocation: 240 GB/cluster (great impact to benchmark outcome, PayPal specific )

• Replica: 1

• Value Ejection

• Disk IO Priority: Default (low)

• Auto Compaction: Default

• Flush Enable: True

Page 19: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 19

Benchmark Procedure - Workloads

• Workload A: 100% write

• Drop XDCR stream

• Flush bucket

• Create XDCR stream

• Run YCSB workload A, document size at 1KB from 20 client machines

• Run XDCR monitoring traffic in parallel, collect latency to remote DCs

• Workload B: %95 Read + 5% Write (Update)• Preload 200 million documents. Results in 464 GB data on disk, 184 GB data in memory

• Run YCSB workload B, document size at 1 KB from 40 to 80 client machines

• Run XDCR monitoring traffic in parallel, collect latency to remote DCs

Page 20: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 20

Benchmark Data Collection

• YCSB Summarization Data. Sample from one client:

• XCDR Latency Data

• Insert one doc to local DC, then query same doc from remote DCs in parallel, record insert ack time, query success time, network round trip time

[OVERALL], Throughput(ops/sec), 9294.932867347366[INSERT], Operations, 4000000.0[INSERT], AverageLatency(us), 203.5180525[INSERT], MinLatency(us), 92.0[INSERT], MaxLatency(us), 80959.0[INSERT], 95thPercentileLatency(us), 429.0[INSERT], 99thPercentileLatency(us), 959.0

Page 21: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 21

Benchmark Data Collection

• Couchbase Performance Data

* cpu_utilization_rate* curr_connections* ops* cmd_get* cmd_set* ep_cache_miss_rate* vb_active_resident_items_ratio* ep_bg_fetched* replication_changes_left* xdc_ops* ep_dcp_replica_items_remaining* ep_dcp_replica_items_sent* ep_dcp_xdcr_items_remaining* ep_dcp_xdcr_items_sent

Page 22: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 22

Benchmark Data Processing

• Sample Data Window

• A 2 min sample window in the middle of a minimal 5 min test period is chosen to represent the steady state of test

• Data Aggregation• Aggregatable Metrics: Use sum of all nodes

• Non-aggregatable Metrics: Use average of all nodes

• Data Graphing• Use standard pandas/matplotlib python library

Page 23: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 23

Benchmark Results - 100% Write To 4 Active DCs

Page 24: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 24

100% Write Workload Throughput

• Client side errors happen well before reaching max throughput• Max throughput = 340k/sec without errors

Page 25: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 25

100% Write Workload P99/Max Latency

• Sub-millisecond level P99 insert latency

Page 26: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 26

100% Write Workload XDCR Latency (20 Client Threads)

• Network latency driven XDCR latency at light throughput (100K/sec)

• Same region: 4 ms• Distant region: 32 ms

Page 27: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 27

100% Write Workload XDCR Latency (40 Client Threads)

• With increased throughput demand, distant DC is impacted first.

Page 28: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 28

100% Write Workload XDCR Latency (80 Client Threads)

• At max throughput, ALL DCs are impacted.

Page 29: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 29

100% Write Workload XDCR Backlog (ep_dcp_xdcr_items_remaining)

• XDCR backlog happens before reaching max throughput and client side errors

• Memory/Nozzle tuning can impact the XDCR performance

Page 30: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 30

Benchmark Results - 95% Read + 5% Write To 4 Active DCs

Page 31: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 31

95% Read Workload Throughput

• A much higher throughput than 100% write use case

Page 32: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 32

95% Read Workload P99 Latency

• Good Read/Write latency at millisecond level

Page 33: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 33

95% Read Workload Max Latency

• Better Max Latency before reaching 2 mil/sec throughput

Page 34: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 34

95% Read Workload XDCR Latency (80 Client Threads)

• Network latency driven XDCR latency at light throughput (100K/sec)

• Same region: 4 ms• Distant region: 32 ms

Page 35: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 35

95% Read Workload XDCR Latency (800 Client Threads)

• Network latency driven XDCR latency at high throughput (4mil/sec)

• Same region: < 10 ms• Distant region: 30 – 50 ms

Page 36: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 36

95% Read Workload XDCR Latency (1120 Client Threads)

• Network latency driven XDCR latency at max throughput (5mil/sec)

• Same region: < 20 ms• Distant region: 400 – 600 ms

Page 37: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 37

95% Read Workload XDCR Backlog(ep_dcp_xdcr_items_remaining)

• Backlog happens at much higher overall read/write throughput than the 100% write use case

• The more remote DCs for XDCRs, the more backlog

Page 38: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 38

Comparison Among Different Traffic Patterns ( 4DC-4A vs 4DC-2A, vs 4DC-1A)

Page 39: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 39

YCSB Throughput vs Traffic Pattern• Limited scalability by Active-Active vs Active-Passive:

Workload 4DC-1A(4 Clusters in 4 DC, 1 Active)

4DC-2A(4 Clusters in 4 DC, 2 Active)

4DC-4A(4 Clusters in 4 DC, 4 Active)

4DC-4A/4DC-1A

100% write 200 K/sec 200 K/sec 200 K/sec 1 x

95% read + 5% write

2.5 Mil/sec 3 Mil/sec 4.5 Mil/sec 1.8 x

200K/sec client facing traffic è 200K * (4 + 1) = 1 M/sec total KV traffic in a 4 cluster setup

Page 40: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 40

Application Latency vs Traffic Pattern

• Consistent Latency:

Workload 4DC-1A(4 Clusters in 4 DC, 1 Active)

4DC-2A(4 Clusters in 4 DC, 2 Active)

4DC-4A(4 Clusters in 4 DC, 4 Active)

4DC-4A/4DC-1A

100% write Avg: 0.25 ms Avg: 0.2 ms Avg: 0.25 ms 1 x

P99: 1 ms P99: 1 ms P99: 1 ms 1x

95% read + 5% write

Avg: 0.3 ms Avg: 0.2 ms Avg: 0.2 ms 0.7 x

P99: 1.2 ms P99: 1.1 ms P99: 1.3 ms 1x

• Millisecond (P99) or sub millisecond (Avg) latency • All traffic patterns• Both reat intensive and write intensive use cases

Page 41: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 41

Data Replication Latency vs Traffic Pattern

Workload 4DC-1A(4 Clusters in 4 DC, 1 Active)

4DC-2A(4 Clusters in 4 DC, 2 Active)

4DC-4A(4 Clusters in 4 DC, 4 Active)

100% write Close: 10 ms (avg) Close: 10 ms (avg) Close: 10 ms (avg)

Far: 100 ms (avg) Far: 200 ms (avg) Far: 250 ms (avg)

95% read + 5% write Close: 5 ms (avg) Close: 5 ms (avg) Close: 10 ms (avg)

Far: 20 ms (avg) Far: 20 ms (avg) Far: 100 ms (avg)

• Amplification by Network Latency, Geographic Distance Effect• Write Intensive (higher latency) vs Read Intensive (lower latency)• Active-Active (higher latency ) vs Active-Passive (lower latency)

Page 42: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 42

Tuning Throughput Findings

• Couchbase 2.0 binding in YCSB 0.10.0

• Nozzle Increase (effect subject to memory allocation)sourceNozzlePerNode=4 (default 2)targetNozzlePerNode=4 (default 2)

• Batch sizeworkerBatchSize=2000 (default 500)docBatchSizeKb=4096 (default 2048)

• Optimization thresholdoptimisticReplicationThreshold=10240 (default 256)

Page 43: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 43

Nozzle Increase

100% Write Workload 4 Nozzles 2 Nozzles

Max Thread# without Insert Errors 80 80

Avg Throughput 325 K/sec 330 K/sec

Avg Latency 0.2 ms 0.2 ms

XDCR Latency

Page 44: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 44

Other Considerations for A Higher XDCR Throughput

• Increasing bucket RAM allocation to release memory pressure when reaching high water marks.

• Using a faster disk subsystem.

• Upgrade to Couchbase 4.5.x with DCP cursor enhancements.

Page 45: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

©2016 PayPal Inc. Confidential and proprietary. 45

Summary

• Couchbase multiple data center Active-Active provides higher availability and scalability. This deployment pattern is used in production at PayPal.

• Scalability is dependent on specific use cases. Read intensive use cases scales better (1.8x in 4 DCs) than write intensive use cases.

• XDCR latency is largely affected by the network latency. XDCR latency can be much higher than actual network latency.

• Geographically close DCs tend to have more consistent data than remote DCs.

• XDCR and overall throughput max out independently. Very likely, before a cluster reaches the max throughput, XDCR is already lagging.

Page 46: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

Thank You!

46

Page 47: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

47

Share your opinion on Couchbase1. Go here: http://gtnr.it/2eRxYWn

2. Create a profile

3. Provide feedback (~15 minutes)

Page 48: Benchmarking XDCR rings at PayPal – Couchbase Connect 2016

48

The Couchbase Connect16mobile appTake our in-app survey!