Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

45
Andrey Zaychikov, Solutions Architect, EMEA 21.02.2017 Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS

Transcript of Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Page 1: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Andrey Zaychikov, Solutions Architect, EMEA21.02.2017

Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS

Page 2: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Typical algorithm of choosing right options for NoSQL DB deployments

Page 3: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

What we will cover today?

Page 4: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

How these databases differs?

DynamoDB

Cloud-based Self-managed (EC2)Key-value Document-oriented

Graph

Page 5: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Cassandra

Page 6: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

What is it?• Dynamo model database

+ CQL• Horizontally scalable• No single point of failure • Data is immutable and

stored in collections• JVM based• Lot of management work

is done in a background• Rely on gossip protocol

Page 7: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Main concerns of the customers

Schema & usage pattern

Geo distribution Background routines &

specific optimizations

Page 8: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

How does it work?

Page 9: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Choosing instance & storage capacity: 80% Writes

• For most of the workloads (especially with 50/50 RW ratio) M4s with EBS is the best option

• For write-heavy workloads with high RPS requirements C4 with EBS should be considered

• When the performance requirements are high and the size of the dataset is relatively small you can use I2s with ephemeral storage

Page 10: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Choosing instance & storage capacity: 80% Reads

• For most of the workloads M4s with EBS is the good choice

• When the performance requirements are high and the size of the dataset is relatively small you can use I2s with ephemeral storage

• When performance requirements are high and dataset is large the best option will be to use R4s with different EBS flavors

Page 11: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: 2AZ cluster architecture

Hint: RetryPolicy for Cassandra Driver

Page 12: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ

Cassandra backup / restore

Auto Scaling of Cassandra

clusters

Cassandra in Containers

- Restore procedure for the whole cluster can be complicated

- Restore for single node can be done

with EBS Snapshots

- Auto-scaling puts unpredictable

pressure on the cluster

- Scaling up is simple, but scaling down is

extremely complicated

- Makes sense only for test / dev

environments

Page 13: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: Troubleshooting

JVM Caching Compaction

Disks I/O CPU Memory

Page 14: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

MongoDB

Page 15: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

What is it?• Document-oriented

database• Horizontally scalable• HA is based on

master / slave replication

• Geo-distributed• Lots of management

work is done in a background

Page 16: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Main concerns of the customers

Schema & usage pattern

Geo distribution and performance

Data consistency & partition tolerance

Page 17: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

How does it work?

Page 18: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Choosing instance & storage• MongoDB needs a lot of memory

and really fast disks so unless your dataset is quite big the best option will be either R3 or I2 (depending on the size of the dataset)

• If the dataset is big you should consider to use R4 with different EBS flavors

• For hidden nodes you use M4 with EBS as EBS snapshots would help you to backup data easily

Page 19: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: 2AZ cluster architecture

Best option: Replica Set in one AZ and Hidden member in another one.

Page 20: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ

MongoDB backup / restore

Querying large amount of data

MongoDB consistency

- Hidden nodes with EBS and EBS

snapshots backups

- Design schema properly

- Avoid using MapReduce on

Master

- Lots of improvements where done but

there are some edge cases

Page 21: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: Troubleshooting

Mongos performance

Long running queries

Fragmentation

Disks I/O CPU Memory

Page 22: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

CouchDB

Page 23: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

What is it?• Document-oriented database

built on Dynamo model• Supports RESTful API• Eventual consistency• Lockless optimistic with

conflicts resolution• Horizontally scalable (with

constraints)• Offline-first database• Map reduce to prepare views

Page 24: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

How it works?

Page 25: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Choosing instance & storage

Page 26: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: 2AZ cluster architecture• You should plan

replication schema on your own so it is your responsibility to check how it will behave in case of DR event

Page 27: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ

Proper replication schema

Indexed views & its performance

Proxy for requests

Page 28: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Aerospike

Page 29: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

What is it?• In-memory key-

value database• High and

constant performance

• Sharing-nothing architecture

• Geo-distributed (hash partitions)

• Master-slave replication

Page 30: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

How does it work?

Page 31: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Choosing instance & storage• Aerospike is used when

the performance requirements are extreme. It needs a lot of memory and super fast disks. That is why EC2 with Ephemeral storage would be a first choice for Aerospike deployments.

Page 32: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: 2AZ cluster architecture• If one AZ goes down

depending on you replication factor you will still have a copy of data

• Aerospike will be able to add more nodes and replicate data to it without putting much pressure on the existing nodes

• It takes time to replicate data

Page 33: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ

Aerospike backup / restore

Auto Scaling of Aerospike clusters

Aerospike in Containers

- Restore procedure for the whole cluster can be complicated

- Restore for single node can be done

with EBS Snapshots

- Auto-scaling puts unpredictable

pressure on the cluster

- Scaling up is simple, but scaling down is

complicated

- Does not make any sense

Page 34: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: Troubleshooting

Disks I/O CPU Memory

Page 35: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks
Page 36: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

What is it?• Graph database• JVM based• Provides REST API • Two clustering modes:

HA cluster & Casual cluster

• Two types of nodes – Core nodes & Read replicas (RAFT protocol)

• Uses Cypher language for querying Neo4j Casual Clustering

Page 37: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

How does it work?

Page 38: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Choosing instance & storage

Page 39: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: 2AZ cluster architecture• If AZ fails and the

master node was in it – new master election procedure is initiated

• Core nodes in Casual cluster mode vote by simple majority

• If majority is unavailable cluster becomes read-only

Page 40: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: Troubleshooting

JVM Page Caching

Disks I/O CPU Memory

Page 41: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

NoSQL on EC2:Cost considerations

Page 42: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

General cost considerations

Usage pattern (R/W)

RPS Size of the dataset

Traffic costs Object size Number of nodes

Page 43: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Cost: Performance / Size• If you want to be always cost

effective and efficient than deployment is a journey for you

• Consider EBS as main option for most of the workloads

• If your performance requirements are really high and the size of the dataset is relatively low – consider EC2 with ephemerals, overvise – go for EC2 with EBS

Page 44: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Sum up• There is no general solution for

all cases• Context matters and the

solution should follow the changing context

• Apps and code should be adapted to the way NoSQL DBs work

• Initial choice of the deployment options can be changed

• Best way to make initial choice of the deployment – PoC

Page 45: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Thank you!