Management and Automation of MongoDB Clusters - Slides

Confidential

MongoDB/TokuMX Automation & Management

Jon Tobin

Tokutek

[email protected]

April 24, 2014

Vinay Joosery

Severalnines

[email protected]

mailto:[email protected]



Confidential 2

Webinar Housekeeping

This webinar is being recorded

A link to the recording and to a copy of the slides will be posted on tokutek.com

We welcome questions: enter questions into the chat box and we will respond at the end of the presentation

Think of something later? Email Tokutek at [email protected] Email Severalnines at [email protected]



Confidential 3

Agenda

MongoDB & Automation

What is operational management ?

MongoDB Management caveats

Automation & Management by ClusterControl

Demo

Copyright Severalnines AB

Confidential 4

Database TCO


Source: IDC, Maximizing the Business Value of Enterprise Database Applications

Confidential


5

Developer’s view of the world

App

DB

Confidential 6

DBA’s view of the world


DB

App

Confidential 7

MongoDB and Automation

MongoDB is great for developers

MongoDB not as great for ops folks Lack of operational tools MMS Management: mainly a monitoring tool MMS Automation: in alpha Perhaps not surprising for a 5-yr old product

General-purpose tools can help some E.g., Puppet, Chef However…


Confidential 8

Drawback with Puppet or Chef Puppet/Chef are appropriate for a group of single-node

components E.g. webservers can be clones of each other.. Deploy 10 webservers, they all look the same..

Distributed databases are more complex Different node types Different roles and responsibilities Specific order for procedures

Using e.g. Chef for deploying a distributed database Yes, it is possible How much Chef functionality is actually leveraged vs How

much code is written by user?


Confidential 9

What do Ops folks do?- Deployment

Optimal hardware (CPU/RAM/Disk)

What topology to start with?

Virtualized or barebone? Cloud?

Multi-region or multi-AZ

Good initial configuration settings for DB

OS tuning (high dependency)

Monitoring the DB + underlying OS

Logging


Confidential 10

What do Ops folks do?- performance monitoring

What do you do when the application is slow?

Is it Disk? CPU? RAM? Badly written queries?

What are the symptoms? (Replication Lag, Page Faults, locks, # connections, …)

Do you need to scale?

How do you scale?

Capacity planning


Confidential 11

Vertical vs Horizontal scaling


Confidential

Copyright 2012 Severalnines AB

12

What do Ops folks do?- Availability

Keep the service running

How do you detect something has failed?

Drilling down to root cause

Manual vs automatic failover

How do you avoid failures?

Confidential 13

What do Ops folks do?- Management

Backup and Restore

Software upgrades and rolling restarts

Configuration changes

Adding nodes or shards

Rebalancing of shards

Compaction


Confidential 14

Monitoring is not Management


Confidential 15

Management caveats (1/2) 1 Config server instead of 3

Starting 2 Config servers only not good enough Read-only config – no changes in cluster state No new shards can be added, no new users with userid/pwd, …

> 2 Routing Servers 1 router only is a SPOF

ReplicaSet: odd number of replicas At least 3 to handle voting / network partitioning To build a ReplicaSet, start with a first node. Use init on it. Add other nodes

in the ReplicaSet to it.

Sharding: pre-defined order for procedures Start config servers (start with 1 node, then add the rest to it) Start mongos (routers) (start with 1 router, then add more routers) Build a ReplicaSet and add it as a shard


Confidential 16

Management caveats (2/2)

Backups Lock a node, flush, then take a snapshot For a sharded cluster, a bit more complicated

Config server data need to be saved All shards backed up at same time for cluster-wide

snapshot

Rolling upgrades Configuration change (e.g. moving a node to a more

powerful server), version upgrade/patch, … E.g. 3 node replicaset, do not shut down 2 nodes. 3rd node

will become secondary/read-only.

Defragmentation, resharding, index rebuilds, etc.


Confidential 17

ClusterControlAutomation & Management

Provisioning Auto deploy a Sharded Cluster in minutes On-premise or in the cloud

Monitoring 1sec resolution Both DB and OS stats Realtime and historical

Management Manage multiple clusters Multi data-center Automate failover, upgrades, backups,… One-click scaling


Confidential

Copyright 2013 Severalnines AB

18

Confidential 19

Demo - Manage multiple clusters thru one pane of glass


Oracle

Internal DataCenterEast Coast US

DB

ON-PREMISE APP

Oracle

Internal Data CenterWest Coast US

DB

Oracle

Public Cloud

DB

Confidential


20

Severalnines Customers

Confidential 21

Agenda

Common User Issues

What’s TokuMX™

What are the advantages

What should I monitor

How does Severalnines help

Confidential 22

Common Problems

I can’t ingest sources fast enough

My data is getting too big

I’m spending too much money on infrastructure

DB level locking is slowing me down

Confidential 23

What is TokuMX?

A open-source fork of MongoDB

Uses proprietary Fractal Trees

Keeps MongoDB APIs (no code change)

Replaces storage code

Builds off of 8+ years of MySQL development

Confidential 24

What are the Advantages?

Performance Concurrency (doc level vs DB level) Cache management (defined vs memory mapped) Efficient index maintenance (No IO req’d [Fractal Tree])

Compression Large blocks (4MB) 3 libraries (quicklz, zlib, lzma) Flash friendly (<reads/writes)

Transactions MVCC consistent reads (consistent snapshot of data) Multistatement commit/rollback

Confidential 25

What Should I Monitor?

Mongo Performance opcounters

Cache Use Effectiveness of memory

Space % full Compression

Disk Utilization What’s utilizing my disk(s)

Confidential 26

Performance

Opcounters let you know how your app is using the database

Establish a baseline for normal behavior

Confidential 27

Severalnines Performance

Confidential 28

Cache Use

Want to know how effective your cache is

When you need to expand cache

Confidential 29

Severalnines Cache Use

How Much?

Are they clean?

Is it Effective?

Confidential 30

Space

Don’t run out of space

Compression ratiowill help predict

future needs

Confidential 31

Disk Utilization

Can be tricky

No one thing causes IO

Helps to troubleshoot if you can narrow it to reads or writes

Baselines can help decrease time to resolve

Confidential 32

Severalnines Measuring Utilization

miss = read

If fsync rises…

checkpoints COMING IN TokuMX 1.4.2!

Confidential 33

Everything Else TokuMX tends to trade IO utilization for CPU

Compression and decompression FT maintenance

Just monitor your CPUs like any other resource SeveralNines is exceptional at this…try it for yourself

db.serverStatus() is your friend We’re moving interesting stats there to make it easier to monitor

***THIS IS ABNORMAL BEHAVIOR MEANT SPECIFICALLY FOR ILLUSTRATION!!

Confidential 34

Tokutek Customers

Confidential 35

Thank You! TokuMX

http://www.tokutek.com/products/tokumx-for-mongodb/

ClusterControl for MongoDB www.severalnines.com/clustercontrol

Severalnines Blog (severalnines.com/blog)

Tokutek Blog (tokutek.com/tokuview)

More Questions? Contact us at: [email protected] [email protected]

http://www.severalnines.com/clustercontrol



Management and Automation of MongoDB Clusters - Slides

Technology

Transcript of Management and Automation of MongoDB Clusters - Slides