Management and Automation of MongoDB Clusters - Slides
-
Upload
severalnines-ab -
Category
Technology
-
view
2.479 -
download
0
description
Transcript of Management and Automation of MongoDB Clusters - Slides
Confidential
MongoDB/TokuMX Automation & Management
Jon Tobin
Tokutek
April 24, 2014
Vinay Joosery
Severalnines
Confidential 2
Webinar Housekeeping
This webinar is being recorded
A link to the recording and to a copy of the slides will be posted on tokutek.com
We welcome questions: enter questions into the chat box and we will respond at the end of the presentation
Think of something later? Email Tokutek at [email protected] Email Severalnines at [email protected]
Confidential 3
Agenda
MongoDB & Automation
What is operational management ?
MongoDB Management caveats
Automation & Management by ClusterControl
Demo
Copyright Severalnines AB
Confidential 4
Database TCO
Copyright Severalnines AB
Source: IDC, Maximizing the Business Value of Enterprise Database Applications
Confidential
Copyright Severalnines AB
5
Developer’s view of the world
App
DB
Confidential 6
DBA’s view of the world
Copyright Severalnines AB
DB
App
Confidential 7
MongoDB and Automation
MongoDB is great for developers
MongoDB not as great for ops folks Lack of operational tools MMS Management: mainly a monitoring tool MMS Automation: in alpha Perhaps not surprising for a 5-yr old product
General-purpose tools can help some E.g., Puppet, Chef However…
Copyright Severalnines AB
Confidential 8
Drawback with Puppet or Chef Puppet/Chef are appropriate for a group of single-node
components E.g. webservers can be clones of each other.. Deploy 10 webservers, they all look the same..
Distributed databases are more complex Different node types Different roles and responsibilities Specific order for procedures
Using e.g. Chef for deploying a distributed database Yes, it is possible How much Chef functionality is actually leveraged vs How
much code is written by user?
Copyright Severalnines AB
Confidential 9
What do Ops folks do?- Deployment
Optimal hardware (CPU/RAM/Disk)
What topology to start with?
Virtualized or barebone? Cloud?
Multi-region or multi-AZ
Good initial configuration settings for DB
OS tuning (high dependency)
Monitoring the DB + underlying OS
Logging
Copyright Severalnines AB
Confidential 10
What do Ops folks do?- performance monitoring
What do you do when the application is slow?
Is it Disk? CPU? RAM? Badly written queries?
What are the symptoms? (Replication Lag, Page Faults, locks, # connections, …)
Do you need to scale?
How do you scale?
Capacity planning
Copyright Severalnines AB
Confidential 11
Vertical vs Horizontal scaling
Copyright Severalnines AB
Confidential
Copyright 2012 Severalnines AB
12
What do Ops folks do?- Availability
Keep the service running
How do you detect something has failed?
Drilling down to root cause
Manual vs automatic failover
How do you avoid failures?
Confidential 13
What do Ops folks do?- Management
Backup and Restore
Software upgrades and rolling restarts
Configuration changes
Adding nodes or shards
Rebalancing of shards
Compaction
Copyright Severalnines AB
Confidential 14
Monitoring is not Management
Copyright Severalnines AB
Confidential 15
Management caveats (1/2) 1 Config server instead of 3
Starting 2 Config servers only not good enough Read-only config – no changes in cluster state No new shards can be added, no new users with userid/pwd, …
> 2 Routing Servers 1 router only is a SPOF
ReplicaSet: odd number of replicas At least 3 to handle voting / network partitioning To build a ReplicaSet, start with a first node. Use init on it. Add other nodes
in the ReplicaSet to it.
Sharding: pre-defined order for procedures Start config servers (start with 1 node, then add the rest to it) Start mongos (routers) (start with 1 router, then add more routers) Build a ReplicaSet and add it as a shard
Copyright Severalnines AB
Confidential 16
Management caveats (2/2)
Backups Lock a node, flush, then take a snapshot For a sharded cluster, a bit more complicated
Config server data need to be saved All shards backed up at same time for cluster-wide
snapshot
Rolling upgrades Configuration change (e.g. moving a node to a more
powerful server), version upgrade/patch, … E.g. 3 node replicaset, do not shut down 2 nodes. 3rd node
will become secondary/read-only.
Defragmentation, resharding, index rebuilds, etc.
Copyright Severalnines AB
Confidential 17
ClusterControlAutomation & Management
Provisioning Auto deploy a Sharded Cluster in minutes On-premise or in the cloud
Monitoring 1sec resolution Both DB and OS stats Realtime and historical
Management Manage multiple clusters Multi data-center Automate failover, upgrades, backups,… One-click scaling
Copyright Severalnines AB
Confidential
Copyright 2013 Severalnines AB
18
Confidential 19
Demo - Manage multiple clusters thru one pane of glass
Copyright Severalnines AB
Oracle
Internal DataCenterEast Coast US
DB
ON-PREMISE APP
Oracle
Internal Data CenterWest Coast US
DB
Oracle
Public Cloud
DB
Confidential
Copyright Severalnines AB
20
Severalnines Customers
Confidential 21
Agenda
Common User Issues
What’s TokuMX™
What are the advantages
What should I monitor
How does Severalnines help
Confidential 22
Common Problems
I can’t ingest sources fast enough
My data is getting too big
I’m spending too much money on infrastructure
DB level locking is slowing me down
Confidential 23
What is TokuMX?
A open-source fork of MongoDB
Uses proprietary Fractal Trees
Keeps MongoDB APIs (no code change)
Replaces storage code
Builds off of 8+ years of MySQL development
Confidential 24
What are the Advantages?
Performance Concurrency (doc level vs DB level) Cache management (defined vs memory mapped) Efficient index maintenance (No IO req’d [Fractal Tree])
Compression Large blocks (4MB) 3 libraries (quicklz, zlib, lzma) Flash friendly (<reads/writes)
Transactions MVCC consistent reads (consistent snapshot of data) Multistatement commit/rollback
Confidential 25
What Should I Monitor?
Mongo Performance opcounters
Cache Use Effectiveness of memory
Space % full Compression
Disk Utilization What’s utilizing my disk(s)
Confidential 26
Performance
Opcounters let you know how your app is using the database
Establish a baseline for normal behavior
Confidential 27
Severalnines Performance
Confidential 28
Cache Use
Want to know how effective your cache is
When you need to expand cache
Confidential 29
Severalnines Cache Use
How Much?
Are they clean?
Is it Effective?
Confidential 30
Space
Don’t run out of space
Compression ratiowill help predict
future needs
Confidential 31
Disk Utilization
Can be tricky
No one thing causes IO
Helps to troubleshoot if you can narrow it to reads or writes
Baselines can help decrease time to resolve
Confidential 32
Severalnines Measuring Utilization
miss = read
If fsync rises…
checkpoints COMING IN TokuMX 1.4.2!
Confidential 33
Everything Else TokuMX tends to trade IO utilization for CPU
Compression and decompression FT maintenance
Just monitor your CPUs like any other resource SeveralNines is exceptional at this…try it for yourself
db.serverStatus() is your friend We’re moving interesting stats there to make it easier to monitor
***THIS IS ABNORMAL BEHAVIOR MEANT SPECIFICALLY FOR ILLUSTRATION!!
Confidential 34
Tokutek Customers
Confidential 35
Thank You! TokuMX
http://www.tokutek.com/products/tokumx-for-mongodb/
ClusterControl for MongoDB www.severalnines.com/clustercontrol
Severalnines Blog (severalnines.com/blog)
Tokutek Blog (tokutek.com/tokuview)
More Questions? Contact us at: [email protected] [email protected]