Deployment Best Practices
description
Transcript of Deployment Best Practices
![Page 1: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/1.jpg)
Solutions Architect, 10gen
Sandeep Parikh
#mongodbdays
Deployment Best Practices
![Page 2: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/2.jpg)
Prototype
Test
MonitorScale
Script
The Cycle of Deployment Prep
![Page 3: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/3.jpg)
Prototype Your Deployment• You have to start somewhere
• Development is complete, deployment is next
• Sketch out some initial deployment parameters
Hardware sizingOperating systemDisk setupStorage layout, data vs. journal vs. log
Prototype
Test
MonitorScale
Script
![Page 4: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/4.jpg)
Prototyping Considerations• Additional considerations
– Horizontal vs. vertical scale options– Multiple datacenters
• Start thinking about data growth– Do you know how your data will evolve?– Does your data live in multiple
collections/databases– Read-centric, write-centric or both?
• The more you start thinking about it, the better
Prototype
Test
MonitorScale
Script
![Page 5: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/5.jpg)
Test, Test, Test
• Generate a lot of data– Write tests to measure bulk loading throughput– Scaffolding can be used for staging, validation
• Build your indexes– All in the beginning– On the fly
• Script your app– Can you simulate “expected” usage?
Prototype
Test
MonitorScale
Script
![Page 6: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/6.jpg)
Monitor Your Resources
• Watch everything
• The goal is to understand the numbers before deploying
• Monitor using– SNMP, munin, nagios– mongostat, mongotop, iostat, cpustat– MongoDB Monitoring Service (MMS)
• Other stats– Database, Collection level
Prototype
Test
MonitorScale
Script
![Page 7: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/7.jpg)
Monitoring Key Metrics
• Op Counters– Inserts, updates, deletes,
reads (more is generally better)
– Some differences in primary vs. secondary ops
• Resident memory– Want this lower than
available physical memory– Correlated with page faults
and index misses
• Queues– Readers and writers
Prototype
Test
MonitorScale
Script
![Page 8: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/8.jpg)
Monitoring Key Metrics
• Page faults and B-Tree– How often are you having
to hit the disk– Persistently non-zero?
Working set might not fit.
• Lock Percentage– If high and queues are
filled, hitting write capacity
• IO and CPU Stats– IO Sustained or fluctuating
=> IO bound– CPU hitting IOWAITs
Prototype
Test
MonitorScale
Script
![Page 9: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/9.jpg)
Scale Your Setup
• Monitor those metrics while testing
• Should tell you where to add capacity– CPU, RAM, Disks
• Storage configuration– RAID levels (10 preferred)– Filesystem selection– Block sizing– Readahead setting
Prototype
Test
MonitorScale
Script
![Page 10: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/10.jpg)
Script Your Plays
• Backups
• Restores (backups are not enough)
• Maintenance and Upgrades
• Replica Set operations– Stepping primaries down, adding new secondaries
• Sharding operations– Consistent backups, balancer operations
• Check out the Backup talk later today
Prototype
Test
MonitorScale
Script
![Page 11: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/11.jpg)
Prototype
Test
MonitorScale
Script
Lather, Rinse, Repeat
![Page 12: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/12.jpg)
Perfect. I know what to do.How Do I Do It?
![Page 13: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/13.jpg)
Balancing Priorities
Product Developme
nt
Infrastructure
Development
Integration
QA
Code
Operations
Monitoring
![Page 14: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/14.jpg)
The Scale Tips To One Side• Product development is the priority
– As it should be, but…
• Infrastructure development can’t be overlooked
• Know the downsides of not being prepared– Downtime– Data safety
• Disaster will strike
![Page 15: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/15.jpg)
Integrate With The Dev Cycle• Why are ops typically skipped over until
it’s too late?– Planning
• Make operations development a part of the dev cycle– Put it into the schedule– Make it a development milestone
• Use it to your advantage– Script deployment of development and test
systems
![Page 16: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/16.jpg)
That’s all well and good butwe are already deployed
![Page 17: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/17.jpg)
Let’s Avoid This Situation
![Page 18: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/18.jpg)
Prototype
Test
MonitorScale
Script
Start The Cycle Again
![Page 19: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/19.jpg)
Start With Monitoring
• Monitor your deployment– Munin, nagios– MMS
• Instrument your app– Know your queries– Read/write/update/delete behaviors– Index utilization
• Database and collection stats
Prototype
Test
MonitorScale
Script
![Page 20: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/20.jpg)
Scaling Deployment
• The numbers don’t lie– But individual measurements don’t always tell the
whole story
• Are you hardware bound?– Memory, Disks, CPU
• Is your app the problem?
• What about system settings?– Low Resident Memory > Readahead > Page Faults
Prototype
Test
MonitorScale
Script
![Page 21: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/21.jpg)
Basic Solutions
• Low opcounters + high page faults– More memory
• High paddingFactor and fragmentation– Data model changes
• Balancer running a lot, chunks always migrating– Better shard key
• Persistent b-tree misses, high page faults– Queries aren’t hitting the indexes or aren’t using
them
Prototype
Test
MonitorScale
Script
![Page 22: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/22.jpg)
Continue Through the Cycle• Script your setup
– This will save time as you iterate
• Prototype the fixes– Evaluate queries, how documents change,
expected usage
• Test the new setup– Scripts to build the deployment and model usage
Prototype
Test
MonitorScale
Script
![Page 23: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/23.jpg)
Deployment is aboutNot being surprised
![Page 24: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/24.jpg)
Questions?
![Page 25: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/25.jpg)
How To Get Help
• Ask the Experts sessions
• We are here to help, come find us
• Refer to our docs: docs.mongodb.org (hint: they’re great!)
• Other things we monitor– mongodb-user Google group– Stack Overflow
• Submit a ticket
![Page 26: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/26.jpg)
BackupProblem > Diagnosis > Solution
![Page 27: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/27.jpg)
Problem 1: Social Networking• Suboptimal write throughput
• Where is the bottleneck?– Check the metrics
![Page 28: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/28.jpg)
Diagnosis 1
• Are opcounters reasonably accurate?
• Check the queues
• Examine lock percentages
• How does resident memory look?
• How large are your indexes?
![Page 29: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/29.jpg)
Solution 1
• Opcounters aren’t as high as you’d expect but memory is saturated
• Correlated with high page faults
• You might need more memory
• MongoDB wants to fit your working set into memory
![Page 30: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/30.jpg)
Problem 2: Tracking FB Friends• Update-heavy workload is slow
• Document paddingFactor is increasing
![Page 31: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/31.jpg)
Diagnosis 2
• High paddingFactor– Fragmentation!
• More memory/disk is taken up by new documents– Inefficient space usage
• Documents are having to be relocated regularly
![Page 32: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/32.jpg)
Solution 2
• Check your queries– Are your documents growing because of arrays or
added fields?
• Pre-create required document structure or…
• Kick growing elements individual objects in a separate collection– Data model changes, app changes
![Page 33: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/33.jpg)
Problem 3: Status Updates • Write-heavy sharded deployment
– Is one shard getting burned– Balancer locked all the time
• Balancer is constantly migrating chunks
![Page 34: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/34.jpg)
Diagnosis 3
• Check the mongos logs– How often is migration occurring?– Are chunks constantly moving from one shard to
the next?
• Shard key distribution– Sequential keys?– One shard always getting new writes?
![Page 35: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/35.jpg)
Solution 3
• Consider using hash, byte swapping, etc. if no “natural” key that distributes well– Avoids the “hot” shard problem
• High writes and high balancer lock– Manage balancer window – Run it during low utilization
![Page 36: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/36.jpg)
Problem 4: File Sharing
• Storing files in GridFS
• Uploads are taking too long
![Page 37: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/37.jpg)
Diagnosis 4
• Check CPU and IO stats
• Is the CPU stuck in IOWAITS?
• High sustained IO operations
• Lots of queued operations
• IO bound workload
![Page 38: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/38.jpg)
Solution 4
• Ensure storage is in good health– RAID status– SAN or NAS devices functioning properly– Virtualized disks
• Consider separating data and journal– --directoryperdb– Symlink journal to another location
• Ensure other processes aren’t hitting storage
![Page 39: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/39.jpg)
Problem 5: Reading Logs
• Indexes are underperforming
• Queries are using indexes but yielding quite a bit
![Page 40: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/40.jpg)
Diagnosis 5
• Use .explain() and .hint() with your queries
• Check out the b-tree metrics– Persistent non-zero misses?– Correlated with memory, page faults, IO stats
• B-trees best for range queries over single dimension– Range queries on {A} if index is {A,B} could be
suboptimal
![Page 41: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/41.jpg)
Solution 5
• Revisit your indexing strategy
• Consider data model changes to optimize queries and indexes
• Some functionality doesn’t hit the index– $where javascript clauses– $mod, $not, $ne– Complex regular expressions
![Page 42: Deployment Best Practices](https://reader038.fdocuments.us/reader038/viewer/2022102823/54994573b47959e1298b45ee/html5/thumbnails/42.jpg)
Miscellaneous Deployment Notes
• Warm the cache– Use touch via db.runCommand()
• Dynamically change log levels
• Synchronize all clocks to the same NTP server