Getting more 9s from your Cloud operations
-
Upload
chamith-kumarage -
Category
Technology
-
view
245 -
download
0
description
Transcript of Getting more 9s from your Cloud operations
![Page 1: Getting more 9s from your Cloud operations](https://reader034.fdocuments.us/reader034/viewer/2022052123/558e69591a28ab0f668b45c0/html5/thumbnails/1.jpg)
![Page 2: Getting more 9s from your Cloud operations](https://reader034.fdocuments.us/reader034/viewer/2022052123/558e69591a28ab0f668b45c0/html5/thumbnails/2.jpg)
● Fail-proof architecture
● Devops tools and utilities
● Monitoring - next level
● Backups and Disaster recovery
● Communication
● Best practices
![Page 3: Getting more 9s from your Cloud operations](https://reader034.fdocuments.us/reader034/viewer/2022052123/558e69591a28ab0f668b45c0/html5/thumbnails/3.jpg)
● Group similar components
● Load distribution is important
● Network level isolation for each group or cluster
● Failover plan for every component
● Someone has to take care of failures
● Design for failures
● Unleash the chaos monkey
“Everything fails all the time” -- Werner Vogels (CTO, Amazon)
![Page 4: Getting more 9s from your Cloud operations](https://reader034.fdocuments.us/reader034/viewer/2022052123/558e69591a28ab0f668b45c0/html5/thumbnails/4.jpg)
Source: http://dev.mysql.com/doc/refman/5.0/en/ha-overview.html
![Page 5: Getting more 9s from your Cloud operations](https://reader034.fdocuments.us/reader034/viewer/2022052123/558e69591a28ab0f668b45c0/html5/thumbnails/5.jpg)
● Every operation must be scripted and tested
● One click operations
● Verification tools are a must!
● Data collecting and reporting tools
● Tools to shorten the pipeline from Dev -> Prod
● Enforce standards
● Documentation has to be a part of tooling
![Page 6: Getting more 9s from your Cloud operations](https://reader034.fdocuments.us/reader034/viewer/2022052123/558e69591a28ab0f668b45c0/html5/thumbnails/6.jpg)
● Are you happy with conventional tools?
● Alert if 1m_load_avg > 5 is not enough
● Analytics is a part of monitoring
● Usage predictions and trend analysis
● Co-relating incidents with logs is very useful
● Simulate user activities
● Be your own Xavier!
![Page 7: Getting more 9s from your Cloud operations](https://reader034.fdocuments.us/reader034/viewer/2022052123/558e69591a28ab0f668b45c0/html5/thumbnails/7.jpg)
● How frequently you backup?
● Alerts for backups
● Verification is a MUST
● Practice DR plan frequently
● Make the DR plan to align with the deployment plan
● Documentation!Source : http://blogger.srvnetwork.com/wp-content/uploads/2010/10/disaster_recovery_plan1.jpg
![Page 8: Getting more 9s from your Cloud operations](https://reader034.fdocuments.us/reader034/viewer/2022052123/558e69591a28ab0f668b45c0/html5/thumbnails/8.jpg)
Source: http://www.accountanttown.com/site/wp-content/uploads/2010/08/sticky_note_backup_small.gif
Source: http://jenniferbrogee.files.wordpress.com/2011/03/backupyourcomputer1.jpg
![Page 9: Getting more 9s from your Cloud operations](https://reader034.fdocuments.us/reader034/viewer/2022052123/558e69591a28ab0f668b45c0/html5/thumbnails/9.jpg)
● Always sound human
● “Our web-monkeys can’t find the page you are looking for”
● Downtimes or failures can be turned into opportunities
● Be honest
● Users are always curious on what’s going on
● Separate communication channel
![Page 10: Getting more 9s from your Cloud operations](https://reader034.fdocuments.us/reader034/viewer/2022052123/558e69591a28ab0f668b45c0/html5/thumbnails/10.jpg)
Source: http://www.transparentuptime.com/2010/06/video-of-my-talk-upside-of-downtime-at.html
![Page 11: Getting more 9s from your Cloud operations](https://reader034.fdocuments.us/reader034/viewer/2022052123/558e69591a28ab0f668b45c0/html5/thumbnails/11.jpg)
Source: http://www.transparentuptime.com/2010/06/video-of-my-talk-upside-of-downtime-at.html
![Page 12: Getting more 9s from your Cloud operations](https://reader034.fdocuments.us/reader034/viewer/2022052123/558e69591a28ab0f668b45c0/html5/thumbnails/12.jpg)
● Staging setup to run parallel
● Verification process after every operation
● Change log and maintenance log
● Use of configuration management
● Manage the complete ALM
● Knowledge sharing
● Culture
Source: http://www.cartoonstock.com/newscartoons/cartoonists/rmo/lowres/business-commerce-best_practice-business_venture-business_model-business_practice-bankrupt-rmon2464l.jpg
![Page 13: Getting more 9s from your Cloud operations](https://reader034.fdocuments.us/reader034/viewer/2022052123/558e69591a28ab0f668b45c0/html5/thumbnails/13.jpg)
[email protected] | @gnuchami