Managing RightScale on RightScale
-
Upload
rightscale -
Category
Technology
-
view
762 -
download
0
Transcript of Managing RightScale on RightScale
1
Managing RightScale on RightScale
Rafael H. Saavedra
VP of Engineering
2
Topics• RightScale managed by RightScale
• Meta, production, staging & development
• An overview of the production system
• Quis Custodiet Ipsos Custodes
• Deploying RightScale – best practices
• What we love about using RightScale
• Features that are difficult to use
3
RightScale Production
RightScale: Cloud Management Platform
Customer A Customer DCustomer B Customer C
4
RightScaleProduction
RightScale: Cloud Management PlatformRightScale Meta
Production
RightScaleStaging
Customer A Customer D
RightScaleDevelopment
RightScaleDevelopment
5
A multitude of RightScale systems• Meta Production currently lives outside the cloud
• Use only to manage the production system• Only RightScale ops accounts
• Production: my.rightscale.com• Reaching 200 servers, a large fraction in EC2 us-east• Servers in every cloud to achieve high availability• Servers allocated in well defined availability zones
• A few staging systems used for integration and QA• Ad hoc systems for performance testing, demos, betas
• Many development systems with simplified configurations• A development system at the click of a button
6
Significant increase in cloud usage
N-08 D-08 J-09 F-09 M-09 A-09 M-09 J-09 J-09 A-09 S-09 O-09 N-09 D-09 J-10 F-10 M-10 A-10 M-10 J-10 J-10 A-10 S-10 O-10
EC
2 U
sag
e
N-08 D-08 J-09 F-09 M-09 A-09 M-09 J-09 J-09 A-09 S-09 O-09 N-09 D-09 J-10 F-10 M-10 A-10 M-10 J-10 J-10 A-10 S-10 O-10
EC
2 U
sa
ge
7
Some interesting RightScale numbers• 1.65M servers launched by RightScale
• RightScale continuously monitors more than 60k servers
• Every day at RightScale:• 2,000 array resize actions are executed• 35,000 alert escalations are triggered• 20,000 escalation emails are sent to users• 9.0TB of monitoring data is exchange with our servers• 1.6TB of logging data is sent to our servers
8
RightScale production – simplifiedd
aem
on
s
DB Master
DB Slave
da
tab
ase
sm
irro
rs
log
gin
gm
on
ito
rin
g
Front Ends
da
shb
oar
d
AP
I
Main App oth
ers
9
What is that our users do?• Dashboard, API, monitoring graphs & event notifications• Most of the requests are monitoring updates 85% (70%)• Dashboard and API represent 7% of requests but 26% of
traffic
Monitoring85%
Noti-fica-tions8%
API6%
Dashboard1%
Distribution by Requests
Monitoring70%
Noti-fica-tions4%
API15%
Dashboard11%
Distribution by Bandwidth
10
We eat our own dog food• Production servers organized into independent deployments
• Core servers: frontends, core/api servers, databases, daemons
11
We eat our own dog food• Extensive use of security groups to isolate servers
• ServerTemplates are maintained for each major release• Ability to launch exact configurations of past versions
12
Monitoring, alerts & escalations• Monitor as much as possible, what is relevant and display it
in insightful ways
• The need to quickly detect patterns and abnormalities
• Proactively eliminate the conditions that raise critical alerts• No broken windows policy
APIs Cores
13
Quis Custodiet Ipsos Custodes?*• The need to monitor the monitoring and alerting systems
• Extensive use of alerts to monitor the responsiveness of all the RightScale servers
• Instance and EBS failures gives us headaches
• Decoupling the meta & production monitoring and alerting systems
* Who watches the watchmen?
14
How to Monitor hundreds of servers?• Starting to use
stacked graphs & heat maps
• The need to quickly detect patterns and abnormalities
15
Our favorite RightScale features• RightImages: never again the need to build custom images
• Input inheritance: makes it easy to keep the configurations of dozens of servers in sync
• ServerTemplates: very easy to reproduce configurations in production, staging and development
• The Library: there is always an example of something new that can be adapted to our needs
• Monitoring: easy to make a collectd plugins to monitor just about anything
16
Our not so favorite features• ServerTemplate inputs: powerful but too many of them make
templates difficult to use
• Revision management: a way to go to make users aware of new revisions and version and how to update
• The Library: checking out new resources from library is not easy
• Alerts: they work pretty well but they are not easy to configure, in particular, custom ones
17
Best practices: upgrading RightScale• Avoid upgrading existing servers; instead launch fresh ones
with new software (fail forward)• Not possible on some components, e.g. monitoring servers, which are
in the hundreds
• The cost of duplicating servers is minimal
• Old servers can take over in case something goes wrong
• Launch additional slaves to capture recovery points• One slave continues to replicate in case of master failure• Another slave is frozen at upgrade point – can rollback by failing over• Don’t forget to take snapshots in case of major failure
18
Front Ends
DB Slave
Databases
DB Master
Main App
Upgrading RightScale: Step by Step
Main App
DB Slave
take snapshot at cutoff
stop replication
servers with new code
servers with old code
cut access to site
stop all access to DB
19
reconnect all servers
Front Ends
DB Slave
Databases
DB Master
Main App
Upgrading RightScale: Step by Step
Main Appsnapshot at cutoff
servers with new code
servers with old code
DB Slave
stop replication
20
Front Ends
Main App
Upgrading RightScale: Step by Step
Main App
servers with new code
servers with old code
open access to site reconnect all servers
DB Slave
Databases
DB Master
DB Slave
21