Post on 12-Jun-2015
description
Protect your app from OutagesRon Zavner, Applications Architect at Gigaspaces
February 2013
2
AWS and outages Outage impact Disaster Recovery – it’s all about redundancy! Cloudify as a solution for redundancy Demo with Cloudify on EC2
® Copyright 2013 GigaSpaces Ltd. All Rights Reserved
AGENDA
3
AWS USAGE
Managing Big Data on the Cloud
• AWS – around 0.5M servers• Facebook – less than 0.1M servers• Google – around 1M servers
4
THE OUTAGE PROBLEM
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved5
OUTAGE – APRIL 21, 2011
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved6
OUTAGE - JUNE 29, 2012
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved7
OUTAGE - OCTOBER 22, 2012
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved8
OUTAGE - CHRISTMAS EVE 2012
9
THAT’S WHAT YOU EXPECT?
Managing Big Data on the Cloud
99% - 3.65 days downtime99.9% - 8.76 hours downtime99.99% - 53 minutes downtime99.999% - 5.26 minutes downtime
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved10
OUTAGE IMPACT – DESIGN FOR FAILURES
Outage could cost…$89K per hour for Amadeus$225K per hour for PayPal!
11
DISASTER RECOVERY
12
MULTI CLOUD
Managing Big Data on the Cloud
13
PREPARE FOR DISASTER RECOVERY
Managing Big Data on the Cloud
•Dedicated expert for DR architecture•Define target recovery time & point•Assume every tier can fail•Use monitoring and alerts•Document your operational processes
14
CHAOS MONKEY
Managing Big Data on the Cloud
15
It’s all about REDUNDANCY!
16
CLONE YOUR ENVIORMENT
Managing Big Data on the Cloud
17
CLONE YOUR DATA
•RDS Read Replica•More to come…
18
You must use an AUTOMATION layer
CLOUDIFY POSITIONING IN THE CLOUD STACK
19
PaaS
IaaS
DevOps(Automation)
Productivity
Control
ChefPuppet
CloudFoundryHeroku
GAEOpenShift
Rightscale
Public clouds(AWS, Rackspace,..) Private clouds
(Vmware, OpenStack..)
High productivity with full control
Enstratus
CLONE YOUR ENV - HOW DOES IT WORK?
® Copyright 2012 GigaSpaces. All Rights Reserved21
EXTENSIVE PLATFORM SUPPORT
22
USE ANY CLOUD
Managing Big Data on the Cloud
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved23
GETTING COMPUTE RESOURCES IN A PORTABLE WAY
compute { template "SMALL_LINUX"}
SMALL_LINUX : template imageId "us-east-1/ami-76f0061f“ remoteDirectory "/home/ec2-user/gs-files“ machineMemoryMB 1600 hardwareId "m1.small" locationId "us-east-1" localDirectory "upload" keyFile "myKeyFile.pem"
options ([ "securityGroups" : ["default"]as
String[], "keyPair" : "myKeyFile"])
overrides (["jclouds.ec2.ami-query":"",
"jclouds.ec2.cc-ami-query":""])privileged true
}
SMALL_LINUX : template{ imageId "1234" machineMemoryMB 3200 hardwareId "103" remoteDirectory "/root/gs-files" localDirectory "upload" keyFile "gigaPGHP.pem" options ([ "openstack.securityGroup" : "default", "openstack.keyPair" : "gigaPGHP"
])privileged true
}
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved24
DATA REPLICATION
• Cloudify Replicated MySQL Recipe• Generic replication service using WAN Gateway
GENERIC REPLICATION SERVICE OVER WAN
Hong Kong
London
New York
In-Memory Speed High Availability and Self-HealingScalable and Efficient
26
Real Life Scenario
VERIFI (CURRENT) DEPLOYMENT ARCHITECTURE
27
Availability region (US-West: Oregon)
Data VolumeInternet EC2 Instance
mod_cluster
EC2 Instance
JBoss
Data Volume
EC2 Instance
EC2 Instance
PostgresSQL
Cassandra
4 recipes
TARGET ARCHITECTURE
Availability Region (US-West Oregon)
Data Volume
Internet EC2 Instance
mod_cluster
EC2 Instance
JBoss
Data Volume
Postgres MasterEC2 Instance
EC2 Instance
Cassandra
Availability Region (US-East Virginia)
Data Volume
EC2 Instance
mod_cluster
EC2 Instance
JBoss
Data Volume
Postgres SlaveEC2 Instance
EC2 Instance
Cassandra
replication
Bootstrap two EC2 clouds in different regions, install the “verifi” application on each. The second cloud will have a slightly modified (extended) postgres recipe for acting as a slave + no running app servers. Upon the primary zone failure, the second cloud will spin up instances of the app servers and turn the data instance into master, then bootstrapping another “slave” cloud in another zone.
FAILOVER SCENARIO
29
Region (US-West Oregon)
App ServersPostgresSQL
Region (US-East Virginia)
PostgresSQL
Cloud #1 Cloud #2
Region (US-East Virginia )
PostgresSQL
Cloud #1 Cloud #2
XApp Servers
Region (US-West California)
PostgresSQL
Cloud #3
Region failure occurs
Bootstrap another cloud in a different region using the same application recipe used to bootstrap cloud #2 above*
1 2 3
Liveness poll
Liveness poll
0 Upon initial deployment, the primary deployment of the application will be bootstrapped onto cloud #1, another slightly modified application recipe will be bootstrapped as cloud #2, polling cloud #1 for failure, and acting as a PostgresSQL db slave.
Turn Postgres slave into master, Start app server instances*
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved30
DEMO ON EC2 - 5 MINUTES SETUP
/* Credentials - You must enter your * cloud provider account credentials */
user="ENTER_USER_HERE"apiKey="ENTER_API_KEY_HERE"keyFile="ENTER_KEY_FILE_HERE"keyPair="ENTER_KEY_PAIR_HERE"
// Advanced usage
hardwareId="m1.small"locationId="us-east-1"linuxImageId="us-east-1/ami-1624987f"ubuntuImageId="us-east-1/ami-82fa58eb"
31
AWS and outages Outage impact Disaster Recovery – it’s all about redundancy!
Cloning your environment – app stack Cloning your DB – Replication
Cloudify as a solution for Redundancy Use recipes to work on any cloud Fast and customized data replication
Demo with Cloudify on EC2
® Copyright 2013 GigaSpaces Ltd. All Rights Reserved
SUMMARY
32
Thank You!RonZ@gigaspaces.com
® Copyright 2013 GigaSpaces Ltd. All Rights Reserved
QUESTIONS & ANSWERS