Constructing a Fault-Tolerant, Highly Available Cloud Infrastructure for your Drupal Site

Webinar Audio Options• Audio will remain quiet until we

begin at the top of the hour

• Streaming Audio• Appears automatically in pop-up window

• Or click Communicate : Join Audio Broadcast

• Remember to unmute your computer

• No Streaming Audio?• Request phone access

• Technical Support• US & Canada 866.229.3239

• International Support 408.435.7088

Thank you for joining! The webinar will begin

shortly.

Housekeeping• Slides and recording will be posted in next

48 hours

• Submit questions via the Q&A Tab in WebEx, we’ll answer as many as we can

• Try it now: tell us where you are joining from

• Hashtags: #acquia #drupal

http://acquia.com/resources/recorded_webinars

http://acquia.com/resources/recorded_webinars

Upcoming Webinars

• Building a Common Drupal Platform for Your Organization Using Drupal 7

• Accessible Theming in Drupal

• Integrating a CDN with Acquia Cloud

• Ensuring Success When Migrating Your Content to Drupal

• OpenPublic & Drupal: Taking the Guesswork Out of Open Source For Government

• Community Box 2.0, mehrsprachige Communities mit Commons

http://acquia.com/resources/webinars

https://www.acquia.com/resources/acquia-tv/conference/building-common-drupal-platform-your-organization-using-drupal-7-1

https://www.acquia.com/resources/acquia-tv/conference/building-common-drupal-platform-your-organization-using-drupal-7-1

https://www.acquia.com/resources/acquia-tv/conference/accessible-theming-drupal

https://www.acquia.com/resources/acquia-tv/conference/integrating-cdn-acquia-cloud

Acquia is Hiring• Do you love working with Drupal?

• Acquia is hiring in North America, Europe, and Australia!

• Engineering / DevOps

• Design

• Support

• Operations

• Client Advisors

• Sales and Marketing

http://acquia.com/careers

Constructing a Fault-Tolerant, Highly Available cloud Infrastructure for your Drupal site

Andrew KenneyVP of Platform Engineering

December 12, 2012

Jess IandiorioSr. Director, Cloud Product Marketing

Creating killer websites is hard …

Hosting them shouldn’t be.

For business-critical sites,How do you avoid a crisis?

Agenda

• Drupal Hosting Challenges

• Cloud Failure Scenarios

• HA & Resiliency

• Resource Challenges

• Designing for Failure

• Architecting & Automating failover

• Testing Failure

Drupal Hosting Challenges

• Drupal expects a POSIX filesystem

• Drupal is not optimized for high-latency MySQL operations

• Drupal is not built with partition tolerance in mind

• Shortage of talent or expertise for operating Drupal in the Cloud or at scale

Cloud Failure Scenarios

• Machine loss

• Service outage

• Network disruption

• Inaccessible/unreliable storage system

• Traffic spike

• Control Plane failure

• Corrupt/Partial Backups

High Availablity & Resiliency

• Plan for Failure

• Automate deployment & configurations

• Eliminate SPOFs

• Two (at least) of everything

• Monitor everything

• Monitor the monitors

• Back up all data

• Periodically test all backups

• Test emergency procedures

• Never assume any procedure works unless it’s periodically tested

Resource Challenges

• Cloud Hype – the cloud frees developers from needing operations staff to do their job

• Cloud Reality – the cloud introduces even more instability unless you plan for failure

Designing for Failure

1. Multiple AZ hosting



2. Multiple region hosting




3. Shared security model





4. Monitoring

Infrastructure & Application Health

Acquia Operations Team

Security Scanning

Acquia Security Team

Monitoring

Web servers

Mon servers

US-West US-East

RackspaceRackspace

PingdomPingdom

External Monitoring





4. Monitoring

5. Recovering from failure

Failover in the Cloud• Amazon Elastic Load Balancers (ELBs) allow for

failover from one Availability Zone (AZ) to another

• Acquia load balancers allow for unhealthy web nodes in any given AZ to be removed from service

• DNS switch allows for failover or promotion of database servers

• Manual DNS switch allows for (one way) failover of a site from one region to another

Testing failover

• Failover and failback should be a scriptable process able to be routinely handled by automated systems or be operations personnel

• Failover scenarios may be useful in events such application deployment or database schema changes

Why not DIY?

• Your core competency is not HA• Let your precious engineering/IT ops staff focus on what’s key

to your organizations success

• Most organization are not 24x7x365• The Internet doesn’t sleep and failure can strike at any time

• Don’t get stuck in the blame game• If your site goes down and you are called upon at an

inconvenient time, you’ll be between the hosting provider or team, and the Drupal application team

Why Acquia?

• White glove service

• 24x7 operations

• Drupal expertise• Operations

• Scalability

• Performance

• HA Offerings• Multi-zone

• Multi-region

Dev Cloud

Acquia’s Continuous Integration Platform for Developers.

• Intuitive development workflow

• Power tools for power users

• Drupal-tuned hosting infrastructure

Managed CloudNever let your best day become your worst.

• White-glove managed service for mission- critical Drupal websites

• Drupal-tuned hosting infrastructure

• HA, elastic resources with multi-region failover

• For more information visit: http://www.acquia.com

• Contact us: [email protected] or 888.9.ACQUIA

• Follow us: @acquia

• Comments welcome:

• [email protected]

• [email protected]

Today’s webinar recording will be posted to:http://acquia.com/resources/recorded_webinars

Questions?

Constructing a Fault-Tolerant, Highly Available Cloud Infrastructure for your Drupal Site

Technology

Transcript of Constructing a Fault-Tolerant, Highly Available Cloud Infrastructure for your Drupal Site