High-performance high-availability Plone

45
High Availability High Performance Plone Guido Stevens [email protected] www.cosent.nl Social Knowledge Technology

description

Presentation at the Plone Conference Brazil 2013. How to create a Plone deployment that performs like crazy and survives not only a datacenter failure, but even keeps on running when all Plone heads are down.

Transcript of High-performance high-availability Plone

Page 1: High-performance high-availability Plone

High AvailabilityHigh Performance

Plone

Guido [email protected]

www.cosent.nlSocial Knowledge Technology

Page 2: High-performance high-availability Plone

Plone Worldwide

Page 3: High-performance high-availability Plone

Resilience

Page 4: High-performance high-availability Plone

Please wave, to improve my speech

Page 5: High-performance high-availability Plone
Page 6: High-performance high-availability Plone

Plone as usual

● Aspeli: über-buildout for a production Plone server

● Regebro: Plone-Buildout-Example

– nginx frontend

– varnish cache

– haproxy balancer

– 4x plone instance

– zeo backend

Page 7: High-performance high-availability Plone

Plone as usual

Page 8: High-performance high-availability Plone

Plone as usual

webserver :80

Page 9: High-performance high-availability Plone

Plone as usual

caching

Page 10: High-performance high-availability Plone

Plone as usual

balancing across Plone instances

Page 11: High-performance high-availability Plone

Plone as usual

Plone instances

Page 12: High-performance high-availability Plone

Plone as usual

ZEO backend

Page 13: High-performance high-availability Plone

Meet the client

● High-profile internet technology NGO

● Slashdot traffic levels

– 0.4 million page views / peak day

– 4 million page views / month

– 40 million hits / month

● Mission-critical web presence

● 100% uptime previous 5 years

● Non-Plone sysadmins

● High security

Page 14: High-performance high-availability Plone

No can do

SPOF

SPOF

WTF?

Page 15: High-performance high-availability Plone

Architecture Goals

● Must convince “file-based 100% uptime” sysadmins

● No SPOF

– eliminate all Single Points Of Failure

● Automated failover

– no manual intervention

● Extreme performance

● Extreme resilience

– killall -9 Plone

Page 16: High-performance high-availability Plone

Meet Paul Stevens

● My brother

● mod_wodan + DBmail

● Plone developer

● pjstevns on irc/github/etc

NFG Net Facilities Group

● premium hosting

● 24/7 MySQL HA

– since stone age

● www.nfg.nl

Page 17: High-performance high-availability Plone

Plone as usual

Page 18: High-performance high-availability Plone

3-tier

Page 19: High-performance high-availability Plone

Plone as usual

Page 20: High-performance high-availability Plone

Duplicate setup

Page 21: High-performance high-availability Plone

Load Balancer

Page 22: High-performance high-availability Plone

Load Balancer

● Client provided hardware load balancer

● Alternative: Linux Virtual Server + HAproxy

– 2x HAproxy in active/passive config● this would be an EXTRA layer of HAproxy not shown in diagram

– use highly available “virtual” IP address

– monitor with Heartbeat or comparable

– failover virtual IP addres with arping broadcasts

● Alternative: AWS

Page 23: High-performance high-availability Plone

Load Balancer

Page 24: High-performance high-availability Plone

Ensure physical separation

● Ensure redundancy across physical servers

– no use to fail over on same machine

– separate machines in separate data centers

● Gotcha: moving virtuals around

– Disable HA facilities of virtualization platform

– We'll do our own HA

Page 25: High-performance high-availability Plone

Full cluster

Page 26: High-performance high-availability Plone

Replacing ZEO

Page 27: High-performance high-availability Plone

ZEO versus Relstorage

● ZEO

– ZEO protocol

– filestorage

– object pickles

● ZRS Replication

– $$$ at the time

– later opensourced

● No hot-failover

– slave master reconfig→

● Relstorage

– ZEO protocol

– MySQL or PostgreSQL

– object pickles: no alchemy!

● MySQL replication

– done that 24/7 since 2001

– widely used

● Hot failover

– multi-master

Page 28: High-performance high-availability Plone

Relstorage on MySQL

Page 29: High-performance high-availability Plone

Blobstorage

● Not shown in diagram

● Client provided Netapp Metrocluster NFS disks

– no need to care about replication and HA for those

● Alternatives:

– DRBD + NFS

– AWS Elastic Block Device

– F-sniper + rsync + NFS

● Why not run database on that?

– disk replication + NFS + ZEO

– what can possibly go wrong?

Page 30: High-performance high-availability Plone

Full cluster

Page 31: High-performance high-availability Plone

Apache + Wodan

Page 32: High-performance high-availability Plone

mod_wodan

● Caching module for Apache

– C

– Originally by ICS for nu.nl

– Now maintained by NFG

● Store response body + headers on disk

● BOFH attitude to caching policies

● Used in anger

● Alternative: stxnext.staticdeployment

Page 33: High-performance high-availability Plone

Varnish ↔ Wodan

● Proxy process

● RAM memory cache

– restart → empty cache

– expired → gone

● Plays nice

– request + response headers

– etag split-view

● purge API

– plone.app.caching

● Apache module

● Persistent disk cache

– restart full cache→

– expired keep fallback→

● BOFH

– my way or the highway

– single cache file per page

● Cronjobs maintenance

– crawl sitemap

– delete removed pages

Page 34: High-performance high-availability Plone

Varnish plus Wodan

Varnish● unload Plone

● plone.app.caching policies

– pages 1 hour

– resources longer

– purge on edit

● etag split-view

– per-user page versions

– cache authenticated

Wodan● failsafe content delivery

● hard policy config

– pages 1 minute

– resources longer

– edit 1-minute refresh→

● Gotcha: anonymous only

– editors bypass Wodan

Page 35: High-performance high-availability Plone

Failure Modes

Page 36: High-performance high-availability Plone

Full cluster

Page 37: High-performance high-availability Plone

MySQL failover

Page 38: High-performance high-availability Plone

Multi Master MySQL

● multi-master

– cross replication● each slaves the other

– any can be master● hot failover and failback

● Gotcha: use only 1 master at a time

– Relstorage is not multi-master

– avoid replication errors

● mmm_agent server (not shown in diagram)

– monitors mysql health and replication

– manages virtual MySQL HA ip address● think: Heartbeat for MySQL

Page 39: High-performance high-availability Plone

Blade failure

Page 40: High-performance high-availability Plone

Wodan only

Page 41: High-performance high-availability Plone

Plone as usual

file-basedcontentdelivery

Page 42: High-performance high-availability Plone

Readonly Rescue Mode

● File-based content delivery

– mod_wodan

– full cache of all pages + resources

– cached search results (Subject / tag cloud)

● AJAX-driven graceful degradation

– detect backend down via non-cached lightweight view● @@ipaddress not a full page: minimal rendering overhead

– disable interactive elements via CSS● search bar, personal tools display:none→

● Gotcha: anonymous only

– down for authenticated until manual reconfig→

● Gotcha: ErrorDocument

– pre-cache nice page but preserve http error status code→

Page 43: High-performance high-availability Plone

No-downtime maintenance

Page 44: High-performance high-availability Plone

Full cluster

Page 45: High-performance high-availability Plone

cosent.nl/blog