Living room sessions: war stories | Altitude NYC

37
I’m a Failure! (and so can you) Kenton Jacobsen Director of Engineering, Vogue.com, Glamour.com at Condé Nast

Transcript of Living room sessions: war stories | Altitude NYC

I’m a Failure! (and so can you)

Kenton JacobsenDirector of Engineering,

Vogue.com, Glamour.comat Condé Nast

Everyone will make mistakes

Information is imperfect

Failure is rarely a single thing

Complex systems have complex failures

Sometimes there is a single point of failure

There shouldn’t be a single button / command / event that requires manual intervention

Fundamental surprise

Failures happen in ways that cannot be predicted

Some things can only be observed in production

Keep fire escapes clear

Keep the cost of failure low

Limited blast radius

http://www.kitchensoap.com/2013/09/30/learning-from-failure-at-etsy/ https://codeascraft.com/speakers/john-allspaw-outages-post-mortems-and-human-error/ https://www.slideshare.net/kellan/continuous-deployment-7300057

Risk

Velo

city

Scientific Graph

Fail fast, fix faster

Recovering from Failure with Fastly

@sethvargo�

Open Sourcefastly.com/open-source

✓ 14,243,908 downloads

✓ 549.9 TB bandwidth

✓ 62.6 GB/hour

✓ 80 requests/minute

✓ 96% cache coverage

@sethvargo�

The dashboard not changing color is related to S3 issue. See the banner at the top of the dashboard for updates.

@awscloudAmazon Web Services

2:17 PM - 28 Feb 2017

@sethvargo�

www.terraform.io

@sethvargo�

HTMLHTML

HTML

@sethvargo�

PURGE-KEYspider-all

@sethvargo�

PURGE-KEYspider-all

@sethvargo�

Spider Failed

@sethvargo�

We PURGED ALL'd

@sethvargo�

Fastly Still Had the Data

💩Well

Fastly Still Had the Data

But Fastly still had the data

HOSTNAME

terraform.io

URL

/index.html

###GENERATION###

139042

+ +

CACHE KEY

@sethvargo�

HOSTNAME

terraform.io

URL

/index.html

###GENERATION###

139042

+ +

CACHE KEY

139043PURGE ALL =>

@sethvargo�

HOSTNAME

terraform.io

URL

/index.html

###GENERATION###

139042

+ +

CACHE KEY

PURGE ALL =>HOSTNAME

terraform.io

URL

/index.html

###GENERATION###

139042

+ +

CACHE KEY

139043

@sethvargo�

PURGE ALL =>HOSTNAME

terraform.io

URL

/index.html

###GENERATION###

139042

+ +

CACHE KEY

139043

HOSTNAME

terraform.io

URL

/index.html

###GENERATION###

139042

+ +

CACHE KEY

@sethvargo�

PURGE ALL =>HOSTNAME

terraform.io

URL

/index.html

###GENERATION###

139042

+ +

CACHE KEY

139043

HOSTNAME

terraform.io

URL

/index.html

###GENERATION###

139042

+ +

CACHE KEY

They did that@sethvargo�

Mitigating Future Failures

@sethvargo�

Cache-Control

Surrogate-Control

Cache-Control

@sethvargo�

Stale-If-Error

Stale-While-Revalidate

@sethvargo�

$ curl \ --request PURGE --header "Fastly-Soft-Purge:1" https://api.fastly.com/purge/service/ab12/purge/site-tf

@sethvargo�

Thank you!

@sethvargo�

Poisoning cachesNiklas [email protected]@protocol7

Production storage

Client accesspointProduction

storage(nginx)

Proxy Master storage

Goggle Cloud Storage

Accept-Ranges:bytes

Client accesspointProduction

storage(nginx)

ProxyMaster storage(GCS)

🔥🔥🔥🔥 ✅ ✅ ✅☑