Avoiding (noticable) Failure, Building Resilience

37
SHOPCADE Getting Ridiculously Good Uptime noticeable Avoiding Failure, Building Resilience

Transcript of Avoiding (noticable) Failure, Building Resilience

Page 1: Avoiding (noticable) Failure, Building Resilience

SHOPCADE

Getting Ridiculously Good Uptime

noticeableAvoiding Failure, Building Resilience

Page 2: Avoiding (noticable) Failure, Building Resilience

Evan Adelman, CTO16 years growing tech teams developing social media platforms,

financial applications and brand campaigns, as co-founder of

Mutant Media in NY, Tech Director at M&C Saatchi London, and

co-founder Shopcade

Page 3: Avoiding (noticable) Failure, Building Resilience

The discovery engine for social shoppers (through content &

personalisation) and be the last point of sale in a purchase decision.

Page 4: Avoiding (noticable) Failure, Building Resilience

SHOPCADE CAPABILITIES

In-house photo and video based

content, native ads

Technology provider

Direct & Affiliate Sale Integration

Widgets For

Media & BloggersMobile APIs, Dev operations, Social integration, Trigger based

CRM, Data analysis

Page 5: Avoiding (noticable) Failure, Building Resilience

91% organic web growth

30% organic mobile growth

45% like or create content on mobile

900,000Registered to newsletters

WHAT WE’VE ACHIEVED SO FAR

5m 19sDwell time on mobile apps

20% spend over 2 hours daily

1.2 Million+registered members

(360K mobile app downloads)

10.6% click to purchase

82% female users, 55% UK users

Page 6: Avoiding (noticable) Failure, Building Resilience

Let’s talk about

1. What’s driving cloud based infra

2. Use case of agile dev from Shopcade

3. Resilience / failure acceptance (or lack thereof)

4. What we’re doing about it

Page 7: Avoiding (noticable) Failure, Building Resilience

Moving fast enables us to build more

things and learn faster.

However, as most companies grow,

they slow down too much because

they’re more afraid of making mistakes

than they are of losing opportunities by

moving too slowly.

We have a saying: “Move fast and

break things.” The idea is that if you

never break anything, you’re probably

not moving fast enough.

Page 8: Avoiding (noticable) Failure, Building Resilience

Planning is guessing

Unless you’re a fortune-teller,

long-term business planning is a

fantasy. There are just too many

factors that are out of your hands:

market conditions, competitors,

customers, the economy, etc.

Writing a plan makes you feel in

control of things you can’t actually

control.

Page 9: Avoiding (noticable) Failure, Building Resilience

What happens when you

mix Gamification,

eCommerce, and Social?Shopcade circa 2011

Page 10: Avoiding (noticable) Failure, Building Resilience

Following these principles

we were able to iterate quickly

from this:

Page 11: Avoiding (noticable) Failure, Building Resilience

Circa 2011

Page 12: Avoiding (noticable) Failure, Building Resilience

Circa 2012, 10 million products

Page 13: Avoiding (noticable) Failure, Building Resilience

Circa 2013, 100 million products

Page 14: Avoiding (noticable) Failure, Building Resilience

Successes by 2014

Page 15: Avoiding (noticable) Failure, Building Resilience

To This

Page 16: Avoiding (noticable) Failure, Building Resilience

2015, 54 million products,

content targeted to fashionable girls in the UK, US, & IN

Page 17: Avoiding (noticable) Failure, Building Resilience

1.6 million products deltas/day

200k added products per day

78 million likes on

users, products, brands

500k daily alertspush notifications and eCRM

Our Infrastructure supports

200 million eventsDriving eCRM

baseline 15k

concurrently

registering usersautoscaling beyond

500,000 deal listsautomatically generated daily for users

8 people (inc me) in tech team covering

Database + native app developers + API + commerce connectors + design

Page 18: Avoiding (noticable) Failure, Building Resilience
Page 19: Avoiding (noticable) Failure, Building Resilience
Page 20: Avoiding (noticable) Failure, Building Resilience

From this:

Page 21: Avoiding (noticable) Failure, Building Resilience
Page 22: Avoiding (noticable) Failure, Building Resilience

To this

Page 23: Avoiding (noticable) Failure, Building Resilience

Here be dragons

Page 24: Avoiding (noticable) Failure, Building Resilience

Building resilience

1) In database

2) In caching

3) In autoscaling

Page 25: Avoiding (noticable) Failure, Building Resilience

Shared resources (EBS volumes, for example) can be a

drag to database operations

NoSQL/in-memory DBs hitting disks? *Really* bad idea.

Page 26: Avoiding (noticable) Failure, Building Resilience

Solution: vertical, then horizontal scale on MongoDB

Page 27: Avoiding (noticable) Failure, Building Resilience

But let’s be clear to achieve database performance

and reliability:

- autoscaling MongoDB not a huge consideration

for our data profile

- reserved instances help a little w/ cost

- many custom profiling tools & high awareness of

monitoring

- volumes written on choosing shard key

Page 28: Avoiding (noticable) Failure, Building Resilience

Building resilience

1) In database

2) In caching

3) In autoscaling

Page 29: Avoiding (noticable) Failure, Building Resilience

Problem: Shared resources (retailer images) can be a drag*

red = fail

green=user takes a nap

blue = watching an image load

*by no means their fault - we’re probably asking for very old images here

Page 30: Avoiding (noticable) Failure, Building Resilience

Solution: stateless cluster of image processors, cache to

CDN, backup to S3

Page 31: Avoiding (noticable) Failure, Building Resilience

Route53 domains, CDN’d images = concurrent, fast img download

red = fail

green=user takes a nap

blue = watching an image load

Page 32: Avoiding (noticable) Failure, Building Resilience

Building resilience

1) In database

2) In caching

3) In autoscaling

Page 33: Avoiding (noticable) Failure, Building Resilience

Pretty simple: to save £, know thy metrics

Page 34: Avoiding (noticable) Failure, Building Resilience

Pretty simple: to save £, react to thy metrics

** and sanity check them often with services like Pingdom

Page 35: Avoiding (noticable) Failure, Building Resilience

Offload non-differentiators to 3rd parties

Page 36: Avoiding (noticable) Failure, Building Resilience

In summaryWaterfall -> Agile pushing developers to be as

responsive as ideas

Agile developers need agile infrastructure

Ease of which you can deploy & destroy resource

facilitates rapid development

Designing infrastructure functionally great way to

support the core ethos of build fast, don’t be *too*

scared to break things

Page 37: Avoiding (noticable) Failure, Building Resilience

Evan Adelman, CTO

[email protected]

ww.shopcade.com/evan

@evanadelman

Thank You!