Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

21
CAN YOUR MOBILE INFRASTRUCTURE SURVIVE 1 MILLION CONCURRENT USERS? Melissa Benua Siva Katir PlayFab, Inc Mobile Dev + Test 2016

Transcript of Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

Page 1: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

CAN YOUR MOBILE INFRASTRUCTURE SURVIVE

1 MILLION CONCURRENT USERS?

Melissa BenuaSiva KatirPlayFab, IncMobile Dev + Test 2016

Page 2: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

Don’t be your own worst enemy!

The Simpsons: Tapped Out launched by EA in 2012Backend was so unprepared for massive loads of traffic it was pulled for FIVE months for total redesignWent on to become a huge and long-lasting hit in the market for many years afterwards

Can your company afford to add an extra 5 months to the development cycle? Including lost marketing and promotional spend? Including lost mindshare? Including bad press?

Page 3: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

Be your own guardian angel!

Loadout launched on Steam by Edge of Reality500x increase in players overnight on being featured in Steam storeEC2 auto-scaled in atomic and replaceable servers instantly to handle load No downtime, no panic, no fires

Page 4: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

DO YOU EVEN NEED A BACKEND?Maybe! Maybe not!

Page 5: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

What can my backend do for me?

Push updates without going through full certification process• New artwork? No problem!• Message of the day!• In-app purchase promotions!

Improve customer service• Have an authoritative source for

what a client ‘has’• Direct access to grant

entitlements to remediate issues

Page 6: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

What can my backend do for me?

Support a single user across multiple devices• Recover a user’s session even if

they lose or replace their device• Continue the same session across

multiple devices (phone to tablet)Perform ‘trusted’ transactions (especially around receipt verification)• Clients are untrustworthy!• Client-to-Provider transaction can

only say if a receipt is valid, NOT if a receipt is valid for your app

Page 7: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

Know Your Project

What is your budget?• What does it cost to host?• What does it cost to run?

Who are your engineers?• Do you have the in-house

expertise to manage all services?• DevOps? Backend? Whole-Stack?

Front-End?• Are they willing to be on-call

24x7?What do you need to put in the cloud? Why?

Page 8: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

Know Your Data

What data are you storing?• User data• Group data• Application data

How does each piece of data need to be queried?• Can all data be looked up by a key?• Need to do arbitrary field queries?

Is the data read and/or write heavy?How much data do you expect to store per user?

Page 9: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

BUILDING A BACKEND 101Not taught in schools!

Page 10: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

Pick a Cloud Provider

Is your language well supported in your provider?How much self management is required for each service?How well is scalability built in?Do you have region requirements?• European data protection laws• Russia and China have special

data laws

Page 11: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

Large Needs or Small Needs?

Database + basic CRUD APIs? • AWS Lambda!

Complex data + user management?• AWS Mobile or Azure Mobile

Services!Highly custom requirements?• Roll your own on a public cloud

(PROCEED WITH CAUTION!)

Page 12: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

Storing and Retrieving Data

Know your databases strength• MySQL – Very easy to get started with and

widely supported• MS-SQL – Powerful query engine and

incredibly performant• MongoDB – Can query against arbitrary

fields• DynamoDB – Very easy scaling and fast

random accessKnow their weaknesses too• MySQL – very hard to scale• MS-SQL – still pretty hard to scale• MongoDB – very hard to scale correctly and

maintain data integrity• DynamoDB – can only query against

predefined indexes cost effectively

Page 13: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

Storing and Retrieving Data

Novel solutions to database shortcomings• Use multiple databases to take advantage

of their individual strengths• Example: Store “index” data in SQL, while

using DynamoDB for actual data storage which clients use

• Allows you to store all data without needing to scale a difficult to scale database

Keys:• Have a way to reliably update the SQL

database out of the user’s flow• Don’t treat the SQL store as authoritative• Some tools can make this entirely

seamless, such as using DynamoDB write streams and Lambda to update SQL through

SQL:{

“playerId”: 00001“purchaseId”: 1002092,“purchaseValue”: 0.99,“purchaseDate”: 03/01/2016 09:01:05

}

DyanamoDB:{

“playerId”: 00001, “purchaseId”: 1002092, “purchasedItems”:[{“itemName”: ”in_app_1”, “purchasePrice”: 0.99 }]

}

SELECT purchaseId, purchaseValue FROM sqlPurchaseTable WHERE purchaseDate > 3/1/2016

Page 14: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

Plan For Failure

Design for the worst, hope for the best• Any machine can go down at any time• No machine should be ‘special’

If any machine can go down then any machine can also be brought upArchitect-in failure behavior both up and down the stack• DB times out?• Web server disk fails?• Third-party provider goes down? http://gunshowcomic.com/648

Page 15: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

COMMON PITFALLSIt’s a trap!

Page 16: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

Saving Data

Remote != LocalDo:• Save only changed data• Save data in batches• Prepare for connection failures• Prepare for client failures• Prepare for server failures

Don’t:• Save on a timer (unless it’s retrying)• Save duplicated data• Expect it to work• Make assumptions on if it worked

http://cloudtweaks.com/

Page 17: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

Loading Data

Easy Wins• Client:• Pre-load data during idle times• Cache locally• Assume data can fail to be loaded• Assume data can arrive corrupted or out of

order• Assume it will load slow• If security matters, connect via SSL• Don’t connect directly to the data store

• Server:• Cache data that is OK to serve stale• Design data schemas to make each request

perform as few queries as possible• Design authorization in such a way to prevent

any, or at least limit any extra queries

Easy Fails• Trying to implement a custom SSL

service• Trying to be clever with caching• Assuming anything will work on the first

try

Page 18: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

Scalability

Don’t optimize early• Actually know what your

bottlenecks are; most likely it is NOT string handling!

• Run a realistic load test with a profiler to get actual useful data

Don’t run blind• Know your KPIs before launch• Track your KPIs realtime via

counters with DataDog or Cloudwatch

• Set up alerting to your DRI

Page 19: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

Scalability

Know what infrastructure to scale and when• Data• API servers• Load balancers

Design to scale horizontally, not vertically• All services should be stateless unless they

don’t need to scale with number of users• Don’t assume a server will exist minute to

minuteKeep a safe capacity margin in your infrastructure• 50% is reasonable• Know how long it will take to increase

capacity

Page 20: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

Managing Connections

Use connection poolingDon’t try to outsmart your language’s connection managementMaking a connection has a cost!Don’t re-invent a protocol if an existing one will do• HTTP is way easier to debug than

websockets• Websockets stream data way more

efficiently than HTTP• Both are safer than using raw TCP

Page 21: Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?

QUESTIONS?

Melissa [email protected]@queenofcodehttp://www.slideshare.net/MelissaBenua

Siva [email protected]@sivakatir