Outgrowing an internet startup: database administration in a fast growing company.

Spil Games: outgrowing an internet startup Art van Scheppingen Head of Database Engineering

2

1.  Who is Spil Games? 2.  How to professionalize? 3.  Spil Storage Pla<orm 4.  Ques@ons?

Overview

Who are we? Who is Spil Games?

4

•  Company founded in 2001 •  350+ employees world wide •  170M unique visitors per month •  100K unique visitors per month on spilgames.com

Facts

5

Geographic Reach 170 Million Monthly Ac@ve Users(*)

Source: (*) Google Analy3cs, December 2011

•  Over 40 localized portals in 19 languages •  Focus on casual and social games •  170M MAU per month (30M YoY growth) •  Over 40M registered users

6

Girls, Teens and Family Brands

7

Games

8

Games

9

Games

10

Games

11

Games

12

Games

•  Inhouse game studios •  Partnerships with Social Gaming studios •  Over 1500 licensed games

13

DB Servers MAU Employees

2006 2007 2008 2009 2010 2011

DB Servers

MAU

Employees

Spil Games is growing fast!

Database Engineering How to professionalize your department

15

•  Databases maintained by Systems Engineering •  No focus on performance, structure or backups •  Looking only one or two weeks into the future

Startup

16

•  Mul@ple migra@ons to new hardware •  Ping-‐pong on Master-‐Master setups •  Lack of insight into performance issues

Lessons

Read+write

17

•  Plan ahead up to three months •  Improve database pla<orm •  Reduce number of repe@@ve tasks

•  Write them down step by step (wiki) •  Automate where possible

•  Improve monitoring •  A single monitoring system is not enough!

•  Forecast growth •  Week / Month / Year •  Look back and evaluate!

•  Extend department

Professionalize

18

•  Scaling the LDAP pla<orm •  LDAP replaced by MySQL based solu@on (with help from Percona)

LDAP isn’t suitable for the web

MMMDMM

19

•  LVM snapshot method •  took 4 hours on average with manual interven@on

•  Innobackupex + netcat + tar + script = quick cloning •  Takes about 1 hour per 100GB •  Foolproof •  Can be run on ac@ve masters (if necessary)

Cloning

20

•  Different monitoring systems give different insights •  Different angles/metrics/purposes •  Early problem detec@on •  Signal abnormal use which could cause outage

Improve monitoring

21

•  Uneven growth: •  Ac@ve master handling all write requests •  Ver@cal scaling

•  Write only •  More writes than reads

•  SOA problems •  Connec@on spawning •  Open file descriptors

Growing pains

22

•  An@cipate more than one year in advance •  Acknowledge shortcomings/problems, look for solu@ons or alterna@ves •  Don't commit to one single solu@on! •  Be flexible!

•  Plan for capacity per instance, not for growth alone! •  Start thinking globally!

Outgrowing our startup phase

Spil Storage Platform Sharding is inevitable

24

What is this exciting project about?

25

•  Natural growth •  Grown out of necessity for more func@onality •  Adding func@onality means more interac@on •  Separa@on of database func@on •  Profiles •  Highscores •  Comments •  User Generated Content •  etc

Functional sharding

26

•  KISS •  Problem isola@on

Advantages

Disadvantages •  Uneven growth •  Difference in query panerns •  No data consistency •  No clear ownership of data •  Capacity planning on total number of reads/writes •  Horizontal scaling is difficult

27

Spil Storage Platform

28

•  What is the bucket model? •  It is an abstrac@on layer between the database and the datamodel

•  Each record has one unique owner anribute (GID) •  The GID (Global IDen@fier) iden@fies different data types

•  Different buckets per func@on •  Anributes contain record data •  Anributes do not have to correspond to schema

Bucket model

29

•  Flexibility •  Database backend independent •  Seamless schema changes and upgrades •  Sharded on both func@onal and GID level

•  Even distribu@on of queries possible •  Capacity planning on number and type of en@@es

•  Asynchronous writes possible •  Transparent data migra@on

Advantages

Disadvantages •  Harder to find data •  At least two lookups needed! •  Datawarehousing needs a different approach

30

•  Globally sharded on GID •  (local) GID Lookup

How do GIDs work?

GID lookup

Shard 1 Shard 2

Persistent storage

31

Pipeline flow

Current functional shards

SPAPI

LEGACY adapter

New Application SSP

Legacy API

New GID based shards

Read only

Read + write

32

Bucket mapping and migration

33

•  Each cluster of two masters will contain two shards •  Data is wrinen interleaved •  HA for both shards •  No warmup needed

•  Both masters ac@ve and “warmed up” •  Slave added for backups and Datawarehouse

Master-Master Sharding

SSP

Shard 1

Shard 2

34

•  Erlang cluster with many workers •  Every GID has its own worker process •  (Inter)cluster communica@on •  (Near) linear scalability

How are we implementing this SSP?

35

•  Erlang node caching •  Mul@ple backend connectors •  MySQL library •  Handlersockets •  Any other connectors if needed

•  Connec@on pooling

Advantages

Disadvantages •  NOT SEXY? ( hnp://spil.com/notsexy )

36

Do YOU want to be sexy?

Questions?

38

Thank you!

Outgrowing an internet startup: database administration in a fast growing company.

Technology

Transcript of Outgrowing an internet startup: database administration in a fast growing company.