Download - AWS Game Analytics - GDC 2014

Transcript
Page 1: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Game Analytics with AWS Or, How to learn what your players love so they will love your game Nate Wiger @nateware | Principal Gaming Solutions Architect

Page 2: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Mobile Game Landscape

•  Free To Play •  In-App Purchases •  Long-Tail •  Cross-Platform •  Go Global •  User Retention = Revenue

Page 3: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Projected Mobile App Revenue

0 10000 20000 30000 40000 50000 60000 70000 80000 90000

2011 2012 2013 2014 2015 2016 2017

Ads IAP Paid

Source: Gartner

Page 4: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Winning at Free to Play

•  Phase 1: Collect Data •  Phase 2: Analyze •  Phase 3: Profit

Page 5: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Analyze What?

Emotions •  Enjoying game •  Engaged •  Like/dislike new content •  Stuck on a level •  Bored •  Abandonment

Behaviors •  Hours played day/week •  Number of sessions/day •  Level progression •  Friend invites/referrals •  Response to mobile push •  Money spent/week

Page 6: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Example: Level Progression (One Metric)

0

2

4

6

8

10

L1 L2 L3 L4 L5 L6 L7 L8 L9 L10

Tries / Level

# of Tries

Page 7: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Example: Level Progression (Two Metrics)

0 10 20 30 40 50 60

0

2

4

6

8

10

L1 L2 L3 L4 L5 L6 L7 L8 L9 L10

Tries / Level

% Highest Level # of Tries

Page 8: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Key Takeaways

•  Multiple data sources •  Correlate variables •  Deltas vs absolutes •  Settle on terminology (game vs level) •  Time matters

Page 9: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Page 10: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Events & Metrics

•  Event = Moment in Time –  Login/quit –  Game start/end –  Level up –  In-app purchase

•  Metrics = What to Measure –  KISS –  Numbers –  Booleans –  Strings (Enums)

•  Always Include (ALWAYS) –  User –  Action –  Session (context-dependent) –  Timestamp in ISO8601

2014-­‐03-­‐16T16:28:26

Page 11: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Off The Shelf Analytics

•  Easy To Integrate •  Pre-Baked Reports •  Rate Limits •  Retention Windows •  Data Lock-In

Page 12: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Ok, A Real Business Plan

Ingest Store Process Analyze

Page 13: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Ok, A Real Business Plan

Ingest • HTTP PUT • Kafka • Kinesis • Scribe

Store • S3 • DynamoDB • HDFS • Redshift

Process • EMR (Hadoop) • Spark • Storm

Analyze • Tableau • Pentaho •  Jaspersoft

Page 14: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

•  Write Events File on Device •  Periodically Upload to S3 •  Process into Redshift •  Point GUI Tool to Redshift

Start Simple

2014-­‐01-­‐24,nateware,e4df,login  2014-­‐01-­‐24,nateware,e4df,gamestart  2014-­‐01-­‐24,nateware,e4df,gameend  2014-­‐01-­‐25,nateware,a88c,login  2014-­‐01-­‐25,nateware,a88c,friendlist  2014-­‐01-­‐25,nateware,a88c,gamestart  

Profit!

Page 15: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Redshift at a Glance

10 GigE (HPC)

Ingestion Backup Restore

SQL Clients/BI Tools

128GB RAM

16TB disk

16 cores

Amazon S3/DynamoDB

JDBC/ODBC

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

Leader Node

•  Leader Node –  SQL endpoint –  Stores metadata –  Coordinates query execution

•  Compute Nodes –  Columnar table storage –  Load, backup, restore via Amazon S3 –  Parallel load from Amazon DynamoDB

•  Single node version available

Page 16: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Tableau + Redshift

Page 17: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Plumbing

①  Create S3 bucket ("mygame-analytics-events") ②  Request a security token for your mobile app:

http://docs.aws.amazon.com/STS/latest/UsingSTS/Welcome.html

③  Upload data from your users' devices ④  Run a scheduled copy to Redshift ⑤  Setup Tableau to access Redshift ⑥  Go to the Beach

Page 18: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Loading Redshift from S3

copy  events  from  's3://mygame-­‐analytics-­‐events'  credentials  'aws_access_key_id=<access-­‐key-­‐id>;  aws_secret_access_key=<secret-­‐access-­‐key>'  delimiter=',';  

Scheduled Redshift Load using Data Pipeline: http://aws.amazon.com/articles/1143507459230804

Page 19: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

•  Also Collect Server Logs •  Periodically Upload to S3 •  Stuff into Redshift •  External Analytics Data Too

More Data Sources

EC2

External Analytics

Page 20: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Logrotate to S3

/var/log/apache2/*.log  {      sharedscripts      postrotate          sudo  /usr/sbin/apache2ctl  graceful          s3cmd  sync  /var/log/*.gz  s3://mygame-­‐logs/      endscript  }  

Blog Entry on Log Rotation: http://www.dowdandassociates.com/blog/content/howto-rotate-logs-to-s3/ And/or, Use ELB Access Logs: http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/access-log-collection.html

Page 21: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

•  Different File Formats •  Device vs Apache vs CDN •  Cleanup with EMR Job •  Output to Clean Bucket •  Load into Redshift

Dealing With Messy Data

EC2

Page 22: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Redshift vs Elastic MapReduce

Redshift •  Columnar DB •  Familiar SQL •  Structured Data •  Batch Load •  Faster to Query •  Long-term Storage

Elastic MapReduce •  Hadoop •  Hive/Pig are SQL-like •  Unstructured Data •  Streaming Loop •  Scales > PB's •  Transient

Page 23: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

•  Integrate Game DB •  Load Directly into Redshift •  Redshift does Intelligent Merge •  Tracks Hash Keys, Columns

Direct From DynamoDB

EC2

Page 24: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

•  Integrate Game DB •  Load Directly into Redshift •  Redshift does Intelligent Merge •  Tracks Hash Keys, Columns •  Or Stream into EMR

Direct From DynamoDB

EC2

Page 25: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Loading Redshift from DynamoDB

copy  games  from  'dynamodb://games'  credentials  'aws_access_key_id=<access-­‐key-­‐id>;  aws_secret_access_key=<secret-­‐access-­‐key>';  

copy  events  from  's3://mygame-­‐analytics-­‐events'  credentials  'aws_access_key_id=<access-­‐key-­‐id>;  aws_secret_access_key=<secret-­‐access-­‐key>'  delimiter=',';  

Page 26: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Page 27: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Funnel Cake

Page 28: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Back To Basics

2014-­‐01-­‐24,nateware,e4df,login  2014-­‐01-­‐24,nateware,e4df,gamestart  2014-­‐01-­‐24,nateware,e4df,gameend  2014-­‐01-­‐25,nateware,a88c,login  2014-­‐01-­‐25,nateware,a88c,friendlist  2014-­‐01-­‐25,nateware,a88c,gamestart  

Page 29: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Measure Retention: Repeated Plays

create  view  events_by_user_by_month  as  select  user_id,  date_trunc('month',  event_date)  as  month_active,  count(*)  as  total_events  from  events  group  by  user_id,  month_active;    

Page 30: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

First-Pass Retention – Too Noisy

0 5

10 15 20 25 30 35 40

# Play Sessions / Month

nateware Lazyd0g AK187 3strikes

Page 31: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Cohorts & Cambria

•  Enables calculating relative metrics •  Group users by a common attribute

–  Month game installed –  Demographics

•  Run analysis by cohort –  Join with metrics

•  Use Redshift as it's SQL –  Example of where SQL is a good fit

Page 32: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Creating Cohorts with Redshift

create  view  cohort_by_first_event_date  as  select  user_id,  date_trunc('month',  min(event_date))  as  first_month  from  events  group  by  user_id;    

http://snowplowanalytics.com/analytics/customer-analytics/cohort-analysis.html

Page 33: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Retention by Cohort – Join Events with Cohort

0

5

10

15

20

25

Week 1 Week 2 Week 3 Week 5 Week 6 Week 7

# Sessions / Week

2013-11 2013-12 2014-01 2014-02 2014-03 2014-04

Page 34: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Moar Cohorts

•  Define multiple cohorts –  By activity, time, demographics –  As many as you like

•  Change cohort depending on analysis •  Join same metrics with different cohorts

–  Retention by date –  Retention by demographic –  Retention by average plays/month quartile

Page 35: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Example Event Stream

2014-­‐03-­‐17T09:52:08-­‐07:00,nateware,e4b5,login  2014-­‐03-­‐17T09:52:54-­‐07:00,nateware,e4b5,gamestart  2014-­‐03-­‐17T09:53:15-­‐07:00,nateware,e4b5,levelup  2014-­‐03-­‐17T09:54:06-­‐07:00,nateware,e4b5,gameend  2014-­‐03-­‐17T09:54:23-­‐07:00,nateware,30a4,gamestart  2014-­‐03-­‐17T09:55:14-­‐07:00,nateware,30a4,gameend  2014-­‐03-­‐17T09:55:41-­‐07:00,nateware,30a4,gamestart  2014-­‐03-­‐17T09:57:12-­‐07:00,nateware,6ebd,levelup  2014-­‐03-­‐17T09:58:50-­‐07:00,nateware,6ebd,levelup  2014-­‐03-­‐17T09:59:52-­‐07:00,nateware,6ebd,gameend    

Page 36: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Example Event Stream

2014-­‐03-­‐17T09:52:08-­‐07:00,nateware,e4b5,login  2014-­‐03-­‐17T09:52:54-­‐07:00,nateware,e4b5,gamestart  2014-­‐03-­‐17T09:53:15-­‐07:00,nateware,e4b5,levelup  2014-­‐03-­‐17T09:54:06-­‐07:00,nateware,e4b5,gameend  2014-­‐03-­‐17T09:54:23-­‐07:00,nateware,30a4,gamestart  2014-­‐03-­‐17T09:55:14-­‐07:00,nateware,30a4,gameend  2014-­‐03-­‐17T09:55:41-­‐07:00,nateware,30a4,gamestart  2014-­‐03-­‐17T09:57:12-­‐07:00,nateware,6ebd,levelup  2014-­‐03-­‐17T09:58:50-­‐07:00,nateware,6ebd,levelup  2014-­‐03-­‐17T09:59:52-­‐07:00,nateware,6ebd,gameend    

Page 37: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Cohorts by Type of Activity

create  view  cohort_by_first_play_date  as  select  user_id,  date_trunc('month',  min(event_date))  as  first_month  from  events  where  action  =  'gamestart'  group  by  user_id;    

Page 38: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Page 39: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Post-Match Heatmaps

Page 40: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Real-Time Analytics

Batch •  What game modes do

people like best? •  How many people have

downloaded DLC pack 2? •  Where do most people

die on map 4? •  How many daily players

are there on average?

Real-Time •  What game modes are

people playing now? •  Are more or less people

downloading DLC today? •  Are people dying in the

same places? Different? •  How many people are

playing today? Variance?

Page 41: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Why Real-Time Analytics?

30x in 24 hours What if you ran a promo?

Page 42: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Real-Time Tools

Spark •  High-Performance

Hadoop Alternative •  Berkeley.edu •  Compatible with HiveQL •  100x faster than Hadoop •  Runs on EMR

Kinesis •  Amazon fully-managed

streaming data layer •  Similar to Kafka •  Streams contain Shards •  Each Shard ingests data

up to 1MB/sec, 1000 TPS •  Data stored for 24 hours

Page 43: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

•  Always Batch Due to S3

Back To Basics [Dubstep Remix]

EC2

Page 44: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

•  Stream Data With Kinesis •  Multiple Writers and Readers •  Still Output to Redshift

Need Data Faster!

EC2

Page 45: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

•  Stream Data With Kinesis •  Multiple Writers and Readers •  Still Output to Redshift •  Stream to Spark on EMR •  Storm via Kinesis Spout •  Custom EC2 Workers

Lots of Ins and Outs

EC2

EC2

Page 46: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

 Data  Sources  

App.4    

[Machine  Learning]  

                             

     AW

S  En

dpoint  

App.1    

[Aggregate  &  De-­‐Duplicate]  

 Data  Sources  

Data  Sources  

 Data  Sources  

App.2    

[Metric  Extrac=on]  

S3

DynamoDB

Redshift

App.3  [Sliding  Window  Analysis]  

 Data  Sources  

Availability Zone

Shard 1 Shard 2 Shard N

Availability Zone

Availability Zone

Introducing Amazon Kinesis Service for Real-Time Big Data Ingestion

Page 47: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Putting Data into Kinesis

•  Producers use PUT to send data to a Stream

•  PutRecord {Data, PartitionKey, StreamName}

•  Partition Key distributes PUTs across Shards

•  Unique Sequence # returned on PUT call

•  Documentation:

http://docs.aws.amazon.com/kinesis/latest/dev/

introduction.html

Producer

Shard 1

Shard 2

Shard 3

Shard n

Shard 4

Producer

Producer

Producer

Producer

Producer

Producer

Producer

Producer

Kinesis

Page 48: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Writing to a Kinesis Stream POST  /  HTTP/1.1  Host:  kinesis.<region>.<domain>  x-­‐amz-­‐Date:  <Date>  Authorization:  AWS4-­‐HMAC-­‐SHA256  Credential=<Credential>,  SignedHeaders=content-­‐type;date;host;user-­‐agent;x-­‐amz-­‐date;x-­‐amz-­‐target;x-­‐amzn-­‐requestid,  Signature=<Signature>  User-­‐Agent:  <UserAgentString>  Content-­‐Type:  application/x-­‐amz-­‐json-­‐1.1  Content-­‐Length:  <PayloadSizeBytes>  Connection:  Keep-­‐Alive  X-­‐Amz-­‐Target:  Kinesis_20131202.PutRecord    {      "StreamName":  "exampleStreamName",      "Data":  "XzxkYXRhPl8x",      "PartitionKey":  "partitionKey"  }  

Page 49: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Kinesis + Spark

http://aws.amazon.com/articles/4926593393724923

Page 50: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Death in Real-Time

PUT  "kills"  {"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"274,591,48"}  PUT  "kills"  {"game_id":"e4b5","map":"Boston","killer":13,"victim":27,"coord":"101,206,35"}  PUT  "kills"  {"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"165,609,17"}  PUT  "kills"  {"game_id":"e4b5","map":"Boston","killer":6,"victim":29,"coord":"120,422,26"}  PUT  "kills"  {"game_id":"30a4","map":"Los  Angeles","killer":34,"victim":18,"coord":"163,677,18"}  PUT  "kills"  {"game_id":"30a4","map":"Los  Angeles","killer":20,"victim":37,"coord":"71,473,20"}  PUT  "kills"  {"game_id":"30a4","map":"Los  Angeles","killer":21,"victim":19,"coord":"332,381,17"}  PUT  "kills"  {"game_id":"30a4","map":"Los  Angeles","killer":0,"victim":10,"coord":"14,108,25"}  PUT  "kills"  {"game_id":"6ebd","map":"Seattle","killer":32,"victim":18,"coord":"13,685,32"}  PUT  "kills"  {"game_id":"6ebd","map":"Seattle","killer":7,"victim":14,"coord":"16,233,16"}  PUT  "kills"  {"game_id":"6ebd","map":"Seattle","killer":27,"victim":19,"coord":"16,498,29"}  PUT  "kills"  {"game_id":"6ebd","map":"Seattle","killer":1,"victim":38,"coord":"138,732,21"}  

Page 51: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Real-Time Heatmaps

Page 52: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

But A Bow On It

•  Collect data from the start •  Store it even if you can't process it (yet) •  Start simple – S3 + Redshift •  Add data sources – process with EMR •  Real-time – Kinesis + Spark •  Tons of untapped potential for gaming

Page 53: AWS Game Analytics - GDC 2014

AWS Gaming Solutions | GDC 2014

Fallback Plan

Cheers – Nate Wiger @nateware