When Tools Attack: IT Infrastructure at Playground Games

Post on 05-Dec-2014

93 views 2 download

description

An overview of how Playground Games used Perforce to help solve issues with tools

Transcript of When Tools Attack: IT Infrastructure at Playground Games

1

Chris Makin IT Infrastructure at Playground Games

2

Chris Makin is a battle-scarred veteran of Perforce server planning, implementation and administration. With 11 years of IT service to the video games industry he proudly serves at Playground Games as IT Infrastructure Administrator. If not elbow deep in servers he can be found trying to sample every craft beer under the sun.

3

•  Sharing in real world examples of custom tools with hearts of gold and hands of destruction!

•  The Impacts & issues caused •  Fixes, resolutions & preventative actions •  Tips for server setup & tools testing

4

•  Founded in 2009 with 16 staff •  20+ years experience in game development •  Turn10 & Microsoft Studios October 2012 September 30th 2014

5

•  Over 150 developers at peak •  Outsource teams across the globe •  17 Perforce Server Instances

–  3 core & 4 supporting P4Ds –  Further 10 replicas for workload balancing, DR & HA –  Linux & Windows, VMWare & Bare metal –  Nimble, EqualLogic & DAS

6

•  Single P4D server –  4TB total size

•  100 developers •  12 build servers •  13 solid hours lock time

per day for automated systems

•  3 Core P4D servers –  10TB total size –  1TB at peak change

•  24 build servers •  Builds over 150GB •  485,000 ops completed •  340,000 automated •  Lock contention – gone

7

•  Deadliest Catch –  Trawling the depths.

•  Tor‘s Hammer –  An internal cyber attack!

•  Skynet –  Ignore it for too long and it’s taken over the world.

8

Trawling the depths

9

•  Tool built to create a depot heat map •  “p4 files @=clnumber” •  Started at CL 1

–  Worked its way upwards. –  9 streams

•  High latency connection

10

•  Unresponsive server •  Commands queue •  Human element

–  F5, F5, F5, F5… •  Database lock contention

11

•  Lower tool polling frequency •  Lower thread count •  Forward commands to replica, on or offsite •  Trigger or P4broker limiting number of concurrent

operations per user •  Upgrade to 2013.3 or higher for lockless reads

12

Internal cyber attack!

13

•  Level editor with Perforce integration –  Have my files been updated? –  Queries workspace #have against #head

•  Query was being carried out on each file individually –  25,000 files –  150 developers –  3,750,000 queries –  Every second

14

•  You are now under DDoS attack! •  Perforce server stops responding to all requests

–  P4Auth stops responding •  TCP flood •  OS level/network stack issue

–  TCP/IP port exhaustion

15

•  Hang, draw and quarter tools programmer •  Local firewall

–  Flood protection rules •  Tune network stack

–  Max available ports –  Min keep alive time

16

Ignore it for too long and it’s taken over the world

17

•  World editor •  Designed so every file is self descriptive with a

UID –  \\game\level1\walls\wall1_walls_level1_123456789.png

•  In the backend this turns into –  D:\p4depot\game\level1\walls

\wall1_walls_level1_123456789.png,d\1.12345.gz

18

•  In practice –  \\game1\mainline\data\level_data\level1\objects\textures

\walls\wall1_walls_textures_objects_level1_level_data_data_mainline_1234567890.png

–  \\game1\mainline\data\level_data\level1\objects\textures\walls\wall1_walls_textures_objects_level1_level_data_data_mainline_mainline_data_level_data_level1_objects_textures_walls_wall1_1234567890.png

•  199 chars compared to original 54

19

•  Project was heavily branched & continuously integrated with > 100,000 files

•  Integrations took longer •  Exponential metadata growth – GB per day •  Higher RAM, swap & CPU utilization •  Windows OS & proxy path length issues

20

•  Send a cyborg back in time to stop the tools change submit

•  Trigger/P4Broker rule inspecting file & path length for irregularities/max allowed length

21

22

•  Perforce Support •  P4D > 2013.3

–  Lockless reads! •  Replica & Edge servers

–  Offload locks, CPU & I/O intensive tools and workloads •  P4Broker & Triggers

–  Don’t like a command? Block or re-direct it! •  “Side-track” server instance

23

24

•  Metadata replica –  Offline checkpoints, additional replicas, no live

interruption •  Enable process monitoring •  Monitor server •  Pay attention to your type map

25

•  Every tool has an impact – TEST! •  Test against real data

–  Metadata & full replicas •  Set a high level of logging

–  Utilize Perforce Server Log Analyzer •  Monitor system utilization

–  CPU, RAM, disk I/O…

26

Chris Makin chris.makin@playground-games.com

27

•  http://answers.perforce.com/articles/KB_Article/Setting-Up-a-Side-track-Server

•  http://answers.perforce.com/articles/KB_Article/How-to-Monitor-a-Swamped-Perforce-Server

•  http://answers.perforce.com/articles/KB_Article/Installing-P4Broker-on-Windows-and-Unix-systems

•  http://answers.perforce.com/articles/KB_Article/Using-P4Broker-With-Replica-Servers

•  https://kb.perforce.com/psla/ •  http://www.perforce.com/blog

28