Designing a Highly Available Environment Using Methods of Modern IT Infrastructure

24
# Jouko Markkanen IT Manager Designing a Highly Available Environment Using Methods of Modern IT Infrastructure

description

Hear about the multi-server Perforce architecture used at Remedy Entertainment, a developer of state-of-the-art action games, game franchises and cutting edge technology. Get tips on how various virtualization, storage technologies and the new distributed Perforce server features can be used to gain high availability and quick recoverability in different disaster scenarios. Handling large game content files and the dependencies between the game code and content assets will also be covered.

Transcript of Designing a Highly Available Environment Using Methods of Modern IT Infrastructure

  • 1. #Jouko MarkkanenIT Manager

2. # 3. # Privately held game developer based in Finland. Released games Death Rally, Max Payne, MaxPayne 2: The Fall of Max Payne, Alan Wake, AlanWakes American Nightmare, Death Rally Mobile. Franchises made into a movie, TV-series & novel. Announced titles Agents of Storm for iOS andXbox One exclusive title Quantum Break. 4. # Founded in 1995, currently 120+ employees. Over 100 Game of the Year awards. Franchises generated over $500M revenue. Max Payne IP sold for $43M. AAA games sold over 11M units. First mobile experiment over 16M downloadsand reached #1 in 70 countries. 5. # 6. # Large content files 7. ## of files Total size # of files,> 100 MBCreated by Remedy since 2004All projects, all revisions 10.5 million 12 terabytesAll projects, #head revisions 5 million 5.5 terabytesAlan Wake (XBOX 360), #head 1.1 million 920 gigabytes 1,300Quantum Break (XBOX One, until today),#head3 million 4.3 terabytes 7,000Perforce Database 30 gigabytes 8. # Large content files Dependencies of game engine internaltools game content (in proprietary formats) 9. #Tools sourcecodeToolbinaries3 Content source rd partytoolsGame sourcecodeExport utilsource codeExport utilRuntimegamebinaryRuntimecontent 10. # Large content files Dependencies of game engine internaltools game content (in proprietary formats) Everything that comes out, comes fromPerforce depot Availability of the system is business critical 11. # 12. # System design approach Service implementation Principles of HA engineering1. Elimination of single points of failure2. Reliable crossover3. Detection of failures as they occur. Source:http://en.wikipedia.org/wiki/High_availability 13. # Client and access network donthave HA Opting for fast manual response LAN core w/ act/act redundancy Servers with failover SAN w/ active/active redundancy Storage w/ redundant components 14. # HA design principles do not cover the concept ofbackups Even when HA is taken care of, data and availabilitycan be lost by user actions and software failures The data still needs to be copied to offline storage fordisaster recovery purposes 15. # Client and access network donthave HA Opting for fast manual response LAN core w/ act/act redundancy Servers with failover SAN w/ active/active redundancy Storage w/ redundant components 16. # 17. # Used for offloading backups andintegrity verification Covers application level failures Activation requires manualinterventionperforce2:1666 perforce3:1666perforce1:1666 perforce1:1667 18. # Snapshot of Perforce every 4 hours Runs storage provided snapshot with p4d c Ensures database integrity Locks database for 30-50 seconds Near-instant recovery Can be mounted and exported to other hosts To run checkpoint, verify, To run test environment with production data 19. # 20. # A user may never see a failure. But the maintenanceactivity must. Infrastructure monitored with vendor tools Central monitoring with Nagios P4D process, TCP connectivity to perforce:1666 Check p4 info output Replication: check changelist counter on both partners P4review.py 21. # Define what HA means for your service Build it one step at a time Ensure redundancy of each component Make sure the component is monitored Backups are still needed 22. #Jouko [email protected] 23. # Introduction to Remedy Perforce at Remedy High Availability Perforce Application Availability Monitoring Conclusions 24. #Jouko Markkanen is an IT Manager at Remedy Entertainmentwith broad experience in different areas of information andcommunications technology including help desk responsibilities,programming, application design, security systems, informationmanagement, and infrastructure planning and design.