Unity Internals: Memory and Performance

46
Unity Internals: Memory and Performance Moscow, 16/05/2014 Marco Trivellato – Field Engineer

description

In this presentation we will provide in-depth knowledge about the Unity runtime. The first part will focus on memory and how to deal with fragmentation and garbage collection. The second part on performance profiling and optimizations. Finally, there will be an overview of debugging and profiling improvements in the newly announced Unity 5.0.

Transcript of Unity Internals: Memory and Performance

Page 1: Unity Internals: Memory and Performance

Unity Internals: Memory and Performance Moscow, 16/05/2014 Marco Trivellato – Field Engineer

Page 2: Unity Internals: Memory and Performance

Page 6/9/14 2

This Talk Goals and Benefits

Page 3: Unity Internals: Memory and Performance

Page

Who Am I ? •  Now Field Engineer @ Unity

•  Previously, Software Engineer •  Mainly worked on game engines •  Shipped several video games:

•  Captain America: Super Soldier •  FIFA ‘07 – FIFA ’10 •  Fight Night: Round 3

6/9/14 3

Page 4: Unity Internals: Memory and Performance

Page

Topics •  Memory Overview •  Garbage Collection •  Mesh Internals •  Scripting •  Job System •  How to use the Profiler

6/9/14 4

Page 5: Unity Internals: Memory and Performance

Page 6/9/14 5

Memory Overview

Page 6: Unity Internals: Memory and Performance

Page

Memory Domains •  Native (internal)

•  Asset Data: Textures, AudioClips, Meshes •  Game Objects & Components: Transform, etc.. •  Engine Internals: Managers, Rendering, Physics, etc..

•  Managed - Mono •  Script objects (Managed dlls) •  Wrappers for Unity objects: Game objects, assets,

components •  Native Dlls

•  User’s dlls and external dlls (for example: DirectX)

6/9/14 6

Page 7: Unity Internals: Memory and Performance

Page

Native Memory: Internal Allocators •  Default •  GameObject •  Gfx •  Profiler 5.x: We are considering to expose an API for using a native allocator in Dlls

6/9/14 7

Page 8: Unity Internals: Memory and Performance

Page

Managed Memory •  Value types (bool, int, float, struct, ...)

•  Exist in stack memory. De-allocated when removed from the stack. No Garbage.

•  Reference types (classes) •  Exist on the heap and are handled by the mono/.net

GC. Removed when no longer being referenced. •  Wrappers for Unity Objects :

•  GameObject •  Assets : Texture2D, AudioClip, Mesh, … •  Components : MeshRenderer, Transform, MonoBehaviour

6/9/14 8

Page 9: Unity Internals: Memory and Performance

Page

Mono Memory Internals •  Allocates system heap blocks for internal allocator •  Will allocate new heap blocks when needed •  Heap blocks are kept in Mono for later use

•  Memory can be given back to the system after a while •  …but it depends on the platform è don’t count on it

•  Garbage collector cleans up •  Fragmentation can cause new heap blocks even

though memory is not exhausted

6/9/14 9

Page 10: Unity Internals: Memory and Performance

Page 6/9/14 10

Garbage Collection

Page 11: Unity Internals: Memory and Performance

Page

Unity Object wrapper

•  Some Objects used in scripts have large native backing memory in unity •  Memory not freed until Finalizers have run

6/9/14 11

WWW Decompression buffer

Compressed file

Decompressed file

Managed Native

Page 12: Unity Internals: Memory and Performance

Page

Mono Garbage Collection •  GC.Collect

•  Runs on the main thread when •  Mono exhausts the heap space •  Or user calls System.GC.Collect()

•  Finalizers •  Run on a separate thread

•  Controlled by mono •  Can have several seconds delay

•  Unity native memory •  Dispose() cleans up internal

memory •  Eventually called from finalizer •  Manually call Dispose() to cleanup

6/9/14 12

Main thread Finalizer thread

www = null; new(someclass); //no more heap -> GC.Collect();

www.Dispose();

.....

Page 13: Unity Internals: Memory and Performance

Page

Garbage Collection •  Roots are not collected in a GC.Collect

•  Thread stacks •  CPU Registers •  GC Handles (used by Unity to hold onto managed

objects) •  Static variables!!

•  Collection time scales with managed heap size •  The more you allocate, the slower it gets

6/9/14 13

Page 14: Unity Internals: Memory and Performance

Page

GC: does lata layout matter ? struct Stuff {

int a; float b; bool c; string leString;

} Stuff[] arrayOfStuff; << Everything is scanned. GC takes more time VS int[] As; float[] Bs; bool[] Cs; string[] leStrings; << Only this is scanned. GC takes less time.

6/9/14 14

Page 15: Unity Internals: Memory and Performance

Page

GC: Best Practices •  Reuse objects è Use object pools •  Prefer stack-based allocations è Use struct

instead of class •  System.GC.Collect can be used to trigger

collection •  Calling it 6 times returns the unused memory to

the OS •  Manually call Dispose to cleanup immediately

6/9/14 15

Page 16: Unity Internals: Memory and Performance

Page

Avoid temp allocations •  Don’t use FindObjects or LINQ •  Use StringBuilder for string concatenation •  Reuse large temporary work buffers •  ToString() •  .tag è use CompareTag() instead

6/9/14 16

Page 17: Unity Internals: Memory and Performance

Page

Unity API Temporary Allocations Some Examples: •  GetComponents<T> •  Vector3[] Mesh.vertices •  Camera[] Camera.allCameras •  foreach

•  does not allocate by definition •  However, there can be a small allocation, depending on the

implementation of .GetEnumerator()

5.x: We are working on new non-allocating versions

6/9/14 17

Page 18: Unity Internals: Memory and Performance

Page

Memory fragmentation •  Memory fragmentation is hard to account for

•  Fully unload dynamically allocated content •  Switch to a blank scene before proceeding to next level

•  This scene could have a hook where you may pause the game long enough to sample if there is anything significant in memory

•  Ensure you clear out variables so GC.Collect will remove as much as possible

•  Avoid allocations where possible •  Reuse objects where possible within a scene play •  Clear them out for map load to clean the memory

6/9/14 18

Page 19: Unity Internals: Memory and Performance

Page

Unloading Unused Assets •  Resources.UnloadUnusedAssets will trigger asset

garbage collection •  It looks for all unreferenced assets and unloads them •  It’s an async operation •  It’s called internally after loading a level

•  Resources.UnloadAsset is preferable •  you need to know exactly what you need to Unload •  Unity does not have to scan everything

•  Unity 5.0: Multi-threaded asset garbage collection

6/9/14 19

Page 20: Unity Internals: Memory and Performance

Page 6/9/14 20

Mesh Internals Memory vs. Cycles

Page 21: Unity Internals: Memory and Performance

Page

Mesh Read/Write Option •  It allows you to modify the mesh at run-time •  If enabled, a system-copy of the Mesh will remain in

memory •  It is enabled by default •  In some cases, disabling this option will not reduce the

memory usage •  Skinned meshes •  iOS

Unity 5.0: disable by default – under consideration

6/9/14 21

Page 22: Unity Internals: Memory and Performance

Page

Non-Uniform scaled Meshes We need to correctly transform vertex normals •  Unity 4.x:

•  transform the mesh on the CPU •  create an extra copy of the data

•  Unity 5.0 •  Scaled on GPU •  Extra memory no longer needed

6/9/14 22

Page 23: Unity Internals: Memory and Performance

Page

Static Batching What is it ? •  It’s an optimization that reduces number of draw

calls and state changes How do I enable it ? •  In the player settings + Tag the object as static

6/9/14 23

Page 24: Unity Internals: Memory and Performance

Page

Static Batching How does it work internally ? •  Build-time: Vertices are transformed to world-

space •  Run-time: Index buffer is created with indices of

visible objects

Unity 5.0: •  Re-implemented static batching without copying of

index buffers

6/9/14 24

Page 25: Unity Internals: Memory and Performance

Page

Dynamic Batching What is it ? •  Similar to Static Batching but it batches non-static

objects at run-time How do I enable it ? •  In the player settings •  no need to tag. it auto-magically works…

6/9/14 25

Page 26: Unity Internals: Memory and Performance

Page

Dynamic Batching How does it work internally ? •  objects are transformed to world space on the

CPU •  Temporary VB & IB are created •  Rendered in one draw call

Unity 5.x: we are considering to expose per-platform parameters

6/9/14 26

Page 27: Unity Internals: Memory and Performance

Page

Mesh Skinning Different Implementations depending on platform: •  x86: SSE •  iOS/Android/WP8: Neon optimizations •  D3D11/XBoxOne/GLES3.0: GPU •  XBox360, WiiU: GPU (memexport) •  PS3: SPU •  WiiU: GPU w/ stream out

Unity 5.0: Skinned meshes use less memory by sharing index buffers between instances

6/9/14 27

Page 28: Unity Internals: Memory and Performance

Page 6/9/14 28

Scripting

Page 29: Unity Internals: Memory and Performance

Page

Unity 5.0: Mono •  No upgrade •  Mainly bug fixes •  New tech in WebGL: IL2CPP

•  http://blogs.unity3d.com/2014/04/29/on-the-future-of-web-publishing-in-unity/

•  Stay tuned: there will be a blog post about it

6/9/14 29

Page 30: Unity Internals: Memory and Performance

Page

GetComponent<T> It asks the GameObject, for a component of the specified type: •  The GO contains a list of Components •  Each Component type is compared to T •  The first Component of type T (or that derives from

T), will be returned to the caller •  Not too much overhead but it still needs to call into

native code

6/9/14 30

Page 31: Unity Internals: Memory and Performance

Page

Unity 5.0: Property Accessors •  Most accessors will be removed in Unity 5.0 •  The objective is to reduce dependencies,

therefore improve modularization •  Transform will remain •  Existing scripts will be converted. Example:

in 5.0:

6/9/14 31

Page 32: Unity Internals: Memory and Performance

Page

Transform Component •  this.transform is the same as GetComponent<Transform>() •  transform.position/rotation needs to:

•  find Transform component •  Traverse hierarchy to calculate absolute position •  Apply translation/rotation

•  transform internally stores the position relative to the parent •  transform.localPosition = new Vector(…) è simple

assignment •  transform.position = new Vector(…) è costs the same if

no father, otherwise it will need to traverse the hierarchy up to transform the abs position into local

•  finally, other components (collider, rigid body, light, camera, etc..) will be notified via messages

6/9/14 32

Page 33: Unity Internals: Memory and Performance

Page

Instantiate API: •  Object Instantiate(Object, Vector3, Quaternion); •  Object Instantiate(Object);

Implementation: •  Clone GameObject Hierarchy and Components •  Copy Properties •  Awake •  Apply new Transform (if provided)

6/9/14 33

Page 34: Unity Internals: Memory and Performance

Page

Instantiate cont..ed •  Awake can be expensive •  AwakeFromLoad (main thread)

•  clear states •  internal state caching •  pre-compute

Unity 5.0: •  Allocations have been reduced •  Some inner loops for copying the data have been

optimized

6/9/14 34

Page 35: Unity Internals: Memory and Performance

Page

JIT Compilation What is it ? •  The process in which machine code is generated from CIL

code during the application's run-time

Pros: •  It generates optimized code for the current platform

Cons: •  Each time a method is called for the first time, the

application will suffer a certain performance penalty because of the compilation

6/9/14 35

Page 36: Unity Internals: Memory and Performance

Page

JIT compilation spikes What about pre-JITting ? •  RuntimeHelpers.PrepareMethod does not work:

…better to use MethodHandle.GetFunctionPointer()

6/9/14 36

Page 37: Unity Internals: Memory and Performance

Page 6/9/14 37

Job System

Page 38: Unity Internals: Memory and Performance

Page

Unity 5.0: Job System (internal) The goals of the job system:

•  make it easy to write very efficient job based multithreaded code

•  The jobs should be able to run safely in parallel to script code

6/9/14 38

Page 39: Unity Internals: Memory and Performance

Page

Job System: Why ? Modern architectures are multi-core: •  XBox 360: 3 cores •  PS4/Xbox One: 8 cores

…which includes mobile devices: •  iPhone 4S: 2 cores •  Galaxy S3: 4 cores

6/9/14 39

Page 40: Unity Internals: Memory and Performance

Page

Job System: What is it ? •  It’s a Framework that we are going to use in

existing and new sub-systems •  We want to have Animation, NavMesh, Occlusion,

Rendering, etc… run as much as possible in parallel

•  This will ultimately lead to better performance

6/9/14 40

Page 41: Unity Internals: Memory and Performance

Page

Unity 5.0: Profiler Timeline View It’s a tool that allows you to analyse internal (native) threads execution of a specific frame

6/9/14 41

Page 42: Unity Internals: Memory and Performance

Page

Unity 5.0: Frame Debugger

6/9/14 42

Page 43: Unity Internals: Memory and Performance

Page 6/9/14 43

Conclusions

Page 44: Unity Internals: Memory and Performance

Page

Budgeting Memory How much memory is available ? •  It depends… •  For example, on 512mb devices running iOS 6.0:

~250mb. A bit less with iOS 7.0

What’s the baseline ? •  Create an empty scene and measure memory •  Don’t forget that the profiler requires some

memory •  For example: on Android 15.5mb (+ 12mb profiler)

6/9/14 44

Page 45: Unity Internals: Memory and Performance

Page

Profiling •  Don’t make assumptions •  Profile on target device •  Editor != Player •  Platform X != Platform Y •  Managed Memory is not returned to Native Land!

For best results…: •  Profile early and regularly

6/9/14 45

Page 46: Unity Internals: Memory and Performance

Page 6/9/14 46

Questions ? [email protected] - Twitter: @m_trive