Unity Internals: Memory and Performance
-
Upload
devgamm-conference -
Category
Presentations & Public Speaking
-
view
5.919 -
download
3
description
Transcript of Unity Internals: Memory and Performance
Unity Internals: Memory and Performance Moscow, 16/05/2014 Marco Trivellato – Field Engineer
Page 6/9/14 2
This Talk Goals and Benefits
Page
Who Am I ? • Now Field Engineer @ Unity
• Previously, Software Engineer • Mainly worked on game engines • Shipped several video games:
• Captain America: Super Soldier • FIFA ‘07 – FIFA ’10 • Fight Night: Round 3
6/9/14 3
Page
Topics • Memory Overview • Garbage Collection • Mesh Internals • Scripting • Job System • How to use the Profiler
6/9/14 4
Page 6/9/14 5
Memory Overview
Page
Memory Domains • Native (internal)
• Asset Data: Textures, AudioClips, Meshes • Game Objects & Components: Transform, etc.. • Engine Internals: Managers, Rendering, Physics, etc..
• Managed - Mono • Script objects (Managed dlls) • Wrappers for Unity objects: Game objects, assets,
components • Native Dlls
• User’s dlls and external dlls (for example: DirectX)
6/9/14 6
Page
Native Memory: Internal Allocators • Default • GameObject • Gfx • Profiler 5.x: We are considering to expose an API for using a native allocator in Dlls
6/9/14 7
Page
Managed Memory • Value types (bool, int, float, struct, ...)
• Exist in stack memory. De-allocated when removed from the stack. No Garbage.
• Reference types (classes) • Exist on the heap and are handled by the mono/.net
GC. Removed when no longer being referenced. • Wrappers for Unity Objects :
• GameObject • Assets : Texture2D, AudioClip, Mesh, … • Components : MeshRenderer, Transform, MonoBehaviour
6/9/14 8
Page
Mono Memory Internals • Allocates system heap blocks for internal allocator • Will allocate new heap blocks when needed • Heap blocks are kept in Mono for later use
• Memory can be given back to the system after a while • …but it depends on the platform è don’t count on it
• Garbage collector cleans up • Fragmentation can cause new heap blocks even
though memory is not exhausted
6/9/14 9
Page 6/9/14 10
Garbage Collection
Page
Unity Object wrapper
• Some Objects used in scripts have large native backing memory in unity • Memory not freed until Finalizers have run
6/9/14 11
WWW Decompression buffer
Compressed file
Decompressed file
Managed Native
Page
Mono Garbage Collection • GC.Collect
• Runs on the main thread when • Mono exhausts the heap space • Or user calls System.GC.Collect()
• Finalizers • Run on a separate thread
• Controlled by mono • Can have several seconds delay
• Unity native memory • Dispose() cleans up internal
memory • Eventually called from finalizer • Manually call Dispose() to cleanup
6/9/14 12
Main thread Finalizer thread
www = null; new(someclass); //no more heap -> GC.Collect();
www.Dispose();
.....
Page
Garbage Collection • Roots are not collected in a GC.Collect
• Thread stacks • CPU Registers • GC Handles (used by Unity to hold onto managed
objects) • Static variables!!
• Collection time scales with managed heap size • The more you allocate, the slower it gets
6/9/14 13
Page
GC: does lata layout matter ? struct Stuff {
int a; float b; bool c; string leString;
} Stuff[] arrayOfStuff; << Everything is scanned. GC takes more time VS int[] As; float[] Bs; bool[] Cs; string[] leStrings; << Only this is scanned. GC takes less time.
6/9/14 14
Page
GC: Best Practices • Reuse objects è Use object pools • Prefer stack-based allocations è Use struct
instead of class • System.GC.Collect can be used to trigger
collection • Calling it 6 times returns the unused memory to
the OS • Manually call Dispose to cleanup immediately
6/9/14 15
Page
Avoid temp allocations • Don’t use FindObjects or LINQ • Use StringBuilder for string concatenation • Reuse large temporary work buffers • ToString() • .tag è use CompareTag() instead
6/9/14 16
Page
Unity API Temporary Allocations Some Examples: • GetComponents<T> • Vector3[] Mesh.vertices • Camera[] Camera.allCameras • foreach
• does not allocate by definition • However, there can be a small allocation, depending on the
implementation of .GetEnumerator()
5.x: We are working on new non-allocating versions
6/9/14 17
Page
Memory fragmentation • Memory fragmentation is hard to account for
• Fully unload dynamically allocated content • Switch to a blank scene before proceeding to next level
• This scene could have a hook where you may pause the game long enough to sample if there is anything significant in memory
• Ensure you clear out variables so GC.Collect will remove as much as possible
• Avoid allocations where possible • Reuse objects where possible within a scene play • Clear them out for map load to clean the memory
6/9/14 18
Page
Unloading Unused Assets • Resources.UnloadUnusedAssets will trigger asset
garbage collection • It looks for all unreferenced assets and unloads them • It’s an async operation • It’s called internally after loading a level
• Resources.UnloadAsset is preferable • you need to know exactly what you need to Unload • Unity does not have to scan everything
• Unity 5.0: Multi-threaded asset garbage collection
6/9/14 19
Page 6/9/14 20
Mesh Internals Memory vs. Cycles
Page
Mesh Read/Write Option • It allows you to modify the mesh at run-time • If enabled, a system-copy of the Mesh will remain in
memory • It is enabled by default • In some cases, disabling this option will not reduce the
memory usage • Skinned meshes • iOS
Unity 5.0: disable by default – under consideration
6/9/14 21
Page
Non-Uniform scaled Meshes We need to correctly transform vertex normals • Unity 4.x:
• transform the mesh on the CPU • create an extra copy of the data
• Unity 5.0 • Scaled on GPU • Extra memory no longer needed
6/9/14 22
Page
Static Batching What is it ? • It’s an optimization that reduces number of draw
calls and state changes How do I enable it ? • In the player settings + Tag the object as static
6/9/14 23
Page
Static Batching How does it work internally ? • Build-time: Vertices are transformed to world-
space • Run-time: Index buffer is created with indices of
visible objects
Unity 5.0: • Re-implemented static batching without copying of
index buffers
6/9/14 24
Page
Dynamic Batching What is it ? • Similar to Static Batching but it batches non-static
objects at run-time How do I enable it ? • In the player settings • no need to tag. it auto-magically works…
6/9/14 25
Page
Dynamic Batching How does it work internally ? • objects are transformed to world space on the
CPU • Temporary VB & IB are created • Rendered in one draw call
Unity 5.x: we are considering to expose per-platform parameters
6/9/14 26
Page
Mesh Skinning Different Implementations depending on platform: • x86: SSE • iOS/Android/WP8: Neon optimizations • D3D11/XBoxOne/GLES3.0: GPU • XBox360, WiiU: GPU (memexport) • PS3: SPU • WiiU: GPU w/ stream out
Unity 5.0: Skinned meshes use less memory by sharing index buffers between instances
6/9/14 27
Page 6/9/14 28
Scripting
Page
Unity 5.0: Mono • No upgrade • Mainly bug fixes • New tech in WebGL: IL2CPP
• http://blogs.unity3d.com/2014/04/29/on-the-future-of-web-publishing-in-unity/
• Stay tuned: there will be a blog post about it
6/9/14 29
Page
GetComponent<T> It asks the GameObject, for a component of the specified type: • The GO contains a list of Components • Each Component type is compared to T • The first Component of type T (or that derives from
T), will be returned to the caller • Not too much overhead but it still needs to call into
native code
6/9/14 30
Page
Unity 5.0: Property Accessors • Most accessors will be removed in Unity 5.0 • The objective is to reduce dependencies,
therefore improve modularization • Transform will remain • Existing scripts will be converted. Example:
in 5.0:
6/9/14 31
Page
Transform Component • this.transform is the same as GetComponent<Transform>() • transform.position/rotation needs to:
• find Transform component • Traverse hierarchy to calculate absolute position • Apply translation/rotation
• transform internally stores the position relative to the parent • transform.localPosition = new Vector(…) è simple
assignment • transform.position = new Vector(…) è costs the same if
no father, otherwise it will need to traverse the hierarchy up to transform the abs position into local
• finally, other components (collider, rigid body, light, camera, etc..) will be notified via messages
6/9/14 32
Page
Instantiate API: • Object Instantiate(Object, Vector3, Quaternion); • Object Instantiate(Object);
Implementation: • Clone GameObject Hierarchy and Components • Copy Properties • Awake • Apply new Transform (if provided)
6/9/14 33
Page
Instantiate cont..ed • Awake can be expensive • AwakeFromLoad (main thread)
• clear states • internal state caching • pre-compute
Unity 5.0: • Allocations have been reduced • Some inner loops for copying the data have been
optimized
6/9/14 34
Page
JIT Compilation What is it ? • The process in which machine code is generated from CIL
code during the application's run-time
Pros: • It generates optimized code for the current platform
Cons: • Each time a method is called for the first time, the
application will suffer a certain performance penalty because of the compilation
6/9/14 35
Page
JIT compilation spikes What about pre-JITting ? • RuntimeHelpers.PrepareMethod does not work:
…better to use MethodHandle.GetFunctionPointer()
6/9/14 36
Page 6/9/14 37
Job System
Page
Unity 5.0: Job System (internal) The goals of the job system:
• make it easy to write very efficient job based multithreaded code
• The jobs should be able to run safely in parallel to script code
6/9/14 38
Page
Job System: Why ? Modern architectures are multi-core: • XBox 360: 3 cores • PS4/Xbox One: 8 cores
…which includes mobile devices: • iPhone 4S: 2 cores • Galaxy S3: 4 cores
6/9/14 39
Page
Job System: What is it ? • It’s a Framework that we are going to use in
existing and new sub-systems • We want to have Animation, NavMesh, Occlusion,
Rendering, etc… run as much as possible in parallel
• This will ultimately lead to better performance
6/9/14 40
Page
Unity 5.0: Profiler Timeline View It’s a tool that allows you to analyse internal (native) threads execution of a specific frame
6/9/14 41
Page
Unity 5.0: Frame Debugger
6/9/14 42
Page 6/9/14 43
Conclusions
Page
Budgeting Memory How much memory is available ? • It depends… • For example, on 512mb devices running iOS 6.0:
~250mb. A bit less with iOS 7.0
What’s the baseline ? • Create an empty scene and measure memory • Don’t forget that the profiler requires some
memory • For example: on Android 15.5mb (+ 12mb profiler)
6/9/14 44
Page
Profiling • Don’t make assumptions • Profile on target device • Editor != Player • Platform X != Platform Y • Managed Memory is not returned to Native Land!
For best results…: • Profile early and regularly
6/9/14 45
Page 6/9/14 46
Questions ? [email protected] - Twitter: @m_trive