Optimization Deep Dive: Unreal Engine 4 on Intel

26
Jeff Rous – Graphics Software Engineer, Intel Twitter: @jeff_rous Optimization Deep Dive: Unreal Engine 4 on Intel

Transcript of Optimization Deep Dive: Unreal Engine 4 on Intel

Page 1: Optimization Deep Dive: Unreal Engine 4 on Intel

Jeff Rous – Graphics Software Engineer, IntelTwitter: @jeff_rous

Optimization Deep Dive: Unreal Engine 4 on Intel

Page 2: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential

Overview

RationaleIntel Graphics Roadmap/DetailsHow We MeasuredCommon Pain PointsShader OptimizationsOptimizing for DX12VR Tips and TricksAndroid x86/x64 and ASTC Support

Page 3: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 3

Why Work Together?

Benefits all games that use the engine UE4 runs on more hardware

Intel is 18% GPU share. 4 of top 10 most popular GPUs are Intel. (Steam)

Optimizations help everyone – high end to phoneCommon goals

APIs like DX12 and Vulkan are going to power tomorrow’s games

Virtual reality an important new segmentAndroid is a large market and key for Epic and Intel

Page 4: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 4

Intel® HD Graphics: Roadmap

Sandy Bridge

Intel® 2nd Gen Core™ Processor

• 32nm

• Feature Level 10.1

• Up to 12 EUs

2011Ivy Bridge

Intel® 3nd Gen Core™ Processor

• 22nm

• Feature Level 11.0

• Up to 16EUs

2012Haswell

Intel® 4nd Gen Core™ Processor

• Feature Level 11.1

• DX Extensions

• GT3 (40 EUs)• EDRAM• Iris Pro™, Iris™

brands

2013 Broadwell

Intel® 5nd Gen Core™ Processor

• 14nm

• Feature Level 11.2

• Up to 48 EUs

2014Skylake

Intel® 6th Gen Core™ Processor

• Feature Level 12.0

• GT4 (72 EUs)• GT3e 15/28W• DX12 HW

2015-16

Up to 30X faster graphics over last 5 years

Page 5: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 5

Intel® HD Graphics: EDRAMBasic facts Located on the same package with CPU 64-128MB Bandwidth – 50 GB/Sec each way

(100BGB/sec total BW) Acts as 4th level $ Just works: no API required to use and

take advantageBandwidth Saving Increasing compute requires more

bandwidth EDRAM helps to reduce BW

consumption and improve EU efficiency

Just works, but efficiency can be improved by re-using frame data

CPU PackageIntel 6rd Gen Core™ chip

CPU Core

CPU Core

CPU Core

Ring-bus

CPU Core

LL$System

MemoryGfx Core

EDRA

M

Page 6: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 6

How We Measured – Intel GPA

Use ToggleDrawEvents commandFrame debugging and live modeExperiment!

Page 7: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 7

How We Measured

ProfileGPU commandStat commandsWindows Performance AnalyzerIntel Extreme Tuning Utility

Page 8: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 8

Intel Pain Points – Memory Bandwidth

Memory bandwidth at a premium with integrated graphicsGbuffers are memory hungry. UE4 is configurable where you can change the format, eliminate or even combine channels. Scaling resolution of gbuffers good to a point.

Page 9: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 9

Intel Pain Points – Dense Geometry

Sub pixel or very dense mesh vertex shader execution can’t be covered by pixel shader execution leading to hardware starving. Use LOD where possible.Clipper can get bottlenecked in the worst cases. Use frustum culling on bounding boxes at the very least. Occlusion culling for hidden objects.

Page 10: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 10

A Word About Power

Intel graphics typically in low power systems. Less CPU usage means more graphics.

Page 11: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 11

Shaders – Local Memory

64 byte cache lines benefit from loop unrolling a great deal. Avoid small loads in tight loops

Page 12: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 12

Shaders – Unused Attributes

Often shaders are bound with large structures full of constants that go unused. This is not cache friendly.Depth passes are especially bad, outputting values not used by a null pixel shader. In UE4, make use of r.ShaderPipelines for depth passes. In DX12, make liberal use of DENY_*_ACCESS to limit resource-shader visibility.

Page 13: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 13

Shaders – Branching and Sampling

Using lots of temporaries can starve the hardware. Branching is expensive if loads are inside the conditional blocks.Group loads as early in the shader as possible to help cover latency.

Page 14: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 14

Demo – DX12 Driver Metrics

Page 15: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 15

DX12 Performance – Fast Clear

Specify optional D3D12_CLEAR_COLOR when calling CreateCommittedResourceIntel hardware has fast clear path for 1 bit per pixel clear values eg. (1,0,1,0)When clearing, use the up front specified color for maximum performance.~9% performance gain on Elemental Demo on DX12!In the engine today

Page 16: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 16

DX12 Performance – Root Signature

Blueprint of resources availableRoot constantsRoot descriptorsDescriptor tables

Constants that sit directly in root are copied to each invocation of the shader (pushed) rather than read from memory when used (pulled)Can significantly speed up shader execution Automatically handled by driver in DX11

Page 17: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 17

VR Tips and Tricks

Simple techniques to take advantage of an under-utilized resource, the CPU!Easily adds realism to your VR scenes without much incremental GPU work.Min spec defined for high end VR.Effects can be scaled up easily through BluePrints.

Page 18: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 18

VR Tips and Tricks - Destruction

Simulates dynamic fracturing of meshes into smaller pieces. Typical destruction workloads consist of a few seconds of a lot of simulation time followed by a return to the baseline.Better CPUs can keep pieces around longer and fracture more for more realism.

Page 19: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 19

VR Tips and Tricks - Cloth

Dynamic mesh simulation that responds to the player, wind or other environmental factors. Typical cloth workloads include player capes or flags. Simulated every frame.  Easy to scale - More cloth systems means more CPU usage

Page 20: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 20

Android x86/x64 Support

Native apps reduce CPU load, startup times and power consumptionSupported in UE4 today through editor menu

Requires source buildPackage as fat or separated APKs

OpenGL ES 3.1 + AEP for best qualityASTC texturesDeferred rendererSupported on latest Intel tablets

Page 21: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 21

Fast ASTC compression

Next gen format (OpenGL ES, Vulkan)Very good compression on RGB/RGBA for variety of block sizesUE4 now has support for Intel’s fast texture compressor for ASTC

44x speed improvementQuality comparable to ARM compressorUE4 uses Intel’s BC6H/BC7 compressors already

Released with 4.13

Page 22: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 22

ASTC Quality Comparison

Zoomed in portion of a 2048x2048 normal map

Original: 12 MB ETC1: 2 MB ASTC 6x6: 1.8 MB

Page 23: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 23

What’s Next?

Intel Compiler Support - 4.14Vtune Amplifier Support – Event based CPU sampling using itt_notify framework. Gives deep insight into what the engine is doing at all times. Future release.VR Sample showing off techniques to take advantage of extra CPU cycles.

Page 24: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 24

Wrap up

Intel and Epic have worked together to enable key technologies to enable developers to make their best games.Take advantage of scaling features in UE4 – Epic has done a lot of work to support lower end hardware.Test on Intel hardware early. UE4 is powerful but it can easily bring down a high end system. With proper optimization, UE4 games run really well on Intel hardware.

Page 25: Optimization Deep Dive: Unreal Engine 4 on Intel
Page 26: Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Software – Developer Relations Division Intel Confidential 26

Links

Intel Developer Zone (software.intel.com)Unreal Engine 4 (unrealengine.com)Intel GPA (software.intel.com/en-us/gpa)ISPC Texture Compressor sample (software.intel.com/en-us/articles/fast-ispc-texture-compressor-update)Using Android x86 on UE4 (software.intel.com/en-us/articles/Unreal-Engine-4-with-x86-Support)UE4 Code Sharing Hub (Intel Hardware Metrics) (wiki.unrealengine.com/GitHub_Sharing_Hub)