D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX...

49
D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015

Transcript of D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX...

Page 1: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

D3D12A NEW MEANING FOR EFFICIENCY AND PERFORMANCE

DAVE OLDCORN, AMDSTEPHAN HODES, AMD

MAX MCMULLEN, MICROSOFTDAN BAKER, OXIDE

5TH MARCH 2015

Page 2: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

D3D11 to D3D12

Page 3: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 20153

WHAT HASN’T CHANGED

D3D12 is primarily a software change Hardware programming model is still the same

‒Few new rendering features

Page 4: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 20154

WHAT HAS CHANGED

The software model has changed a lot Not just in the API, but also in the underlying

philosophy‒Closer to the hardware‒Give more control to the application

Page 5: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 20155

APPLICATION IS ARBITER OF CORRECT RENDERING

Trades off safety for power‒If D3D11 is Javascript, D3D12 is C++

Large areas of undefined‒... where behaviour will change with future GPUs

Use the debug layer Stay away from the corners, don’t take risks

‒Expect “morality guides”‒... once we know what people keep doing wrong

Page 6: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 20156

BROAD STROKE CHANGES D3D11 -> 12

Sequential API Queues, Command Lists

Small state blocks State object for pipeline

Resource binding: individual objects Resource binding: tables

Automatic synchronisation, driver tracks resource state

Manual synchronisation, app must avoid overwrites

Implicit memory management by OS & driver

Explicit memory management by application

Page 7: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

New in D3D12

Page 8: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 20158

COMMAND LISTS

Each command list is executed strictly sequentially Command lists can call out to second-level command lists

(“bundles”)‒Some restrictions on bundles‒Replaying bundles is OK

Top level command lists can be replayed too‒But not until the previous submit has retired

Size them right‒100s draws for direct lists; 10+ draws for bundle

Page 9: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 20159

COMMAND LISTS ENABLE CPU SIDE THREADING

Command lists can be built on arbitrary threads‒And very quickly too

Submit is thread-safe‒Submit in batches

Consider task oriented engines‒Divide rendering into tasks‒Run CPU tasks to build command lists‒Use dependencies to order GPU submission

‒Also helps with resource barriers

Page 10: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201510

ALLOCATOR AND LIST MEMORY MANAGEMENT

Lists / Allocators manage memory‒Hang on to their resources when reset‒Must be destroyed to fully release memory

‒Reuse lists / allocators on ‘similar’ data‒Destroy if data is very dissimilar

‒Don’t use pool of lists / allocators for all possible uses

Initial

100 draws

Reset

Same 100 draws

200 draws

List / Allocator memory usage

(Guaranteed no new allocations)

Different 100 draws

5 draws

Page 11: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201511

PIPELINE STATE OBJECT (PSO)

Collates most D3D11 renderstates Compiled into hardware registers at Create time

‒Can easily be tens of ms, so use asynchronous threads All state set onto command buffer in one go Keep adjacent PSOs similar Use sensible defaults for don’t care fields

Example: Rasterizer state

INT DepthBias;FLOAT DepthBiasClamp;FLOAT SlopeScaledDepthBias;BOOL DepthClipEnable;

None of this matters if depth

test is off

Page 12: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201513

D3D12 RESOURCE BINDING 1

Table driven Shared across all shader stages Two-level table

‒Root Signature describes a top-level layout‒Pointers to descriptor tables‒Direct pointers to constant buffers‒ Inline constants

Changing which table is pointed to is cheap‒It’s just writing a pointer; no synchronisation cost

Changing contents of table is harder‒Can’t change table in flight on the hardware; no

automatic renaming

Table Pointer

RootSignature

RootConstant

BufferView

32-bitconstant

Table pointerTable

pointer

CB view

CB view

SR view

UA view

DescriptorTable

SR viewSR view

SR view

SR view

DescriptorTable

Table pointer

Page 13: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201514

D3D12 RESOURCE BINDING 2

Tables should be grouped by frequency of change‒Per-draw, per-material, per-light, per-frame‒Hint update frequency to driver by placing most frequent changes early in root signature

Page 14: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201515

D3D12 RESOURCE BINDING TIPS

Don’t overload root signature size‒CBVs and constants in root signature should probably be changing every draw call

‒Bulk constant data should be in CBs not root constants Use static tables where possible

‒Associate with object and prebuild

Page 15: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201516

D3D12 RESOURCE SYNCHRONISATION

No automatic synchronisation Must insert barriers between usage Three functions of barrier

‒Format conversion‒e.g. antialiasing resolve or depth decompression

‒Synchronisation‒Ensuring correct order of execution; e.g. compute use of a render output could start before

colour buffer is finished working on the data, due to pipelining‒Visibility

‒Typically cache flushes, if unit A and unit B do not share the same visibility of the data Barrier specifies previous and next usage and driver inserts appropriate work

Page 16: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201517

BARRIER TIPS

Group barriers into same Barrier call‒Will take the worst case of all, rather than potentially incurring multiple sequential barriers

Set minimal barriers Barriers must be correct

‒Will be a gigantic headache for IHVs if not

Page 17: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201518

PROFILING

D3D11 was reasonably predictable in profiling‒Limited set of accessible bottlenecks‒Usually fairly obvious which one you’re hitting

D3D12 environment adds new factors‒API features: flexible resource binding, concurrency‒Hardware limits that were pretty much impossible to bump against in

D3D11‒Even PCIe® and system memory bus

Different hardware much more likely to have divergent behaviour‒Test on a wide range of hardware

Page 18: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

Concurrency inD3D12

Page 19: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201520

QUEUES

Graphics, compute and copy queues

Each is a superset Must specify executing

queue type at record time

Graphics

Compute

Copy

Page 20: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201521

MULTIPLE QUEUES

Multiple queues of the same type supported‒Within queue: work is ordered

‒Between separate queues work can be arbitrarily reordered

Use Fences to define work order

GraphicsQueue 1

GraphicsQueue 2

Graphics engine

Shadowmap L0 Lighting L0

Shadowmap L1 Lighting L1

Shadowmap L0 Shadowmap L1 Lighting L1 Lighting L0

Page 21: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201522

GAME ENGINE WORKFLOW

Physics Shadowmap Rendering

G-buffer Rendering

Lighting & Shading

Solid Post Processing

Post Processing

UI Rendering Present

TressFXParticle

Multiple cascadesPoint/Spotlights

Prepare

e.g. generate Min/Max Mips

e.g. Particle Rendering

Transparent Obj Rendering

Heap Defragmentation Streaming Dynamic Data Update

Page 22: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201523

CONCURRENCY

Graphics, compute and / or copy may run in parallel‒Profile to verify‒Very familiar to console programmers

GraphicsEngine

ComputeEngine

CopyEngine DefragmentationStreamingDynamic Data Update

Physics

Shadowmaps G-buffer

TileDeferred AA/AO

Transparent

Tonemap

UI

Prepare SM

Page 23: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201524

DEMO TIME!

Example of gains from async compute:‒Interleaving 2 frames

Sample code will be available Sample based on DX11 work by Jason Stewart & Gareth Thomas

G-buffer Rendering 1

Lighting & Shading 1

G-buffer Rendering 2

Lighting & Shading 2

G-buffer Rendering 3

Lighting & Shading 0

Page 24: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201525

Page 25: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201526

Page 26: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201527

PARALLELISE UNALIKE WORKLOADS

Engines may compete for resources‒Bus bandwidth‒Shader core, texture fetch for compute / graphics‒GPRs, Caches…

The less similar the workload, the faster each runsBus dominated Shader throughput Geometry dominatedShadow mappingROP heavy workloadsMany G buffer operationsDMA operations- Texture upload- Heap defrag

Deferred lighting (usually)Many postprocessing effectsMost compute tasks- Texture compression- Physics- Simulations

Rendering highly detailed models

Page 27: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201528

EXPLOITING CONCURRENCY

Profile! Can align execution across queues with fences

‒Fences have a significant cost‒Don’t overdo this; “a few” per frame at most

Shadow mapAnimateParticles

Stream Texture Deferred Lighting

Shadow map Deferred Lighting

Stream Texture Animate Particles

Deferred LightingShadow map

Stream Texture

Animate Particles

Win!

Big Win!

Page 28: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201529

BARRIERS AND MULTIPLE QUEUES

Barrier must be inserted on last queue to write resource‒Primarily this is for any required format conversion

Fences contain implicit acquire / release barriers‒One of the reasons they have a high cost

Page 29: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

Resource Management in D3D12Max McMullenMicrosoft

Page 30: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201531

DIRECT3D 12 RESOURCE CREATION OVERVIEW

Direct3D 11 has a simple model, create and use Works great given the simplicity of the abstraction A few problems for today’s titles

‒Unpredictable performance differences due to driver workarounds‒No high performance reuse of memory in a given frame‒Tiled Resources added on to the original abstraction

Page 31: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201532

DIRECT3D 11

Physical Pages

DDI

API

Physical Pages

GPU VA

Buffer

Physical Pages

GPU VA

Texture3D

Physical Pages

GPU VA

Texture2DTexture2D

Page 32: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201533

DIRECT3D 12 RESOURCE HEAPS

Direct3D 12 separates allocation of GPU physical pages and GPU virtual addresses from resources

Applications can better amortize the cost of physical page allocation‒Reuse memory for temporaries‒Repurpose memory when the scene no longer requires it

Page 33: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201534

DIRECT3D 12 RESOURCE HEAPS

Physical Pages

DDI

API

Physical Pages

GPU VA

Buffer Texture3D Texture2D

Resource Heap

Texture2D

Page 34: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201535

RESOURCE HEAP PROPERTIES

Memory Pool L0 – Closest to CPUL1 – Closest to GPU (Discrete GPU only)

CPU Page Properties Not Accessible (L0 & L1)Write Combine (L0 Only)Write Back (L0 Only)

Alignment 64 KB (Default)1 MB (Enable MSAA)

Page 35: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201536

SIMPLIFIED HEAP TYPES

DEFAULT UPLOAD READBACKMemory Pool

L1 (Discrete)L0 (Integrated)

L0 L0

CPU Properties

No CPU access Write Combine Write Back*

Write Back

Usage Frequent GPU Read/Write

Max GPU Bandwidth

CPU Write Once, GPU Read Once

Max CPU Write Bandwidth

GPU Write Once, CPU Read

Max CPU Read Bandwidth

Page 36: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201537

DIRECT3D 12 RESOURCE CREATION APIS

Three types of resource create‒Committed‒Placed‒Reserved

Each has a different pattern of GPU VA and Physical Page usage to enable different scenarios

Page 37: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201538

DIRECT3D 12 RESOURCE CREATION APIS

Physical Pages Physical Pages

GPU VA

Resource Heap

Texture3D Buffer

Physical Pages

GPU VA

Resource Heap

Texture2D

Committed Placed Reserved

Page 38: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201539

EFFICIENT HEAP USAGE

Prefer default heaps populated by upload heaps‒Build a ring buffer out of one or more committed upload buffer resources, and leave

each buffer perpetually mapped for CPU access‒Sequentially write data into each buffer with the CPU, aligning offsets as needed‒Instruct the GPU to signal an increasing fence value at the end of each frame‒Do not overwrite the data in the upload heap until the fence value indicates the GPU

has finished reading the data Reuse upload heaps for dynamic data sent to GPU throughout rendering

Page 39: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201540

PHYSICAL MEMORY REUSE

Both reserved and placed resources must follow the same rules as Direct3D 11 tiled resources: An aliasing barrier must be queued when physical memory is

reused with a new resource The application must initialize the resource memory with either a

Clear or Copy operation when first using or re-using physical memory with a render target or depth stencil resource

Page 40: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

Efficient Memory Use in D3D12Dan BakerCo-Founder of Oxide Games

Page 41: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201542

D3D12 MEMORY CONTROL

D3D11 – much guesswork in driver/API on where data went, how it was referenced

ConstantBuffer dynamic map difficult to stream huge quantities of data efficiently

D3D12 provides explicit control over memory mapping ‒Can create one large buffer per frame and stage all data‒No specific need for a constant buffer – becomes application construct if desired

Page 42: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201543

HIGH THROUGHPUT RENDERING

To get advantage of draw call, must be hooked into game logic

For each unit, turret, missile trail, CPU calculates information like position or color

This data must be uploaded to the GPU – quickly as possible

Page 43: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201544

FAST DATA STREAMING TO GPU

CPUL1 Data Cache

CPU Memory

L2/L3 Cache

GPU Memory

GPU

Page 44: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201545

STREAMING THE DATA

GPU memory is not write-cached, do not read Should always write whole cache-lines out _mm_stream_si128

‒Writes cache-line at a time‒Will bypass L2 and L3 Cache

Page 45: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201546

REAL-WORLD D3D12 EXAMPLE

Ashes of the Singularity – new mega RTS from Oxide and Stardock

Player may have thousands of units Every turret, bullet and missile simulated by engine On heavy frame, Ashes uploads 40-50 mb/s of data to

GPU, 60fps = 3 GB/s‒~20% of system bandwidth on DDR3‒If stored in CPU memory with GPU fetch, would be doubled

Page 46: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201547

WHAT A FRAME LOOKS LIKE IN ASHES

Sim Job

Sim Job

Sim Job

D3D12 CMD Job

D3D12 CMD Job

Core 1

Current Frame

Sim Job

Sim Job

D3D12 CMD Job

D3D12 CMD JobCore 2

Sim Job Sim Job

D3D12 CMD Job

D3D12 CMD JobCore 3

Sim Job Sim Job

D3D12 CMD Job

D3D12 CMD JobCore 4

AI Job

Sim Job Sim JobD3D12

CMD JobD3D12

CMD JobCore 5

Game Job

Sim Job

Sim Job

Sim Job

AI Job

Game Job

Next Frame

D3D12 Present Job

GPU Memory

Page 47: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201548

D3D12 DEMO

Demo of Ashes of the Singularity

Page 48: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201549

Questions

We are hiring!Contact: [email protected]

Page 49: D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE DAVE OLDCORN, AMD STEPHAN HODES, AMD MAX MCMULLEN, MICROSOFT DAN BAKER, OXIDE 5 TH MARCH 2015.

| D3D12 A NEW MEANING FOR EFFICIENCY AND PERFORMANCE | GDC | MARCH 5TH 201550

DISCLAIMER & ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION

© 2015 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.