Windows Display Driver Model (WDDM) v2 And Beyond

Windows Display Driver Windows Display Driver Model (WDDM) v2 Model (WDDM) v2 And BeyondAnd Beyond

Steve Pronovost, MicrosoftSteve Pronovost, MicrosoftHenry Moreton, NVIDIAHenry Moreton, NVIDIATim Kelley, ATITim Kelley, ATI

OutlineOutlineIntroductionIntroduction

Trends in use of GPU(s)Trends in use of GPU(s)

WDDM v1.0 overviewWDDM v1.0 overviewWDDM v.2.x overviewWDDM v.2.x overviewScenarios that benefitScenarios that benefit

Trends In Use Of GPUTrends In Use Of GPUWindows XP: Single client at a timeWindows XP: Single client at a time

GDI desktopGDI desktopVideo decodingVideo decodingFull screen gameFull screen gameCAD/Workstation applicationsCAD/Workstation applications

GPUs getting more flexibleGPUs getting more flexibleDirect3D pushing increased programmability, Direct3D pushing increased programmability, precision and performanceprecision and performanceMassive processing power, not fully Massive processing power, not fully utilized todayutilized today

Trends In Use Of GPUTrends In Use Of GPUWindows Vista: Multiple clients togetherWindows Vista: Multiple clients together

Desktop window managerDesktop window managerWinFX APIs based on Direct3D 9WinFX APIs based on Direct3D 9Picture, video playback, capture, encode, Picture, video playback, capture, encode, transcode, edit leverage GPUstranscode, edit leverage GPUsIn-box gamesIn-box games

Emerging General – Purpose-GPU trendEmerging General – Purpose-GPU trendPhysics, image processing, etc.Physics, image processing, etc.

WDDM v1.0WDDM v1.0Designed to work on Designed to work on existingexisting GPUs GPUsIncrease stability, robustness and securityIncrease stability, robustness and securityGPU schedulingGPU schedulingVirtualized video memoryVirtualized video memoryResource virtualization seamless Resource virtualization seamless across legacy APIacross legacy API

Ddraw, dx3, dx5, dx6, dx7, dx8, dx9, OGL Ddraw, dx3, dx5, dx6, dx7, dx8, dx9, OGL

Use new API to take full advantage of Use new API to take full advantage of resource virtualizationresource virtualization

Direct3D 9Ex, Direct3D 10Direct3D 9Ex, Direct3D 10

WDDM v2.0WDDM v2.0New generation of GPUs designed New generation of GPUs designed for multi-taskingfor multi-taskingMid command buffer preemptionMid command buffer preemptionDemand faulting of resourcesDemand faulting of resources

Surface fault (preferred mode for v2.0)Surface fault (preferred mode for v2.0)Page fault (stall the GPU)Page fault (stall the GPU)

Per process page tablesPer process page tablesBetter multi-tasking than WDDM v1.0,Better multi-tasking than WDDM v1.0,still some client cooperation requiredstill some client cooperation required

WDDM v2.1WDDM v2.1Everything WDDM v2.0 GPU can doEverything WDDM v2.0 GPU can doFine grained context switchingFine grained context switching

Can preempt mid pixelCan preempt mid pixel

Doesn’t stall GPU on page faultDoesn’t stall GPU on page faultTrue preemptive multi-taskingTrue preemptive multi-taskingUltimate flexibility for the GPU Ultimate flexibility for the GPU GPU can be used for any scenarios GPU can be used for any scenarios without impact on the desktopwithout impact on the desktop

WDDM Cheat SheetWDDM Cheat SheetWDDM v1.0WDDM v1.0 WDDM v2.0WDDM v2.0 WDDM v2.1WDDM v2.1

SchedulingScheduling PacketPacket RunListRunList RunListRunList

PreemptionPreemption PacketPacket Mid PacketMid Packet Mid PixelMid Pixel

Demand Demand faultingfaulting

Not supportedNot supported Surface/Surface/Page (STALL)Page (STALL)

PagePage

MemoryMemoryManagementManagement

Physical/ Physical/ ContiguousContiguous

Virtual/ Virtual/ Page tablePage table

Virtual/Virtual/Page tablePage table

Multi-taskingMulti-tasking CooperativeCooperative Mostly Mostly PreemptivePreemptive

Truly Truly PreemptivePreemptive

WDDM 2.x Scheduling, WDDM 2.x Scheduling, Performance AndPerformance AndMulti-GPU SupportMulti-GPU Support

Henry MoretonHenry MoretonNVIDIANVIDIA

GPUs On The DesktopGPUs On The DesktopThe power of the GPU is finally tappedThe power of the GPU is finally tapped

GraphicsGraphicsVideoVideoBandwidth and floating point (GPGPU)Bandwidth and floating point (GPGPU)

Applications are vying for this Applications are vying for this powerful resourcepowerful resource

The Vista Desktop The Vista Desktop Window Manager (DWM)Window Manager (DWM)Photo editingPhoto editingVideo feedsVideo feedsPersonal Video RecorderPersonal Video Recorder

GPU Management Is CrucialGPU Management Is CrucialApplications naturally see the Applications naturally see the processor as their ownprocessor as their ownGreat GPU tasks really exploit the powerGreat GPU tasks really exploit the powerBut...But...

Some GPU operations are so massiveSome GPU operations are so massivethey take non-trivial timethey take non-trivial timeSome GPU operations are time sensitiveSome GPU operations are time sensitive

Management of the GPU is crucial to Management of the GPU is crucial to success (a happy user)success (a happy user)

Watching The Daily ShowWatching The Daily Show©©

Doodling with photosDoodling with photosI find a great program forI find a great program forcreating panoramas...creating panoramas...

TodayTodayI set it up with twelve, I set it up with twelve, 6 mega-pixel images6 mega-pixel imagesPress Press gogo and wait... and wait... a long time (minutes)a long time (minutes)

Soon, with GPU acceleration, I press Soon, with GPU acceleration, I press go and wait a second or twogo and wait a second or two

A Typical Situation (For Me)A Typical Situation (For Me)

But A Second Or But A Second Or Two Is A Long TimeTwo Is A Long Time

Managed as a shared resource the GPUManaged as a shared resource the GPURenders my video unaffectedRenders my video unaffectedBuilds my panorama in no time...Builds my panorama in no time...

UnmanagedUnmanagedThe Daily Show risks being a slide show...The Daily Show risks being a slide show...

So Scheduling Is ImportantSo Scheduling Is ImportantHow does scheduling vary acrossHow does scheduling vary across

WDDM v1.0WDDM v1.0WDDM v2.0WDDM v2.0WDDM v2.1WDDM v2.1

What are What are the mechanics?the mechanics?What is the context What is the context switch behavior?switch behavior?What is expected performance?What is expected performance?

With varying numbers of active contexts...With varying numbers of active contexts...

WDDM v2.x – The Care WDDM v2.x – The Care And Feeding Of The GPUAnd Feeding Of The GPU

User Mode Driver (UMD)User Mode Driver (UMD)Creates DMA buffer of commandsCreates DMA buffer of commands

Kernel Mode Driver (KMD)Kernel Mode Driver (KMD)Appends DMA buffer to GPU context’s queueAppends DMA buffer to GPU context’s queue

The GPU Scheduler schedules contextsThe GPU Scheduler schedules contextsA Run List of contexts each with A Run List of contexts each with its own ring buffer of DMA buffersits own ring buffer of DMA buffers

Run ListsRun ListsList of contexts (box)List of contexts (box)GPU processes GPU processes a context untila context until

Context is completed Context is completed (get new run list)(get new run list)Scheduler pre-emptsScheduler pre-emptsPage fault – WDDM v2.1Page fault – WDDM v2.1Protection faultProtection faultSynchronization eventSynchronization event

Multiple contexts per Run ListMultiple contexts per Run ListHide latencyHide latency

How Nimble Is How Nimble Is Context Switching?Context Switching?

XPXPAll Q’d DP2 buffers must completeAll Q’d DP2 buffers must complete(very coarse)(very coarse)

WDDM v1.0 – Basic schedulingWDDM v1.0 – Basic schedulingCurrent DMA buffer Current DMA buffer must complete (coarse)must complete (coarse)

WDDM v2.0WDDM v2.0Switch on command/triangle (fine)Switch on command/triangle (fine)

WDDM v2.1WDDM v2.1Switch “immediately” (very fine)Switch “immediately” (very fine)

Context Switch GuaranteesContext Switch GuaranteesPre WDDM v2.1 (XP, v1.0, v2.0)Pre WDDM v2.1 (XP, v1.0, v2.0)

No guaranteeNo guaranteeVERY long shader, VERY large triangle slow to switchVERY long shader, VERY large triangle slow to switch

expected performanceexpected performanceRelatively coarse switching for XP and v1.0Relatively coarse switching for XP and v1.0V2.0: Good average/typical switch time V2.0: Good average/typical switch time

WDDM v2.1WDDM v2.1Guaranteed to context switchGuaranteed to context switchSame average/typical switch time as v2.0Same average/typical switch time as v2.0Much better switch time on applications Much better switch time on applications with long shaderswith long shaders

Context Switch ChallengeContext Switch ChallengeBecause GPUs are heavily threaded Because GPUs are heavily threaded there is much more state than on a CPUthere is much more state than on a CPUConsider rendering @ 60 fpsConsider rendering @ 60 fps

17 millisecond frame time17 millisecond frame time

With a context switch time of 100µsWith a context switch time of 100µsThree concurrent applications see Three concurrent applications see a ~2% context switch overheada ~2% context switch overheadFast GPU context switching is Fast GPU context switching is important and challenging!important and challenging!

WDDM v2.x EfficienciesWDDM v2.x EfficienciesWDDM v1.0WDDM v1.0

User Mode Driver (UMD) creates User Mode Driver (UMD) creates GPU-specific command bufferGPU-specific command bufferKMD patches addressesKMD patches addressesCopies to GPU visible DMA bufferCopies to GPU visible DMA buffer

WDDM v2.0 and 2.1WDDM v2.0 and 2.1UMD creates DMA buffer directly UMD creates DMA buffer directly in GPU memoryin GPU memoryNo copy, no patch, fast and efficientNo copy, no patch, fast and efficient

Performance – Performance – Memory FootprintMemory Footprint

WDDM v1.0WDDM v1.0No demand fault (page or surface)No demand fault (page or surface)Entire surfaces resident – coarse grainedEntire surfaces resident – coarse grainedOS must guarantee residence – CPU overheadOS must guarantee residence – CPU overhead

WDDM v2.0WDDM v2.0Surface fault – supports load on bindSurface fault – supports load on bind

GPU switches to new context, no stallingGPU switches to new context, no stalling

Fault and stall – permits partial evictionFault and stall – permits partial evictionGPU stalls waiting for missing pageGPU stalls waiting for missing page

WDDM v2.1WDDM v2.1Page fault – permits partial eviction/residencePage fault – permits partial eviction/residence

GPU switches to new context, no stallingGPU switches to new context, no stalling

Multi-Engine, Multi-Engine, Multi-GPU SupportMulti-GPU Support

GPUs are composed of nodes of enginesGPUs are composed of nodes of enginesHomogeneous nodesHomogeneous nodes

3D3D nodes nodesVideoVideo nodes nodesCopyCopy, etc., etc.

RunList per engineRunList per engineGPU Device-common address spaceGPU Device-common address space

Multiple GPU Contexts (per engine)Multiple GPU Contexts (per engine)

Synchronization Synchronization Fence, Trap, Wait, Signal Fence, Trap, Wait, Signal

GPU 3D3Dvideo

Multi-GPUMulti-GPULinked AdapterLinked Adapter

Single logical adapterSingle logical adapterMultiple physical Multiple physical adaptersadapters

MemoryMemoryMirrored or instancedMirrored or instanced

Broadcast – multiple DMA buffer referencesBroadcast – multiple DMA buffer references

Split Frame RenderingSplit Frame Rendering

WDDM v2.x Memory WDDM v2.x Memory Management And Management And RobustnessRobustness

Tim KelleyTim KelleyATIATI

WDDM v1.0 Surface MgmtWDDM v1.0 Surface MgmtAll allocations (surfaces) referenced in DMA buffer All allocations (surfaces) referenced in DMA buffer must be resident at GPU submitmust be resident at GPU submit

Driver tracks every allocation Driver tracks every allocation reference in the DMA bufferreference in the DMA bufferContiguous memory for each allocationContiguous memory for each allocation

DMA buffers patched with physical addresses once DMA buffers patched with physical addresses once surfaces are residentsurfaces are residentDriver defines DMA split Driver defines DMA split points to identify minimal points to identify minimal working setworking setSignificant risk of graphics Significant risk of graphics memory thrashingmemory thrashing

WDDM v2.0 WDDM v2.0 Surface FaultingSurface Faulting

A step in the right directionA step in the right directionGPU supports per process virtual memoryGPU supports per process virtual memoryTwo faulting behaviorsTwo faulting behaviors

Surface fault and context switchSurface fault and context switchPage fault and stallPage fault and stall

In surface faulting, GPU In surface faulting, GPU probes first page of surfaceprobes first page of surfaceOn probe of non-resident surfaceOn probe of non-resident surface

GPU faultsGPU faultsGPU context switches to next run list entryGPU context switches to next run list entry

Context switch is coarse grained; graphics pipeline drainsContext switch is coarse grained; graphics pipeline drains

OS VidMm issues paging requestsOS VidMm issues paging requests

WDDM v2.0 Page WDDM v2.0 Page Fault And StallFault And Stall

Even if surface probe Even if surface probe succeeds, entire surface succeeds, entire surface may not be residentmay not be residentGPU must still support page faultingGPU must still support page faultingOn access to a non-resident pageOn access to a non-resident page

GPU faults and stallsGPU faults and stallsDriver informs OS of missing pagesDriver informs OS of missing pagesOS VidMm issues paging requestsOS VidMm issues paging requestsDriver restarts GPU once pages are residentDriver restarts GPU once pages are resident

Entire working set doesn’t have to Entire working set doesn’t have to be resident simultaneouslybe resident simultaneously

WDDM v2.1 Page FaultingWDDM v2.1 Page FaultingFinally, full fledged page faulting with context switching!Finally, full fledged page faulting with context switching!GPUs support general page faulting and GPUs support general page faulting and virtual memory per processvirtual memory per processOn a page fault, GPU context On a page fault, GPU context switches to next run list entryswitches to next run list entry

Context switch is “immediate”Context switch is “immediate”

OS can partially populate OS can partially populate allocations to reduce an allocations to reduce an app’s working setapp’s working setGPU faults on non-resident page accessGPU faults on non-resident page accessGPU context switches to next run list entryGPU context switches to next run list entry

Dedicated Paging EngineDedicated Paging EngineAddition of high bandwidth copy Addition of high bandwidth copy engine for pagingengine for pagingOperates in parallel to 3D engineOperates in parallel to 3D engineGPU can perform paging operations GPU can perform paging operations for one context in parallel with 3D for one context in parallel with 3D rendering for another contextrendering for another context

Paging DeterminationPaging DeterminationGPU reports faulting addressGPU reports faulting addressGPU/Driver determine set of pages GPU/Driver determine set of pages needed to make further progressneeded to make further progressGPU maintains a set of page access bitsGPU maintains a set of page access bitsOS VidMm uses the above to determine OS VidMm uses the above to determine appropriate paging operations appropriate paging operations (including evictions)(including evictions)Additionally, OS uses heuristics Additionally, OS uses heuristics to preload pagesto preload pages

Efficient Memory ManagementEfficient Memory Management

Steady state residency of surface Steady state residency of surface data for applicationsdata for applicationsNo texture thrashing for apps whose No texture thrashing for apps whose working set fits into graphics memoryworking set fits into graphics memoryNo need for entire surface to be residentNo need for entire surface to be residentApps with large surfaces run fast in Apps with large surfaces run fast in smaller local memory if working set fitssmaller local memory if working set fitsPage access info guides VidMm Page access info guides VidMm eviction and promotioneviction and promotionReduced minimum physical Reduced minimum physical memory requirementsmemory requirements

WDDM v2.x RobustnessWDDM v2.x RobustnessWDDM V2.x increases OS robustnessWDDM V2.x increases OS robustnessGPU uses virtual addressing instead of physicalGPU uses virtual addressing instead of physical

Kernel mode driver (KMD) no longer patches DMA Kernel mode driver (KMD) no longer patches DMA buffers with physical addressesbuffers with physical addresses

User Mode Driver (UMD) builds DMA bufferUser Mode Driver (UMD) builds DMA bufferKMD no longer validates command bufferKMD no longer validates command bufferKMD no longer copies cmd buffer to DMA bufferKMD no longer copies cmd buffer to DMA buffer

No DMA buffer splittingNo DMA buffer splittingUMD no longer identifies split pointsUMD no longer identifies split pointsOS no longer splits DMA buffers to fit resourcesOS no longer splits DMA buffers to fit resources

WDDM v2.1 RobustnessWDDM v2.1 RobustnessGuaranteed sub-triangle context switchingGuaranteed sub-triangle context switchingDriver processing on fault Driver processing on fault essentially eliminatedessentially eliminatedNo application can hog GPUNo application can hog GPUBetter application responsivenessBetter application responsivenessApplications with arbitrarily complex Applications with arbitrarily complex GPU processing do not hinder GPU processing do not hinder other applicationsother applications

E.g., Complex GPGPU number E.g., Complex GPGPU number crunching alongside glitch free videocrunching alongside glitch free video

SecuritySecurityPer-process virtual memoryPer-process virtual memory

Protection moved to GPUProtection moved to GPUPatching eliminated Patching eliminated from driverfrom driver

Privileged OperationsPrivileged OperationsPrivileged memoryPrivileged memoryMore secure platform More secure platform for future premium for future premium content protectioncontent protection

Privileged OperationsPrivileged OperationsDMA buffers created in user mode cannot DMA buffers created in user mode cannot compromise the systemcompromise the system

Can’t access memory belonging to other processesCan’t access memory belonging to other processesCan’t interfere with correct and robust operationCan’t interfere with correct and robust operation

Certain GPU operations are privileged Certain GPU operations are privileged and only available to KMD-built DMA and only available to KMD-built DMA buffers; Examples includebuffers; Examples include

Display settingsDisplay settingsGPU configurationGPU configurationContext switching controlsContext switching controls

UMD-created DMA buffers cannot UMD-created DMA buffers cannot perform privileged operationsperform privileged operations

Privileged MemoryPrivileged MemoryProvides secure location for page tables, ring buffers, Provides secure location for page tables, ring buffers, and other allocations that should be protectedand other allocations that should be protectedMalicious apps cannot compromise system securityMalicious apps cannot compromise system securityGPU maintains per-page privilege setting (in page table)GPU maintains per-page privilege setting (in page table)Fault occurs on GPU access to privileged memory from Fault occurs on GPU access to privileged memory from limited DMA buffers constructed by UMDlimited DMA buffers constructed by UMDGPU access GPU access allowed for allowed for privileged privileged DMA buffers DMA buffers constructed constructed by KMDby KMD

PagePage Table Table

Bad Bad DMA DMA

BufferBuffer

V2.1 GPUV2.1 GPU

Process Process Ring Ring

BufferBuffer

WDDM Future WDDM Future And ConclusionAnd Conclusion

Steve PronovostSteve PronovostMicrosoftMicrosoft

Future: WDDM 3.xFuture: WDDM 3.xAll the features of WDDM v2.1All the features of WDDM v2.1Better support for content streamingBetter support for content streamingVirtual machine supportVirtual machine support

Call To ActionCall To ActionInvest in WDDM v2.x GPUInvest in WDDM v2.x GPUFind new interesting ways Find new interesting ways to use the GPUto use the GPU

Questions Or Feedback?Questions Or Feedback?Send e-mail toSend e-mail toDirectX @ microsoft.comDirectX @ microsoft.com

© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions,

it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Windows Display Driver Model (WDDM) v2 And Beyond

Documents

Transcript of Windows Display Driver Model (WDDM) v2 And Beyond