Windows Display Driver Model (WDDM) v2 And Beyond
-
Upload
trinhquynh -
Category
Documents
-
view
240 -
download
1
Transcript of Windows Display Driver Model (WDDM) v2 And Beyond
Windows Display Driver Windows Display Driver Model (WDDM) v2 Model (WDDM) v2 And BeyondAnd Beyond
Steve Pronovost, MicrosoftSteve Pronovost, MicrosoftHenry Moreton, NVIDIAHenry Moreton, NVIDIATim Kelley, ATITim Kelley, ATI
OutlineOutlineIntroductionIntroduction
Trends in use of GPU(s)Trends in use of GPU(s)
WDDM v1.0 overviewWDDM v1.0 overviewWDDM v.2.x overviewWDDM v.2.x overviewScenarios that benefitScenarios that benefit
Trends In Use Of GPUTrends In Use Of GPUWindows XP: Single client at a timeWindows XP: Single client at a time
GDI desktopGDI desktopVideo decodingVideo decodingFull screen gameFull screen gameCAD/Workstation applicationsCAD/Workstation applications
GPUs getting more flexibleGPUs getting more flexibleDirect3D pushing increased programmability, Direct3D pushing increased programmability, precision and performanceprecision and performanceMassive processing power, not fully Massive processing power, not fully utilized todayutilized today
Trends In Use Of GPUTrends In Use Of GPUWindows Vista: Multiple clients togetherWindows Vista: Multiple clients together
Desktop window managerDesktop window managerWinFX APIs based on Direct3D 9WinFX APIs based on Direct3D 9Picture, video playback, capture, encode, Picture, video playback, capture, encode, transcode, edit leverage GPUstranscode, edit leverage GPUsIn-box gamesIn-box games
Emerging General – Purpose-GPU trendEmerging General – Purpose-GPU trendPhysics, image processing, etc.Physics, image processing, etc.
WDDM v1.0WDDM v1.0Designed to work on Designed to work on existingexisting GPUs GPUsIncrease stability, robustness and securityIncrease stability, robustness and securityGPU schedulingGPU schedulingVirtualized video memoryVirtualized video memoryResource virtualization seamless Resource virtualization seamless across legacy APIacross legacy API
Ddraw, dx3, dx5, dx6, dx7, dx8, dx9, OGL Ddraw, dx3, dx5, dx6, dx7, dx8, dx9, OGL
Use new API to take full advantage of Use new API to take full advantage of resource virtualizationresource virtualization
Direct3D 9Ex, Direct3D 10Direct3D 9Ex, Direct3D 10
WDDM v2.0WDDM v2.0New generation of GPUs designed New generation of GPUs designed for multi-taskingfor multi-taskingMid command buffer preemptionMid command buffer preemptionDemand faulting of resourcesDemand faulting of resources
Surface fault (preferred mode for v2.0)Surface fault (preferred mode for v2.0)Page fault (stall the GPU)Page fault (stall the GPU)
Per process page tablesPer process page tablesBetter multi-tasking than WDDM v1.0,Better multi-tasking than WDDM v1.0,still some client cooperation requiredstill some client cooperation required
WDDM v2.1WDDM v2.1Everything WDDM v2.0 GPU can doEverything WDDM v2.0 GPU can doFine grained context switchingFine grained context switching
Can preempt mid pixelCan preempt mid pixel
Doesn’t stall GPU on page faultDoesn’t stall GPU on page faultTrue preemptive multi-taskingTrue preemptive multi-taskingUltimate flexibility for the GPU Ultimate flexibility for the GPU GPU can be used for any scenarios GPU can be used for any scenarios without impact on the desktopwithout impact on the desktop
WDDM Cheat SheetWDDM Cheat SheetWDDM v1.0WDDM v1.0 WDDM v2.0WDDM v2.0 WDDM v2.1WDDM v2.1
SchedulingScheduling PacketPacket RunListRunList RunListRunList
PreemptionPreemption PacketPacket Mid PacketMid Packet Mid PixelMid Pixel
Demand Demand faultingfaulting
Not supportedNot supported Surface/Surface/Page (STALL)Page (STALL)
PagePage
MemoryMemoryManagementManagement
Physical/ Physical/ ContiguousContiguous
Virtual/ Virtual/ Page tablePage table
Virtual/Virtual/Page tablePage table
Multi-taskingMulti-tasking CooperativeCooperative Mostly Mostly PreemptivePreemptive
Truly Truly PreemptivePreemptive
WDDM 2.x Scheduling, WDDM 2.x Scheduling, Performance AndPerformance AndMulti-GPU SupportMulti-GPU Support
Henry MoretonHenry MoretonNVIDIANVIDIA
GPUs On The DesktopGPUs On The DesktopThe power of the GPU is finally tappedThe power of the GPU is finally tapped
GraphicsGraphicsVideoVideoBandwidth and floating point (GPGPU)Bandwidth and floating point (GPGPU)
Applications are vying for this Applications are vying for this powerful resourcepowerful resource
The Vista Desktop The Vista Desktop Window Manager (DWM)Window Manager (DWM)Photo editingPhoto editingVideo feedsVideo feedsPersonal Video RecorderPersonal Video Recorder
GPU Management Is CrucialGPU Management Is CrucialApplications naturally see the Applications naturally see the processor as their ownprocessor as their ownGreat GPU tasks really exploit the powerGreat GPU tasks really exploit the powerBut...But...
Some GPU operations are so massiveSome GPU operations are so massivethey take non-trivial timethey take non-trivial timeSome GPU operations are time sensitiveSome GPU operations are time sensitive
Management of the GPU is crucial to Management of the GPU is crucial to success (a happy user)success (a happy user)
Watching The Daily ShowWatching The Daily Show©©
Doodling with photosDoodling with photosI find a great program forI find a great program forcreating panoramas...creating panoramas...
TodayTodayI set it up with twelve, I set it up with twelve, 6 mega-pixel images6 mega-pixel imagesPress Press gogo and wait... and wait... a long time (minutes)a long time (minutes)
Soon, with GPU acceleration, I press Soon, with GPU acceleration, I press go and wait a second or twogo and wait a second or two
A Typical Situation (For Me)A Typical Situation (For Me)
But A Second Or But A Second Or Two Is A Long TimeTwo Is A Long Time
Managed as a shared resource the GPUManaged as a shared resource the GPURenders my video unaffectedRenders my video unaffectedBuilds my panorama in no time...Builds my panorama in no time...
UnmanagedUnmanagedThe Daily Show risks being a slide show...The Daily Show risks being a slide show...
So Scheduling Is ImportantSo Scheduling Is ImportantHow does scheduling vary acrossHow does scheduling vary across
WDDM v1.0WDDM v1.0WDDM v2.0WDDM v2.0WDDM v2.1WDDM v2.1
What are What are the mechanics?the mechanics?What is the context What is the context switch behavior?switch behavior?What is expected performance?What is expected performance?
With varying numbers of active contexts...With varying numbers of active contexts...
WDDM v2.x – The Care WDDM v2.x – The Care And Feeding Of The GPUAnd Feeding Of The GPU
User Mode Driver (UMD)User Mode Driver (UMD)Creates DMA buffer of commandsCreates DMA buffer of commands
Kernel Mode Driver (KMD)Kernel Mode Driver (KMD)Appends DMA buffer to GPU context’s queueAppends DMA buffer to GPU context’s queue
The GPU Scheduler schedules contextsThe GPU Scheduler schedules contextsA Run List of contexts each with A Run List of contexts each with its own ring buffer of DMA buffersits own ring buffer of DMA buffers
Run ListsRun ListsList of contexts (box)List of contexts (box)GPU processes GPU processes a context untila context until
Context is completed Context is completed (get new run list)(get new run list)Scheduler pre-emptsScheduler pre-emptsPage fault – WDDM v2.1Page fault – WDDM v2.1Protection faultProtection faultSynchronization eventSynchronization event
Multiple contexts per Run ListMultiple contexts per Run ListHide latencyHide latency
How Nimble Is How Nimble Is Context Switching?Context Switching?
XPXPAll Q’d DP2 buffers must completeAll Q’d DP2 buffers must complete(very coarse)(very coarse)
WDDM v1.0 – Basic schedulingWDDM v1.0 – Basic schedulingCurrent DMA buffer Current DMA buffer must complete (coarse)must complete (coarse)
WDDM v2.0WDDM v2.0Switch on command/triangle (fine)Switch on command/triangle (fine)
WDDM v2.1WDDM v2.1Switch “immediately” (very fine)Switch “immediately” (very fine)
Context Switch GuaranteesContext Switch GuaranteesPre WDDM v2.1 (XP, v1.0, v2.0)Pre WDDM v2.1 (XP, v1.0, v2.0)
No guaranteeNo guaranteeVERY long shader, VERY large triangle slow to switchVERY long shader, VERY large triangle slow to switch
expected performanceexpected performanceRelatively coarse switching for XP and v1.0Relatively coarse switching for XP and v1.0V2.0: Good average/typical switch time V2.0: Good average/typical switch time
WDDM v2.1WDDM v2.1Guaranteed to context switchGuaranteed to context switchSame average/typical switch time as v2.0Same average/typical switch time as v2.0Much better switch time on applications Much better switch time on applications with long shaderswith long shaders
Context Switch ChallengeContext Switch ChallengeBecause GPUs are heavily threaded Because GPUs are heavily threaded there is much more state than on a CPUthere is much more state than on a CPUConsider rendering @ 60 fpsConsider rendering @ 60 fps
17 millisecond frame time17 millisecond frame time
With a context switch time of 100µsWith a context switch time of 100µsThree concurrent applications see Three concurrent applications see a ~2% context switch overheada ~2% context switch overheadFast GPU context switching is Fast GPU context switching is important and challenging!important and challenging!
WDDM v2.x EfficienciesWDDM v2.x EfficienciesWDDM v1.0WDDM v1.0
User Mode Driver (UMD) creates User Mode Driver (UMD) creates GPU-specific command bufferGPU-specific command bufferKMD patches addressesKMD patches addressesCopies to GPU visible DMA bufferCopies to GPU visible DMA buffer
WDDM v2.0 and 2.1WDDM v2.0 and 2.1UMD creates DMA buffer directly UMD creates DMA buffer directly in GPU memoryin GPU memoryNo copy, no patch, fast and efficientNo copy, no patch, fast and efficient
Performance – Performance – Memory FootprintMemory Footprint
WDDM v1.0WDDM v1.0No demand fault (page or surface)No demand fault (page or surface)Entire surfaces resident – coarse grainedEntire surfaces resident – coarse grainedOS must guarantee residence – CPU overheadOS must guarantee residence – CPU overhead
WDDM v2.0WDDM v2.0Surface fault – supports load on bindSurface fault – supports load on bind
GPU switches to new context, no stallingGPU switches to new context, no stalling
Fault and stall – permits partial evictionFault and stall – permits partial evictionGPU stalls waiting for missing pageGPU stalls waiting for missing page
WDDM v2.1WDDM v2.1Page fault – permits partial eviction/residencePage fault – permits partial eviction/residence
GPU switches to new context, no stallingGPU switches to new context, no stalling
Multi-Engine, Multi-Engine, Multi-GPU SupportMulti-GPU Support
GPUs are composed of nodes of enginesGPUs are composed of nodes of enginesHomogeneous nodesHomogeneous nodes
3D3D nodes nodesVideoVideo nodes nodesCopyCopy, etc., etc.
RunList per engineRunList per engineGPU Device-common address spaceGPU Device-common address space
Multiple GPU Contexts (per engine)Multiple GPU Contexts (per engine)
Synchronization Synchronization Fence, Trap, Wait, Signal Fence, Trap, Wait, Signal
GPU 3D3Dvideo
Multi-GPUMulti-GPULinked AdapterLinked Adapter
Single logical adapterSingle logical adapterMultiple physical Multiple physical adaptersadapters
MemoryMemoryMirrored or instancedMirrored or instanced
Broadcast – multiple DMA buffer referencesBroadcast – multiple DMA buffer references
Split Frame RenderingSplit Frame Rendering
WDDM v2.x Memory WDDM v2.x Memory Management And Management And RobustnessRobustness
Tim KelleyTim KelleyATIATI
WDDM v1.0 Surface MgmtWDDM v1.0 Surface MgmtAll allocations (surfaces) referenced in DMA buffer All allocations (surfaces) referenced in DMA buffer must be resident at GPU submitmust be resident at GPU submit
Driver tracks every allocation Driver tracks every allocation reference in the DMA bufferreference in the DMA bufferContiguous memory for each allocationContiguous memory for each allocation
DMA buffers patched with physical addresses once DMA buffers patched with physical addresses once surfaces are residentsurfaces are residentDriver defines DMA split Driver defines DMA split points to identify minimal points to identify minimal working setworking setSignificant risk of graphics Significant risk of graphics memory thrashingmemory thrashing
WDDM v2.0 WDDM v2.0 Surface FaultingSurface Faulting
A step in the right directionA step in the right directionGPU supports per process virtual memoryGPU supports per process virtual memoryTwo faulting behaviorsTwo faulting behaviors
Surface fault and context switchSurface fault and context switchPage fault and stallPage fault and stall
In surface faulting, GPU In surface faulting, GPU probes first page of surfaceprobes first page of surfaceOn probe of non-resident surfaceOn probe of non-resident surface
GPU faultsGPU faultsGPU context switches to next run list entryGPU context switches to next run list entry
Context switch is coarse grained; graphics pipeline drainsContext switch is coarse grained; graphics pipeline drains
OS VidMm issues paging requestsOS VidMm issues paging requests
WDDM v2.0 Page WDDM v2.0 Page Fault And StallFault And Stall
Even if surface probe Even if surface probe succeeds, entire surface succeeds, entire surface may not be residentmay not be residentGPU must still support page faultingGPU must still support page faultingOn access to a non-resident pageOn access to a non-resident page
GPU faults and stallsGPU faults and stallsDriver informs OS of missing pagesDriver informs OS of missing pagesOS VidMm issues paging requestsOS VidMm issues paging requestsDriver restarts GPU once pages are residentDriver restarts GPU once pages are resident
Entire working set doesn’t have to Entire working set doesn’t have to be resident simultaneouslybe resident simultaneously
WDDM v2.1 Page FaultingWDDM v2.1 Page FaultingFinally, full fledged page faulting with context switching!Finally, full fledged page faulting with context switching!GPUs support general page faulting and GPUs support general page faulting and virtual memory per processvirtual memory per processOn a page fault, GPU context On a page fault, GPU context switches to next run list entryswitches to next run list entry
Context switch is “immediate”Context switch is “immediate”
OS can partially populate OS can partially populate allocations to reduce an allocations to reduce an app’s working setapp’s working setGPU faults on non-resident page accessGPU faults on non-resident page accessGPU context switches to next run list entryGPU context switches to next run list entry
Dedicated Paging EngineDedicated Paging EngineAddition of high bandwidth copy Addition of high bandwidth copy engine for pagingengine for pagingOperates in parallel to 3D engineOperates in parallel to 3D engineGPU can perform paging operations GPU can perform paging operations for one context in parallel with 3D for one context in parallel with 3D rendering for another contextrendering for another context
Paging DeterminationPaging DeterminationGPU reports faulting addressGPU reports faulting addressGPU/Driver determine set of pages GPU/Driver determine set of pages needed to make further progressneeded to make further progressGPU maintains a set of page access bitsGPU maintains a set of page access bitsOS VidMm uses the above to determine OS VidMm uses the above to determine appropriate paging operations appropriate paging operations (including evictions)(including evictions)Additionally, OS uses heuristics Additionally, OS uses heuristics to preload pagesto preload pages
Efficient Memory ManagementEfficient Memory Management
Steady state residency of surface Steady state residency of surface data for applicationsdata for applicationsNo texture thrashing for apps whose No texture thrashing for apps whose working set fits into graphics memoryworking set fits into graphics memoryNo need for entire surface to be residentNo need for entire surface to be residentApps with large surfaces run fast in Apps with large surfaces run fast in smaller local memory if working set fitssmaller local memory if working set fitsPage access info guides VidMm Page access info guides VidMm eviction and promotioneviction and promotionReduced minimum physical Reduced minimum physical memory requirementsmemory requirements
WDDM v2.x RobustnessWDDM v2.x RobustnessWDDM V2.x increases OS robustnessWDDM V2.x increases OS robustnessGPU uses virtual addressing instead of physicalGPU uses virtual addressing instead of physical
Kernel mode driver (KMD) no longer patches DMA Kernel mode driver (KMD) no longer patches DMA buffers with physical addressesbuffers with physical addresses
User Mode Driver (UMD) builds DMA bufferUser Mode Driver (UMD) builds DMA bufferKMD no longer validates command bufferKMD no longer validates command bufferKMD no longer copies cmd buffer to DMA bufferKMD no longer copies cmd buffer to DMA buffer
No DMA buffer splittingNo DMA buffer splittingUMD no longer identifies split pointsUMD no longer identifies split pointsOS no longer splits DMA buffers to fit resourcesOS no longer splits DMA buffers to fit resources
WDDM v2.1 RobustnessWDDM v2.1 RobustnessGuaranteed sub-triangle context switchingGuaranteed sub-triangle context switchingDriver processing on fault Driver processing on fault essentially eliminatedessentially eliminatedNo application can hog GPUNo application can hog GPUBetter application responsivenessBetter application responsivenessApplications with arbitrarily complex Applications with arbitrarily complex GPU processing do not hinder GPU processing do not hinder other applicationsother applications
E.g., Complex GPGPU number E.g., Complex GPGPU number crunching alongside glitch free videocrunching alongside glitch free video
SecuritySecurityPer-process virtual memoryPer-process virtual memory
Protection moved to GPUProtection moved to GPUPatching eliminated Patching eliminated from driverfrom driver
Privileged OperationsPrivileged OperationsPrivileged memoryPrivileged memoryMore secure platform More secure platform for future premium for future premium content protectioncontent protection
Privileged OperationsPrivileged OperationsDMA buffers created in user mode cannot DMA buffers created in user mode cannot compromise the systemcompromise the system
Can’t access memory belonging to other processesCan’t access memory belonging to other processesCan’t interfere with correct and robust operationCan’t interfere with correct and robust operation
Certain GPU operations are privileged Certain GPU operations are privileged and only available to KMD-built DMA and only available to KMD-built DMA buffers; Examples includebuffers; Examples include
Display settingsDisplay settingsGPU configurationGPU configurationContext switching controlsContext switching controls
UMD-created DMA buffers cannot UMD-created DMA buffers cannot perform privileged operationsperform privileged operations
Privileged MemoryPrivileged MemoryProvides secure location for page tables, ring buffers, Provides secure location for page tables, ring buffers, and other allocations that should be protectedand other allocations that should be protectedMalicious apps cannot compromise system securityMalicious apps cannot compromise system securityGPU maintains per-page privilege setting (in page table)GPU maintains per-page privilege setting (in page table)Fault occurs on GPU access to privileged memory from Fault occurs on GPU access to privileged memory from limited DMA buffers constructed by UMDlimited DMA buffers constructed by UMDGPU access GPU access allowed for allowed for privileged privileged DMA buffers DMA buffers constructed constructed by KMDby KMD
PagePage Table Table
Bad Bad DMA DMA
BufferBuffer
V2.1 GPUV2.1 GPU
Process Process Ring Ring
BufferBuffer
WDDM Future WDDM Future And ConclusionAnd Conclusion
Steve PronovostSteve PronovostMicrosoftMicrosoft
Future: WDDM 3.xFuture: WDDM 3.xAll the features of WDDM v2.1All the features of WDDM v2.1Better support for content streamingBetter support for content streamingVirtual machine supportVirtual machine support
Call To ActionCall To ActionInvest in WDDM v2.x GPUInvest in WDDM v2.x GPUFind new interesting ways Find new interesting ways to use the GPUto use the GPU
Questions Or Feedback?Questions Or Feedback?Send e-mail toSend e-mail toDirectX @ microsoft.comDirectX @ microsoft.com
© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions,
it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.