Operating Systems must support GPU abstractions
description
Transcript of Operating Systems must support GPU abstractions
![Page 1: Operating Systems must support GPU abstractions](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816374550346895dd45004/html5/thumbnails/1.jpg)
Operating Systems must support GPU abstractions
Chris Rossbach, Microsoft ResearchJon Currey, Microsoft Research
Emmett Witchel, University of Texas at AustinHotOS 2011
![Page 2: Operating Systems must support GPU abstractions](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816374550346895dd45004/html5/thumbnails/2.jpg)
Lots of GPUsMust they be so hard to use?We need dataflow…
GPU Haiku (apropos 10 min talks)
![Page 3: Operating Systems must support GPU abstractions](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816374550346895dd45004/html5/thumbnails/3.jpg)
Lots of GPUsMust they be so hard to use?We need dataflow…
…support in the OS
GPU Haiku (apropos 10 min talks)
![Page 4: Operating Systems must support GPU abstractions](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816374550346895dd45004/html5/thumbnails/4.jpg)
There are lots of GPUs!◦ ~ more powerful than CPUs◦ Great for Halo <X> and HPC, but little else◦ Underutilized
GPUs are difficult to program◦ SIMD execution model◦ Cannot access main memory◦ Treated as I/O device by OS
Motivation and Agenda
![Page 5: Operating Systems must support GPU abstractions](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816374550346895dd45004/html5/thumbnails/5.jpg)
There are lots of GPUs!◦ ~ more powerful than CPUs◦ Great for Halo <X> and HPC, but little else◦ Underutilized
GPUs are difficult to program◦ SIMD execution model◦ Cannot access main memory◦ Treated as I/O device by OS
Motivation and Agenda
A. These two things are related
B. We need OS abstractions (dataflow)
![Page 6: Operating Systems must support GPU abstractions](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816374550346895dd45004/html5/thumbnails/6.jpg)
Traditional OS-Level abstractions
programmer-visible interface
OS-level abstractionsHardware interface
![Page 7: Operating Systems must support GPU abstractions](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816374550346895dd45004/html5/thumbnails/7.jpg)
GPU Abstractions
programmer-visible interface
1 OS-level abstraction!
The programmer gets to work with great abstractions… Why is this a problem?
![Page 8: Operating Systems must support GPU abstractions](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816374550346895dd45004/html5/thumbnails/8.jpg)
We expect traditional OS guarantees:◦ Fairness◦ IsolationNo user-space runtime can provide these!
No kernel-facing interface◦ The OS cannot use the GPU◦ OS cannot manage the GPU
Lost optimization opportunities◦ Suboptimal data movement◦ Poor composability
Why isn’t ioctl() enough?
![Page 9: Operating Systems must support GPU abstractions](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816374550346895dd45004/html5/thumbnails/9.jpg)
CPU-bound processes hurt GPUs
• Windows 7 x64 8GB RAM• Intel Core 2 Quad 2.66GHz• nVidia GeForce GT230
invo
catio
ns p
er s
econ
d
Higher is better no CPU load high CPU load0
200400600800
10001200
CUDA benchmark throughput
CPU scheduler and GPU scheduler not integrated!
![Page 10: Operating Systems must support GPU abstractions](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816374550346895dd45004/html5/thumbnails/10.jpg)
GPU-bound processes hurt CPUs
• Windows 7 x64 8GB RAM• Intel Core 2 Quad 2.66GHz• nVidia GeForce GT230
Flatter lines Are better
![Page 11: Operating Systems must support GPU abstractions](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816374550346895dd45004/html5/thumbnails/11.jpg)
Pipes between filter and detectmove data to and from GPU evenwhen it’s already there
Composability: Gestural Interface
capture
detectfilter
Point cloud
“Hand” events
Raw images
HIDInputOS
#> capture | filter | detect | hidinput &
• Data crossing u/k boundary• Double-buffering between camera drivers and GPU drivers
![Page 12: Operating Systems must support GPU abstractions](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816374550346895dd45004/html5/thumbnails/12.jpg)
Process API analogues IPC API analogues Scheduler hint analogues Abstractions that enable:
◦ Composition◦ Data movement optimization◦ Easier programming
Meaningful GPGPU impliesGPUs should be managed like CPUs
![Page 13: Operating Systems must support GPU abstractions](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816374550346895dd45004/html5/thumbnails/13.jpg)
ptask (parallel task) ◦ Have priority for fairness◦ Analogous to a process for GPU execution◦ List of input/output resources (e.g. stdin, stdout…)
ports◦ Can be mapped to ptask input/outputs◦ A data source or sink (e.g. buffer in GPU memory)
channels◦ Similar to pipes◦ Connect arbitrary ports◦ Specialize to eliminate double-buffering
OS abstractions: dataflow!
![Page 14: Operating Systems must support GPU abstractions](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816374550346895dd45004/html5/thumbnails/14.jpg)
Gestural interface revisited
ptask:detect
process:hidinput
process:capture
usbsrc
hid_inhands
Computation expressed as a graph• Synthesis [Masselin 89] (streams, pumps)• Dryad [Isard 07]• SteamIt [Thies 02]• Offcodes [Weinsberg 08]• others…
ptask:filter
cloud
rawimg
det_inp
= process
= ptask
= port
= channel
![Page 15: Operating Systems must support GPU abstractions](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816374550346895dd45004/html5/thumbnails/15.jpg)
Gestural interface revisited
ptask:detect
process:hidinput
process:capture
usbsrc
hid_inhandsptask:filter
cloud
rawimg
det_inp
= process
= ptask
= port
= channel
USBGPU mem
GPU mem GPU mem
• Eliminate unnecessary communication…
![Page 16: Operating Systems must support GPU abstractions](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816374550346895dd45004/html5/thumbnails/16.jpg)
Gestural interface revisited
ptask:detect
process:hidinput
process:capture
usbsrc
hid_inhandsptask:filter
cloud
rawimg
det_inp
= process
= ptask
= port
= channel• Eliminates unnecessary communication• Eliminates u/k crossings, computation
New data triggers new computation
![Page 17: Operating Systems must support GPU abstractions](https://reader036.fdocuments.us/reader036/viewer/2022062323/56816374550346895dd45004/html5/thumbnails/17.jpg)
OS must get involved in GPU support Current approaches:
◦ Require wasteful data movement◦ Inhibit modularity/reuse◦ Cannot guarantee fairness, isolation
OS-level abstractions are required
Conclusions
Questions?