Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift
description
Transcript of Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift
HotPar 20121
Operating Systems Should Manage
AcceleratorsSankaralingam Panneerselvam
Michael M. SwiftComputer Sciences Department
University of Wisconsin, Madison, WI
HotPar 20122
Accelerators Specialized execution
units for common tasks Data parallel tasks,
encryption, video encoding, XML parsing, network processing etc.
Task offloaded to accelerators High performance,
energy efficient (or both)
GPU
Wire speed processor
APU
DySER
Crypto accelerator
HotPar 20123
Why accelerators? Moore’s law and Dennard’s scaling
Single core to Multi-core Dark Silicon: More transistors but limited
power budget [Esmaeilzadeh et. al - ISCA ‘11]
Accelerators Trade-off area for specialized logic Performance up by 250x and energy
efficiency by 500x [Hameed et. al – ISCA ’10]
Heterogeneous architectures
HotPar 20124
Heterogeneity everywhere Focus on diverse heterogeneous systems
Processing elements with different properties in the same system
Our work How can OS support this architecture?
1. Abstract heterogeneity2. Enable flexible task execution3. Multiplex shared accelerators among
applications
HotPar 20125
Outline Motivation Classification of accelerators Challenges Our model - Rinnegan Conclusion
HotPar 20126
IBM wire-speed processorAccelerator
ComplexCPU cores
Memory controlle
r
HotPar 20127
Classification of accelerators Accelerators classified based on their
accessibilityAcceleration devicesCo-ProcessorAsymmetric cores
HotPar 20128
ClassificationProperties Acceleration
devicesCo-Processor Asymmetric
cores
Systems GPU, APU, Crypto accelerators in SunNiagara chips etc.
C-cores, DySER, IBM wire-speed processor
NVIDIA’s Kal-El processor, Scalable cores like WiDGET
Location Off/On-chip device
On-chip device Regular core that can execute threads
Accessibility
Control Device drivers ISA instructions Thread migration, RPC
Data DMA or zero copy System memory Cache coherency
Resource contention
Multiple applications might want to make use of the device
Sharing of resource across multiple cores could result in contention
Multiple threads might want toaccess the special core
1) Application must deal with different classes of accelerators with different access mechanisms
2) Resources to be shared among multiple applications
HotPar 20129
Challenges Task invocation
Execution on different processing elements
Virtualization Sharing across multiple applications
Scheduling Time multiplexing the accelerators
HotPar 201210
Task invocation Current systems decide
statically on where to execute the task
Programmer problems where system can help Data granularity Power budget limitation Presence/availability of
acceleratorAES
Crypto accelerator
HotPar 201211
Virtualization Data addressing for
accelerators Accelerators need to
understand virtual addresses
OS must provide virtual-address translations
Process preemption after launching a computation System must be aware of
computation completion
0x000 0x1000x200 0x500 0x4000x480
00x4900 0x5000
Virtual memor
ycryptoXML
cryp
t 0
x240
Acce
lera
tor
uni
ts
CPU cores
HotPar 201212
Scheduling Scheduling needed to
prioritize access to shared accelerators
User-mode access to accelerators complicates scheduling decisions OS cannot interpose on
every request
HotPar 201213
Outline Motivation Classification of Accelerators Challenges Our model - Rinnegan Conclusion
HotPar 201214
Rinnegan Goals
Application leverage the heterogeneity OS enforces system wide policies
Rinnegan tries to provide support for different class of accelerators Flexible task execution by Accelerator
stub Accelerator agent for safe multiplexing Enforcing system policy through
Accelerator monitor
HotPar 201215
Accelerator stub Entry point for accelerator
invocation and a dispatcher Abstracts different
processing elements Binding - Selecting the best
implementation for the taskStatic: during compilationEarly: at process startLate: at call
Process making use
of accelerator
Stub
{Encrypt through
AES instructio
ns}
{Offload
to crypto Accelerat
or}
HotPar 201216
Accelerator agent Manages an
accelerator Roles of an agent
Virtualize accelerator Forwarding requests
to the accelerator Implement
scheduling decisions Expose accounting
information to the OS
Resource allocator
Process trying to offload a parallel task
Stub
User Space
Kernel Space
Asymmetric core
Thread migration
Device driver
Process can communicate directly with accelerator
Stub
Acce
lera
tor
agen
ts
Hardware
HotPar 201217
Accelerator monitor Monitors information
like utilization of resources
Responsible for system-wide energy and performance goals
Acts as an online modeling tool Helps in deciding where
to schedule the task
Accelerator agent
Process making use of
accelerator
Stub
Accelerator
monitor
User Space
Kernel Space
Accelerator
Hardware
HotPar 201218
Programming model Based on task level
parallelism Task creation and
execution Write: different
implementations Launch: similar to function
invocation Fetching result: supporting
synchronous/asynchronous invocation
Example: Intel TBB/Cilk
application(){ task = cryptStub(inputData, outputData); ... waitFor(task);}
<TaskHandle> cryptStub(input, output){ if(factor1 == true) output = implmn_AES(input) //or// else if(factor2 == true) output = implmn_CACC(input)}
HotPar 201219
Summary
Accelerator agent
Process making use of accelerator
Stub
Accelerator monitor
Other kernel components
Process User Space
Kernel Space
CPU cores Accelerator
Direct communication with accelerator
HotPar 201220
Related work Task invocation based on data
granularityMerge, Harmony
Generating device specific implementation of same task
GPUOcelot Resource scheduling: Sharing
between multiple applicationsPTask, Pegasus
HotPar 201221
Conclusions Accelerators common in future
systems Rinnegan embraces accelerators
through stub mechanism, agent and monitoring components
Other challenges Data movement between devices Power measurement
HotPar 201222
Thank You