Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer...

22
Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer Sciences Department University of Wisconsin, Madison, WI HotPar 2012 1

Transcript of Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer...

HotPar 20121

Operating Systems Should Manage

Accelerators

Sankaralingam Panneerselvam Michael M. Swift

Computer Sciences DepartmentUniversity of Wisconsin, Madison, WI

HotPar 20122

Accelerators Specialized execution

units for common tasks Data parallel tasks,

encryption, video encoding, XML parsing, network processing etc.

Task offloaded to accelerators High performance,

energy efficient (or both)

GPU

Wire speed processor

APU

DySER

Crypto accelerator

HotPar 20123

Why accelerators? Moore’s law and Dennard’s scaling

Single core to Multi-core Dark Silicon: More transistors but limited

power budget [Esmaeilzadeh et. al - ISCA ‘11]

Accelerators Trade-off area for specialized logic Performance up by 250x and energy

efficiency by 500x [Hameed et. al – ISCA ’10]

Heterogeneous architectures

HotPar 20124

Heterogeneity everywhere Focus on diverse heterogeneous

systems Processing elements with different properties

in the same system

Our work How can OS support this architecture?

1. Abstract heterogeneity2. Enable flexible task execution3. Multiplex shared accelerators among

applications

HotPar 20125

Outline Motivation Classification of accelerators Challenges Our model - Rinnegan Conclusion

HotPar 20126

IBM wire-speed processorAccelerator

ComplexCPU cores

Memory controlle

r

HotPar 20127

Classification of accelerators Accelerators classified based on their

accessibilityAcceleration devicesCo-ProcessorAsymmetric cores

HotPar 20128

ClassificationProperties Acceleration

devicesCo-Processor Asymmetric

cores

Systems GPU, APU, Crypto accelerators in SunNiagara chips etc.

C-cores, DySER, IBM wire-speed processor

NVIDIA’s Kal-El processor, Scalable cores like WiDGET

Location Off/On-chip device

On-chip device Regular core that can execute threads

Accessibility

Control Device drivers ISA instructions Thread migration, RPC

Data DMA or zero copy System memory Cache coherency

Resource contention

Multiple applications might want to make use of the device

Sharing of resource across multiple cores could result in contention

Multiple threads might want toaccess the special core

1) Application must deal with different classes of accelerators with different access mechanisms

2) Resources to be shared among multiple applications

HotPar 20129

Challenges Task invocation

Execution on different processing elements

Virtualization Sharing across multiple applications

Scheduling Time multiplexing the accelerators

HotPar 201210

Task invocation Current systems decide

statically on where to execute the task

Programmer problems where system can help Data granularity Power budget limitation Presence/availability of

accelerator

AES

Crypto accelerator

HotPar 201211

Virtualization Data addressing for

accelerators Accelerators need to

understand virtual addresses

OS must provide virtual-address translations

Process preemption after launching a computation System must be aware of

computation completion

0x0

00 0

x1

000

x2

00 0

x5

00 0

x4

000x4

80

00x4

90

0 0x5

00

0

Virtual memor

y

crypto

XML

cryp

t

0x240

Acc

ele

rat

or

unit

s

CPU cores

HotPar 201212

Scheduling Scheduling needed to

prioritize access to shared accelerators

User-mode access to accelerators complicates scheduling decisions OS cannot interpose on

every request

HotPar 201213

Outline Motivation Classification of Accelerators Challenges Our model - Rinnegan Conclusion

HotPar 201214

Rinnegan Goals

Application leverage the heterogeneity OS enforces system wide policies

Rinnegan tries to provide support for different class of accelerators Flexible task execution by Accelerator

stub Accelerator agent for safe multiplexing Enforcing system policy through

Accelerator monitor

HotPar 201215

Accelerator stub Entry point for accelerator

invocation and a dispatcher Abstracts different

processing elements Binding - Selecting the best

implementation for the task

Static: during compilation Early: at process start Late: at call

Process making use

of accelerator

Stub

{Encrypt through

AES instructio

ns}

{Offload

to crypto Accelerat

or}

HotPar 201216

Accelerator agent Manages an

accelerator Roles of an agent

Virtualize accelerator Forwarding requests

to the accelerator Implement

scheduling decisions Expose accounting

information to the OS

Resource allocator

Process trying to offload a parallel task

Stub

User Space

Kernel Space

Asymmetric core

Thread migration

Device driver

Process can communicate directly with accelerator

Stub

Acc

ele

rato

r ag

ents

Hardware

HotPar 201217

Accelerator monitor Monitors information

like utilization of resources

Responsible for system-wide energy and performance goals

Acts as an online modeling tool Helps in deciding where

to schedule the task

Accelerator agent

Process making use of

accelerator

Stub

Accelerator

monitor

User Space

Kernel Space

Accelerator

Hardware

HotPar 201218

Programming model Based on task level

parallelism Task creation and

execution Write: different

implementations Launch: similar to function

invocation Fetching result: supporting

synchronous/asynchronous invocation

Example: Intel TBB/Cilk

application(){ task = cryptStub(inputData, outputData); ... waitFor(task);}

<TaskHandle> cryptStub(input, output){ if(factor1 == true) output = implmn_AES(input) //or// else if(factor2 == true) output = implmn_CACC(input)}

HotPar 201219

Summary

Accelerator agent

Process making use of accelerator

Stub

Accelerator monitor

Other kernel components

Process User Space

Kernel Space

CPU cores Accelerator

Direct communication with accelerator

HotPar 201220

Related work Task invocation based on data

granularityMerge, Harmony

Generating device specific implementation of same task

GPUOcelot Resource scheduling: Sharing

between multiple applicationsPTask, Pegasus

HotPar 201221

Conclusions Accelerators common in future

systems Rinnegan embraces accelerators

through stub mechanism, agent and monitoring components

Other challenges Data movement between devices Power measurement

HotPar 201222

Thank You