Microkernel Construction - Case Study:...

141
Microkernel Construction Case Study: M 3 Nils Asmussen July 4th 2019 1 / 58

Transcript of Microkernel Construction - Case Study:...

Page 1: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Microkernel ConstructionCase Study: M3

Nils Asmussen

July 4th 2019

1 / 58

Page 2: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Heterogeneous Systems

2 / 58

Page 3: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Heterogeneous Systems

2 / 58

Page 4: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Heterogeneous Systems

2 / 58

Page 5: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Why?

memcached: FPGA-based implementation is 16 times be�er in performance perwa� than Atom CPU [1]

machine learning: custom accelerator is 20% faster than GPU andrequires 128 times less energy [2]

[1] Thin servers with smart pipes: Designing SoC accelerators for memcached, ISCA’13[2] PuDianNao: A polyvalent machine learning accelerator, ASPLOS’15

3 / 58

Page 6: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Future Platforms: Problems for Operating Systems

ARM

ARM

x86

x86

FFT

DSP

GPU

TPU

Kernel

Kernel

Kernel

Kernel

4 / 58

Page 7: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Future Platforms: Problems for Operating Systems

ARM

ARM

x86

x86

FFT

DSP

GPU

TPU

Kernel

Kernel

Kernel

Kernel

4 / 58

Page 8: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Future Platforms: Problems for Operating Systems

ARM

ARM

x86

x86

FFT

DSP

GPU

TPU

Kernel

Kernel

Kernel

Kernel

4 / 58

Page 9: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Future Platforms: Problems for Operating Systems

ARM

ARM

x86

x86

FFT

DSP

GPU

TPU

Kernel

Kernel

Kernel

Kernel

4 / 58

Page 10: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Future Platforms: Problems for Operating Systems

ARM

ARM

x86

x86

FFT

DSP

GPU

TPU

Kernel

Kernel

Kernel

Kernel

4 / 58

Page 11: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Future Platforms: Problems for Operating Systems

ARM

ARM

x86

x86

FFT

DSP

GPU

TPU

Kernel

Kernel

Kernel

Kernel

4 / 58

Page 12: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Future Platforms: Problems for Operating Systems

ARM

ARM

x86

x86

FFT

DSP

GPU

TPU

Kernel

Kernel

Kernel

Kernel

4 / 58

Page 13: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Future Platforms: Problems for Operating Systems

ARM

ARM

x86

x86

FFT

DSP

GPU

TPU

Kernel

Kernel

Kernel

Kernel

4 / 58

Page 14: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Related Work

Isolation of componentsDPU, NoC-MPU

IOMMUs

First-class handling of one specific accelerator

GPUfs, GPUnet, PTask

ReconOS, BORPH

OSes for heterogeneous systems

Barrelfish

Popcorn Linux, K2

Helios

5 / 58

Page 15: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Related Work

Isolation of componentsDPU, NoC-MPU

IOMMUs

First-class handling of one specific accelerator

GPUfs, GPUnet, PTask

ReconOS, BORPH

OSes for heterogeneous systems

Barrelfish

Popcorn Linux, K2

Helios

5 / 58

Page 16: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Related Work

Isolation of componentsDPU, NoC-MPU

IOMMUs

First-class handling of one specific accelerator

GPUfs, GPUnet, PTask

ReconOS, BORPH

OSes for heterogeneous systems

Barrelfish

Popcorn Linux, K2

Helios5 / 58

Page 17: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

What If We Could Change Hardware?

Can we design a system that integrates all types of

untrusted compute units as first-class citizens?

6 / 58

Page 18: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Goals for First-class Citizens

Prevent harm by untrusted compute units (CUs)

Access operating-system services by all CUs

Direct communication between all CUs

Context switching support for all CUs

7 / 58

Page 19: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Outline

1 Overall System Architecture

2 Prototype Platforms

3 Isolation and Communication

4 Capabilities

5 OS Services and Accelerators

6 Virtual Memory

7 Context Switching

8 Evaluation

8 / 58

Page 20: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Outline

1 Overall System Architecture

2 Prototype Platforms

3 Isolation and Communication

4 Capabilities

5 OS Services and Accelerators

6 Virtual Memory

7 Context Switching

8 Evaluation

9 / 58

Page 21: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Hardware/Operating System Co-Design

ARM

x86

FFT

DSP

FPGA

TPU

DTU DTU DTU

DTU DTU DTU

ARM

x86

FFT

DSP

FPGA

TPU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

User PE

CU

DTU

Kernel PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

Kernel

App App

App

App

App

Key Ideas:

Minimize changes toexisting components

Add uniform interface

Kernel controls user PEsremotely

Direct communication

10 / 58

Page 22: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Hardware/Operating System Co-Design

ARM

x86

FFT

DSP

FPGA

TPU

DTU DTU DTU

DTU DTU DTU

ARM

x86

FFT

DSP

FPGA

TPU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

User PE

CU

DTU

Kernel PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

Kernel

App App

App

App

App

Key Ideas:

Minimize changes toexisting components

Add uniform interface

Kernel controls user PEsremotely

Direct communication

10 / 58

Page 23: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Hardware/Operating System Co-Design

ARM

x86

FFT

DSP

FPGA

TPU

DTU DTU DTU

DTU DTU DTU

ARM

x86

FFT

DSP

FPGA

TPU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

User PE

CU

DTU

Kernel PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

Kernel

App App

App

App

App

Key Ideas:

Minimize changes toexisting components

Add uniform interface

Kernel controls user PEsremotely

Direct communication

10 / 58

Page 24: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Hardware/Operating System Co-Design

ARM

x86

FFT

DSP

FPGA

TPU

DTU DTU DTU

DTU DTU DTU

ARM

x86

FFT

DSP

FPGA

TPU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

User PE

CU

DTU

Kernel PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

Kernel

App App

App

App

App Key Ideas:

Minimize changes toexisting components

Add uniform interface

Kernel controls user PEsremotely

Direct communication

10 / 58

Page 25: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Hardware/Operating System Co-Design

ARM

x86

FFT

DSP

FPGA

TPU

DTU DTU DTU

DTU DTU DTU

ARM

x86

FFT

DSP

FPGA

TPU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

PE

CU

DTU

User PE

CU

DTU

Kernel PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

Kernel

App App

App

App

App Key Ideas:

Minimize changes toexisting components

Add uniform interface

Kernel controls user PEsremotely

Direct communication

10 / 58

Page 26: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Outline

1 Overall System Architecture

2 Prototype Platforms

3 Isolation and Communication

4 Capabilities

5 OS Services and Accelerators

6 Virtual Memory

7 Context Switching

8 Evaluation

11 / 58

Page 27: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Tomahawk

Xtensa LX4

Instr.SPM

DataSPM

DTU

PEPEPE

PE

PE PE

PE

DRAM

RRR

R R R

RRR

PE

MemCtrl.

PEs have no OS support:

No privileged mode

No MMU, no caches, but SPM

T2: simple DTU; T4: most features12 / 58

Page 28: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Linux

M3 runs on Linux using it as a virtual machine

A process simulates a PE, having two threads (CPU + DTU)

DTUs communicate over UNIX domain socketsNo accuracy because

I Programs are directly executed on hostI Data transfers have huge overhead compared to HW

Very useful for debugging and early prototyping

13 / 58

Page 29: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

gem5

Modular platform for computer architecture research

Supports various ISAs (x86, ARM, Alpha, RISC-V, . . . )

Provides detailed CPU and memory models

Cycle-accurate simulation

Added DTU model to gem5

Added hardware accelerators

14 / 58

Page 30: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

gem5 – Example Configuration

x86 PE

L2$

DTUL1$

AccelPE

DTU

SPM

IO$

AccelPE

DTU

L1$

x86

PE

DTU

L1$ IO$

DTU

VM

ME

DRAM

x86PE

DTU

L1$SPM

15 / 58

Page 31: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Outline

1 Overall System Architecture

2 Prototype Platforms

3 Isolation and Communication

4 Capabilities

5 OS Services and Accelerators

6 Virtual Memory

7 Context Switching

8 Evaluation

16 / 58

Page 32: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Isolation

User PE

CU

DTU

Kernel PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

Kernel

App App

App

App

App DTU-based isolation:

Additional protection layer

Only kernel PE canestablish communicationchannels

User PEs can only useestablished channels

17 / 58

Page 33: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Isolation

User PE

CU

DTU

Kernel PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

Kernel

App App

App

App

App DTU-based isolation:

Additional protection layer

Only kernel PE canestablish communicationchannels

User PEs can only useestablished channels

17 / 58

Page 34: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Isolation

User PE

CU

DTU

Kernel PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

Kernel

App App

App

App

App DTU-based isolation:

Additional protection layer

Only kernel PE canestablish communicationchannels

User PEs can only useestablished channels

17 / 58

Page 35: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Isolation

User PE

CU

DTU

Kernel PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

Kernel

App App

App

App

App DTU-based isolation:

Additional protection layer

Only kernel PE canestablish communicationchannels

User PEs can only useestablished channels

17 / 58

Page 36: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Communication

User PE

CU

DTU

Kernel PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

DRAM

User PE

CU

DTU

Kernel

App App

App App

M

S

R

DTU provides endpoints to:

Access memory (contiguousrange, byte granular)

Receive messages into areceive bu�er

Send messages to areceiving endpoint

Replies for RPC

18 / 58

Page 37: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Communication

User PE

CU

DTU

Kernel PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

DRAM

User PE

CU

DTU

Kernel

App App

App App

M S

R

DTU provides endpoints to:

Access memory (contiguousrange, byte granular)

Receive messages into areceive bu�er

Send messages to areceiving endpoint

Replies for RPC

18 / 58

Page 38: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Communication

User PE

CU

DTU

Kernel PE

CU

DTU

User PE

CU

DTU

User PE

CU

DTU

DRAM

User PE

CU

DTU

Kernel

App App

App App

M S

R

DTU provides endpoints to:

Access memory (contiguousrange, byte granular)

Receive messages into areceive bu�er

Send messages to areceiving endpoint

Replies for RPC

18 / 58

Page 39: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

OS Design

M3: Microkernel-based system for het. manycores(or L4 ± 1)

Implemented from scratch

Drivers, filesystems, etc. implemented on user PEs

Kernel manages permissions, using capabilities

DTU enforces permissions(communication, memory access)

Kernel is independent of other PEs

Kernel M3FS

pipes App

App App

19 / 58

Page 40: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

M3 System Call

User PE

CU

DTU

Kernel PE

CU

DTU

KernelApp

R

S

20 / 58

Page 41: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

M3 System Call

User PE

CU

DTU

Kernel PE

CU

DTU

KernelApp

RS

20 / 58

Page 42: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

M3 System Call

User PE

CU

DTU

Kernel PE

CU

DTU

KernelApp

RS

20 / 58

Page 43: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

M3 System Call

User PE

CU

DTU

Kernel PE

CU

DTU

KernelApp

RS

20 / 58

Page 44: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

M3 System Call

User PE

CU

DTU

Kernel PE

CU

DTU

KernelApp

RS

20 / 58

Page 45: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Outline

1 Overall System Architecture

2 Prototype Platforms

3 Isolation and Communication

4 Capabilities

5 OS Services and Accelerators

6 Virtual Memory

7 Context Switching

8 Evaluation

21 / 58

Page 46: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Overview

0 2 0 21VPE 1 VPE 2

Kernel

VPE 2VPE 1

VPE SGate RGate VPE

22 / 58

Page 47: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Capabilities

M3 has the following capabilities:

Send: send messages to a receive EP

Receive: receive messages from send EPs

Memory: access remote memory via DTU

Mapping: access remote memory via load/store

Service: create sessions

Session: exchange caps with service

Endpoint: configure EPs of own or foreign DTU

VPE: use a PE

23 / 58

Page 48: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Capability Exchange

Kernel provides syscalls to create, exchange, and revoke capsThere are two ways to exchange caps:

1 Directly with another VPE (typically, a child VPE)2 Over a session with a service

The kernel o�ers two operations:1 Delegate: send capability to somebody else2 Obtain: receive capability from somebody else

Di�erence to L4:I Applications communicate directly, without involving the kernelI → Capability exchange cannot be done during IPCI Special communication channel between kernel and serversI Kernel uses this channel to send exchange requests to server

24 / 58

Page 49: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Communication

DTU

DTUDTU adds

CU

Mem

buffer

occupunread

EP credits

labeltarget

Receiver: PE1 Sender: PE2

channel

Kernel: PE0

SendGate

DTU

Mem

CUCU

EP

configuration of endpoints to establish a channel

VPE1: PE1

header data

Recv Cap RecvGate

VPE2: PE2

Send CapMem

EP

cmdregcmdreg cmdreg

25 / 58

Page 50: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Virtual PEs

M3 kernel manages user PEs in terms of VPEs

VPE is combination of a process and a thread

VPE creation yields a VPE capability and memory capability

Library provides primitives like fork and exec

VPEs are used for all PEs:I Accelerators are not handled di�erently by the kernelI All VPEs can perform system callsI All VPEs can have time slices and prioritiesI . . .

26 / 58

Page 51: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

VPEs – Examples

Executing ELF-Binaries

VPE vpe("test");

char *args[] = {"/bin/hello", "foo", "bar"};

vpe.exec(3, args);

Asynchronous Lambdas

VPE vpe("test");

MemGate mem = MemGate :: create_global (0x1000 , RW);

vpe.delegate(CapRngDesc(mem.sel(), 1));

vpe.run_async ([&mem]() {

mem.read(buf , sizeof(buf));

});

27 / 58

Page 52: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

VPEs – Examples

Executing ELF-Binaries

VPE vpe("test");

char *args[] = {"/bin/hello", "foo", "bar"};

vpe.exec(3, args);

Asynchronous Lambdas

VPE vpe("test");

MemGate mem = MemGate :: create_global (0x1000 , RW);

vpe.delegate(CapRngDesc(mem.sel(), 1));

vpe.run_async ([&mem]() {

mem.read(buf , sizeof(buf));

});

27 / 58

Page 53: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Outline

1 Overall System Architecture

2 Prototype Platforms

3 Isolation and Communication

4 Capabilities

5 OS Services and Accelerators

6 Virtual Memory

7 Context Switching

8 Evaluation

28 / 58

Page 54: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

OS Service Access for all CUs

sh$ decode in.png | fft | mul | ifft > out.rawsh$ decode in.png | fft | mul | ifft > out.raw

Shell

User program Input file

Hardware accelerators forimage processing

Output file

Pipes and output redirect

Challenges:

OS must provide genericprotocols

Accelerators needsupport for protocols

29 / 58

Page 55: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

OS Service Access for all CUs

sh$ decode in.png | fft | mul | ifft > out.rawsh$ decode in.png | fft | mul | ifft > out.raw

Shell

User program Input file

Hardware accelerators forimage processing

Output file

Pipes and output redirect

Challenges:

OS must provide genericprotocols

Accelerators needsupport for protocols

29 / 58

Page 56: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

OS Service Access for all CUs

sh$ decode in.png | fft | mul | ifft > out.rawsh$ decode in.png | fft | mul | ifft > out.raw

Shell

User program

Input file

Hardware accelerators forimage processing

Output file

Pipes and output redirect

Challenges:

OS must provide genericprotocols

Accelerators needsupport for protocols

29 / 58

Page 57: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

OS Service Access for all CUs

sh$ decode in.png | fft | mul | ifft > out.rawsh$ decode in.png | fft | mul | ifft > out.raw

Shell

User program Input file

Hardware accelerators forimage processing

Output file

Pipes and output redirect

Challenges:

OS must provide genericprotocols

Accelerators needsupport for protocols

29 / 58

Page 58: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

OS Service Access for all CUs

sh$ decode in.png | fft | mul | ifft > out.rawsh$ decode in.png | fft | mul | ifft > out.raw

Shell

User program Input file

Hardware accelerators forimage processing

Output file

Pipes and output redirect

Challenges:

OS must provide genericprotocols

Accelerators needsupport for protocols

29 / 58

Page 59: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

OS Service Access for all CUs

sh$ decode in.png | fft | mul | ifft > out.rawsh$ decode in.png | fft | mul | ifft > out.raw

Shell

User program Input file

Hardware accelerators forimage processing

Output file

Pipes and output redirect

Challenges:

OS must provide genericprotocols

Accelerators needsupport for protocols

29 / 58

Page 60: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

OS Service Access for all CUs

sh$ decode in.png | fft | mul | ifft > out.rawsh$ decode in.png | fft | mul | ifft > out.raw

Shell

User program Input file

Hardware accelerators forimage processing

Output file

Pipes and output redirect

Challenges:

OS must provide genericprotocols

Accelerators needsupport for protocols

29 / 58

Page 61: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

OS Service Access for all CUs

sh$ decode in.png | fft | mul | ifft > out.rawsh$ decode in.png | fft | mul | ifft > out.raw

Shell

User program Input file

Hardware accelerators forimage processing

Output file

Pipes and output redirect

Challenges:

OS must provide genericprotocols

Accelerators needsupport for protocols

29 / 58

Page 62: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

OS Service Access for all CUs

sh$ decode in.png | fft | mul | ifft > out.rawsh$ decode in.png | fft | mul | ifft > out.raw

Shell

User program Input file

Hardware accelerators forimage processing

Output file

Pipes and output redirect

Challenges:

OS must provide genericprotocols

Accelerators needsupport for protocols

29 / 58

Page 63: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

OS Service Access for all CUs

sh$ decode in.png | fft | mul | ifft > out.rawsh$ decode in.png | fft | mul | ifft > out.raw

Shell

User program Input file

Hardware accelerators forimage processing

Output file

Pipes and output redirect

Challenges:

OS must provide genericprotocols

Accelerators needsupport for protocols

29 / 58

Page 64: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Generic Protocols

Client Server

DRAM

DTU DTU

S R

req(in/out)

resp(pos,len)MM

File protocol:

Data in memoryRPC between client and server

I req(in/out) requests next piece,implicitly commits previous piece

I commit(nbytes) commits nbytes ofprevious piece

Server configures client’s memory EP

Client accesses data via DTU

30 / 58

Page 65: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Generic Protocols

Client Server

DRAM

DTU DTU

S R

req(in/out)

resp(pos,len)MM

File protocol:

Data in memory

RPC between client and serverI req(in/out) requests next piece,

implicitly commits previous pieceI commit(nbytes) commits nbytes of

previous piece

Server configures client’s memory EP

Client accesses data via DTU

30 / 58

Page 66: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Generic Protocols

Client Server

DRAM

DTU DTUS R

req(in/out)

resp(pos,len)

MM

File protocol:

Data in memoryRPC between client and server

I req(in/out) requests next piece,implicitly commits previous piece

I commit(nbytes) commits nbytes ofprevious piece

Server configures client’s memory EP

Client accesses data via DTU

30 / 58

Page 67: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Generic Protocols

Client Server

DRAM

DTU DTUS R

req(in/out)

resp(pos,len)MM

File protocol:

Data in memoryRPC between client and server

I req(in/out) requests next piece,implicitly commits previous piece

I commit(nbytes) commits nbytes ofprevious piece

Server configures client’s memory EP

Client accesses data via DTU

30 / 58

Page 68: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Generic Protocols

Client Server

DRAM

DTU DTUS R

req(in/out)

resp(pos,len)MM

File protocol:

Data in memoryRPC between client and server

I req(in/out) requests next piece,implicitly commits previous piece

I commit(nbytes) commits nbytes ofprevious piece

Server configures client’s memory EP

Client accesses data via DTU

30 / 58

Page 69: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Implementation: M3FS – Overview

M3FS organizes the file’s data in extentsM3FS can be used with a memory and disk backend

I With memory backend, FS image is a contiguous region in DRAMI Clients get access to parts of the imageI With disk backend, M3FS uses a bu�er cache in DRAMI Clients get access to parts of bu�er cache

Two types of sessions: metadata session, file session

Metadata session is created first, allows stat, open, . . .

open creates a new file session

Both sessions can be cloned to provide other VPEs access

31 / 58

Page 70: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Implementation: M3FS – File Protocol

The file session implements the file protocol (plus seeking)

File session holds file position and advances it on read/write

req(in/out) request next extent

M3FS configures client’s EP for this extent

Appending reserves new space, invisible to other clients

commit(nbytes) commits a previous append

32 / 58

Page 71: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Implementation: Pipe – Overview

writer reader

33 / 58

Page 72: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Implementation: Pipe – Overview

writer reader

Shared Memory

msg passing

pipeserv

34 / 58

Page 73: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Implementation: Pipe

Two types of sessions: pipe session, channel session

Pipe session represents whole pipe, allows to create channels

Channel session implements file protocol

Channel session can be cloned

Server configures client’s EP just once at the beginning

req(in/out) request access to next data

commit(nbytes) commits previous request

35 / 58

Page 74: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

File Multiplexing

File protocol maps directly to EPs (limited resource)

Number of open files shouldn’t be limited (that much)

libm3 dedicates at most 4 EPs to files and multiplexes themMultiplexing requires:

1 commit(nbytes) to commit read/wri�en data2 revocation of EP capability (old server)3 delegation of EP capability (new server)4 next read/write will contact server again

Fortunately, file multiplexing does almost never happen

36 / 58

Page 75: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Additions to Accelerator

Scratchpad memory (SPM)

CU

Accelerator

DTUASM

S M S M

IN OUT

O�-the-shelf accelerators

Accelerator Support Module (ASM):

Interacts with DTU and accelerator

Implements file protocol for input andoutput channel

ASM assumes that endpoints are setupexternally by so�ware

37 / 58

Page 76: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Additions to Accelerator

Scratchpad memory (SPM)

CU

Accelerator

DTU

ASM

S M S M

IN OUT

O�-the-shelf accelerators

Accelerator Support Module (ASM):

Interacts with DTU and accelerator

Implements file protocol for input andoutput channel

ASM assumes that endpoints are setupexternally by so�ware

37 / 58

Page 77: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Additions to Accelerator

Scratchpad memory (SPM)

CU

Accelerator

DTUASM

S M S M

IN OUT

O�-the-shelf accelerators

Accelerator Support Module (ASM):

Interacts with DTU and accelerator

Implements file protocol for input andoutput channel

ASM assumes that endpoints are setupexternally by so�ware

37 / 58

Page 78: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Additions to Accelerator

Scratchpad memory (SPM)

CU

Accelerator

DTUASM

S M S M

IN OUT

O�-the-shelf accelerators

Accelerator Support Module (ASM):

Interacts with DTU and accelerator

Implements file protocol for input andoutput channel

ASM assumes that endpoints are setupexternally by so�ware

37 / 58

Page 79: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Additions to Accelerator

Scratchpad memory (SPM)

CU

Accelerator

DTUASM

S M S M

IN OUT

O�-the-shelf accelerators

Accelerator Support Module (ASM):

Interacts with DTU and accelerator

Implements file protocol for input andoutput channel

ASM assumes that endpoints are setupexternally by so�ware

37 / 58

Page 80: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Demo

38 / 58

Page 81: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Assisted by OS

FFT

SPM

DMA

MUL

SPM

DMA

IFFT

SPM

DMA

OS

Driver

Input

Output

FFT

SPM

DMA

OS

Driver

MUL

SPM

DMAOS

Driver

IFFT

SPM

DMA

OS

Driver

OS-assisted accelerator chains:

OS drives copy-in/copy-out ofaccelerator SPMs

Only simple DMA needed

Like in traditional systems,high CPU overhead for OS

39 / 58

Page 82: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Assisted by OS

FFT

SPM

DMA

MUL

SPM

DMA

IFFT

SPM

DMA

OS

Driver

Input

Output

FFT

SPM

DMA

OS

Driver

MUL

SPM

DMAOS

Driver

IFFT

SPM

DMA

OS

Driver

OS-assisted accelerator chains:

OS drives copy-in/copy-out ofaccelerator SPMs

Only simple DMA needed

Like in traditional systems,high CPU overhead for OS

39 / 58

Page 83: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Assisted by OS

FFT

SPM

DMA

MUL

SPM

DMA

IFFT

SPM

DMA

OS

Driver

Input

Output

FFT

SPM

DMA

OS

Driver

MUL

SPM

DMAOS

Driver

IFFT

SPM

DMA

OS

Driver

OS-assisted accelerator chains:

OS drives copy-in/copy-out ofaccelerator SPMs

Only simple DMA needed

Like in traditional systems,high CPU overhead for OS

39 / 58

Page 84: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Assisted by OS

FFT

SPM

DMA

MUL

SPM

DMA

IFFT

SPM

DMA

OS

Driver

Input

Output

FFT

SPM

DMA

OS

Driver

MUL

SPM

DMAOS

Driver

IFFT

SPM

DMA

OS

Driver

OS-assisted accelerator chains:

OS drives copy-in/copy-out ofaccelerator SPMs

Only simple DMA needed

Like in traditional systems,high CPU overhead for OS

39 / 58

Page 85: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Assisted by OS

FFT

SPM

DMA

MUL

SPM

DMA

IFFT

SPM

DMA

OS

Driver

Input

Output

FFT

SPM

DMA

OS

Driver

MUL

SPM

DMAOS

Driver

IFFT

SPM

DMA

OS

Driver

OS-assisted accelerator chains:

OS drives copy-in/copy-out ofaccelerator SPMs

Only simple DMA needed

Like in traditional systems,high CPU overhead for OS

39 / 58

Page 86: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Assisted by OS

FFT

SPM

DMA

MUL

SPM

DMA

IFFT

SPM

DMA

OS

Driver

Input

Output

FFT

SPM

DMA

OS

Driver

MUL

SPM

DMAOS

Driver

IFFT

SPM

DMA

OS

Driver

OS-assisted accelerator chains:

OS drives copy-in/copy-out ofaccelerator SPMs

Only simple DMA needed

Like in traditional systems,high CPU overhead for OS

39 / 58

Page 87: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Assisted by OS

FFT

SPM

DMA

MUL

SPM

DMA

IFFT

SPM

DMA

OS

Driver

Input

Output

FFT

SPM

DMA

OS

Driver

MUL

SPM

DMAOS

Driver

IFFT

SPM

DMA

OS

Driver

OS-assisted accelerator chains:

OS drives copy-in/copy-out ofaccelerator SPMs

Only simple DMA needed

Like in traditional systems,high CPU overhead for OS

39 / 58

Page 88: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Assisted by OS

FFT

SPM

DMA

MUL

SPM

DMA

IFFT

SPM

DMA

OS

Driver

Input

Output

FFT

SPM

DMA

OS

Driver

MUL

SPM

DMAOS

Driver

IFFT

SPM

DMA

OS

Driver

OS-assisted accelerator chains:

OS drives copy-in/copy-out ofaccelerator SPMs

Only simple DMA needed

Like in traditional systems,high CPU overhead for OS

39 / 58

Page 89: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Fully Autonomous

FFT

SPM

DTU

MUL

SPM

DTU

IFFT

SPM

DTU

Shell

Input

Output

ASM

ASM

ASM

Shell

FFT

SPM

DTU ASM

MUL

SPM

DTU ASM

IFFT

SPM

DTU ASM

Autonomous accelerator chains:

Shell configures all endpoints

ASMs of accelerators drive DTUs totransfer data autonomously

Fully o�loaded,almost no CPU overhead for OS

40 / 58

Page 90: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Fully Autonomous

FFT

SPM

DTU

MUL

SPM

DTU

IFFT

SPM

DTU

Shell

Input

Output

ASM

ASM

ASM

Shell

FFT

SPM

DTU ASM

MUL

SPM

DTU ASM

IFFT

SPM

DTU ASM

Autonomous accelerator chains:

Shell configures all endpoints

ASMs of accelerators drive DTUs totransfer data autonomously

Fully o�loaded,almost no CPU overhead for OS

40 / 58

Page 91: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Fully Autonomous

FFT

SPM

DTU

MUL

SPM

DTU

IFFT

SPM

DTU

Shell

Input

Output

ASM

ASM

ASM

Shell

FFT

SPM

DTU ASM

MUL

SPM

DTU ASM

IFFT

SPM

DTU ASM

Autonomous accelerator chains:

Shell configures all endpoints

ASMs of accelerators drive DTUs totransfer data autonomously

Fully o�loaded,almost no CPU overhead for OS

40 / 58

Page 92: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Fully Autonomous

FFT

SPM

DTU

MUL

SPM

DTU

IFFT

SPM

DTU

Shell

Input

Output

ASM

ASM

ASM

Shell

FFT

SPM

DTU ASM

MUL

SPM

DTU ASM

IFFT

SPM

DTU ASM

Autonomous accelerator chains:

Shell configures all endpoints

ASMs of accelerators drive DTUs totransfer data autonomously

Fully o�loaded,almost no CPU overhead for OS

40 / 58

Page 93: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Fully Autonomous

FFT

SPM

DTU

MUL

SPM

DTU

IFFT

SPM

DTU

Shell

Input

Output

ASM

ASM

ASM

Shell

FFT

SPM

DTU ASM

MUL

SPM

DTU ASM

IFFT

SPM

DTU ASM

Autonomous accelerator chains:

Shell configures all endpoints

ASMs of accelerators drive DTUs totransfer data autonomously

Fully o�loaded,almost no CPU overhead for OS

40 / 58

Page 94: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Fully Autonomous

FFT

SPM

DTU

MUL

SPM

DTU

IFFT

SPM

DTU

Shell

Input

Output

ASM

ASM

ASM

Shell

FFT

SPM

DTU ASM

MUL

SPM

DTU ASM

IFFT

SPM

DTU ASM

Autonomous accelerator chains:

Shell configures all endpoints

ASMs of accelerators drive DTUs totransfer data autonomously

Fully o�loaded,almost no CPU overhead for OS

40 / 58

Page 95: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Fully Autonomous

FFT

SPM

DTU

MUL

SPM

DTU

IFFT

SPM

DTU

Shell

Input

Output

ASM

ASM

ASM

Shell

FFT

SPM

DTU ASM

MUL

SPM

DTU ASM

IFFT

SPM

DTU ASM

Autonomous accelerator chains:

Shell configures all endpoints

ASMs of accelerators drive DTUs totransfer data autonomously

Fully o�loaded,almost no CPU overhead for OS

40 / 58

Page 96: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Fully Autonomous

FFT

SPM

DTU

MUL

SPM

DTU

IFFT

SPM

DTU

Shell

Input

Output

ASM

ASM

ASM

Shell

FFT

SPM

DTU ASM

MUL

SPM

DTU ASM

IFFT

SPM

DTU ASM

Autonomous accelerator chains:

Shell configures all endpoints

ASMs of accelerators drive DTUs totransfer data autonomously

Fully o�loaded,almost no CPU overhead for OS

40 / 58

Page 97: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Fully Autonomous

FFT

SPM

DTU

MUL

SPM

DTU

IFFT

SPM

DTU

Shell

Input

Output

ASM

ASM

ASM

Shell

FFT

SPM

DTU ASM

MUL

SPM

DTU ASM

IFFT

SPM

DTU ASM

Autonomous accelerator chains:

Shell configures all endpoints

ASMs of accelerators drive DTUs totransfer data autonomously

Fully o�loaded,almost no CPU overhead for OS

40 / 58

Page 98: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Results

Assisted Autonomous

1

Tim

e (m

s)

0

5

10

15

20

2 3 4

# of parallel chains

1

CP

U l

oad

0.0

0.2

0.4

0.6

0.8

1.0

2 3 4

# of parallel chains

41 / 58

Page 99: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Results

Assisted Autonomous

1

Tim

e (m

s)

0

5

10

15

20

2 3 4

# of parallel chains

1

CP

U l

oad

0.0

0.2

0.4

0.6

0.8

1.0

2 3 4

# of parallel chains

41 / 58

Page 100: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Results

Assisted Autonomous

1

Tim

e (m

s)

0

5

10

15

20

2 3 4

# of parallel chains

1

CP

U l

oad

0.0

0.2

0.4

0.6

0.8

1.0

2 3 4

# of parallel chains

41 / 58

Page 101: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Chains: Results (PCIe-like Latency)

Assisted Autonomous

1

Tim

e (m

s)

0

20

40

60

80

2 3 4

# of parallel chains

1

CP

U l

oad

0.0

0.2

0.4

0.6

0.8

1.0

2 3 4

# of parallel chains

42 / 58

Page 102: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Outline

1 Overall System Architecture

2 Prototype Platforms

3 Isolation and Communication

4 Capabilities

5 OS Services and Accelerators

6 Virtual Memory

7 Context Switching

8 Evaluation

43 / 58

Page 103: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Virtual Memory – Overview

DTU

SPM

Accelerator

DTU MMU

Cache

Accelerator

DTU

x86

Cache

MMU

VM Helper

Di�erent PE types:

No MMU, SPM insteadof caches

MMU+caches providedby DTU

Reuse existingMMU+caches of CU

44 / 58

Page 104: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Virtual Memory – Overview

DTU

SPM

Accelerator

DTU MMU

Cache

Accelerator

DTU

x86

Cache

MMU

VM Helper

Di�erent PE types:

No MMU, SPM insteadof caches

MMU+caches providedby DTU

Reuse existingMMU+caches of CU

44 / 58

Page 105: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Virtual Memory – Overview

DTU

SPM

Accelerator

DTU MMU

Cache

Accelerator

DTU

x86

Cache

MMU

VM HelperDi�erent PE types:

No MMU, SPM insteadof caches

MMU+caches providedby DTU

Reuse existingMMU+caches of CU

44 / 58

Page 106: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Page Fault Handling

App

VMA

CU (PE-type C)

CU (PE-type B)

App

CU CU

Pager KernelPF

PF create_map

kernel requests

update PTEs

IRQ

DTU

DTU

DTU DTU

45 / 58

Page 107: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Outline

1 Overall System Architecture

2 Prototype Platforms

3 Isolation and Communication

4 Capabilities

5 OS Services and Accelerators

6 Virtual Memory

7 Context Switching

8 Evaluation

46 / 58

Page 108: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Context Switching – Overview

DTU

x86

Ctx Helper

VPE VPE

DTU

ARM

Kernel

SwitcherSwitcher

DTU

x86

DTU

ARM

DTU

Accelerator

Ctx Helper

VPE VPE

DTUDTU

ARM

Kernel handles complex partI Schedules and migrates VPEsI Initiates context switches

Helper on user PEsimplements save/restore

I General purpose PEs:So�ware helper

I Accelerator PEs:Helper implemented inhardware as part of ASM

47 / 58

Page 109: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Context Switching – Overview

DTU

x86

Ctx Helper

VPE VPE

DTU

ARM

Kernel

SwitcherSwitcher

DTU

x86

DTU

ARM

DTU

Accelerator

Ctx Helper

VPE VPE

DTUDTU

ARM

Kernel handles complex partI Schedules and migrates VPEsI Initiates context switches

Helper on user PEsimplements save/restore

I General purpose PEs:So�ware helper

I Accelerator PEs:Helper implemented inhardware as part of ASM

47 / 58

Page 110: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Context Switching – Overview

DTU

x86

Ctx Helper

VPE VPE

DTU

ARM

Kernel

SwitcherSwitcher

DTU

x86

DTU

ARM

DTU

Accelerator

Ctx Helper

VPE VPE

DTUDTU

ARM

Kernel handles complex partI Schedules and migrates VPEsI Initiates context switches

Helper on user PEsimplements save/restore

I General purpose PEs:So�ware helper

I Accelerator PEs:Helper implemented inhardware as part of ASM

47 / 58

Page 111: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Context Switching – Overview

DTU

x86

Ctx Helper

VPE VPE

DTU

ARM

Kernel

SwitcherSwitcher

DTU

x86

DTU

ARM

DTU

Accelerator

Ctx Helper

VPE VPE

DTUDTU

ARM

Kernel handles complex partI Schedules and migrates VPEsI Initiates context switches

Helper on user PEsimplements save/restore

I General purpose PEs:So�ware helper

I Accelerator PEs:Helper implemented inhardware as part of ASM

47 / 58

Page 112: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Context Switching with Direct Communication

How to determine whether recipient is running?I DTU knows running VPE and recipient of communicationI DTU reports error if recipient is not running

How to deliver the message if recipient is not running?I Message is forwarded via the kernelI Kernel schedules recipient and delivers message

How does the kernel know what VPEs are doing?I Activities send idle notificationI Only if compatible VPE is ready

48 / 58

Page 113: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Context Switching with Direct Communication

How to determine whether recipient is running?I DTU knows running VPE and recipient of communicationI DTU reports error if recipient is not running

How to deliver the message if recipient is not running?I Message is forwarded via the kernelI Kernel schedules recipient and delivers message

How does the kernel know what VPEs are doing?I Activities send idle notificationI Only if compatible VPE is ready

48 / 58

Page 114: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Context Switching with Direct Communication

How to determine whether recipient is running?I DTU knows running VPE and recipient of communicationI DTU reports error if recipient is not running

How to deliver the message if recipient is not running?I Message is forwarded via the kernelI Kernel schedules recipient and delivers message

How does the kernel know what VPEs are doing?I Activities send idle notificationI Only if compatible VPE is ready

48 / 58

Page 115: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Outline

1 Overall System Architecture

2 Prototype Platforms

3 Isolation and Communication

4 Capabilities

5 OS Services and Accelerators

6 Virtual Memory

7 Context Switching

8 Evaluation

49 / 58

Page 116: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Experimental Setup

Evaluation platform is gem5

Each general-purpose PE has out-of-order x86-64 core @ 3GHz,32+32 KiB L1 cache, 256 KiB L2 cache

Accelerator PEs are clocked with 1GHz

DRAM clocked with 1GHz

Short running, but representative benchmarks

50 / 58

Page 117: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Linux Application Workloads

M3

Lx

tar

02468

10

Tim

e (m

s)

M3

Lx

untar

M3

Lx

shasum

M3

Lx

sort

M3

Lx

find

M3

Lx

SQLite

M3

Lx

LvlDB

App Xfers OS

M3 vs. Linux 4.10

Traced on Linux,replayed on M3

M3FS vs. Linux tmpfs

Kernel App

Pager M3FS

M3: 1+3 cores

LinuxLinux: 1 core

51 / 58

Page 118: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Linux Application Workloads

M3

Lx

tar

02468

10

Tim

e (m

s)

M3

Lx

untar

M3

Lx

shasum

M3

Lx

sort

M3

Lx

find

M3

Lx

SQLite

M3

Lx

LvlDB

App Xfers OS

M3 vs. Linux 4.10

Traced on Linux,replayed on M3

M3FS vs. Linux tmpfs

Kernel App

Pager M3FS

M3: 1+3 cores

LinuxLinux: 1 core

51 / 58

Page 119: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Linux Application Workloads

M3

Lx

tar

02468

10

Tim

e (m

s)

M3

Lx

untar

M3

Lx

shasum

M3

Lx

sort

M3

Lx

find

M3

Lx

SQLite

M3

Lx

LvlDB

App Xfers OS

M3 vs. Linux 4.10

Traced on Linux,replayed on M3

M3FS vs. Linux tmpfs

Kernel App

Pager M3FS

M3: 1+3 cores

LinuxLinux: 1 core

51 / 58

Page 120: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

PE Sharing

tar untar shasum sort find SQLite LvlDB0

1

2

3

4

Rel. t

ime M3 vs. Linux 4.10

M3 shares user PEs indi�erent ways

Baseline is 1+3 PEs

Kernel App

Pager M3FS

Kernel App

PG+FS

Kernel A+P+F Linux

52 / 58

Page 121: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

PE Sharing

tar untar shasum sort find SQLite LvlDB0

1

2

3

4

Rel. t

ime M3 vs. Linux 4.10

M3 shares user PEs indi�erent ways

Baseline is 1+3 PEs

Kernel App

Pager M3FS

Kernel App

PG+FS

Kernel A+P+F Linux

52 / 58

Page 122: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Sharing

FFT MUL IFFT

VPE VPE VPEInput OutputVPE VPE VPEInput Output

FFT MUL IFFT

VPE VPE VPEInput OutputVPE VPE VPEInput Output

1..4 chains

53 / 58

Page 123: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Sharing

FFT MUL IFFT

VPE VPE VPEInput Output

VPE VPE VPEInput Output

FFT MUL IFFT

VPE VPE VPEInput OutputVPE VPE VPEInput Output

1..4 chains

53 / 58

Page 124: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Sharing

FFT MUL IFFT

VPE VPE VPEInput OutputVPE VPE VPEInput Output

FFT MUL IFFT

VPE VPE VPEInput OutputVPE VPE VPEInput Output

1..4 chains

53 / 58

Page 125: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Sharing

FFT MUL IFFT

VPE VPE VPEInput OutputVPE VPE VPEInput Output

FFT MUL IFFT

VPE VPE VPEInput OutputVPE VPE VPEInput Output

1..4 chains

53 / 58

Page 126: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Sharing

1

Rel. t

ime

0.98

0.99

1.00

1.01

1.02

2 3 4

1ms 2ms 4ms

# of accelerator chains

54 / 58

Page 127: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Sharing

1

Rel. t

ime

0.98

0.99

1.00

1.01

1.02

2 3 4

1ms 2ms 4ms

# of accelerator chains

54 / 58

Page 128: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Sharing

1

Rel. t

ime

0.98

0.99

1.00

1.01

1.02

2 3 4

1ms 2ms 4ms

# of accelerator chains

54 / 58

Page 129: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Evaluation Summary

Comparable application performance

Superior performance for data-intensive applications

Accelerators can run autonomously, causing almost no CPU load

Accelerators can be shared with minimum overhead

55 / 58

Page 130: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Future Work

Scaling to larger systems pursued by Ma�hias Hille(runs 512 applications with a parallel e�iciency of 75%, using 11% for the OS [1])

Core-local context switching and IPC

Other accelerators: FPGAs, GPUs, . . .

[1] SemperOS: Distributed Capability System, USENIX ATC’1956 / 58

Page 131: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Conclusion

M3 uses a hardware/operating-system co-design

DTU introduces common interface for all CUs

Allows to integrate all (untrusted) CUs as first-class citizens

Access to OS services for all CUs

M3 uses the same concepts for all CUs

Allows simple management of complex systems

57 / 58

Page 132: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

More Information

M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous ManycoresNils Asmussen, Marcus Völp, Benedikt Nöthen, Hermann Härtig, and Gerhard Fe�weisASPLOS 2016

M3X: Autonomous Accelerators via Context-Enabled Fast-Path CommunicationNils Asmussen, Michael Roitzsch, and Hermann HärtigUSENIX ATC 2019

SemperOS: Distributed Capability SystemMa�hias Hille, Nils Asmussen, Pramod Bhatotia, and Hermann HärtigUSENIX ATC 2019

58 / 58

Page 133: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Backup Slides

59 / 58

Page 134: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Accelerator Sharing (PCIe)

1

Rel. t

ime

0.98

1.00

1.02

1.04

1.06

1.08

2 3 4

1ms 2ms 4ms

60 / 58

Page 135: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

DTU Power Consumption

0.5 1 2 4 10

Compute time (K cycles)

Av

g P

ow

er

(mW

)

02

46

81

01

21

4

Core SPM DTU

61 / 58

Page 136: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

DTU Size

Comparison:

Single Xtensa core has∼ 50000 gates

Single x86 core (haswell) has∼ 100 Million gates

62 / 58

Page 137: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

So�ware Complexity

63 / 58

Page 138: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Context Switching Microbenchmark

M³−C (local)M³−C (rem−sh)M³−C (rem−ex)M³−B (rem−sh)M³−B (rem−ex)M³−A (rem−sh)M³−A (rem−ex)NOVA (remote)

NOVA (local)

Time (µs)

0 1 2 3 4 5 6 7 8 9 10 11

Wake CtxSw Fwd Comm

64 / 58

Page 139: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Scalability with Dedicated OS Service PEs

0 4 8 12 16 20 24 28 320

25

50

75

100

Pa

ral.

eff

. (%

)

# of applications ( tar )

0 4 8 12 16 20 24 28 320

25

50

75

100

# of applications ( untar )

0 4 8 12 16 20 24 28 320

25

50

75

100

Pa

ral.

eff

. (%

)

# of applications ( shasum & sort )

0 4 8 12 16 20 24 28 320

25

50

75

100

# of applications ( find )

0 4 8 12 16 20 24 28 320

25

50

75

100

Pa

ral.

eff

. (%

)

# of applications ( SQLite )

0 4 8 12 16 20 24 28 320

25

50

75

100

# of applications ( LevelDB )

1 srv 2 srv 4 srv 8 srv

65 / 58

Page 140: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Scalability with PE Sharing

● ● ●

1 2 4 8 16 320

25

50

75

100P

ara

l. e

ff. (

%)

# of applications

●tar untar find sqliteleveldb shasum sort

66 / 58

Page 141: Microkernel Construction - Case Study: M3os.inf.tu-dresden.de/Studium/MkK/SS2019/07_m3.pdfMicrokernel Construction Case Study: M3 Nils Asmussen July 4th 2019 1 / 58 Heterogeneous Systems

Stream Processing ASM

DTU

SPM

S

in out

M SM

RSASM

Acceleratorlogic

CU

C

RD OU

W

E

IN WR

input no input output

no outputin reply

out replyEOF

ctxsw

ctxsw

67 / 58