Accelerating The Cloud - University of Toronto

14
Sahil Suneja, Elliott Baron , Eyal de Lara, Ryan Johnson Accelerating The Cloud with Heterogeneous Computing

Transcript of Accelerating The Cloud - University of Toronto

Page 1: Accelerating The Cloud - University of Toronto

Sahil Suneja, Elliott Baron, Eyal de Lara, Ryan Johnson

Accelerating The Cloudwith Heterogeneous Computing

Page 2: Accelerating The Cloud - University of Toronto

GPGPU Computing

2

Data Parallel Tasks Apply a fixed operation in parallel to each

element of a data array

Examples Bioinformatics Data Mining Computational Finance NOT Systems Tasks

High-latency memory copying

Page 3: Accelerating The Cloud - University of Toronto

Game Changer – On-Chip GPUs

Processors combining CPU/GPU on one die AMD Fusion APU, Intel Sandy/Ivy Bridge Share Main Memory Very Low Latency Energy Efficient

3

Page 4: Accelerating The Cloud - University of Toronto

Accelerating The Cloud

Use GPUs to accelerate Data Parallel Systems Tasks Better Performance Offload CPU for other tasks No Cache Pollution Better Energy Efficiency (Silberstein et al, SYSTOR 2011)

Cloud Environment particularly attractive Hybrid CPU/GPU will make it to the data center GPU cores likely underutilized Useful for Common Hypervisor Tasks

4

Page 5: Accelerating The Cloud - University of Toronto

Data Parallel Cloud Operations

Memory Scrubbing Batch Page Table Updates Memory Compression Virus Scanning Memory Hashing

6

Page 6: Accelerating The Cloud - University of Toronto

Complications Different Privilege Levels Multiple Users

Requirements Performance Isolation Memory Protection

7

Hardware Management

Page 7: Accelerating The Cloud - University of Toronto

Hardware Management

Management Policies VMM Only Time Multiplexing Space Multiplexing

8

Page 8: Accelerating The Cloud - University of Toronto

Memory Access

• All Tasks mentioned assume GPU can Directly Access Main (CPU) Memory• Many require Write Access

• Currently, CPU <-> GPU copying required!• Even though both share Main Memory

• Makes some tasks infeasible on GPU, others less efficient

9

Page 9: Accelerating The Cloud - University of Toronto

Case Study – Page Sharing

“De-duplicate” Memory Hashing identifies sharing candidates Remove all, but one physical copy Heavy on CPU Scanning Frequency ∝ Sharing Opportunities

10

Page 10: Accelerating The Cloud - University of Toronto

Memory Hashing Evaluation

11

0

2

4

6

8

10

12

14

16

CPU GPU CPU GPU

Fusion Discrete

Tim

e (s

)

Running Time (CPU vs. GPU)

Page 11: Accelerating The Cloud - University of Toronto

Conclusion/Summary

Hybrid CPU/GPU Processors Are Here Get Full Benefit in Data Centres

Accelerate and Offload Administrative Tasks

Need to Consider Effective Management and Remedy Memory Access Issues

Memory Hashing Example Shows Promise Over Order of Magnitude Faster

22

Page 12: Accelerating The Cloud - University of Toronto

Extra Slides

Page 13: Accelerating The Cloud - University of Toronto

Memory Hashing Evaluation

17

050

100150200250300350400450500

Memory Kernel Memory Kernel

Fusion Discrete

Tim

e (m

s)

Running Time (Memory vs. Kernel)

Page 14: Accelerating The Cloud - University of Toronto

CPU Overhead

Measure performance degradation of CPU-Heavy program

Hashing via CPU = 50% Overhead Hashing via GPU = 25% Overhead

Without Memory Transfers = 11% Overhead

21