Warp-Aware Trace Scheduling for GPUS

65
Warp-Aware Trace Scheduling for GPUS James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU) Maurice Herlihy (Brown)

description

Warp-Aware Trace Scheduling for GPUS. James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU) Maurice Herlihy (Brown). Historical Trends in GFLOPS: CPUs vs. GPUs. Reproduced from NVIDIA C Programming Guide (Version 5.0). Performance Pitfalls. - PowerPoint PPT Presentation

Transcript of Warp-Aware Trace Scheduling for GPUS

Page 1: Warp-Aware Trace Scheduling for GPUS

Warp-Aware TraceScheduling for

GPUSJames Jablin (Brown)Thomas Jablin (UIUC)

Onur Mutlu (CMU)Maurice Herlihy (Brown)

Page 2: Warp-Aware Trace Scheduling for GPUS

Historical Trends in GFLOPS:CPUs vs. GPUs

Reproduced from NVIDIA C Programming Guide (Version 5.0)

Page 3: Warp-Aware Trace Scheduling for GPUS

Performance Pitfalls

Control flow can negatively affect performance.

Page 4: Warp-Aware Trace Scheduling for GPUS

Performance Pitfalls

Pipeline Stall – execution delay in an instruction pipeline to resolve a dependency

Page 5: Warp-Aware Trace Scheduling for GPUS

Hardware: CPU versus GPU

Reproduced from NVIDIA C Programming Guide (Version 5.0)

Page 6: Warp-Aware Trace Scheduling for GPUS
Page 7: Warp-Aware Trace Scheduling for GPUS
Page 8: Warp-Aware Trace Scheduling for GPUS
Page 9: Warp-Aware Trace Scheduling for GPUS
Page 10: Warp-Aware Trace Scheduling for GPUS
Page 11: Warp-Aware Trace Scheduling for GPUS
Page 12: Warp-Aware Trace Scheduling for GPUS
Page 13: Warp-Aware Trace Scheduling for GPUS
Page 14: Warp-Aware Trace Scheduling for GPUS
Page 15: Warp-Aware Trace Scheduling for GPUS
Page 16: Warp-Aware Trace Scheduling for GPUS
Page 17: Warp-Aware Trace Scheduling for GPUS
Page 18: Warp-Aware Trace Scheduling for GPUS

Performance Pitfalls

Pipeline Stall – execution delay in an instruction pipeline to resolve a dependency

Page 19: Warp-Aware Trace Scheduling for GPUS

Performance Pitfalls

Pipeline Stall – execution delay in an instruction pipeline to resolve a dependency

Warp Divergence – threads within a warp take different paths and the different execution paths are serialized

Page 20: Warp-Aware Trace Scheduling for GPUS

Warp Divergence Example

Page 21: Warp-Aware Trace Scheduling for GPUS

Warp Divergence Example

Page 22: Warp-Aware Trace Scheduling for GPUS

Warp Divergence Example

Page 23: Warp-Aware Trace Scheduling for GPUS

Warp Divergence Example

Page 24: Warp-Aware Trace Scheduling for GPUS

Warp Divergence Example

Page 25: Warp-Aware Trace Scheduling for GPUS

Warp Divergence Example

Page 26: Warp-Aware Trace Scheduling for GPUS

Warp Divergence Example

Page 27: Warp-Aware Trace Scheduling for GPUS

Warp Divergence Example

Page 28: Warp-Aware Trace Scheduling for GPUS

Warp-Aware Trace SchedulingSchedule instructions across basic block boundaries to expose additional ILP…

Page 29: Warp-Aware Trace Scheduling for GPUS

Warp-Aware Trace SchedulingSchedule instructions across basic block boundaries to expose additional ILP…

while managing and optimizing warp divergence.

Page 30: Warp-Aware Trace Scheduling for GPUS

Origins: Microcode Trace Scheduling…generalizing local and disparate vertical-to-horizontal microcode compaction

Step Description

Page 31: Warp-Aware Trace Scheduling for GPUS

Origins: Microcode Trace Scheduling…generalizing local and disparate vertical-to-horizontal microcode compaction

Step Description

1. Trace Selection

Page 32: Warp-Aware Trace Scheduling for GPUS

Origins: Microcode Trace Scheduling…generalizing local and disparate vertical-to-horizontal microcode compaction

Step Description

1. Trace Selection

2. Trace Formation

Page 33: Warp-Aware Trace Scheduling for GPUS

Origins: Microcode Trace Scheduling…generalizing local and disparate vertical-to-horizontal microcode compaction

Step Description

1. Trace Selection

2. Trace Formation

3. Local Scheduling

Page 34: Warp-Aware Trace Scheduling for GPUS

Origins: Microcode Trace Scheduling…generalizing local and disparate vertical-to-horizontal microcode compaction

Step Description

1. Trace Selection Partition basic blocks into regions

2. Trace Formation Facilitate local scheduling, potentially adding nodes and edges

3. Local Scheduling Schedule instructions within each region

Page 35: Warp-Aware Trace Scheduling for GPUS
Page 36: Warp-Aware Trace Scheduling for GPUS
Page 37: Warp-Aware Trace Scheduling for GPUS
Page 38: Warp-Aware Trace Scheduling for GPUS
Page 39: Warp-Aware Trace Scheduling for GPUS
Page 40: Warp-Aware Trace Scheduling for GPUS
Page 41: Warp-Aware Trace Scheduling for GPUS
Page 42: Warp-Aware Trace Scheduling for GPUS
Page 43: Warp-Aware Trace Scheduling for GPUS
Page 44: Warp-Aware Trace Scheduling for GPUS
Page 45: Warp-Aware Trace Scheduling for GPUS
Page 46: Warp-Aware Trace Scheduling for GPUS
Page 47: Warp-Aware Trace Scheduling for GPUS
Page 48: Warp-Aware Trace Scheduling for GPUS
Page 49: Warp-Aware Trace Scheduling for GPUS
Page 50: Warp-Aware Trace Scheduling for GPUS
Page 51: Warp-Aware Trace Scheduling for GPUS

Backup Slides

Page 52: Warp-Aware Trace Scheduling for GPUS
Page 53: Warp-Aware Trace Scheduling for GPUS

GPU Programming Model

Page 54: Warp-Aware Trace Scheduling for GPUS

GPU Programming Model

Page 55: Warp-Aware Trace Scheduling for GPUS

GPU Programming Model

Page 56: Warp-Aware Trace Scheduling for GPUS

Characterizing the Grid…

Page 57: Warp-Aware Trace Scheduling for GPUS

Characterizing the Grid, Block…

Page 58: Warp-Aware Trace Scheduling for GPUS

Characterizing the Grid, Block…

Page 59: Warp-Aware Trace Scheduling for GPUS

Characterizing the Grid, Block, and Thread

Page 60: Warp-Aware Trace Scheduling for GPUS

Characterizing the Grid, Block, and Thread

Page 61: Warp-Aware Trace Scheduling for GPUS

Warp Divergence Examples

Page 62: Warp-Aware Trace Scheduling for GPUS

Warp Divergence Examples

Page 63: Warp-Aware Trace Scheduling for GPUS

Warp Divergence Examples

Page 64: Warp-Aware Trace Scheduling for GPUS

Warp Divergence Examples

Page 65: Warp-Aware Trace Scheduling for GPUS

Warp Divergence Examples