Warp-Aware Trace Scheduling for GPUS James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU)...

65
Warp-Aware Trace Scheduling for GPUS James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU) Maurice Herlihy (Brown)

Transcript of Warp-Aware Trace Scheduling for GPUS James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU)...

  • Slide 1
  • Warp-Aware Trace Scheduling for GPUS James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU) Maurice Herlihy (Brown)
  • Slide 2
  • Historical Trends in GFLOPS: CPUs vs. GPUs Reproduced from NVIDIA C Programming Guide (Version 5.0)
  • Slide 3
  • Performance Pitfalls Control flow can negatively affect performance.
  • Slide 4
  • Performance Pitfalls Pipeline Stall execution delay in an instruction pipeline to resolve a dependency
  • Slide 5
  • Hardware: CPU versus GPU Reproduced from NVIDIA C Programming Guide (Version 5.0)
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Performance Pitfalls Pipeline Stall execution delay in an instruction pipeline to resolve a dependency
  • Slide 19
  • Performance Pitfalls Pipeline Stall execution delay in an instruction pipeline to resolve a dependency Warp Divergence threads within a warp take different paths and the different execution paths are serialized
  • Slide 20
  • Warp Divergence Example
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Warp-Aware Trace Scheduling Schedule instructions across basic block boundaries to expose additional ILP
  • Slide 29
  • Warp-Aware Trace Scheduling Schedule instructions across basic block boundaries to expose additional ILP while managing and optimizing warp divergence.
  • Slide 30
  • Origins: Microcode Trace Scheduling generalizing local and disparate vertical-to- horizontal microcode compaction StepDescription
  • Slide 31
  • Origins: Microcode Trace Scheduling generalizing local and disparate vertical-to- horizontal microcode compaction StepDescription 1. Trace Selection
  • Slide 32
  • Origins: Microcode Trace Scheduling generalizing local and disparate vertical-to- horizontal microcode compaction StepDescription 1. Trace Selection 2. Trace Formation
  • Slide 33
  • Origins: Microcode Trace Scheduling generalizing local and disparate vertical-to- horizontal microcode compaction StepDescription 1. Trace Selection 2. Trace Formation 3. Local Scheduling
  • Slide 34
  • Origins: Microcode Trace Scheduling generalizing local and disparate vertical-to- horizontal microcode compaction StepDescription 1. Trace SelectionPartition basic blocks into regions 2. Trace Formation Facilitate local scheduling, potentially adding nodes and edges 3. Local SchedulingSchedule instructions within each region
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Backup Slides
  • Slide 52
  • Slide 53
  • GPU Programming Model
  • Slide 54
  • Slide 55
  • Slide 56
  • Characterizing the Grid
  • Slide 57
  • Characterizing the Grid, Block
  • Slide 58
  • Slide 59
  • Characterizing the Grid, Block, and Thread
  • Slide 60
  • Slide 61
  • Warp Divergence Examples
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65