Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat...

14
Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat...

Many-Core Programming with GRAMPSJeremy SugermanKayvon FatahalianSolomon BoulosKurt AkeleyPat Hanrahan

2

Problem Statement Facilitate efficient development and

execution in many-/multi-core commodity systems.

Homogeneous or heterogeneous cores.

Status Quo: GPUs: Easy to write GL/D3D and run it fast,

hard to express anything else CPUs: Possible (not easy) to write

anything, possible (hard) to run it fast

3

GRAMPS Background Resembles a GPU with software constructed

pipeline. Not (too) radical even in a pure graphics context Similar story saw fixed -> programmable

shading Now the pipeline topology is under analogous

pressures: proliferation of stages and options And graphics is more than a GL/D3D pipeline… And throughput / many-core is more than

graphics…

4

GRAMPS Programming Model Software constructs the pipeline (actually

graph) Exposes threads, shaders, fixed function

stages– Coprocessors exposed via ISA

Exposes FIFOs / Queues connecting stagesAlso enables software push / re-sorting

Exposes Buffers for memory access

5

GRAMPS’ Place Compared to GPU Pipeline:

More things possible (and medium easy), still (mostly) runs fast, less hardware independent

Compared to CPU:Easier to write things, easier to run them well,

some loss of expressivity and flexibility

Still a role for a ‘graphics pipeline’. It’s an app! GRAMPS is a layer, model for state machines.

6

GRAMPS and Streaming From some angles, GRAMPS sounds a lot like

Stream Processing / Computing Distinctions are most visible in the target

traits. Streaming expects predictable data creation,

flow, and consumption. Intensive offline / compile-time optimization and pre-scheduling.

GRAMPS expects dynamic data-dependent execution, (and thus) run-time scheduling

Also, GRAMPS assumes commodity and heterogeneity.

GRAMPS Examples

Rast ShadeFB

Blend

InputFragment

Queue

OutputFragment

Queue

Camera Intersect

FB Blend

RayQueue

SampleQueue

Shade

PixelQueue

Rasterization Pipeline

Ray Tracing Pipeline

8

GRAMPS Overview Concepts:

GraphsStages: thread, shader, fixed-functionQueues: ordered, unordered, sets

(exclusion)Buffers

ComponentsAPIs: setup/driver, thread, shaderScheduler: fat core, shader core, top-level

9

What We’ve Built Three rendering pipelines:

Direct3D, Packet Tracer, D3D + Push (Hybrid)

Simulator and Runtime for two machines:GPU-like: Many threads per core, hw

schedCPU-like: Few threads per core, sw sched

10

Rendering Pipelines

Direct3D Pipeline (with Ray-tracing Extension)

IA 1 VS 1 RO Rast

Trace

IA N VS N

PS

SampleQueue Set

RayQueue

PrimitiveQueue

Input VertexQueue 1

PrimitiveQueue 1

Input VertexQueue N

Ray-tracing Pipeline

Tiler Sampler Camera Intersect

Shade FB Blend

SampleQueue

TileQueue

RayQueue

Ray HitQueue Fragment

Queue

= Thread Stage

= Shader Stage

= Fixed-func Stage

= Queue

= Output via Push

OM

PS2

FragmentQueue

= Stage Output

Ray HitQueue

Ray-tracing Extension

PrimitiveQueue N

11

Initial Results Measured thread occupancy, worst case

total queue memory.

12

GRAMPS Vis

13

High-level Challenges Is GRAMPS a suitable GPU evolution?

– Enable pipeline competitive with bare metal?

– Enable innovation: advanced / alternative methods?

– Is there a ‘best’ graphics pipeline on top?

Is GRAMPS a good parallel compute model?– Map well to hardware, hardware trends?– Support important apps?– Concepts influence developers?

14

What’s Next? Low level implementation: scheduling,

more accurate simulation. More apps: REYES, physics, likely more. Audit and refine model: graph modification

/ state change, fork-join / blocking calls, locks / barriers / synchronization primitives intra- or inter-stage

Prototype, explore next generation graphics pipelines.