Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian...
-
date post
21-Dec-2015 -
Category
Documents
-
view
228 -
download
3
Transcript of Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian...
![Page 1: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/1.jpg)
Many-Core Programming with GRAMPS& “Real Time REYES”
Jeremy Sugerman, Kayvon FatahalianStanford University
June 12, 2008
![Page 2: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/2.jpg)
2
Background, Outline Stanford Graphics / Architecture Research CPU, GPU trends And collision?
Two research areas:– HW/SW Interface, Programming Model– Future Graphics API
![Page 3: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/3.jpg)
3
Problem Statement Drive efficient development and execution in
many-/multi-core systems. Support homogeneous, heterogeneous cores. Inform future hardware
Status Quo: GPU Pipeline (Good for GL, otherwise hard) CPU (No guidance, fast is hard)
![Page 4: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/4.jpg)
4
Software defined graphs Producer-consumer, data-parallelism Initial focus on rendering
GRAMPSInput
FragmentQueue
OutputFragment
Queue
Rasterization Pipeline
Ray Tracing Pipeline
= Thread Stage= Shader Stage= Fixed-func Stage
= Queue= Stage Output
RayQueue
Ray HitQueue Fragment
Queue
Camera Intersect
Shade FB Blend
Shade FB BlendRasterize
![Page 5: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/5.jpg)
5
As a GPU Evolution Not (too) radical for ‘graphics’ Like fixed → programmable shading
– Pipeline undergoing massive shake up– Diversity of new parameters and use cases
Bigger picture than ‘graphics’– Rendering is more than GL/D3D– Compute is more than rendering– Larrabee has no innate pipeline
![Page 6: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/6.jpg)
6
As a Compute Evolution Sounds like streaming:
Execution graphs, kernels, data-parallelism Streaming: “squeeze out every FLOP”
– Goals: bulk transfer, arithmetic intensity– Intensive static analysis, custom chips (mostly)– Bounded space, data access, execution time
GRAMPS: “interesting apps are irregular”– Goals: Dynamic, data-dependent code– Aggregate work at run-time– Heterogeneous commodity platforms– Naturally supports streaming when applicable
![Page 7: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/7.jpg)
7
GRAMPS’ Role A ‘graphics pipeline’ is now an app! GRAMPS models parallel state machines.
Compared to status quo:– More flexible than a GPU pipeline– More guidance than bare metal– Portability in between– Not domain specific
![Page 8: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/8.jpg)
8
GRAMPS Interfaces Host/Setup: Create execution graph
Thread: Stateful, singleton
Shader: Data-parallel, auto-instanced
![Page 9: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/9.jpg)
9
What We’ve Built (System)
![Page 10: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/10.jpg)
10
GRAMPS Scheduler Tiered Scheduler
‘Fat’ cores: per-thread, per-core
‘Micro’ cores: shared hw scheduler
Top level: tier N
![Page 11: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/11.jpg)
11
What We’ve Built (Apps)Direct3D Pipeline (with Ray-tracing Extension)
Ray-tracing Pipeline
IA 1 VS 1 RO Rast
Trace
IA N VS N
PS
SampleQueue Set
RayQueue
PrimitiveQueue
Input VertexQueue 1
PrimitiveQueue 1
Input VertexQueue N
OM
PS2
FragmentQueue
Ray HitQueue
Ray-tracing Extension
PrimitiveQueue N
Tiler
Shade FB Blend
SampleQueue
TileQueue
RayQueue
Ray HitQueue
FragmentQueue
CameraSampler Intersect
= Thread Stage= Shader Stage= Fixed-func
= Queue= Stage Output= Push Output
![Page 12: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/12.jpg)
12
Initial Results Queues are small, utilization is good
![Page 13: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/13.jpg)
13
GRAMPS Visualization
![Page 14: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/14.jpg)
14
GRAMPS Visualization
![Page 15: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/15.jpg)
15
GRAMPS Portability Portability really means performance.
Less portable than GL/D3D– GRAMPS graph is hardware sensitive
More portable than bare metal– Enforces modularity– Best case, just works – Worst case, saves boilerplate
![Page 16: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/16.jpg)
16
High-level Challenges Is GRAMPS a suitable GPU evolution?
– Enable pipeline competitive with bare metal?– Enable innovation: advanced / alternative
methods?
Is GRAMPS a good parallel compute model?– Map well to hardware, hardware trends?– Support important apps?– Concepts influence developers?
![Page 17: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/17.jpg)
17
What’s Next for GRAMPS? Implementation: scheduling, simulation details Model:
Graph modification (state change)Blocking calls (join)Intra/inter-stage synchronization primitivesData sharing / ref-counting
Workloads: REYES, physics, others?
Develop new graphics pipelines…
![Page 18: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/18.jpg)
“Real-Time REYES”
18
![Page 19: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/19.jpg)
19
Just Build It
Build a real-time REYES pipeline...
… that is tightly integrated with ray tracing for global effects.
![Page 20: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/20.jpg)
20
What does real-time REYES mean? (to us)
Smooth surfaces via adaptive tessellation– Everything is a displaced subdivision surface
Shade on surface, prior to rasterization
Stochastic rasterization for motion blur and DOF
Order-independent transparency
![Page 21: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/21.jpg)
21
Split
Dice
Shade
Rasterize
Z Test
Blend/Resolve
Displace
Early Z
Tessellate (xbox)
Early Z
Frag Shade
Z Test
Blend/Resolve
Vertex Shade
Rasterize
REYES OpenGL/Direct3D
![Page 22: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/22.jpg)
22
Split primitive into smaller primitives until a “GOOD” grid can be created.
REYES Tessellation
![Page 23: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/23.jpg)
23
![Page 24: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/24.jpg)
24
![Page 25: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/25.jpg)
25
![Page 26: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/26.jpg)
26
Grids
GOOD GRID = - Max polygon area < 1 pixel - All polys about the same size - Bounded # polys per grid
Regular parametric sampling of primitive surface (like XBox360).
Compact representation for many adjacent polygons.
Grids provide SIMD efficiency and bulk processing benefits.
![Page 27: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/27.jpg)
27
Split
Dice
Shade
Rast/Crack Fix
Z Test
Blend/Resolve
Displace
Early Z
Tessellate (xbox)
Early Z
Frag Shade
Z Test
Blend/Resolve
Vertex Shade
Rast
REYES OpenGL/Direct3D
![Page 28: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/28.jpg)
28
What does real-time REYES mean? (to us)
Smooth surfaces via adaptive tessellation– Splitting is irregular (and serial)– Crack fixing
Shade on surface, prior to rasterization– We feel confident about this– But most “work” done before moving to raster space… hmm
Stochastic rasterization for motion blur and DOF – Many tiny polygons parallel rasterization– SIMD tricky
Order-independent transparency– Not unique to REYES
![Page 29: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/29.jpg)
29
Shading in a Hybrid System Evaluate displacement (due to REYES or on demand for ray tracing)
Shade grids Shade ray hits Looking forward… shade quads too?
One shading system or two or three?
![Page 30: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/30.jpg)
This Project is Really About Re-architecting REYES pipeline for real-time
performance (for throughput architectures like LRB)
Hybrid rendering: study interoperability of advanced techniques (REYES + ray tracing + maybe Direct3D)– Hybrid shading system– Understand workload balance
Hybrid pipeline interface: real-time, retained mode
Pursuit of more flexible, advanced graphics pipelines
![Page 31: Many-Core Programming with GRAMPS & “Real Time REYES” Jeremy Sugerman, Kayvon Fatahalian Stanford University June 12, 2008.](https://reader030.fdocuments.us/reader030/viewer/2022020716/56649d695503460f94a47775/html5/thumbnails/31.jpg)
31
Questions?