Commodity Components 1 Greg Humphreys October 10, 2002.
-
Upload
bailey-stricklen -
Category
Documents
-
view
217 -
download
0
Transcript of Commodity Components 1 Greg Humphreys October 10, 2002.
![Page 1: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/1.jpg)
Commodity Components 1Commodity Components 1
Greg HumphreysGreg HumphreysOctober 10, 2002October 10, 2002
![Page 2: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/2.jpg)
Life Is But a (Graphics) StreamLife Is But a (Graphics) Stream
C C C C
T
TT
T
S S S S
“There’s a bug somewhere…”
![Page 3: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/3.jpg)
You Spent How Much Money You Spent How Much Money On On WhatWhat Exactly? Exactly?
You Spent How Much Money You Spent How Much Money On On WhatWhat Exactly? Exactly?
![Page 4: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/4.jpg)
Several Years of Failed ExperimentsSeveral Years of Failed Experiments
1975-1980
1982-1986
1980-1982
1986-Present
![Page 5: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/5.jpg)
The ProblemThe Problem
Scalable graphics solutions are rare Scalable graphics solutions are rare and expensiveand expensive
Commodity technology is getting Commodity technology is getting fasterfaster
But it tends not to scaleBut it tends not to scale
Cluster graphics solutions have been Cluster graphics solutions have been inflexibleinflexible
![Page 6: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/6.jpg)
Big ModelsBig Models
Scans of Saint Matthew (386 MPolys) and the David (2 GPolys) Stanford Digital Michelangelo Project
![Page 7: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/7.jpg)
![Page 8: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/8.jpg)
Large DisplaysLarge Displays
Window system and large-screen interaction metaphorsFrançois Guimbretière, Stanford University HCI group
![Page 9: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/9.jpg)
Modern Graphics ArchitectureModern Graphics ArchitectureNVIDIA GeForce4 Ti 4600NVIDIA GeForce4 Ti 4600Amazing technology:Amazing technology:
• 4.8 Gpix/sec (antialiased)4.8 Gpix/sec (antialiased)
• 136 Mtri/sec136 Mtri/sec
• Programmable pipeline stagesProgrammable pipeline stages
Capabilities increasing at roughly Capabilities increasing at roughly 225% per year225% per year
But it doesn’t scale:But it doesn’t scale:• Triangle rate is bus-limited – 136 Mtri/sec mostly Triangle rate is bus-limited – 136 Mtri/sec mostly
unachievableunachievable
• Display resolution growing very slowlyDisplay resolution growing very slowly
Result: There is a serious gap between dataset Result: There is a serious gap between dataset complexity and processing powercomplexity and processing power
GeForce4 Die Plot Courtesy NVIDIA
![Page 10: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/10.jpg)
Why Clusters?Why Clusters?
Commodity partsCommodity parts• Complete graphics pipeline on a single chip
• Extremely fast product cycle
FlexibilityFlexibility• Configurable building blocks
CostCost• Driven by consumer demand
• Economies of scale
AvailabilityAvailability• Insufficient demand for “big iron” solutions
• Little or no ongoing innovation in graphics “supercomputers”
![Page 11: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/11.jpg)
Stanford/DOE Visualization ClusterStanford/DOE Visualization Cluster32 nodes, each with graphics32 nodes, each with graphics
Compaq SP750Compaq SP750• Dual 800 MHz PIII Xeon
• i840 logic
• 256 MB memory
• 18 GB disk
• 64-bit 66 MHz PCI
• AGP-4x
GraphicsGraphics• 16 NVIDIA Quadro2 Pro
• 16 NVIDIA GeForce 3
NetworkNetwork• Myrinet (LANai 7 ~ 100 MB/sec)
![Page 12: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/12.jpg)
Virginia Cluster?Virginia Cluster?
![Page 13: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/13.jpg)
IdeasIdeas
Technology for driving tiled displaysTechnology for driving tiled displays•Unmodified applications
•Efficient network usage
Scalable rendering rates on clustersScalable rendering rates on clusters•161 Mtri/sec at interactive rates to a display wall
•1.6 Gvox/sec at interactive rates to a single display
Cluster graphics as stream processingCluster graphics as stream processing•Virtual graphics interface
•Flexible mechanism for non-invasive transformations
![Page 14: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/14.jpg)
The Name GameThe Name Game
One idea, two systems:One idea, two systems:WireGLWireGL
• Sort-first parallel rendering for tiled displays
• Released to the public in 2000
ChromiumChromium• General stream processing framework
• Multiple parallel rendering architectures
• Open-source project started in June 2001
• Alpha release September 2001
• Beta release April 2002
• 1.0 release September 2002
![Page 15: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/15.jpg)
General ApproachGeneral Approach
Replace system’s OpenGL driverReplace system’s OpenGL driver• Industry standard API
•Support existing unmodified applications
Manipulate streams of API commandsManipulate streams of API commands•Route commands over a network
•Track state!
•Render commands using graphics hardware
Allow parallel applications to issue Allow parallel applications to issue OpenGLOpenGL•Constrain ordering between multiple streams
![Page 16: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/16.jpg)
Cluster GraphicsCluster Graphics
•Raw scalability is easy (just add more pipelines)•One of our goals is to expose that scalability to an application
AppGraphicsHardware
Display
AppGraphicsHardware
Display
AppGraphicsHardware
Display
AppGraphicsHardware
Display
![Page 17: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/17.jpg)
Cluster GraphicsCluster Graphics
• Graphics hardware is indivisible• Each graphics pipeline managed by a network server
App Server Display
App Server Display
App Server Display
App Server Display
![Page 18: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/18.jpg)
Cluster GraphicsCluster Graphics
• Flexible number of clients, servers and displays• Compute limited = more clients• Graphics limited = more servers• Interface/network limited = more of both
Server
App
Server Display
App
Server Display
App
Server
![Page 19: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/19.jpg)
Output ScalabilityOutput Scalability
Larger displays with Larger displays with unmodifiedunmodified applicationsapplicationsOther possibilities: broadcast, ring networkOther possibilities: broadcast, ring network
App
Server
Server
Server
...
Display
Display
Display
...
![Page 20: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/20.jpg)
1 byte
12 bytes
Protocol DesignProtocol Design
1 byte overhead per function call1 byte overhead per function call
glColor3f( 1.0, 0.5, 0.5 );
COLOR3F1.00.50.5
VERTEX3F
1.02.03.0
COLOR3F
0.51.00.5
VERTEX3F
2.03.01.0
glVertex3f( 1.0, 2.0, 3.0 );
glColor3f( 0.5, 1.0, 0.5 );
glVertex3f( 2.0, 3.0, 1.0 );
![Page 21: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/21.jpg)
Protocol DesignProtocol Design
1 byte overhead per function call1 byte overhead per function call
glColor3f( 1.0, 0.5, 0.5 );
COLOR3F1.00.50.5
VERTEX3F
1.02.03.0
COLOR3F
0.51.00.5
VERTEX3F
2.03.01.0
glVertex3f( 1.0, 2.0, 3.0 );
glColor3f( 0.5, 1.0, 0.5 );
glVertex3f( 2.0, 3.0, 1.0 );
Opcodes
Data
![Page 22: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/22.jpg)
Efficient Remote RenderingEfficient Remote Rendering
NetworkNetwork Mvert/secMvert/sec EfficiencyEfficiency
DirectDirect NoneNone 21.5021.50
Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle
![Page 23: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/23.jpg)
Efficient Remote RenderingEfficient Remote Rendering
NetworkNetwork Mvert/secMvert/sec EfficiencyEfficiency
DirectDirect NoneNone 21.5021.50
GLXGLX 100 Mbit100 Mbit 0.730.73 70%70%
Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle
![Page 24: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/24.jpg)
Efficient Remote RenderingEfficient Remote Rendering
NetworkNetwork Mvert/secMvert/sec EfficiencyEfficiency
DirectDirect NoneNone 21.5021.50
GLXGLX 100 Mbit100 Mbit 0.730.73 70%70%
WireGLWireGL 100 Mbit100 Mbit 0.900.90 86%86%
Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle
![Page 25: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/25.jpg)
Efficient Remote RenderingEfficient Remote Rendering
NetworkNetwork Mvert/secMvert/sec EfficiencyEfficiency
DirectDirect NoneNone 21.5021.50
GLXGLX 100 Mbit100 Mbit 0.730.73 70%70%
WireGLWireGL 100 Mbit100 Mbit 0.900.90 86%86%
WireGLWireGL MyrinetMyrinet 9.189.18 88%88%
Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle
![Page 26: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/26.jpg)
Efficient Remote RenderingEfficient Remote Rendering
NetworkNetwork Mvert/secMvert/sec EfficiencyEfficiency
DirectDirect NoneNone 21.5021.50
GLXGLX 100 Mbit100 Mbit 0.730.73 70%70%
WireGLWireGL 100 Mbit100 Mbit 0.900.90 86%86%
WireGLWireGL MyrinetMyrinet 9.189.18 88%88%
WireGLWireGL None*None* 20.9020.90 97%97%
Application draws 60,000 vertices/frame Measurements using 800 MhZ PIII + GeForce2Efficiency assumes 12 bytes per triangle*None: discard packets, measuring pack rate
![Page 27: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/27.jpg)
Sort-first Stream SpecializationSort-first Stream Specialization
Update bounding box per-vertexUpdate bounding box per-vertex
Transform bounds to screen-spaceTransform bounds to screen-space
Assign primitives to servers (with Assign primitives to servers (with overlap)overlap)
![Page 28: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/28.jpg)
Graphics StateGraphics State
OpenGL is a big state machineOpenGL is a big state machineState encapsulates control for State encapsulates control for
geometric operationsgeometric operations•Lighting/shading parameters
•Texture maps and texture mapping parameters
•Boolean enables/disables
•Rendering modes
Example: Example: glColor3f( 1.0, 1.0, 1.0 )glColor3f( 1.0, 1.0, 1.0 )• Sets the current color to white
•Any subsequent primitives will appear white
![Page 29: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/29.jpg)
Lazy State UpdateLazy State Update
Track entire OpenGL stateTrack entire OpenGL state
Precede a tile’s geometry with state Precede a tile’s geometry with state deltasdeltas
glTexImage2D(…)
glBlendFunc(…)
glEnable(…)
glLightfv(…)
glMaterialf(…)
glEnable(…)
Ian Buck, Greg Humphreys and Pat Hanrahan, Graphics Hardware Workshop 2000
![Page 30: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/30.jpg)
How Does State Tracking Work?How Does State Tracking Work?
Tracking state is a no-brainer, it’s the frequent Tracking state is a no-brainer, it’s the frequent context differences that complicate thingscontext differences that complicate things
Need to quickly find the elements that are Need to quickly find the elements that are differentdifferent
Represent state as a hierarchy of dirty bitsRepresent state as a hierarchy of dirty bits
18 top-level categories: buffer, transformation, 18 top-level categories: buffer, transformation, lighting, texture, stencil, etc.lighting, texture, stencil, etc.
Actually, use dirty bit-vectors. Each bit Actually, use dirty bit-vectors. Each bit corresponds to a rendering servercorresponds to a rendering server
![Page 31: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/31.jpg)
0 0 0 0 0 0 0 0
Inside State TrackingInside State Tracking
glLightf( GL_LIGHT1, GL_SPOT_CUTOFF, 45)
0 0 0 0 0 0 0 0 Transformation
0 0 0 0 0 0 0 0 Pixel
Lighting1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 Texture
0 0 0 0 0 0 0 0 Buffer
0 0 0 0 0 0 0 0 Fog
0 0 0 0 0 0 0 0 Line
0 0 0 0 0 0 0 0 Polygon
0 0 0 0 0 0 0 0 Viewport
0 0 0 0 0 0 0 0 Scissor
0 0 0 0 0 0 0 0
Light 00 0 0 0 0 0 0 0
Light 1
0 0 0 0 0 0 0 0 Light 2
0 0 0 0 0 0 0 0
Light 30 0 0 0 0 0 0 0
Light 4
0 0 0 0 0 0 0 0 Light 5
0 0 0 0 0 0 0 0 Texture
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 Ambient color
0 0 0 0 0 0 0 0 Diffuse color
Spot cutoff
(0,0,0,1)
(1,1,1,1)
180
0 0 0 0 0 0 0 0 Light 2
1 1 1 1 1 1 1 1
451 1 1 1 1 1 1 1
![Page 32: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/32.jpg)
Context ComparisonContext Comparison
(0,0,0,1)
(1,1,1,1)
45
Client State Server 2’s State
(0,0,0,1)
(1,1,1,1)
180
Bit 2 set?Bit 2 set?Bit 2 set?Bit 2 set?Bit 2 set?
Bit 2 set?
No, skip itNo, skip itYes, drill downNo, skip itYes, drill down
No, skip it
Equal
EqualNot Equal!Pack Command!
LIGHTF_OPGL_LIGHT1
GL_SPOT_CUTOFF45
45
![Page 33: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/33.jpg)
Output Scalability ResultsOutput Scalability Results
Marching Cubes
Point-to-point “broadcast” doesn’t scale at allHowever, it’s still the commercial solution [SGI Cluster]
![Page 34: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/34.jpg)
Output Scalability ResultsOutput Scalability Results
Quake III: Arena
Larger polygons overlap more tiles
![Page 35: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/35.jpg)
WireGL on the SIGGRAPH show floorWireGL on the SIGGRAPH show floor
![Page 36: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/36.jpg)
Input ScalabilityInput Scalability
Parallel geometry extractionParallel geometry extractionParallel data submissionParallel data submissionOrdering?Ordering?
App
App
App
...DisplayServer
![Page 37: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/37.jpg)
Parallel OpenGL APIParallel OpenGL API
Express ordering constraints between Express ordering constraints between multiple independent graphics contexts multiple independent graphics contexts
Don’t block the application, just encode Don’t block the application, just encode them like any other graphics commandthem like any other graphics command
Ordering is resolved by the rendering Ordering is resolved by the rendering serverserver
Homan Igehy, Gordon Stoll and Pat Hanrahan, SIGGRAPH 98
Introduce new OpenGL commands:Introduce new OpenGL commands:•glBarrierExec•glSemaphoreP•glSemaphoreV
![Page 38: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/38.jpg)
Serial OpenGL ExampleSerial OpenGL Example
def Display:glClear(…)DrawOpaqueGeometry()DrawTransparentGeometry()SwapBuffers()
![Page 39: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/39.jpg)
Parallel OpenGL ExampleParallel OpenGL Exampledef Init:
glBarrierInit(barrier, 2)glSemaphoreInit(sema, 0)
def Display:if my_client_id == 1:
glClear(…)glBarrierExec(barrier)DrawOpaqueGeometry(my_client_id)glBarrierExec(barrier)if my_client_id == 1:
glSemaphoreP(sema)DrawTransparentGeometry1()
else:DrawTransparentGeometry2()glSemaphoreV(sema)
glBarrierExec(barrier)if my_client_id == 1:
SwapBuffers()
Optional in Chromium
Optional in Chromium
![Page 40: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/40.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
Clear BarrierOpaqueGeom 1
TransparentGeom 1
SwapSemaP BarrierBarrier
BarrierOpaqueGeom 1
BarrierTransparent
Geom 2SemaV Barrier
![Page 41: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/41.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
BarrierOpaqueGeom 1
TransparentGeom 1
SwapSemaP BarrierBarrier
BarrierOpaqueGeom 1
BarrierTransparent
Geom 2SemaV Barrier
![Page 42: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/42.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
OpaqueGeom 1
TransparentGeom 1
SwapSemaP BarrierBarrier
BarrierOpaqueGeom 1
BarrierTransparent
Geom 2SemaV Barrier
![Page 43: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/43.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
OpaqueGeom 1
TransparentGeom 1
SwapSemaP BarrierBarrier
BarrierOpaqueGeom 1
BarrierTransparent
Geom 2SemaV Barrier
![Page 44: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/44.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
OpaqueGeom 1
TransparentGeom 1
SwapSemaP BarrierBarrier
OpaqueGeom 1
BarrierTransparent
Geom 2SemaV Barrier
![Page 45: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/45.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
OpaqueGeom 1
TransparentGeom 1
SwapSemaP BarrierBarrier
BarrierTransparent
Geom 2SemaV Barrier
![Page 46: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/46.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
OpaqueGeom 1
TransparentGeom 1
SwapSemaP BarrierBarrier
TransparentGeom 2
SemaV Barrier
![Page 47: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/47.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
OpaqueGeom 1
TransparentGeom 1
SwapSemaP BarrierBarrier
TransparentGeom 2
SemaV Barrier
![Page 48: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/48.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
TransparentGeom 1
SwapSemaP BarrierBarrier
TransparentGeom 2
SemaV Barrier
![Page 49: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/49.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
TransparentGeom 1
SwapSemaP Barrier
TransparentGeom 2
SemaV Barrier
![Page 50: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/50.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
TransparentGeom 1
SwapBarrier
TransparentGeom 2
SemaV Barrier
![Page 51: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/51.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
TransparentGeom 1
SwapBarrier
TransparentGeom 2
SemaV Barrier
![Page 52: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/52.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
TransparentGeom 1
SwapBarrier
BarrierSemaV
![Page 53: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/53.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
TransparentGeom 1
SwapBarrier
Barrier
![Page 54: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/54.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
TransparentGeom 1
SwapBarrier
![Page 55: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/55.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
TransparentGeom 1
SwapBarrier
![Page 56: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/56.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
SwapBarrier
![Page 57: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/57.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
Swap
![Page 58: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/58.jpg)
Inside a Rendering ServerInside a Rendering Server
Client 1
Client 2
![Page 59: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/59.jpg)
Input Scalability ResultsInput Scalability Results
Multiple clients, Multiple clients, oneone serverserver
Compute limited applicationCompute limited application
![Page 60: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/60.jpg)
A Fully Parallel ConfigurationA Fully Parallel Configuration
App
App
App
...
Server
Server
Server
...Display
Display
Display
![Page 61: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/61.jpg)
Fully Parallel ResultsFully Parallel Results
1-1 rate: 472 KTri/sec16-16 rate: 6.2 MTri/sec
![Page 62: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/62.jpg)
Peak Sort-First Rendering PerformancePeak Sort-First Rendering Performance
Immediate ModeImmediate Mode• Unstructured triangle
strips
• 200 triangles/strip
Data change Data change every every frameframe
Peak observed: Peak observed: 161,000,000 161,000,000 tris/secondtris/second
Total scene size = Total scene size = 16,000,000 16,000,000 trianglestriangles
Total display size at Total display size at 32 nodes: 32 nodes: 2048x1024 2048x1024 (256x256 tiles)(256x256 tiles)
![Page 63: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/63.jpg)
WireGL Image ReassemblyWireGL Image Reassembly
Composite
App
App
App
...
Server
Server
Server
...
NetworkGeometry
NetworkImagery
Display
![Page 64: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/64.jpg)
WireGL Image ReassemblyWireGL Image Reassembly
Composite
ParallelOpenGL
ParallelOpenGL
App
App
App
...
Server
Server
Server
...Display
![Page 65: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/65.jpg)
WireGL Image ReassemblyWireGL Image Reassembly
Server
ParallelOpenGL
ParallelOpenGL
Composite == Server!Composite == Server!
Server
Server
Server
...
App
App
App
...Display
![Page 66: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/66.jpg)
Lightning-2 = Digital Video Switch from Intel ResearchLightning-2 = Digital Video Switch from Intel ResearchRoute partial scanlines from Route partial scanlines from nn inputs to inputs to mm outputs outputsTile reassembly or depth compositing at full refresh rateTile reassembly or depth compositing at full refresh rateLightning-2 and WireGL demonstrated at SIGGRAPH Lightning-2 and WireGL demonstrated at SIGGRAPH 20012001
Image Reassembly in HardwareImage Reassembly in Hardware
Gordon Stoll et al., SIGGRAPH 2001
![Page 67: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/67.jpg)
Example: 16-way Tiling of One MonitorExample: 16-way Tiling of One Monitor
Framebuffer 1
Framebuffer 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
1 3 6
8 9 11
14 16
2 4 5
7 10 12
13 15
3 6
ReconstructedImage
StripHeaders
![Page 68: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/68.jpg)
WireGL ShortcomingsWireGL Shortcomings
Sort-firstSort-first•Can be difficult to load-balance
•Screen-space parallelism limited
•Heavily dependent on spatial locality
Resource utilizationResource utilization•Geometry must move over network every frame
•Server’s graphics hardware remains underutilized
We need something more flexibleWe need something more flexible
S
AA
A
...
SS
S
...
![Page 69: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/69.jpg)
Stream ProcessingStream Processing
Stream Source Transform 1
Transform 1
Transform 2...
...
Streams:• Ordered sequences of records• Potentially infinite
Transformations:• Process only the head element• Finite local storage• Can be partitioned across processors to expose
parallelism
Stream Source
Stream Output
![Page 70: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/70.jpg)
Why Stream Processing?Why Stream Processing?
Elegant mechanism for dealing with huge Elegant mechanism for dealing with huge datadata• Explicitly expose and exploit parallelism
• Hide latency
State of the art in many fields:State of the art in many fields:• Databases [Terry92, Babu01]
• Telephony [Cortes00]
• Online Algorithms [Borodin98,O’Callaghan02]
• Sensor Fusion [Madden01]
• Media Processing [Halfhill00,Khailany01]
• Computer Architecture [Rixner98]
• Graphics Hardware [Owens00, NVIDIA, ATI]
![Page 71: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/71.jpg)
Cluster Graphics As Stream ProcessingCluster Graphics As Stream ProcessingTreat OpenGL calls as a stream of Treat OpenGL calls as a stream of
commandscommands
Form a DAG of stream transformation Form a DAG of stream transformation nodesnodes•Nodes are computers in a cluster
•Edges are OpenGL API communication
Each node has a Each node has a serializationserialization stage stage and a and a transformationtransformation stage stage
![Page 72: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/72.jpg)
Stream SerializationStream Serialization
Convert multiple streams into a single Convert multiple streams into a single streamstream
Efficiently context-switch between streamsEfficiently context-switch between streams
Constrain ordering using Parallel OpenGL Constrain ordering using Parallel OpenGL APIAPI
Two kinds of serializers:Two kinds of serializers:
•Network server:
•Application:
•Unmodified serial application
•Custom parallel application
S
AOpenGL
![Page 73: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/73.jpg)
Stream TransformationStream Transformation
Serialized stream is dispatched to Serialized stream is dispatched to “Stream Processing Units” (SPUs)“Stream Processing Units” (SPUs)
Each SPU is a shared libraryEach SPU is a shared library•Exports a (partial) OpenGL interface
Each node loads a Each node loads a chainchain of SPUs at of SPUs at run timerun time
SPUs are generic and interchangeableSPUs are generic and interchangeable
![Page 74: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/74.jpg)
Example: WireGL RevealedExample: WireGL Revealed
App
App
App
...Tilesort
Tilesort
Tilesort
...
Server
Server
Server
Readback
Readback
Readback
Send
Send
Send
Server
Render
![Page 75: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/75.jpg)
SPU InheritanceSPU Inheritance
The Readback and Render SPUs are The Readback and Render SPUs are relatedrelated•Readback renders everything except
SwapBuffers
Readback Readback inheritsinherits from the Render from the Render SPUSPU•Override parent’s implementation of
SwapBuffers
•All OpenGL calls considered “virtual”
![Page 76: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/76.jpg)
Example: Readback’s SwapBuffersExample: Readback’s SwapBuffers
Easily extended to include depth composite Easily extended to include depth composite
All other functions inherited from Render SPUAll other functions inherited from Render SPU
void RB_SwapBuffers(void){ self.ReadPixels( 0, 0, w, h, ... ); if (self.id == 0) child.Clear( GL_COLOR_BUFFER_BIT ); child.BarrierExec( READBACK_BARRIER ); child.RasterPos2i( tileX, tileY ); child.DrawPixels( w, h, ... ); child.BarrierExec( READBACK_BARRIER ); if (self.id == 0) child.SwapBuffers( );}
Optional
Optional
![Page 77: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/77.jpg)
Virtual GraphicsVirtual Graphics
Separate interface from Separate interface from implementationimplementation
Underlying architecture can change Underlying architecture can change without application’s knowledgewithout application’s knowledge
Douglas Voorhies, David Kirk and Olin Lathrop, SIGGRAPH 88
![Page 78: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/78.jpg)
Example: Sort-LastExample: Sort-Last
Application runs directly on graphics hardwareApplication runs directly on graphics hardware
Same application can use sort-last or sort-firstSame application can use sort-last or sort-first
...
Application
Application
Application
Readback
Readback
Readback
Send
Send
Send
Server
Render
![Page 79: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/79.jpg)
Example: Sort-Last Binary SwapExample: Sort-Last Binary Swap
Application Application
Application Application
Readback Readback
Readback Readback
BSwap BSwap
BSwap BSwap
Send Send
Send Send
Server
Render
![Page 80: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/80.jpg)
Binary Swap Volume Rendering ResultsBinary Swap Volume Rendering Results
One node
Two nodes
Four nodesEight nodes Sixteen nodes
![Page 81: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/81.jpg)
Example: User Interface ReintegrationExample: User Interface Reintegration
App
Tilesort ...
Server
Server
Server
IBMT221
Display(3840x2400)
IBMScalableGraphicsEngine
Chromium ProtocolUDP/Gigabit EthernetDigital Video Cables
Integrate
Integrate
Integrate
Serial applications can drive the T221 with their original user interfaceParallel applications can have a user interface
![Page 82: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/82.jpg)
CATIA Driving IBM’s T221 DisplayCATIA Driving IBM’s T221 Display
Jet engine nacelle model courtesy Goodrich AerostructuresX-Windows Integration SPU by Peter Kirchner and Jim Klosowski, IBM T.J. WatsonChromium is the only practical way to drive the T221 with an existing applicationDemonstrated at Supercomputing 2001
3840
24
00
![Page 83: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/83.jpg)
A Hidden-line Style SPUA Hidden-line Style SPU
![Page 84: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/84.jpg)
A Hidden-line Style SPUA Hidden-line Style SPU
![Page 85: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/85.jpg)
DemoDemo
Quake III
Render
Quake III
StateQueryVertexArrayHiddenLine Render
Server
HiddenLine Render
Quake III
SendStateQuery
![Page 86: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/86.jpg)
Is “HiddenLine” Really a SPU?Is “HiddenLine” Really a SPU?
Technically, no!Technically, no!
Requires potentially unbounded Requires potentially unbounded resourcesresources
Alternate design:Alternate design:
Application
SQ VAHiddenLine
Server
Server
ReadbackSend
ReadbackSend
Server
Render
Lines
Polygons
DepthComposit
e
![Page 87: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/87.jpg)
Current ShortcomingsCurrent Shortcomings
General display lists on tiled displaysGeneral display lists on tiled displays•Display lists that affect the graphics state
Distributed texture managementDistributed texture management•Each node must provide its own texture
•Potential N2 texture explosion
•Virtualize distributed texture access?
Ease of creating parallel applicationsEase of creating parallel applications• Input event management
![Page 88: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/88.jpg)
Future DirectionsFuture Directions
End-to-end visualization system for 4D End-to-end visualization system for 4D datadata•Data management and load balancing
•Volume compression
Remote/Ubiquitous VisualizationRemote/Ubiquitous Visualization•Scalable graphics as a shared resource
•Transparent remote interaction with (parallel) apps
•Shift away from desktop-attached graphics
Taxonomy of non-invasive techniquesTaxonomy of non-invasive techniques•Classify SPUs and algorithms
• Identify tradeoffs in design
![Page 89: Commodity Components 1 Greg Humphreys October 10, 2002.](https://reader035.fdocuments.us/reader035/viewer/2022062322/56649c7f5503460f94936141/html5/thumbnails/89.jpg)
Observations and PredictionsObservations and Predictions
Manipulation of graphics streams is a Manipulation of graphics streams is a powerful abstraction for cluster powerful abstraction for cluster graphicsgraphics•Achieves both input and output scalability
Providing Providing mechanismsmechanisms instead of instead of algorithms algorithms allows greater flexibilityallows greater flexibility•Data management algorithms can be built into
a parallel application or embedded in a SPU
Flexible remote graphics will lead to a Flexible remote graphics will lead to a revolution in ubiquitous computingrevolution in ubiquitous computing