ADVANCES IN OPTIX - GTC On-Demand Featured...
Transcript of ADVANCES IN OPTIX - GTC On-Demand Featured...
SAMPLE DEVICE CODERT_PROGRAM void dome_camera(){
size_t2 screen = output_buffer.size();
float2 d = make_float2(launch_index) / make_float2(screen)* make_float2(2.0f, 2.0f) - make_float2(1.0f, 1.0f);
float3 angle = make_float3(d.x, d.y, sqrtf(1.0f - (d.x*d.x + d.y*d.y)));float3 ray_origin = eye;float3 ray_direction = normalize(angle.x*normalize(U) +
angle.y*normalize(V) +angle.z*normalize(W));
optix::Ray ray(ray_origin, ray_direction, radiance_ray_type, scene_epsilon);
PerRayData_radiance prd;prd.importance = 1.f;prd.depth = 0;
rtTrace(top_object, ray, prd);
output_buffer[launch_index] = make_color(prd.result);}
OPTIX EXECUTION MODEL
rtContextLaunchException
Program
Selector Visit
Program
Miss
ProgramNode Graph
Traversal
Acceleration
Traversal
Launch
Traverse Shade
rtTrace
Closest Hit
Program
Any Hit
Program
Intersection
Program
Callable
Program
Ray Generation
Program
OPTIX ENCAPSULATES THE ALGORITHM
OptiX is a to-the-algorithm API
Processor
Algorithm
SoftwareTo-the-metal
To-the-algorithm
MAJOR ARCHITECTURAL RENOVATIONLLVM-based OptiX compiler
Better GPU ray tracing performance
More fluid interactive rendering
Better multi-GPU scaling
More efficient complex node graphs
Additional input languages
CPU backend
UNIFIED VIRTUAL MEMORYMerges CPU and GPU memory spaces
Full read/write access from both processors
Eliminates GPU memory footprint barrier
Coming in Pascal architecture (2016)
OPTIX PRIMESpecialized for ray tracing
Latest algorithms from NVIDIA Research
Ray tracing kernels
Treelet Reordering BVH (TRBVH)
Support for asynchronous computation
CPU support
No programing model support for shading
No support for Quadro VCA
No support for dynamic materials
Triangles only
No ability to target different architectures
INSTANCING IN PRIMEA model is a set of instances:
RTP_BUFFER_FORMAT_INSTANCE_MODEL
RTP_BUFFER_FORMAT_TRANSFORM_FLOAT4x3
New API call
rtpModelSetInstances
Hit result formats
RTP_BUFFER_FORMAT_HIT_T_TRIID_INSTID
RTP_BUFFER_FORMAT_HIT_T_TRIID_INSTID_U_V
Context
ModelBufferDesc
transforms
instances
ModelModel
BufferDesc
INSTANCING IN PRIME
std::vector<instInfo_t> instanceData;std::vector<RTPmodel> instanceList;std::vector<SimpleMatrix4x3> transformList;createInstances(numInstances, models, instanceList, transformList, instanceData);
RTPbufferdesc instances, transforms;rtpBufferDescCreate(context, RTP_BUFFER_FORMAT_INSTANCE_MODEL, RTP_BUFFER_TYPE_HOST, &instanceList[0], &instances);rtpBufferDescSetRange(instances, 0, instanceList.size());
rtpBufferDescCreate(context, RTP_BUFFER_FORMAT_TRANSFORM_FLOAT4x3, RTP_BUFFER_TYPE_HOST, &transformList[0], &transforms);rtpBufferDescSetRange(transforms, 0, transformList.size());
RTPmodel scene;rtpModelCreate(context, &scene);rtpModelSetInstances(scene, instances, transforms);
STREAM BUFFERS
RTbuffer output_buffer, stream_buffer;rtBufferCreate(context, RT_BUFFER_OUTPUT, &output_buffer);rtBufferCreate(context, RT_BUFFER_PROGRESSIVE_STREAM, &stream_buffer);
rtBufferSetSize2D(output_buffer, width, height);rtBufferSetSize2D(stream_buffer, width, height);rtBufferSetFormat(output_buffer, RT_FORMAT_FLOAT4);rtBufferSetFormat(stream_buffer, RT_FORMAT_UNSIGNED_BYTE4);
rtBufferBindProgressiveStream(stream_buffer, output_buffer);
PROGRESSIVE APIrtContextLaunchProgressive2D(context, width, height, num_subframes);
while(!finished) {int ready;rtBufferGetProgressiveUpdateReady(stream_buffer, &ready, 0, 0);
if(ready) {rtBufferMap(stream_buffer, &data);display(data);rtBufferUnmap(stream_buffer);
}
if(scene_changed()) {// Update OptiX statertVariableSet(...);
}
rtContextLaunchProgressive2D(context, width, height, num_subframes);}
PROGRESSIVE API (DEVICE)
rtDeclareVariable(unsigned int, subframe_idx, rtSubframeIndex, );
unsigned int seed = rand_seed(launch_index, frame, subframe_idx);
Quadro VCA Under the Hood
GPUs 8 x M6000-VCA GPUs
GPU Memory 12 GB per GPU
CUDA Cores 23,040
CPU Cores 20 Physical
System Memory 256 GB
Storage 4 x 512GB SSD
Network
2 x 1GigE
2 x 10GigE (SFP+)
1 x InfiniBand
Installed SoftwareIray IQ + Cent OS Linux
+ VCA Cluster Manager
U.S. MSRP $50,000
Interactive
Image
Stream
Incremental
Updates
OptiX App
Ethernet or
InternetCustom OptiX Applications
All Processing on VCA
OptiX Leveraging
Same Infrastructure as Iray
(using DiCE)
Minimal Work
within the OptiX App
CONNECTION APIRTremotedevice rdev;rtRemoteDeviceCreate("url", "user", "password", &rdev));
unsigned int num_configs;rtRemoteDeviceGetAttribute(rdev, RT_REMOTEDEVICE_ATTRIBUTE_NUM_CONFIGURATIONS,
sizeof(unsigned int), &num_configs);
int vca_config_index = chooseConfig(num_configs);
rtRemoteDeviceReserve(rdev, vca_num_nodes, vca_config_index);
int ready;do {
rtRemoteDeviceGetAttribute(*rdev, RT_REMOTEDEVICE_ATTRIBUTE_STATUS, sizeof(int), &ready);if(ready != RT_REMOTEDEVICE_STATUS_READY) sleep(10);
} while(ready != RT_REMOTEDEVICE_STATUS_READY);
rtContextCreate(context);rtContextSetRemoteDevice(*context, rdev));
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/Beckman Institute,
U. Illinois at Urbana-Champaign
S5246—Innovations in OptiXGuest Presentation: Integrating OptiX in VMD
John E. Stone
Theoretical and Computational Biophysics Group
Beckman Institute for Advanced Science and Technology
University of Illinois at Urbana-Champaign
http://www.ks.uiuc.edu/
S5246, GPU Technology Conference
15:00-15:50, Room LL21E, San Jose Convention Center,
San Jose, CA, Wednesday March 18, 2015
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/Beckman Institute,
U. Illinois at Urbana-Champaign
VMD – “Visual Molecular Dynamics”Goal: A Computational Microscope
Study the molecular machines in living cells
Ribosome: target for antibiotics Poliovirus
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/Beckman Institute,
U. Illinois at Urbana-Champaign
Lighting ComparisonTwo lights, no
shadows
Two lights,
hard shadows, 1
shadow ray per light
Ambient occlusion
+ two lights,
144 AO rays/hit
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/Beckman Institute,
U. Illinois at Urbana-Champaign
VMD Chromatophore Rendering on Blue
Waters• New representatinos, GPU-accelerated
molecular surface calculations, memory-
efficient algorithms for huge complexes
• VMD GPU-accelerated ray tracing engine
w/ CUDA+OptiX+MPI+Pthreads
• Each revision: 7,500 frames render on
~96 Cray XK7 nodes in 290 node-hours,
45GB of images prior to editing
GPU-Accelerated Molecular Visualization on Petascale Supercomputing Platforms.
J. E. Stone, K. L. Vandivort, and K. Schulten. UltraVis’13, 2013.
Visualization of Energy Conversion Processes in a Light Harvesting Organelle at Atomic Detail. M. Sener, et al. SC'14 Visualization and Data Analytics Showcase, 2014. ***Winner of the SC'14 Visualization and Data Analytics Showcase
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/Beckman Institute,
U. Illinois at Urbana-Champaign
VMD 1.9.2 Interactive GPU Ray Tracing
• Ray tracing heavily used for VMD
publication-quality images/movies
• High quality lighting, shadows,
transparency, depth-of-field focal
blur, etc.
• VMD now provides –interactive–
ray tracing on laptops, desktops,
and remote visual supercomputers
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/Beckman Institute,
U. Illinois at Urbana-Champaign
Scene Graph
VMD TachyonL-OptiX Interactive RT w/
Progressive Rendering
RT Rendering Pass
Seed RNGs
TrBvh
RT Acceleration
Structure
Accumulate RT samples
Normalize+copy accum. buf
Compute ave. FPS,
adjust RT samples per pass Output Framebuffer
Accum. Buf
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/Beckman Institute,
U. Illinois at Urbana-Champaign
VMD Scene
VMD TachyonL-OptiX:
Multi-GPU on a Desktop or Single Node
Scene Data Replicated,
Image Space Parallel Decomposition
onto GPUs
GPU 0
TrBvh
RT Acceleration
Structure
GPU 3
GPU 2
GPU 1
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/Beckman Institute,
U. Illinois at Urbana-Champaign
Scene Graph
VMD TachyonL-OptiX Interactive RT w/
OptiX 3.8 Progressive API
RT Rendering Pass
Seed RNGs
TrBvh
RT Acceleration
Structure
Accumulate RT samples
Normalize+copy accum. buf
Compute ave. FPS,
adjust RT samples per pass Output Framebuffer
Accum. Buf
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/Beckman Institute,
U. Illinois at Urbana-Champaign
Scene Graph
VMD TachyonL-OptiX Interactive RT w/
OptiX 3.8 Progressive API
RT Progressive Subframe
rtContextLaunchProgressive2D()
TrBvh
RT Acceleration
Structure
rtBufferGetProgressiveUpdateReady()
Draw Output Framebuffer
Check for User Interface Inputs,
Update OptiX Variables
rtContextStopProgressive()
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/Beckman Institute,
U. Illinois at Urbana-Champaign
VMD Scene
VMD TachyonL-OptiX:
Multi-GPU on NVIDIA VCA Cluster
Scene Data Replicated,
Image Space / Sample Space Parallel
Decomposition onto GPUs
VCA 0:
8 K6000 GPUs
VCA N:
8 K6000 GPUs
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/Beckman Institute,
U. Illinois at Urbana-Champaign
Future Work• Improved performance / quality trade-offs in
interactive RT stochastic sampling strategies
• Optimize GPU scene DMA and BVH regen speed for
time-varying geometry, e.g. MD trajectories
• Continue tuning of GPU-specific RT intersection
routines, memory layout
• GPU-accelerated movie encoder back-end
• Interactive RT combined with remote viz on HPC
systems, much larger data sizes
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/Beckman Institute,
U. Illinois at Urbana-Champaign
Acknowledgements• Theoretical and Computational Biophysics Group, University of Illinois at
Urbana-Champaign
• NVIDIA CUDA Center of Excellence, University of Illinois at Urbana-Champaign
• NVIDIA CUDA team
• NVIDIA OptiX team
• NCSA Blue Waters Team
• Funding:
– DOE INCITE, ORNL Titan: DE-AC05-00OR22725
– NSF Blue Waters: NSF OCI 07-25070, PRAC “The Computational Microscope”, ACI-1238993, ACI-1440026
– NIH support: 9P41GM104601, 5R01GM098243-02
NIH BTRC for Macromolecular Modeling and Bioinformatics
http://www.ks.uiuc.edu/Beckman Institute,
U. Illinois at Urbana-Champaign
REGISTERED DEVELOPER PROGRAMAccess latest OptiX version
Access private beta releases
Tighter communication with OptiX developers
https://developer.nvidia.com/optix
MORE OPTIX TALKSSessionTitle Day Start End Room Speaker
S5659 Accelerating Mountain Bike Development with Optimized Design Visualization Tuesday 13:30 13:55 LL21A Geoff Casey
S5188 FurryBall RT: New OptiX Core and 30x Speed Up Tuesday 15:00 15:25 LL21D Jan Tománek
S5643 Advanced Rendering Solutions from NVIDIA Tuesday 15:30 16:20 LL21E Phillip Miller
S5622 Dekko: A Framework for Real-Time Preview for VFX Wednesday 9:30 9:55 LL21D Damien Fagnou
S5644 Flexible Cluster Rendering with NVIDIA VCA Wednesday 10:00 10:50 LL21E Phillip Miller
S5541 CATIA Live Rendering Iray and NVIDIA VCA Wednesday 10:00 10:50 LL21A Pierre Maheut
S5409 Custom Iray Applications and MDL for Consistent Visual Appearance Wednesday 14:00 14:50 Ll21E Dave Hutchinson
S5246 Innovations in OptiX Wednesday 15:00 15:50 LL21E David McAllister
S5628 Simulation-Based CGI for Automotive Applications Wednesday 16:00 16:25 LL21A Benoit Deschamps
S5386 VMD: Publication-Quality Ray Tracing of Molecular Graphics with OptiX Thursday 9:00 9:25 LL21E John Stone
S5416 Accelerad: Daylight Simulation for Architectural Spaces Using GPU Ray Tracing Thursday 14:00 14:25 LL21E Nathaniel Jones
S5210 GPU-Accelerated Spectral Caustic Rendering of Homogeneous Caustic Objects Thursday 14:30 14:55 LL21E Budianto Tandianus