Ben Simons Data Arena Lead Developer University of Technology, … · 2013-09-09 · Isilon (now...
Transcript of Ben Simons Data Arena Lead Developer University of Technology, … · 2013-09-09 · Isilon (now...
Visualisation of Large Datasets with Houdini
Ben Simons
Data Arena Lead DeveloperUniversity of Technology, Sydney
[email protected]@acm.org
Today's Outline - Big Data
1. Some strategies used in Film Visual FX
2. Visualisation Techniques in Houdini
3. VFX Data Formats & Disk Systems
Happy Feet 2
● 2 Petabytes (2,000,000 GB)
● 3D Stereo HD images
● Render: 18,000 cpu cores
● Parallel access to data● HDF5 data on Bluearc & Isolon
NAS Disk Systems
● Linux software: Maya, Houdini, Naiad, Nuke, 3Delight
● Entirely made at Carriageworks in Sydney at Dr D Studios
Resident Evil 3 Extinction● The Desert Undead: 18-layer images (Rman AOV's)
● Each single image frame was split into 96 tiles
● Rendered on 96 machines, then each frame tile-joined
Houdini Chops
● Channel is a column of data
● Plain textfiles ok – separate columns with tabs
● Interactive Channel graph (zoom in)
● Visual programming
● Filtering, Sampling, shading, instancing, and rendering
● Hands-on tomorrow will be Chops & Vops
Spitzer Glimpse Datasethttp://data.spitzer.caltech.edu/popular/glimpse/20070416_enhanced_v2/source_lists/south/
Spitzer Space Telescope GLIMPSE Dataset
● South: ~300 files, 78 different Channels, 145K rows
● gzipped .tbl data loaded into Houdini
● Houdini Chops used to filter & calc 'colours'
– Show difference of infra-red magnitude bands● Point colours and scales calculated by VOPs SIMD
Shaders
● Houdini Movie Rendered (Mantra PBR)
– 36M points, filtered <12M
Shading & VOP's
● A shader is a mini-program which makes data● It can be better to generate data than load it.● Shaders allow additional level of management● Geom shaders on HF2 generated 1 billion snow
particles per image frame (impossible to load).● Houdini VOP's are SIMD
Instancing
● Saves Memory & I/O by re-using geometry● Copies generated at render time● Each Instance can be varied based on point
attributes● Referencing one “instance object” provides a
massive data reduction
Adaptive Meshes, LOD, Caching & Filtering
● Data reduction techniques● Level of Detail (distance from camera)● Adaptive Meshes● Cache common files locally● Filter texture (images) - Mipmapping
Other tricks -Baked Lighting & Shadows
● Pre-calculate lighting & shadows
● “bake” new textures & reapply onto geom
● Sydney Harbour Multi-Beam Sonar Survey, 30cm data.
● Interactive 3D Fly-through
Know ur Limits: Memory & I/O
● I/O will Bottleneck - Partition the problem & then scale it up
– Split job across many independent machines (eg. render)– Segment data access for each machine (eg. HDF5)
● Alternate memory hardware
● Vector (array) processor - SIMD
– as Cray, now intel SSE/MMX and Nvidia GPU– IBM Cell Processor has Vector Processor
● Content-Addressable Memory
– “associative arrays” are used by Network Routers
Types of System Memory
● Virtual Memory
● Swapping is good, thrashing is bad● SMP vs MPI
● SMP Symmetric Multiprocessing: Multiple CPU's with common/shared memory. Multi-threaded apps.
– eg. Intel Xeon, Core 2 Duo are SMP.– Cache coherency, snooping bus (on distributed SM)
ccNUMA● MPI (Message Passing) PVM Clusters, Beowulf, etc
(Memory not shared)
Data Formats● HDF5 “Heirachical Data Format”
● www.hdfgroup.org
● Browsable container of data (HDFView)● Has “groups & datasets” like “dirs & files”● Data stored in B-Trees● Can also store Binary Data
● HDF5 for Python www.h5py.org● Operate on HDF5 data via python dictionaries
& NumPy arrays - www.numpy.org
Disk Systems
● Network Attached Storage (NAS)● Bluearc (now Hitachi) implemented via FPGA● Isilon (now EMC) clustered filesystem, 100GB/s
– Multiple SSD nodes & maintains global file coherency
● Lustre Filesystem● Experimental Parallel distributed filesystem – can
have multiple copies of a file, one master.
● Venti (Bell Labs Plan-9 & Inferno)– WORM Archive. Shares Blocks by secure SHA-1 Hash.
Data Formats 2
● Open VDB www.openvdb.org● Hierachical structure for volumetric data (“clouds”)● Good for sparse volumetric time-varying data● Fast access (constant-time) to voxels● Large set of operators (Level Set tools, filters,
transforms & morphological operators)
Data Formats 3
● Disney Ptex eliminates uv texture assignment● http://ptex.us/● no (u,v)'s required! no seams visible● works on sub-d/poly faces● Stores face adjacency data & filters● Efficiently stores 106 mipmapped texture files● Multi-channels, compressed separately● Used in Disney's “Bolt”
“D3” Data-Driven Documents
● D3 – An amazing Data visualisation web framework (javascript)
● http://d3js.org● See: https://github.com/mbostock/d3/wiki/Gallery
● Offers Parallel Coordinates
● Demo ? Nutrient Contents - An interactive visualization of the USDA Nutrient Database.
http://exposedata.com/parallel/