A Survey of Compressed GPU-based Direct Volume Rendering

1. www.crs4.it/vic/ Visual Computing Group A Survey of Compressed GPU-based Direct Volume Rendering Enrico Gobbetti Jose Daz Fabio Marton June 2015

2. www.crs4.it/vic/ Visual Computing Group A Survey of Compressed GPU-based Direct Volume Rendering Enrico Gobbetti Jose Daz Fabio Marton June 2015

3. E. Gobbetti, F. Marton and J. Diaz Goal and motivation Massive volumetric models anywhere! Deal with time and memory limits Compute compact representation of volumetric models suitable for fast GPU rendering Focus on rectilinear scalar grids 3

4. E. Gobbetti, F. Marton and J. Diaz Visualization The use of computer-supported, interactive, visual representations of (abstract) data to amplify cognition. Volume visualization Discrete data samples in the 3D space Represented as a 3D rectilinear grid Applications Medical science Engineering Earth science 3D grid of samples Evidence to study Output image Visualization Pipeline Introduction

5. E. Gobbetti, F. Marton and J. Diaz Introduction http://gallery.ensight.com http://www.vizworld.com Visualization The use of computer-supported, interactive, visual representations of (abstract) data to amplify cognition. Volume visualization Discrete data samples in the 3D space Represented as a 3D rectilinear grid Applications Medical science Engineering Earth science

6. E. Gobbetti, F. Marton and J. Diaz Introduction Rendering methods Indirect Volume Rendering Isosurface extraction Data mapped to geometric primitives Rendering of geometry Direct Volume Rendering (DVR) No need of isosurface extraction Data mapped to optical properties Direct visualization by compositing optical properties (color, opacity)

7. E. Gobbetti, F. Marton and J. Diaz Introduction Direct Volume Rendering Optical properties assigned by transfer functions Color and opacity Illumination may be added Typically Phong shading model Most used method: ray casting Sampling along viewing rays Compositing optical properties of the samples for each ray

10. E. Gobbetti, F. Marton and J. Diaz Introduction Visualization challenges Present the information in a proper way Transfer functions definition Highlight features of interest Deal with occlusions of outer elements Convey the spatial arrangement Deal with the ever-increasing size of the data Overcome hardware limitations Bandwith Memory consumption Provide a real-time exploration of the data

11. E. Gobbetti, F. Marton and J. Diaz Big Data In information technology, big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, analysis and visualization. Our interest Very large 3D volume data 11

12. E. Gobbetti, F. Marton and J. Diaz Big Data Volume data growth 12 64 x 64 x 64 [Sabella 1988] 256 x 256 x 256 [Krger 2003] 21494 x 25790 x 1850 [Hadwiger 2012]

13. E. Gobbetti, F. Marton and J. Diaz Data size examples 13 Year Paper Data size Comments 2002 Guthe et al. 512 x 512 x 999 (500 MB) 2,048 x 1,216 x 1,877 (4.4 GB) multi-pass, wavelet compression, streaming from disk 2003 Krger & Westermann 256 x 256 x 256 (32 MB) single-pass ray-casting 2005 Hadwiger et al. 576 x 352 x 1,536 (594 MB) single-pass ray-casting (bricked) 2006 Ljung 512 x 512 x 628 (314 MB) 512 x 512 x 3396 (1.7 GB) single-pass ray-casting, multi-resolution 2008 Gobbetti et al. 2,048 x 1,024 x 1,080 (4.2 GB) ray-guided ray-casting with occlusion queries 2009 Crassin et al. 8,192 x 8,192 x 8,192 (512 GB) ray-guided ray-casting 2011 Engel 8,192 x 8,192 x 16,384 (1 TB) ray-guided ray-casting 2012 Hadwiger et al. 18,000 x 18,000 x 304 (92 GB) 21,494 x 25,790 x 1,850 (955 GB) ray-guided ray-casting visualization-driven system 2013 Fogal et al. 1,728 x 1,008 x 1,878 (12.2 GB) 8,192 x 8,192 x 8,192 (512 GB) ray-guided ray-casting

14. E. Gobbetti, F. Marton and J. Diaz Scalability Traditional HPC, parallel rendering definitions Strong scaling (more nodes are faster for same data) Weak scaling (more nodes allow larger data) Our interest/definition: output sensitivity Running time/storage proportional to size of output instead of input Computational effort scales with visible data and screen resolution Working set independent of original data size 14

15. E. Gobbetti, F. Marton and J. Diaz Large-Scale Visualization Pipeline 15

16. E. Gobbetti, F. Marton and J. Diaz Scalability Issues 16 Scalability issues Scalable method Data representation and storage Multi-resolution data structures Data layout, compression Work/data partitioning In-core/out-of-core Parallel, distributed Work/data reduction Pre-processing On-demand processing Streaming In-situ visualization Query-based visualization

17. E. Gobbetti, F. Marton and J. Diaz Compressed DVR Example 17 Chameleon: 1024^3@16bit (2.1GB); Supernova: 432^360timesteps@oat (18GB) Compression-domain Rendering from Sparse-Coded Voxel Blocks (EUROVIS2012)

18. E. Gobbetti, F. Marton and J. Diaz Many use cases despite current growth in graphics memory Still many too large datasets Hires models, multi-scalar data, time-varying data, multi- volume visualization, Mobile graphics Streaming, local rendering Need for loading entire datasets on graphics memory Fast animation Cloud renderers with many clients 18

19. E. Gobbetti, F. Marton and J. Diaz Overview

20. E. Gobbetti, F. Marton and J. Diaz Compact Data Representation Models 20

21. E. Gobbetti, F. Marton and J. Diaz Processing Methods and Architectures 21

22. E. Gobbetti, F. Marton and J. Diaz Rendering Methods and Architectures 22

23. E. Gobbetti, F. Marton and J. Diaz COMPACT DATA REPRESENTATION MODELS AND PROCESSING A Survey of Compressed GPU-based Direct Volume Rendering Next Session: 23

24. www.crs4.it/vic/ Visual Computing Group Enrico Gobbetti Jose Daz Fabio Marton June 2015 A Survey of Compressed GPU-based Direct Volume Rendering Compact Data Representation Models

25. E. Gobbetti, F. Marton and J. Diaz Compression-domain DVR Trade-off: achieve high compression ratio high image quality Reconstruct in real-time fast data access fast computations 25

26. E. Gobbetti, F. Marton and J. Diaz Bases and Coefficients 26 show relationship between bases and original

27. E. Gobbetti, F. Marton and J. Diaz Types of Bases 27 pre-defined bases learned bases DFT, DCT, DWT eigenbases (KLT, TA), dictionaries

28. E. Gobbetti, F. Marton and J. Diaz Modeling Stages 28 stages that can be applied

29. E. Gobbetti, F. Marton and J. Diaz Compact Models in DVR Pre-defined bases discrete Fourier transform (1990;1993) discrete Hartley transform (1993) discrete cosine transform (1995) discrete wavelet transform (1993) laplacian pyramid/transform (1995;2003) Burrows-Wheeler transform (2007) Learned bases Karhunen-Loeve transform (2007) tensor approximation (2010;2011) dictionaries for vector quantization (1993;2003) dictionaries for sparse coding (2012) fractal compression (1996) 29 most models are lossy or near lossless (best achievement), but often used in a lossy version

30. E. Gobbetti, F. Marton and J. Diaz Compact Models in DVR Pre-defined bases discrete Fourier transform discrete Hartley transform discrete cosine transform discrete wavelet transform laplacian pyramid/transform Burrows-Wheeler transform Learned bases Karhunen-Loeve transform tensor approximation dictionaries for vector quantization dictionaries for sparse coding fractal compression 30

31. E. Gobbetti, F. Marton and J. Diaz Wavelet Transform Use a wavelet to represent data similar to Fourier transform (represent data with sines/cosines) A wavelet is a set of basis functions high-pass and low-pass filtering Data transform into frequency domain AND spatial domain enhanced compared to Fourier transform (only frequency domain) 31

32. E. Gobbetti, F. Marton and J. Diaz 32 2D Wavelet Example multilevel DWT low frequency coefficients of level l high frequency coefficients of level l

33. E. Gobbetti, F. Marton and J. Diaz Wavelet full reconstruction 34

34. E. Gobbetti, F. Marton and J. Diaz Quantized wavelet reconstruction 1 35

38. E. Gobbetti, F. Marton and J. Diaz 39 Dictionaries

39. E. Gobbetti, F. Marton and J. Diaz 40 dictionary Dictionaries ... words used to represent an approximated volume

40. E. Gobbetti, F. Marton and J. Diaz 41 Dictionaries Basic idea many subblocks have a similar pattern search for each subblock a codeword that best represents this subblock volume is represented with index list fast decompression (table-lookup) codewords can be pre-defined or learned lossless or lossy Main challenge find good codewords find a fast algorithm for dictionary generation store indices to codewords M codewords volume divided into subblocks

41. E. Gobbetti, F. Marton and J. Diaz 42 Vector Quantization (VQ) Vector represents subblock Limit number of codewords M

42. E. Gobbetti, F. Marton and J. Diaz DICTIONARY Vector Quantization 43

43. E. Gobbetti, F. Marton and J. Diaz Hierarchical VQ For each block store: Average Value Delta to first level Delta to second level 45 LaplacianFilter

44. E. Gobbetti, F. Marton and J. Diaz Store differences in a multiresolution data structure block-based two-level laplacian pyramid Produce blocks of different frequency bands High frequency bands are compressed using VQ Covariance approach for dictionary generation splitting by PCA Similar: VQ and Karhunen-Loeve transform (KLT) 46 Hierarchical VQ

45. E. Gobbetti, F. Marton and J. Diaz Sparse Coding Data-specific representation in which each block is represented by a sparse linear combination of few dictionary elements State-of-the-art in sparse coding [Rubinstein et al. 10] Compression is achieved by storing just indices and their magnitudes 47

46. E. Gobbetti, F. Marton and J. Diaz DICTIONARY Sparse Coding 48 Generalization of VQ: data is represented with a linear combination of codewords B = c0 * D[a0] + c1 * D[a1] + c2 * D[a2]

47. E. Gobbetti, F. Marton and J. Diaz Sparse Coding The problem Generalization of vector quantization Combine vectors instead of choosing single ones Overcomes limitations due to dictionary sizes Generalization of data-specific bases Dictionary is an overcomplete basis Sparse projection 49

48. E. Gobbetti, F. Marton and J. Diaz Sparse Coding Represent each block as linear combination of s blocks from an overcomplete dictionary The dictionary can be learned or predefined (e.g., wavelet) Learned dictionaries typically perform better (but optimal ones are hard to compute) Basic idea is that the dictionary is sparsifying Few dictionary elements can be combined to get good approximations Combines the benefits of Dictionaries Projection onto bases 50

49. E. Gobbetti, F. Marton and J. Diaz 53 Higher-order singular value decomposition here: so-called Tucker model used for TA (eigenbases) Tensor Approximation (TA) coefficients (core tensor) bases (factor matrices) approximation spatial dimensions multilinear ranks

50. E. Gobbetti, F. Marton and J. Diaz 55 Feature Extraction with WT and TA 2200000 coeff. 310000 coeff. 57500 coeff. 16500 coeff. original size: 2563 = 16777216 T multiscale dental growth structures

51. E. Gobbetti, F. Marton and J. Diaz Truncation and Quantization Lossy typically, we loose most information during this stage Truncation threshold insignificant coefficients threshold insignificant bases Quantization floating point to integer conversion vector quantization (fewer codewords) 56

52. E. Gobbetti, F. Marton and J. Diaz 57 Example: Truncate TA Bases truncate high ranks truncate low ranks

53. E. Gobbetti, F. Marton and J. Diaz K-SVD vs HVQ vs Tensor Comparison of state-of-the-art GPU-based decompression methods

54. E. Gobbetti, F. Marton and J. Diaz Sparse Coding vs HVQ vs Tensor Comparison of state-of-the-art GPU-based decompression methods SPARSE CODING HVQ TA

55. E. Gobbetti, F. Marton and J. Diaz Sparse Coding Compression Results 60

56. E. Gobbetti, F. Marton and J. Diaz 61 Encoding Models Lossless RLE count number of equal values Entropy encoding short code for frequent words; long code for infrequent words Huffman coding, arithmetic coding, ... avoid inconvenient data access fixed-length Huffman coding combined with RLE significance maps (wavelets) decode before rendering Many Hybrids

57. E. Gobbetti, F. Marton and J. Diaz 62 Hardware Encoding Models Block truncation coding (texture compression) volume texture compression (VTC); 3D extension of S3TC ETC2 adaptive scalable texture (AST)

58. E. Gobbetti, F. Marton and J. Diaz 63 Time-varying Data Compression even needed more Three categories extend compression approach from 3D to 4D use other approach for 4th dimension encode each time step separately Hybrids of the above

59. E. Gobbetti, F. Marton and J. Diaz 64 Model Summary Wavelet transform ... is good for smoothed averaging (multiresolution) Vector quantization ... makes a very fast reconstruction possible fast data access can be combined with various methods to generate the dictionary Sparse coding sparse linear combination of dictionary words fast decompression, high compression rates Tensor approximation ... is good for multiresolution and multiscale DVR feature extraction at multiple scales

60. E. Gobbetti, F. Marton and J. Diaz 65 Conclusions Recently mostly learned bases variants of eigenbases are popular hybrid forms of dictionaries (VQ, multiresolution, sparse coding) Variable compression ratio from near lossless to extremely lossy Hardware accelerated models block truncation coding (texture compression models) Various stages of compact modeling CAN be applied transform or decomposition truncation quantization encoding

61. www.crs4.it/vic/ Visual Computing Group A Survey of Compressed GPU-based Direct Volume Rendering Processing and Encoding Enrico Gobbetti Jose Daz Fabio Marton June 2015

62. E. Gobbetti, F. Marton and J. Diaz Processing and Encoding 67

63. E. Gobbetti, F. Marton and J. Diaz A scalable algorithm! 68 Able to face huge datasets O(10^10) samples, and more Scalable approach in terms of Memory Times Required features: Out-of-Core Asymmetric codecs Parallel settings High quality coding (depends on representation) Input Volume Bricked Compressed Volume

64. E. Gobbetti, F. Marton and J. Diaz Static and time-varying data Static datasets Handle 3D volume typically uses 3D bricked representation Mostly octree or flat multires blocking Dynamic datasets Uses techniques similar to static dataset Three approaches Separate encoding of each timestep Static compression method extended from 3D to 4D data Treat in different way 3D data and the time dimension as in video encoding 69

65. E. Gobbetti, F. Marton and J. Diaz Output volume Bricked Fast local and transient decompression Visibility Empty space skipping Early ray termination Compressed Bandwidth GPU memory LOD adaptive rendering 70 Compressed block

66. E. Gobbetti, F. Marton and J. Diaz Output volume Bricked Fast local and transient decompression Visibility Empty space skipping Early ray termination Compressed Bandwidth GPU memory LOD adaptive rendering 71 Compressed block

67. E. Gobbetti, F. Marton and J. Diaz Processing: two phases Processing depends on data representation Pre-defined bases Learned bases Global processing (some methods) Access all the dataset (works on billions of samples) Learn some representation Local Processing (all methods) Access independent blocks Encode [possibly using global phase representation] 72

69. E. Gobbetti, F. Marton and J. Diaz Global Processing Find representation by globally analyzing whole dataset Early transform coding methods [Levoy92, Malzbender93] but modern methods apply transforms to small regions All dictionary based methods Also combined with other transformations Main problems Analyze all data within available resources Huge dataset introduce numerical instability problems 74

70. E. Gobbetti, F. Marton and J. Diaz Finding a dictionary for huge datasets: solutions Online learning Streaming all blocks Splitting into manageable tiles Separate dictionary per tile Coreset / importance sampling Work on representative weighted subsets 75

71. E. Gobbetti, F. Marton and J. Diaz Global phase: Online Learning 76 State Processor Dictionary N Iterations Streaming all input blocks How it works Does not require concurrent access to all elements Stream over all elements Keep only local statistics => bounded memory! Iterates streaming multiple times

72. E. Gobbetti, F. Marton and J. Diaz Global phase: Online Learning Main example: Vector Quantization Find blocks representative of all the elements Classic approach: generalized Lloyd algorithm: [Linde 80, Lloyd 82] Based on 2 conditions: Given a codebook the optimal codevector is found with nearest neighbor Given a codevector the optimal codebook is the centroid of the partition Starting from initial partition and iterate until convergence: Associate all the vectors to corresponding cells Cells are updated to the centroids 77

73. E. Gobbetti, F. Marton and J. Diaz Global phase: Online Learning Vector quantization: convergence depends on initial partition Good starting point: initial codevectors based on PCA analysis (also in streaming!) [Schneider 03,Knittel et al. 09] Each new seed is inserted in cell with max residual distortion; split plane selected with PCA of the cell producing two cells with similar residual distortion Also applied hierarchically [Schneider 03] Similar concepts applied to online learning for sparse coding [Mairal 2010, Skretting 2010] 78

74. E. Gobbetti, F. Marton and J. Diaz Global phase: Online Learning Pros Easy to implement Does not incur in memory limitations Cons Slow due to number of iterations on many data values (slow convergence) Slow because of out-of-core operations Used by Vector quantization Sparse coding (but better results working on the full dataset) 79

75. E. Gobbetti, F. Marton and J. Diaz Global phase: Splitting Split dataset into manageable tiles Process each tile independently Produce representation for each tile 80 Dictionary Global Processor Dictionary Global Processor Dictionary Global Processor Dictionary Global Processor Dictionary Global Processor Dictionary Global Processor Dictionary Global Processor Dictionary Global Processor

76. E. Gobbetti, F. Marton and J. Diaz Global phase: Splitting Example: Transform Coding [Fout 07] Whole dataset subdivided in tiles, for each tile Transformed with Karhunen-Loeve Low frequencies contain significative representation Encode with VQ (then decoded) Residual encoded, iterated to store 3 representations: Low-pass Band-pass High-pass Low-pass enough to encode most of the elements Inverse transform computationally intensive: Data is reprojected during preprocessing. Reprojections stored into codebook instead of coefficients 81

77. E. Gobbetti, F. Marton and J. Diaz Global phase: Splitting Pros Reasonably simple to implement Permit to solve with global algorithm High quality per tile Cons Discontinuity at tile boundaries Need to handle multiple contexts at rendering time Many active dictionary could lead to cache overflow at rendering time Scalability limits Used by Most dictionary based methods 82

78. E. Gobbetti, F. Marton and J. Diaz Dictionary Global phase: Coreset/Importance Sampling Build dictionary on smartly subsampled and reweighted subset of input data Coreset concept [Agarwal05] Take a few representative subblocks from original model Higher probability for more representative samples Sample reweighting to remove bias Use global method on coreset 83 Global Processor Importance sampler

79. E. Gobbetti, F. Marton and J. Diaz Global phase: Coreset/Importance Sampling Example: learning dictionary for sparse coding of volume blocks K-SVD algorithm is extremely efficient on moderate size data [Aharon et al. 2006] K-means generalization alternating sparse coding of signals in X, producing given current D updates of D given the current sparse representations Update phase needs to solve linear systems of size proportional to number of training signals! Use coreset concept! [Feldman & Langberg 11, Feigin et al. 11, Gobbetti 12] 84

80. E. Gobbetti, F. Marton and J. Diaz Global phase: Coreset/Importance Sampling We associate an importance to each of the original blocks , being the standard deviation of the entries in Picking C elements with probability proportional to See [Gobbetti 12] methods for coreset building and training using few streaming passes Non-uniform sampling introduces a severe bias Scale each selected block by a weight where is the associated probability Applying K-SVD to scaled coefficients will converge to a dictionary associated with the original problem 85

81. E. Gobbetti, F. Marton and J. Diaz Global phase: Coreset/Importance Sampling Coreset scalability 86

82. E. Gobbetti, F. Marton and J. Diaz Global phase: Coreset/Importance Sampling Pros Proved to work extremely well for KSVD Extremely scalable No extra run-time overhead Cons Importance concept needs refinement especially for structured data Tested so far only on limited datasets and few representations. Needs more testing in multiple contexts Used by KSVD, VQ, HVQ 87

84. E. Gobbetti, F. Marton and J. Diaz Local phase: Independent Block Encoding Input model M is split into blocks B0Bi For each block Bi Possibly use information gathered during the global phase (e.g. dictionaries) Find the transformation/combination/representation that best fits the data Store in the DB 89 Local Processing DB B0 Bi M B0 B1 B2 B3

85. E. Gobbetti, F. Marton and J. Diaz Local phase: Compression models Pre-defined bases discrete Fourier transform (1990;1993) discrete Hartley transform (1993) discrete cosine transform (1995) discrete wavelet transform (1993) laplacian pyramid/transform (1995;2003) Burrows-Wheeler transform (2007) Learned bases Karhunen-Loeve transform (2007) tensor approximation (2010;2011) dictionaries for vector quantization (1993;2003) dictionaries for sparse coding (2012) fractal compression (1996) 90

86. E. Gobbetti, F. Marton and J. Diaz Local phase: Compression models Pre-defined bases discrete Fourier transform (1990;1993) discrete Hartley transform (1993) discrete cosine transform (1995) discrete wavelet transform (1993) laplacian pyramid/transform (1995;2003) Burrows-Wheeler transform (2007) Learned bases Karhunen-Loeve transform (2007) tensor approximation (2010;2011) dictionaries for vector quantization (1993;2003) dictionaries for sparse coding (2012) fractal compression (1996) 91

87. E. Gobbetti, F. Marton and J. Diaz Local phase: Discrete Wavelet Transform Projecting into a set of pre-defined bases Concentrates most of the signal in a few low-frequency components Quantizing the transformed data can significantly reduce data size Error threshold Entropy coding techniques are used for further reducing size Exploiting data correlation 92

88. E. Gobbetti, F. Marton and J. Diaz Local phase: Discrete Wavelet Transform B-Spline wavelets [Lippert 97] Wavelet coefficients and positions (splatting) Delta encoding, RLE and Huffman coding Hierarchical partition (blocks/sub-blocks) 3D Haar wavelet, delta encoding, quantization, RLE and Huffman [Ihm 99, Kim 99] Biorthogonal 9/7-tap Daubechies and biorthogonal spline wavelets [Nguyen 01, Guthe 02] FPGA Hw deco [Wetekam 05] 3D dyadic wavelet transform [Wu 05] Separated correlated slices Temporal/spatial coherence [Rodler 99] 93 Wavelets are not trivial to decode in GPU

89. E. Gobbetti, F. Marton and J. Diaz Local phase: Tensor approximation Tensor decomposition is applied to N-dim input dataset Tucker model [Suter 11] Product of N basis matrices and a N-dimensional core tensor HOSVD (Higher-order singular value decomposition) Applied along every direction of input data Iterative process for finding the core tensor Represents a projection of the input dataset Core tensor coefficients show the relationship between original data and bases 94

90. E. Gobbetti, F. Marton and J. Diaz Local phase: Tensor approximation Tensor decomposition (Tucker model) TA = B x U(1) x U(2) x U(3) Decomposition obtained through HOSVD or HOOI (higher-order orthogonal iteration) HOOI also produces rank-reduction Target is core tensor B (B is element of RR1xR2xR3 with Ri < Ii) After each iteration Ranki

A Survey of Compressed GPU-based Direct Volume Rendering

Science

Transcript of A Survey of Compressed GPU-based Direct Volume Rendering