Tutorial: Tensor Approximation in Visualization and ... · PDF fileTutorial: Tensor...
Transcript of Tutorial: Tensor Approximation in Visualization and ... · PDF fileTutorial: Tensor...
Tutorial: Tensor Approximation in Visualization and Graphics
Implementation Examples in Scientific Visualization
Renato Pajarola, Susanne K. Suter, and Roland Ruiters
Tutorial Continued...
• Tuesday May. 7 from 9:00 to 10:40• Location: Room B.1‣ Implementation Examples in Scientific Visualization (Suter,
25min)‣ Graphics Applications (Ruiters, 30min)‣ Clustering and Sparsity (Ruiters, 25min)‣ Summary/Outlook (Pajarola, 10min)
2
Outline
• Part 1: Typical decomposition algorithms/operations• Part 2: GPU-based tensor reconstruction
3
Typical TA Operations
4
tensor approximation
tensor decompositionreal-time
tensor reconstruction
I2
I1 A
I3
I2I3
I1 fAU(3)U(1) U(2)I1 I2 I3
R1 R2 R3
R1
R2R3
B
U(3)U(1) U(2)I1 I2 I3
R1 R2 R3
R1
R2R3
B
Tensor Decomposition
• Create factor matrices‣ higher-order SVD (HOSVD)
– tensor unfolding
‣ alternating least-squares (ALS) algorithms– higher-order orthogonal iteration (HOOI)– higher-order power method (HOPM)
• Generate core tensor‣ tensor times matrix (TTM) multiplications
5
I2
I1 A
I3
U(3)U(1) U(2)I1 I2 I3
R1 R2 R3
R1
R2R3
B
Tensor Reconstruction
• Realtime (!) reconstruction‣ tensor times matrix (TTM) multiplications
6
I2I3
I1 fA
U(3)U(1) U(2)I1 I2 I3
R1 R2 R3
R1
R2R3
B
Tensor: A Multidimensional Array
7
a
0th-order tensor
I1 a
i1 = 1, . . . , I1
1st -order tensor
I1 A
i2 = 1, . . . , I2
I2
2nd-order tensor
I2
I1 A
i3 = 1, . . . , I3
I3
...
3rd-order tensor
SVD Extension to Higher Orders
8
I3
I1
I2
A
• PARAFAC (parallel factors) [ Harshman, 1970 ]
• CANDECOMP (CAND) (canonical decomposition) [ Caroll & Chang, 1970 ]
CP
I1 U(1
)
I2
U(2)
I3
U(3)
R
R
R
Rcoefficients
• Three-mode factor analysis (3MFA/Tucker3) [ Tucker, 1964+1966 ]
• Higher-order SVD (HOSVD) [ De Lathauwer et al., 2000a ]
Tucker
I1 U(1
)
I2
U(2)
I3
U(3)
R1
R3
R2R3
R1
core tensor B
basis matrices U(n)
BR2
rank-R decompositionpreserved
orthonormal matricespreserved
Part 1: Typical Decomposition
Algorithms and Operations
Downloads
• MATLAB tensor toolbox ‣ http://www.sandia.gov/~tgkolda/TensorToolbox
• vmmlib: C++ library for vectors, matrices, and tensor approximation‣ http://vmml.github.io/vmmlib/
• Tensor tutorial notes‣ http://vmml.ifi.uzh.ch/links/TutorTensorAprox.html
10
Test Dataset: Hazelnut
11
• A microCT scan of dried hazelnuts• I1 = I2 = I3 = 512• Values: unsigned char (8bit)• http://vmml.ifi.uzh.ch/research/datasets.html
I2
I1 A
I3
Higher-order SVD (HOSVD)
12
start HOSVD for mode n
tensor A
stop HOSVD for mode n
set left singular vectors as U_n
RnU(n)
compute the matrix SVD on A_nA(n)
A
mode n matrix U_nU(n)
unfold A along mode n (A_n )A(n)
A
De Lathauwer, de Moor, Vandewalle. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21(4):1253–1278, 2000.
Slices of a Tensor3
13
frontal slices horizontal slices
matrix< 512, 512, values_t > slice;
t3.get_frontal_slice_fwd( 256, slice );
t3.get_horizontal_slice_fwd( 256, slice );
t3.get_lateral_slice_fwd( 256, slice );
lateral slices
[vmmlib]
Tensor Unfolding (Matricization)
14
I2
I3
A
A
I1
I1
I2
I3 I2
I3
I1
I2
I1
A
I1
I2
I3
I3
I3 I3
I2 I2
I1 I1
A(3)
A(1)
A(2)
I2 · I1
I1 · I3
I3 · I2
I1
I2
I3
A
A
A
I1
I1
I1
I2
I2
I3
I3
I2
I3
I3 I3 I3
I1 I1 I1
I2 I2 I2
A(2)
A(1)
A(3)
I2 · I3
I3 · I1
I1 · I2
Kiers. Towards a standardized notation and terminology in multiway analysis. Journal of Chemometrics, 14(3):105–122, 2000.
forward cyclic unfolding backward cyclic unfolding
De Lathauwer, de Moor, Vandewalle. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21(4):1253–1278, 2000.
Tensor Unfolding Example
15
... ...mode-1 unfolding
... ...mode-2 unfolding
... ...mode-3 unfolding
262’144
262’144
262’144
512
512
512
Higher-order SVD (HOSVD)
16
start HOSVD for mode n
tensor A
stop HOSVD for mode n
set left singular vectors as U_n
RnU(n)
compute the matrix SVD on A_nA(n)
A
mode n matrix U_nU(n)
unfold A along mode n (A_n )A(n)
A
De Lathauwer, de Moor, Vandewalle. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21(4):1253–1278, 2000.
Large Data Tensors (in vmmlib)
17
A
I1
I2
I3
! const size_t d = 512;! typedef tensor3< d,d,d, unsigned char > t3_512u_t;! typedef t3_converter< d,d,d, unsigned char > t3_conv_t;! typedef tensor_mmapper< t3_512u_t, t3_conv_t > t3map_t;
[vmmlib]
! std::string in_dir = "./dataset";! std::string file_name = "hnut512_uint.raw";! t3_512u_t t3_hazelnut;! t3_conv_t t3_conv;
! t3map_t t3_mmap( in_dir, file_name, true, t3_conv ); //true -> read-only! t3_mmap.get_tensor( t3_hazelnut );
Optimize Factor Matrices• Higher-order orthogonal iteration‣ optimize factor matrix of mode n‣ keep factor matrices of all other modes fixed‣ generate optimized data tensor
– project original data tensor on the inverted factor matrices of all other modes
‣ receive optimized mode-n factor matrix– apply HOSVD to the optimized tensor
18
De Lathauwer, de Moor, Vandewalle. On the best rank-1 and rank-(R1,R2,...,RN) approximation of higher-order tensors. SIAM Journal on Matrix Analysis and Applications, 21(4):1324–1342, 2000.
Optimize Mode-n Factor Matrix
19
invert all matrices, but mode n
start mode-n optimization
multiply tensor with all inverted
matrices (TTMs)
optimized tensor A'
tensor aa, matrices Uu
stop mode-n optimization
U(n)A
Pn
I2
R2
I1
A
U(2)T
I2
I3
TR2
I1 I3
R3 U(3)T
R2
I3
R3
T
PI1R2
Higher-order Orthogonal Iteration (HOOI)
20
convergence?
init matrices U(random, HOSVD)
compute max Frobenius norm
set convergence criteria
input tensor A
mode-n optimization
compute new mode-n matrix
(HOSVD on Aab)
yes
no
compute core tensor hh
compute fit
stop iterations
start HOOI ALS
matrices uua, core tensor B
mode-n optimized tensor PPPP
A
U(n)
U(n)
B
B
Pn
Pn
Tensor Times Matrix Multiplication
21
C
B(n) In
In
A(n) JnJn
I1 · I2
I1 · I2
A(n) = CB(n)
A
B
C
In
In
I1
I1
JnJn
I2
A = B⇥n C
,
(B⇥n C)i1...ın�1 jnin+1...iN =In
Âin=1
bi1i2...iN · c jnin
n-mode product[De Lathauwer et al., 2000a]
Example TTMs: Core Computation
22
I1
I2
I1
R1
A
U(1)T B0
I2
R1
I3 I2
R2 B00
U(2)T
B0
I2
I3
I3
R2
R1 I3
R3 BU(3)T
B00I3
R1
R1
R3
R2
• Three consecutive TTM multiplications• For orthogonal matrices, use the transposes of the
three factor matrices (otherwise the (pseudo)-inverses)
B = A �1 U(1)T �2 U(2)T �3 · · ·�N U(N)TB = A ⇥1 U(1)(�1)⇥2 U(2)(�1)⇥3 · · ·⇥N U(N)(�1) orthogonal
factor matrices
Part 2: GPU Reconstruction
Suter et al.. Interactive multiscale tensor reconstruction for multiresolution volume visualization. IEEE Transactions on Visualization and Computer Graphics, 17(12):2135–2143, December 2011.
Parallel Tensor Reconstruction
24
To appear in an IEEE VGTC sponsored conference proceedings
for the outer product between the corresponding column vectors in thefactor matrices
⇥A = Âr1
Âr2
Âr3
br1r2r3 · u(1)r1 · u(2)
r2 · u(3)r3 . (A.4)
The sum of all theses weighted “subtensors” forms the approxima-tion ⇥A of the original data A (see Fig. 13).
+ ...= ... +
I3I2
I1br1r2r3
u(1)r1
u(2)r2
u(3)r3
�A
Fig. 13. Tensor reconstruction from Eq. A.4 visualized.
Another approach, is to reconstruct each element of the approxi-mated dataset individually, which we call voxel-wise reconstructionapproach. Each element �ai1i2i3 is reconstructed as shown in Eq. A.5,i.e., sum up all core coefficients multiplied with the corresponding co-efficients in the factor matrices (weighted product).
�ai1i2i3 = Âr1
Âr2
Âr3
br1r2r3 · u(1)i1r1
· u(2)i2r2
· u(3)i3r3
(A.5)
A third reconstruction approach is to build the n-mode productsalong every mode [15] (notation: B⇥n U(n)), which leads to a ten-sor times matrix (TTM) multiplication for each mode, i.e., dimension.This is analogous to the Tucker model given by Eq. A.3. The n-modeproduct between a tensor and a matrix is equivalent to a matrix prod-uct as it can be seen from Eq. A.6. In Fig. 14 we visualize the TTMapproach using n-mode products.
Y = X ⇥n U⇤ Y(n) = UX(n), (A.6)
where X(n) represents the mode-n unfolded tensor, i.e., a matrix.
I1 U(1)
R1
R3
R2�1
B
B�
R3
I1
R1
(a) TTM 1
I2
R2
U(2)
I1
R2
R3�2
B�
B��I2
I1
(b) TTM 2
I3
R3
U(3)
I2
R3
I1�3
B��
�A
I2
I3
(c) TTM 3
Fig. 14. TTM: tensor times matrix along the 3 modes (n-mode products).Backward cyclic reconstruction after Lathauwer et al. [6].
Given the fixed cost of generating an I1⇥ I2⇥ I3 grid, the computa-tional overhead factor varies from cubic rank complexity R1 ·R2 ·R3 inthe case of the progressive reconstruction (Eq. A.4) to a linear rankcomplexity R1 for the TTM or the n-mode product reconstruction(Eq. A.5). (For R = 16, the improvement to R3 = 4⌅096 is 256-fold.)
ACKNOWLEDGMENTS
This work was supported in part by a grant from the Forschungskreditof the University of Zurich, Switzerland, and by internal researchfunds at CRS4, Italy. The authors wish to thank Paul Tafforeau forthe help of acquiring the tooth dataset, which was scanned at the SLSin Switzerland (experiment number: 20080205).
REFERENCES
[1] vmmlib: A vector and matrix math library. http://vmmlib.sf.net.[2] B. W. Bader and T. G. Kolda. Algorithm 862: Matlab tensor classes
for fast algorithm prototyping. ACM Transactions on Mathematical Soft-ware, 32(4):635–653, 2006.
[3] J. Beyer, M. Hadwiger, T. Moller, and L. Fritz. Smooth mixed-resolutionGPU volume rendering. In Proc. IEEE/EG Symposium on Volume andPoint-Based Graphics, pages 163–170, 2008.
[4] T.-c. Chiueh, C.-k. Yang, T. He, H. Pfister, and A. E. Kaufman. Inte-grated volume compression and visualization. In Proc. IEEE Visualiza-tion, pages 329–336. Computer Society Press, 1997.
[5] C. Crassin, F. Neyret, S. Lefebvre, and E. Eisemann. GigaVoxels: Ray-guided streaming for efficient and detailed voxel rendering. In Proc. Sym-posium on Interactive 3D Graphics and Games, pages 15–22. ACM SIG-GRAPH, 2009.
[6] L. de Lathauwer, B. de Moor, and J. Vandewalle. A multilinear singularvalue decomposition. SIAM Journal of Matrix Analysis and Applications,21(4):1253–1278, 2000.
[7] M. Do and M. Vetterli. Pyramidal directional filter banks and curvelets.In Proc. IEEE Image Processing, volume 3, pages 158–161, 2001.
[8] M. Do and M. Vetterli. The contourlet transform: an efficient direc-tional multiresolution image representation. IEEE Trans. Im. Proc.,14(12):2091–2106, 2005.
[9] K. Engel, M. Hadwiger, J. M. Kniss, C. Rezk-Salama, and D. Weiskopf.Real-Time Volume Graphics. AK Peters, 2006.
[10] N. Fout, H. Akiba, K.-L. Ma, A. Lefohn, and J. Kniss. High quality ren-dering of compressed volume data formats. In Proceedings Eurographics,pages 77–84, Jun 2005.
[11] N. Fout and K.-L. Ma. Transform coding for hardware-accelerated vol-ume rendering. IEEE Transaction on Visualization and Computer Graph-ics, 13(6):1600–1607, 2007.
[12] E. Gobbetti, F. Marton, and J. A. I. Guitian. A single-pass GPU raycasting framework for interactive out-of-core rendering of massive volu-metric datasets. The Visual Computer, 24(7-9):797–806, Jul 2008.
[13] S. Guthe, M. Wand, J. Gonser, and W. Strasser. Interactive rendering oflarge volume data sets. In Proc. IEEE Visualization, pages 53–60, 2002.
[14] J. A. Iglesias Guitian, E. Gobbetti, and F. Marton. View-dependent ex-ploration of massive volumetric models on large scale light field displays.The Visual Computer, 26(6–8):1037–1047, 2010.
[15] T. G. Kolda and B. W. Bader. Tensor decompositions and applications.Siam Review, 51(3):455–500, Sep 2009.
[16] P. Ljung, C. Lundstrom, and A. Ynnerman. Multiresolution interblockinterpolation in direct volume rendering. In Proc. Eurographics/IEEETCVG Symposium on Visualization, pages 259–266, 2006.
[17] P. Ljung, C. Winskog, A. Persson, C. Lundstrom, and A. Ynnerman. Fullbody virtual autopsies using a state-of-the-art volume rendering pipeline.IEEE Transactions on Visualization and Computer Graphics, 12(5):869–876, Oct 2006.
[18] NVIDIA. CUDA C best practices guide, version 3.2 edition, Aug 2010.[19] NVIDIA. CUDA C programming guide, version 3.2 edition, Nov 2010.[20] R. Parys and G. Knittel. Giga-voxel rendering from compressed data on
a display wall. In WSCG, 2009.[21] F. Rodler. Wavelet based 3D compression with fast random access for
very large volume data. In Proc. Pacific Graphics, pages 108–117, 1999.[22] J. Schneider and R. Westermann. Compression domain volume rendering.
In Proc. IEEE Visualization, pages 293–300, 2003.[23] S. K. Suter, C. P. Zollikofer, and R. Pajarola. Application of tensor ap-
proximation to multiscale volume feature representations. In Proc. Vision,Modeling and Visualization, pages 203–210, 2010.
[24] Y.-T. Tsai and Z.-C. Shih. All-frequency precomputed radiance transferusing spherical radial basis functions and clustered tensor approximation.ACM Transactions on Graphics, 25(3):967–976, 2006.
[25] J. E. Vollrath, T. Schafhitzel, and T. Ertl. Employing complex GPU datastructures for the interactive visualization of adaptive mesh refinementdata. In Proc. Volume Graphics, pages 55–58, 2006.
[26] H. Wang, Q. Wu, L. Shi, Y. Yu, and N. Ahuja. Out-of-core tensor approx-imation of multi-dimensional matrices of visual data. ACM Transactionson Graphics, 24(3):527–535, Aug 2005.
[27] Q. Wu, T. Xia, C. Chen, H.-Y. S. Lin, H. Wang, and Y. Yu. Hierarchicaltensor approximation of multidimensional visual data. IEEE Transactionson Visualization and Computer Graphics, 14(1):186–199, Feb 2008.
[28] B.-L. Yeo and B. Liu. Volume rendering of DCT-based compressed 3Dscalar data. IEEE Transactions on Visualization and Computer Graphics,1(1):29–43, 1995.
9
+ ...= ... +
I3I2
I1br1r2r3
u(1)r1
u(2)r2
u(3)r3
eA
triple-for-loop
ãcomputational cost per voxel is cubic: O(R3)
ã ã ã ã ã ã ã ã
ã ã ã ã ã ã ã ã
ã ã ã ã ã ã ã ã
ã ã ã ã ã ã ã ã
ã ã ã ã ã ã ã ã
ã ã ã ã ã ã ã ã
ã ã ã ã ã ã ã ã
ã ã ã ã ã ã ã ã
parallel computing grid per brick
Faster Parallel Tensor Reconstruction
25
I1 U(1)
R1
tensor times matrix (TTM) multiplication or n-mode product
R1
R3
B
B0
R3
I1
R1
R3
R2⇥1
B
B0
R3
I1
parallel computing grid per brick
Faster Parallel Tensor Reconstruction
26
TTM1 TTM2 TTM3
store intermediate results (B’ and B’’)
I3
R3
U(3)
I2
R3
I1⇥3
B00
eA
I2
I3I2
R2
U(2)
I1
R2
R3⇥2
B0
B00I2
I1
I1 U(1)
R1
R3
R2⇥1
B
B0
R3
I1
R1
computational cost per voxel is linear: O(R)
ã
parallel computing grid per brick
Compute Intermediate Tensor B’
27
I1 U(1)
R1
R3
R2⇥1
B
B0
R3
I1
R1
b’ b’ b’ b’ b’ b’ b’ b’
b’ b’ b’ b’ b’ b’ b’ b’
b’ b’ b’ b’ b’ b’ b’ b’
parallel computing grid per brick
TTM1
Compute Intermediate Tensor B’’
28
parallel computing grid per brick
I2
R2
U(2)
I1
R2
R3⇥2
B0
B00I2
I1
I1 U(1)
R1
R3
R2⇥1
B
B0
R3
I1
R1
b’’ b’’ b’’ b’’ b’’ b’’ b’’ b’’
b’’ b’’ b’’ b’’ b’’ b’’ b’’ b’’
b’’ b’’ b’’ b’’ b’’ b’’ b’’ b’’
b’’ b’’ b’’ b’’ b’’ b’’ b’’ b’’
b’’ b’’ b’’ b’’ b’’ b’’ b’’ b’’
TTM1 TTM2
Compute Approximated Tensor Ã
29
ã ã ã ã ã ã ã ã
ã ã ã ã ã ã ã ã
ã ã ã ã ã ã ã ã
ã ã ã ã ã ã ã ã
ã ã ã ã ã ã ã ã
ã ã ã ã ã ã ã ã
ã ã ã ã ã ã ã ã
ã ã ã ã ã ã ã ã
I3
R3
U(3)
I2
R3
I1⇥3
B00
eA
I2
I3I2
R2
U(2)
I1
R2
R3⇥2
B0
B00I2
I1
I1 U(1)
R1
R3
R2⇥1
B
B0
R3
I1
R1
parallel computing grid per brick
TTM1 TTM2 TTM3
Reconstruction Performance
30
0"
10"
20"
30"
40"
50"
60"
70"
80"
90"
100"
1" 501" 1001" 1501" 2001" 2501" 3001" 3501" 4001"cpu" upload(cpu"4"gpu)" decode"bricks" deliver"bricks" rendering"
• Intel Core 2 E8500 3.2GHz Linux PC, 4GB memory• NVIDIA GeForce GTX 480, 1.5GB memory
20483
ms
frames
great ape molar (17GB -> 5.5 GB) chameleon (2 GB -> 230 MB)demo videos
http://www.youtube.com/user/VMMLuzh[Suter et al., 2011]
Conclusion
• Critical implementation steps• Tensor decomposition‣ initial decomposition or a large input tensor
– memory mapping
‣ tensor times matrix (TTM) multiplications– parallel matrix matrix multiplications
‣ higher-order SVD
• Tensor reconstruction‣ GPU implementation of TTM
32