Tutorial: Tensor Approximation in Visualization and ... · PDF fileTutorial: Tensor...

Tutorial: Tensor Approximation in Visualization and Graphics

Implementation Examples in Scientific Visualization

Renato Pajarola, Susanne K. Suter, and Roland Ruiters

Tutorial Continued...

• Tuesday May. 7 from 9:00 to 10:40• Location: Room B.1‣ Implementation Examples in Scientific Visualization (Suter,

25min)‣ Graphics Applications (Ruiters, 30min)‣ Clustering and Sparsity (Ruiters, 25min)‣ Summary/Outlook (Pajarola, 10min)

2

Outline

• Part 1: Typical decomposition algorithms/operations• Part 2: GPU-based tensor reconstruction

3

Typical TA Operations

4

tensor approximation

tensor decompositionreal-time

tensor reconstruction

I2

I1 A

I3

I2I3

I1 fAU(3)U(1) U(2)I1 I2 I3

R1 R2 R3

R1

R2R3

B

U(3)U(1) U(2)I1 I2 I3

R1 R2 R3

R1

R2R3

B

Tensor Decomposition

• Create factor matrices‣ higher-order SVD (HOSVD)

– tensor unfolding

‣ alternating least-squares (ALS) algorithms– higher-order orthogonal iteration (HOOI)– higher-order power method (HOPM)

• Generate core tensor‣ tensor times matrix (TTM) multiplications

5

I2

I1 A

I3

U(3)U(1) U(2)I1 I2 I3

R1 R2 R3

R1

R2R3

B

Tensor Reconstruction

• Realtime (!) reconstruction‣ tensor times matrix (TTM) multiplications

6

I2I3

I1 fA

U(3)U(1) U(2)I1 I2 I3

R1 R2 R3

R1

R2R3

B

Tensor: A Multidimensional Array

7

a

0th-order tensor

I1 a

i1 = 1, . . . , I1

1st -order tensor

I1 A

i2 = 1, . . . , I2

I2

2nd-order tensor

I2

I1 A

i3 = 1, . . . , I3

I3

...

3rd-order tensor

SVD Extension to Higher Orders

8

I3

I1

I2

A

• PARAFAC (parallel factors) [ Harshman, 1970 ]

• CANDECOMP (CAND) (canonical decomposition) [ Caroll & Chang, 1970 ]

CP

I1 U(1

)

I2

U(2)

I3

U(3)

R

R

R

Rcoefficients

• Three-mode factor analysis (3MFA/Tucker3) [ Tucker, 1964+1966 ]

• Higher-order SVD (HOSVD) [ De Lathauwer et al., 2000a ]

Tucker

I1 U(1

)

I2

U(2)

I3

U(3)

R1

R3

R2R3

R1

core tensor B

basis matrices U(n)

BR2

rank-R decompositionpreserved

orthonormal matricespreserved

Part 1: Typical Decomposition

Algorithms and Operations

Downloads

• MATLAB tensor toolbox ‣ http://www.sandia.gov/~tgkolda/TensorToolbox

• vmmlib: C++ library for vectors, matrices, and tensor approximation‣ http://vmml.github.io/vmmlib/

• Tensor tutorial notes‣ http://vmml.ifi.uzh.ch/links/TutorTensorAprox.html

10

http://www.sandia.gov/~tgkolda/TensorToolbox




https://github.com/VMML/vmmlib

https://github.com/VMML/vmmlib

http://www.ifi.uzh.ch/vmml/links/TutorTensorAprox.html

http://www.ifi.uzh.ch/vmml/links/TutorTensorAprox.html

Test Dataset: Hazelnut

11

• A microCT scan of dried hazelnuts• I1 = I2 = I3 = 512• Values: unsigned char (8bit)• http://vmml.ifi.uzh.ch/research/datasets.html

I2

I1 A

I3

http://vmml.ifi.uzh.ch/research/datasets.html

http://vmml.ifi.uzh.ch/research/datasets.html

Higher-order SVD (HOSVD)

12

start HOSVD for mode n

tensor A

stop HOSVD for mode n

set left singular vectors as U_n

RnU(n)

compute the matrix SVD on A_nA(n)

A

mode n matrix U_nU(n)

unfold A along mode n (A_n )A(n)

A

De Lathauwer, de Moor, Vandewalle. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21(4):1253–1278, 2000.

Slices of a Tensor3

13

frontal slices horizontal slices

matrix< 512, 512, values_t > slice;

t3.get_frontal_slice_fwd( 256, slice );

t3.get_horizontal_slice_fwd( 256, slice );

t3.get_lateral_slice_fwd( 256, slice );

lateral slices

[vmmlib]

Tensor Unfolding (Matricization)

14

I2

I3

A

A

I1

I1

I2

I3 I2

I3

I1

I2

I1

A

I1

I2

I3

I3

I3 I3

I2 I2

I1 I1

A(3)

A(1)

A(2)

I2 · I1

I1 · I3

I3 · I2

I1

I2

I3

A

A

A

I1

I1

I1

I2

I2

I3

I3

I2

I3

I3 I3 I3

I1 I1 I1

I2 I2 I2

A(2)

A(1)

A(3)

I2 · I3

I3 · I1

I1 · I2

Kiers. Towards a standardized notation and terminology in multiway analysis. Journal of Chemometrics, 14(3):105–122, 2000.

forward cyclic unfolding backward cyclic unfolding


Tensor Unfolding Example

15

... ...mode-1 unfolding



262’144

262’144

262’144

512

512

512

Higher-order SVD (HOSVD)

16

start HOSVD for mode n

tensor A

stop HOSVD for mode n

set left singular vectors as U_n

RnU(n)

compute the matrix SVD on A_nA(n)

A

mode n matrix U_nU(n)

unfold A along mode n (A_n )A(n)

A


Large Data Tensors (in vmmlib)

17

A

I1

I2

I3

! const size_t d = 512;! typedef tensor3< d,d,d, unsigned char > t3_512u_t;! typedef t3_converter< d,d,d, unsigned char > t3_conv_t;! typedef tensor_mmapper< t3_512u_t, t3_conv_t > t3map_t;

[vmmlib]

! std::string in_dir = "./dataset";! std::string file_name = "hnut512_uint.raw";! t3_512u_t t3_hazelnut;! t3_conv_t t3_conv;

! t3map_t t3_mmap( in_dir, file_name, true, t3_conv ); //true -> read-only! t3_mmap.get_tensor( t3_hazelnut );

Optimize Factor Matrices• Higher-order orthogonal iteration‣ optimize factor matrix of mode n‣ keep factor matrices of all other modes fixed‣ generate optimized data tensor

– project original data tensor on the inverted factor matrices of all other modes

‣ receive optimized mode-n factor matrix– apply HOSVD to the optimized tensor

18

De Lathauwer, de Moor, Vandewalle. On the best rank-1 and rank-(R1,R2,...,RN) approximation of higher-order tensors. SIAM Journal on Matrix Analysis and Applications, 21(4):1324–1342, 2000.

Optimize Mode-n Factor Matrix

19

invert all matrices, but mode n

start mode-n optimization

multiply tensor with all inverted

matrices (TTMs)

optimized tensor A'

tensor aa, matrices Uu

stop mode-n optimization

U(n)A

Pn

I2

R2

I1

A

U(2)T

I2

I3

TR2

I1 I3

R3 U(3)T

R2

I3

R3

T

PI1R2

Higher-order Orthogonal Iteration (HOOI)

20

convergence?

init matrices U(random, HOSVD)

compute max Frobenius norm

set convergence criteria

input tensor A

mode-n optimization

compute new mode-n matrix

(HOSVD on Aab)

yes

no

compute core tensor hh

compute fit

stop iterations

start HOOI ALS

matrices uua, core tensor B

mode-n optimized tensor PPPP

A

U(n)

U(n)

B

B

Pn

Pn

Tensor Times Matrix Multiplication

21

C

B(n) In

In

A(n) JnJn

I1 · I2

I1 · I2

A(n) = CB(n)

A

B

C

In

In

I1

I1

JnJn

I2

A = B⇥n C

,

(B⇥n C)i1...ın�1 jnin+1...iN =In

Âin=1

bi1i2...iN · c jnin

n-mode product[De Lathauwer et al., 2000a]

Example TTMs: Core Computation

22

I1

I2

I1

R1

A

U(1)T B0

I2

R1

I3 I2

R2 B00

U(2)T

B0

I2

I3

I3

R2

R1 I3

R3 BU(3)T

B00I3

R1

R1

R3

R2

• Three consecutive TTM multiplications• For orthogonal matrices, use the transposes of the

three factor matrices (otherwise the (pseudo)-inverses)

B = A �1 U(1)T �2 U(2)T �3 · · ·�N U(N)TB = A ⇥1 U(1)(�1)⇥2 U(2)(�1)⇥3 · · ·⇥N U(N)(�1) orthogonal

factor matrices

Part 2: GPU Reconstruction

Suter et al.. Interactive multiscale tensor reconstruction for multiresolution volume visualization. IEEE Transactions on Visualization and Computer Graphics, 17(12):2135–2143, December 2011.

Parallel Tensor Reconstruction

24

To appear in an IEEE VGTC sponsored conference proceedings

for the outer product between the corresponding column vectors in thefactor matrices

⇥A = Âr1

Âr2

Âr3

br1r2r3 · u(1)r1 · u(2)

r2 · u(3)r3 . (A.4)

The sum of all theses weighted “subtensors” forms the approxima-tion ⇥A of the original data A (see Fig. 13).

+ ...= ... +

I3I2

I1br1r2r3

u(1)r1

u(2)r2

u(3)r3

�A

Fig. 13. Tensor reconstruction from Eq. A.4 visualized.

Another approach, is to reconstruct each element of the approxi-mated dataset individually, which we call voxel-wise reconstructionapproach. Each element �ai1i2i3 is reconstructed as shown in Eq. A.5,i.e., sum up all core coefficients multiplied with the corresponding co-efficients in the factor matrices (weighted product).

�ai1i2i3 = Âr1

Âr2

Âr3

br1r2r3 · u(1)i1r1

· u(2)i2r2

· u(3)i3r3

(A.5)

A third reconstruction approach is to build the n-mode productsalong every mode [15] (notation: B⇥n U(n)), which leads to a ten-sor times matrix (TTM) multiplication for each mode, i.e., dimension.This is analogous to the Tucker model given by Eq. A.3. The n-modeproduct between a tensor and a matrix is equivalent to a matrix prod-uct as it can be seen from Eq. A.6. In Fig. 14 we visualize the TTMapproach using n-mode products.

Y = X ⇥n U⇤ Y(n) = UX(n), (A.6)

where X(n) represents the mode-n unfolded tensor, i.e., a matrix.

I1 U(1)

R1

R3

R2�1

B

B�

R3

I1

R1

(a) TTM 1

I2

R2

U(2)

I1

R2

R3�2

B�

B��I2

I1

(b) TTM 2

I3

R3

U(3)

I2

R3

I1�3

B��

�A

I2

I3

(c) TTM 3

Fig. 14. TTM: tensor times matrix along the 3 modes (n-mode products).Backward cyclic reconstruction after Lathauwer et al. [6].

Given the fixed cost of generating an I1⇥ I2⇥ I3 grid, the computa-tional overhead factor varies from cubic rank complexity R1 ·R2 ·R3 inthe case of the progressive reconstruction (Eq. A.4) to a linear rankcomplexity R1 for the TTM or the n-mode product reconstruction(Eq. A.5). (For R = 16, the improvement to R3 = 4⌅096 is 256-fold.)

ACKNOWLEDGMENTS

This work was supported in part by a grant from the Forschungskreditof the University of Zurich, Switzerland, and by internal researchfunds at CRS4, Italy. The authors wish to thank Paul Tafforeau forthe help of acquiring the tooth dataset, which was scanned at the SLSin Switzerland (experiment number: 20080205).

REFERENCES

[1] vmmlib: A vector and matrix math library. http://vmmlib.sf.net.[2] B. W. Bader and T. G. Kolda. Algorithm 862: Matlab tensor classes

for fast algorithm prototyping. ACM Transactions on Mathematical Soft-ware, 32(4):635–653, 2006.

[3] J. Beyer, M. Hadwiger, T. Moller, and L. Fritz. Smooth mixed-resolutionGPU volume rendering. In Proc. IEEE/EG Symposium on Volume andPoint-Based Graphics, pages 163–170, 2008.

[4] T.-c. Chiueh, C.-k. Yang, T. He, H. Pfister, and A. E. Kaufman. Inte-grated volume compression and visualization. In Proc. IEEE Visualiza-tion, pages 329–336. Computer Society Press, 1997.

[5] C. Crassin, F. Neyret, S. Lefebvre, and E. Eisemann. GigaVoxels: Ray-guided streaming for efficient and detailed voxel rendering. In Proc. Sym-posium on Interactive 3D Graphics and Games, pages 15–22. ACM SIG-GRAPH, 2009.

[6] L. de Lathauwer, B. de Moor, and J. Vandewalle. A multilinear singularvalue decomposition. SIAM Journal of Matrix Analysis and Applications,21(4):1253–1278, 2000.

[7] M. Do and M. Vetterli. Pyramidal directional filter banks and curvelets.In Proc. IEEE Image Processing, volume 3, pages 158–161, 2001.

[8] M. Do and M. Vetterli. The contourlet transform: an efficient direc-tional multiresolution image representation. IEEE Trans. Im. Proc.,14(12):2091–2106, 2005.

[9] K. Engel, M. Hadwiger, J. M. Kniss, C. Rezk-Salama, and D. Weiskopf.Real-Time Volume Graphics. AK Peters, 2006.

[10] N. Fout, H. Akiba, K.-L. Ma, A. Lefohn, and J. Kniss. High quality ren-dering of compressed volume data formats. In Proceedings Eurographics,pages 77–84, Jun 2005.

[11] N. Fout and K.-L. Ma. Transform coding for hardware-accelerated vol-ume rendering. IEEE Transaction on Visualization and Computer Graph-ics, 13(6):1600–1607, 2007.

[12] E. Gobbetti, F. Marton, and J. A. I. Guitian. A single-pass GPU raycasting framework for interactive out-of-core rendering of massive volu-metric datasets. The Visual Computer, 24(7-9):797–806, Jul 2008.

[13] S. Guthe, M. Wand, J. Gonser, and W. Strasser. Interactive rendering oflarge volume data sets. In Proc. IEEE Visualization, pages 53–60, 2002.

[14] J. A. Iglesias Guitian, E. Gobbetti, and F. Marton. View-dependent ex-ploration of massive volumetric models on large scale light field displays.The Visual Computer, 26(6–8):1037–1047, 2010.

[15] T. G. Kolda and B. W. Bader. Tensor decompositions and applications.Siam Review, 51(3):455–500, Sep 2009.

[16] P. Ljung, C. Lundstrom, and A. Ynnerman. Multiresolution interblockinterpolation in direct volume rendering. In Proc. Eurographics/IEEETCVG Symposium on Visualization, pages 259–266, 2006.

[17] P. Ljung, C. Winskog, A. Persson, C. Lundstrom, and A. Ynnerman. Fullbody virtual autopsies using a state-of-the-art volume rendering pipeline.IEEE Transactions on Visualization and Computer Graphics, 12(5):869–876, Oct 2006.

[18] NVIDIA. CUDA C best practices guide, version 3.2 edition, Aug 2010.[19] NVIDIA. CUDA C programming guide, version 3.2 edition, Nov 2010.[20] R. Parys and G. Knittel. Giga-voxel rendering from compressed data on

a display wall. In WSCG, 2009.[21] F. Rodler. Wavelet based 3D compression with fast random access for

very large volume data. In Proc. Pacific Graphics, pages 108–117, 1999.[22] J. Schneider and R. Westermann. Compression domain volume rendering.

In Proc. IEEE Visualization, pages 293–300, 2003.[23] S. K. Suter, C. P. Zollikofer, and R. Pajarola. Application of tensor ap-

proximation to multiscale volume feature representations. In Proc. Vision,Modeling and Visualization, pages 203–210, 2010.

[24] Y.-T. Tsai and Z.-C. Shih. All-frequency precomputed radiance transferusing spherical radial basis functions and clustered tensor approximation.ACM Transactions on Graphics, 25(3):967–976, 2006.

[25] J. E. Vollrath, T. Schafhitzel, and T. Ertl. Employing complex GPU datastructures for the interactive visualization of adaptive mesh refinementdata. In Proc. Volume Graphics, pages 55–58, 2006.

[26] H. Wang, Q. Wu, L. Shi, Y. Yu, and N. Ahuja. Out-of-core tensor approx-imation of multi-dimensional matrices of visual data. ACM Transactionson Graphics, 24(3):527–535, Aug 2005.

[27] Q. Wu, T. Xia, C. Chen, H.-Y. S. Lin, H. Wang, and Y. Yu. Hierarchicaltensor approximation of multidimensional visual data. IEEE Transactionson Visualization and Computer Graphics, 14(1):186–199, Feb 2008.

[28] B.-L. Yeo and B. Liu. Volume rendering of DCT-based compressed 3Dscalar data. IEEE Transactions on Visualization and Computer Graphics,1(1):29–43, 1995.

9

+ ...= ... +

I3I2

I1br1r2r3

u(1)r1

u(2)r2

u(3)r3

eA

triple-for-loop

ãcomputational cost per voxel is cubic: O(R3)

ã ã ã ã ã ã ã ã








parallel computing grid per brick

Faster Parallel Tensor Reconstruction

25

I1 U(1)

R1

tensor times matrix (TTM) multiplication or n-mode product

R1

R3

B

B0

R3

I1

R1

R3

R2⇥1

B

B0

R3

I1


Faster Parallel Tensor Reconstruction

26

TTM1 TTM2 TTM3

store intermediate results (B’ and B’’)

I3

R3

U(3)

I2

R3

I1⇥3

B00

eA

I2

I3I2

R2

U(2)

I1

R2

R3⇥2

B0

B00I2

I1

I1 U(1)

R1

R3

R2⇥1

B

B0

R3

I1

R1

computational cost per voxel is linear: O(R)

ã


Compute Intermediate Tensor B’

27

I1 U(1)

R1

R3

R2⇥1

B

B0

R3

I1

R1

b’ b’ b’ b’ b’ b’ b’ b’

b’ b’ b’ b’ b’ b’ b’ b’

b’ b’ b’ b’ b’ b’ b’ b’


TTM1

Compute Intermediate Tensor B’’

28


I2

R2

U(2)

I1

R2

R3⇥2

B0

B00I2

I1

I1 U(1)

R1

R3

R2⇥1

B

B0

R3

I1

R1

b’’ b’’ b’’ b’’ b’’ b’’ b’’ b’’

b’’ b’’ b’’ b’’ b’’ b’’ b’’ b’’

b’’ b’’ b’’ b’’ b’’ b’’ b’’ b’’

b’’ b’’ b’’ b’’ b’’ b’’ b’’ b’’

b’’ b’’ b’’ b’’ b’’ b’’ b’’ b’’

TTM1 TTM2

Compute Approximated Tensor Ã

29









I3

R3

U(3)

I2

R3

I1⇥3

B00

eA

I2

I3I2

R2

U(2)

I1

R2

R3⇥2

B0

B00I2

I1

I1 U(1)

R1

R3

R2⇥1

B

B0

R3

I1

R1


TTM1 TTM2 TTM3

Reconstruction Performance

30

0"

10"

20"

30"

40"

50"

60"

70"

80"

90"

100"

1" 501" 1001" 1501" 2001" 2501" 3001" 3501" 4001"cpu" upload(cpu"4"gpu)" decode"bricks" deliver"bricks" rendering"

• Intel Core 2 E8500 3.2GHz Linux PC, 4GB memory• NVIDIA GeForce GTX 480, 1.5GB memory

20483

ms

frames

great ape molar (17GB -> 5.5 GB) chameleon (2 GB -> 230 MB)demo videos

http://www.youtube.com/user/VMMLuzh[Suter et al., 2011]

http://www.youtube.com/user/VMMLuzh






Conclusion

• Critical implementation steps• Tensor decomposition‣ initial decomposition or a large input tensor

– memory mapping

‣ tensor times matrix (TTM) multiplications– parallel matrix matrix multiplications

‣ higher-order SVD

• Tensor reconstruction‣ GPU implementation of TTM

32

Tutorial: Tensor Approximation in Visualization and ... · PDF fileTutorial: Tensor...

Documents

Transcript of Tutorial: Tensor Approximation in Visualization and ... · PDF fileTutorial: Tensor...