Format Abstraction for Sparse Tensor Algebra...

134
Format Abstraction for Sparse Tensor Algebra Compilers Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe

Transcript of Format Abstraction for Sparse Tensor Algebra...

Page 1: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

Format Abstraction for Sparse Tensor Algebra Compilers Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe

Page 2: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

2

Sparse tensors are a natural way of representing real-world data

Page 3: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

2

Sparse tensors are a natural way of representing real-world data

Page 4: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

2

10

Quality

DurablePoor

2

1 1

1

1

1

2

3

1 1

Kindle

Dubliners

The Iliad

Monitor

Sweater

Laptop

Candide

Jack

et …

Peter

Paul

Mary

Bob

Sam

Billy

Lilly

Hilde

Users

Words

Products

Sparse tensors are a natural way of representing real-world data

Page 5: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

2

10

Quality

DurablePoor

2

1 1

1

1

1

2

3

1 1

Kindle

Dubliners

The Iliad

Monitor

Sweater

Laptop

Candide

Jack

et …

Peter

Paul

Mary

Bob

Sam

Billy

Lilly

Hilde

Users

Words

Products

Sparse tensors are a natural way of representing real-world data

Dense storage: 107 exabytes Sparse storage: 13 gigabytes

Page 6: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

3

ELLPACK

DIA

CSR

Coordinate matrix

DCSC

CSB

DCSR

CSC

Dense array matrix

Block DIA

BCSR

BCOO

SELLSkyline BELLLIL

Many different formats for storing tensors exist

Page 7: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

3

ELLPACK

DIA

CSR

Coordinate matrix

DCSC

CSB

DCSR

CSC

Dense array matrix

Block DIA

BCSR

BCOO

SELLSkyline BELLLIL

Hash mapsSparse vector Dense array vector

Many different formats for storing tensors exist

Page 8: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

3

ELLPACK

DIA

CSR

Coordinate matrix

DCSC

CSB

DCSR

CSC

Dense array matrix

Block DIA

BCSR

BCOO

SELLSkyline BELLLIL

Hash mapsSparse vector Dense array vector

CSFCoordinate tensor

Dense array tensor

Mode-generic tensorHiCOO

F-COO

Many different formats for storing tensors exist

Page 9: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

Thermal simulation

3

ELLPACK

DIA

CSR

Coordinate matrix

DCSC

CSB

DCSR

CSC

Dense array matrix

Block DIA

BCSR

BCOO

SELLSkyline BELLLIL

Hash mapsSparse vector Dense array vector

CSFCoordinate tensor

Dense array tensor

Mode-generic tensorHiCOO

F-COO

Many different formats for storing tensors exist

Page 10: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

Unstructured mesh simulation[Bell and Garland 2009]

Thermal simulation

3

ELLPACK

DIA

CSR

Coordinate matrix

DCSC

CSB

DCSR

CSC

Dense array matrix

Block DIA

BCSR

BCOO

SELLSkyline BELLLIL

Hash mapsSparse vector Dense array vector

CSFCoordinate tensor

Dense array tensor

Mode-generic tensorHiCOO

F-COO

Many different formats for storing tensors exist

Page 11: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

Unstructured mesh simulation[Bell and Garland 2009]

Thermal simulation

3

CNN with block-sparse weights[Gray et al. 2017]

ELLPACK

DIA

CSR

Coordinate matrix

DCSC

CSB

DCSR

CSC

Dense array matrix

Block DIA

BCSR

BCOO

SELLSkyline BELLLIL

Hash mapsSparse vector Dense array vector

CSFCoordinate tensor

Dense array tensor

Mode-generic tensorHiCOO

F-COO

Many different formats for storing tensors exist

Page 12: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

Unstructured mesh simulation[Bell and Garland 2009]

Thermal simulation

Image processing

3

CNN with block-sparse weights[Gray et al. 2017]

ELLPACK

DIA

CSR

Coordinate matrix

DCSC

CSB

DCSR

CSC

Dense array matrix

Block DIA

BCSR

BCOO

SELLSkyline BELLLIL

Hash mapsSparse vector Dense array vector

CSFCoordinate tensor

Dense array tensor

Mode-generic tensorHiCOO

F-COO

Many different formats for storing tensors exist

Page 13: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

Unstructured mesh simulation[Bell and Garland 2009]

Thermal simulation

Data analytics

Image processing

3

CNN with block-sparse weights[Gray et al. 2017]

ELLPACK

DIA

CSR

Coordinate matrix

DCSC

CSB

DCSR

CSC

Dense array matrix

Block DIA

BCSR

BCOO

SELLSkyline BELLLIL

Hash mapsSparse vector Dense array vector

CSFCoordinate tensor

Dense array tensor

Mode-generic tensorHiCOO

F-COO

Many different formats for storing tensors exist

Page 14: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

4

There is no universally superior tensor format

77:22 Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe

DD DS SD SS

Normalized run timeNormalized storage

(a) Dense

DD DS SD SS

158× ruuur 99×

(b) Row-slicing

DD DS SD SS

6270× rrruuu 7780× rurrrrrtruu 6144× ruu 6145×

(c) Thermal

DD DS SD SS

17150×rrrrrrruu 88× rrrruuurrr 26542× rr 123×

(d) Hypersparse

DS SD SS SDᵀTT

112× rrruur 95×

(e) Column-slicing

DS SS DSDD SSDD

(f) Blocked

Fig. 18. Performance of matrix-vector multiplication on various matrices with distinct sparsity patterns usingtaco. The left half of each subfigure depicts the sparsity pattern of the matrix, while the right half showsthe normalized storage costs and normalized average execution times (relative to the optimal format) ofmatrix-vector multiplication using the storage formats labeled on the horizontal axis to store the matrix. Thestorage format labels follow the scheme described in Section 3; for instance, DS is short for (densed1,sparsed2),while SDᵀ is equivalent to (sparsed2,densed1). The dense matrix input has a density of 0.95, the hypersparsematrix has a density of 2.5 × 10−5, the row-slicing and column-slicing matrices have densities of 9.5 × 10−3,and the thermal and blocked matrices have densities of 1.0 × 10−3.

and tensor-times-vector multiplication with matrices and 3rd-order tensors of varying sparsitiesas inputs. The tensors are randomly generated with every component having some probability dof being non-zero, where d is the density of the tensor (i.e. the fraction of components that arenon-zero, and the complement of sparsity). As Fig. 17 shows, while computing with sparse tensorstorage formats incurs some performance penalty as compared to the same computation withdense formats when the inputs are highly dense, the performance penalty decreases and eventuallyturns into performance gain as the sparsity of the inputs increases. For the two computations weevaluate, we observe that input sparsity of as low as approximately 35% is actually sufficient tomake sparse formats that compress out all zeros—including DCSR and CSF—perform better thandense formats, which further emphasizes the practicality of sparse tensor storage formats.

Proc. ACM Program. Lang., Vol. 1, No. OOPSLA, Article 77. Publication date: October 2017.

77:22 Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe

DD DS SD SS

Normalized run timeNormalized storage

(a) Dense

DD DS SD SS

158× ruuur 99×

(b) Row-slicing

DD DS SD SS

6270× rrruuu 7780× rurrrrrtruu 6144× ruu 6145×

(c) Thermal

DD DS SD SS

17150×rrrrrrruu 88× rrrruuurrr 26542× rr 123×

(d) Hypersparse

DS SD SS SDᵀTT

112× rrruur 95×

(e) Column-slicing

DS SS DSDD SSDD

(f) Blocked

Fig. 18. Performance of matrix-vector multiplication on various matrices with distinct sparsity patterns usingtaco. The left half of each subfigure depicts the sparsity pattern of the matrix, while the right half showsthe normalized storage costs and normalized average execution times (relative to the optimal format) ofmatrix-vector multiplication using the storage formats labeled on the horizontal axis to store the matrix. Thestorage format labels follow the scheme described in Section 3; for instance, DS is short for (densed1,sparsed2),while SDᵀ is equivalent to (sparsed2,densed1). The dense matrix input has a density of 0.95, the hypersparsematrix has a density of 2.5 × 10−5, the row-slicing and column-slicing matrices have densities of 9.5 × 10−3,and the thermal and blocked matrices have densities of 1.0 × 10−3.

and tensor-times-vector multiplication with matrices and 3rd-order tensors of varying sparsitiesas inputs. The tensors are randomly generated with every component having some probability dof being non-zero, where d is the density of the tensor (i.e. the fraction of components that arenon-zero, and the complement of sparsity). As Fig. 17 shows, while computing with sparse tensorstorage formats incurs some performance penalty as compared to the same computation withdense formats when the inputs are highly dense, the performance penalty decreases and eventuallyturns into performance gain as the sparsity of the inputs increases. For the two computations weevaluate, we observe that input sparsity of as low as approximately 35% is actually sufficient tomake sparse formats that compress out all zeros—including DCSR and CSF—perform better thandense formats, which further emphasizes the practicality of sparse tensor storage formats.

Proc. ACM Program. Lang., Vol. 1, No. OOPSLA, Article 77. Publication date: October 2017.

Page 15: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

4

There is no universally superior tensor format

Page 16: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

4

There is no universally superior tensor format

77:22 Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe

DD DS SD SS

Normalized run timeNormalized storage

(a) Dense

DD DS SD SS

158× ruuur 99×

(b) Row-slicing

DD DS SD SS

6270× rrruuu 7780× rurrrrrtruu 6144× ruu 6145×

(c) Thermal

DD DS SD SS

17150×rrrrrrruu 88× rrrruuurrr 26542× rr 123×

(d) Hypersparse

DS SD SS SDᵀTT

112× rrruur 95×

(e) Column-slicing

DS SS DSDD SSDD

(f) Blocked

Fig. 18. Performance of matrix-vector multiplication on various matrices with distinct sparsity patterns usingtaco. The left half of each subfigure depicts the sparsity pattern of the matrix, while the right half showsthe normalized storage costs and normalized average execution times (relative to the optimal format) ofmatrix-vector multiplication using the storage formats labeled on the horizontal axis to store the matrix. Thestorage format labels follow the scheme described in Section 3; for instance, DS is short for (densed1,sparsed2),while SDᵀ is equivalent to (sparsed2,densed1). The dense matrix input has a density of 0.95, the hypersparsematrix has a density of 2.5 × 10−5, the row-slicing and column-slicing matrices have densities of 9.5 × 10−3,and the thermal and blocked matrices have densities of 1.0 × 10−3.

and tensor-times-vector multiplication with matrices and 3rd-order tensors of varying sparsitiesas inputs. The tensors are randomly generated with every component having some probability dof being non-zero, where d is the density of the tensor (i.e. the fraction of components that arenon-zero, and the complement of sparsity). As Fig. 17 shows, while computing with sparse tensorstorage formats incurs some performance penalty as compared to the same computation withdense formats when the inputs are highly dense, the performance penalty decreases and eventuallyturns into performance gain as the sparsity of the inputs increases. For the two computations weevaluate, we observe that input sparsity of as low as approximately 35% is actually sufficient tomake sparse formats that compress out all zeros—including DCSR and CSF—perform better thandense formats, which further emphasizes the practicality of sparse tensor storage formats.

Proc. ACM Program. Lang., Vol. 1, No. OOPSLA, Article 77. Publication date: October 2017.

Page 17: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

4

There is no universally superior tensor format

77:22 Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe

DD DS SD SS

Normalized run timeNormalized storage

(a) Dense

DD DS SD SS

158× ruuur 99×

(b) Row-slicing

DD DS SD SS

6270× rrruuu 7780× rurrrrrtruu 6144× ruu 6145×

(c) Thermal

DD DS SD SS

17150×rrrrrrruu 88× rrrruuurrr 26542× rr 123×

(d) Hypersparse

DS SD SS SDᵀTT

112× rrruur 95×

(e) Column-slicing

DS SS DSDD SSDD

(f) Blocked

Fig. 18. Performance of matrix-vector multiplication on various matrices with distinct sparsity patterns usingtaco. The left half of each subfigure depicts the sparsity pattern of the matrix, while the right half showsthe normalized storage costs and normalized average execution times (relative to the optimal format) ofmatrix-vector multiplication using the storage formats labeled on the horizontal axis to store the matrix. Thestorage format labels follow the scheme described in Section 3; for instance, DS is short for (densed1,sparsed2),while SDᵀ is equivalent to (sparsed2,densed1). The dense matrix input has a density of 0.95, the hypersparsematrix has a density of 2.5 × 10−5, the row-slicing and column-slicing matrices have densities of 9.5 × 10−3,and the thermal and blocked matrices have densities of 1.0 × 10−3.

and tensor-times-vector multiplication with matrices and 3rd-order tensors of varying sparsitiesas inputs. The tensors are randomly generated with every component having some probability dof being non-zero, where d is the density of the tensor (i.e. the fraction of components that arenon-zero, and the complement of sparsity). As Fig. 17 shows, while computing with sparse tensorstorage formats incurs some performance penalty as compared to the same computation withdense formats when the inputs are highly dense, the performance penalty decreases and eventuallyturns into performance gain as the sparsity of the inputs increases. For the two computations weevaluate, we observe that input sparsity of as low as approximately 35% is actually sufficient tomake sparse formats that compress out all zeros—including DCSR and CSF—perform better thandense formats, which further emphasizes the practicality of sparse tensor storage formats.

Proc. ACM Program. Lang., Vol. 1, No. OOPSLA, Article 77. Publication date: October 2017.

Page 18: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

4

There is no universally superior tensor format

77:22 Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe

DD DS SD SS

Normalized run timeNormalized storage

(a) Dense

DD DS SD SS

158× ruuur 99×

(b) Row-slicing

DD DS SD SS

6270× rrruuu 7780× rurrrrrtruu 6144× ruu 6145×

(c) Thermal

DD DS SD SS

17150×rrrrrrruu 88× rrrruuurrr 26542× rr 123×

(d) Hypersparse

DS SD SS SDᵀTT

112× rrruur 95×

(e) Column-slicing

DS SS DSDD SSDD

(f) Blocked

Fig. 18. Performance of matrix-vector multiplication on various matrices with distinct sparsity patterns usingtaco. The left half of each subfigure depicts the sparsity pattern of the matrix, while the right half showsthe normalized storage costs and normalized average execution times (relative to the optimal format) ofmatrix-vector multiplication using the storage formats labeled on the horizontal axis to store the matrix. Thestorage format labels follow the scheme described in Section 3; for instance, DS is short for (densed1,sparsed2),while SDᵀ is equivalent to (sparsed2,densed1). The dense matrix input has a density of 0.95, the hypersparsematrix has a density of 2.5 × 10−5, the row-slicing and column-slicing matrices have densities of 9.5 × 10−3,and the thermal and blocked matrices have densities of 1.0 × 10−3.

and tensor-times-vector multiplication with matrices and 3rd-order tensors of varying sparsitiesas inputs. The tensors are randomly generated with every component having some probability dof being non-zero, where d is the density of the tensor (i.e. the fraction of components that arenon-zero, and the complement of sparsity). As Fig. 17 shows, while computing with sparse tensorstorage formats incurs some performance penalty as compared to the same computation withdense formats when the inputs are highly dense, the performance penalty decreases and eventuallyturns into performance gain as the sparsity of the inputs increases. For the two computations weevaluate, we observe that input sparsity of as low as approximately 35% is actually sufficient tomake sparse formats that compress out all zeros—including DCSR and CSF—perform better thandense formats, which further emphasizes the practicality of sparse tensor storage formats.

Proc. ACM Program. Lang., Vol. 1, No. OOPSLA, Article 77. Publication date: October 2017.

77:22 Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe

DD DS SD SS

Normalized run timeNormalized storage

(a) Dense

DD DS SD SS

158× ruuur 99×

(b) Row-slicing

DD DS SD SS

6270× rrruuu 7780× rurrrrrtruu 6144× ruu 6145×

(c) Thermal

DD DS SD SS

17150×rrrrrrruu 88× rrrruuurrr 26542× rr 123×

(d) Hypersparse

DS SD SS SDᵀTT

112× rrruur 95×

(e) Column-slicing

DS SS DSDD SSDD

(f) Blocked

Fig. 18. Performance of matrix-vector multiplication on various matrices with distinct sparsity patterns usingtaco. The left half of each subfigure depicts the sparsity pattern of the matrix, while the right half showsthe normalized storage costs and normalized average execution times (relative to the optimal format) ofmatrix-vector multiplication using the storage formats labeled on the horizontal axis to store the matrix. Thestorage format labels follow the scheme described in Section 3; for instance, DS is short for (densed1,sparsed2),while SDᵀ is equivalent to (sparsed2,densed1). The dense matrix input has a density of 0.95, the hypersparsematrix has a density of 2.5 × 10−5, the row-slicing and column-slicing matrices have densities of 9.5 × 10−3,and the thermal and blocked matrices have densities of 1.0 × 10−3.

and tensor-times-vector multiplication with matrices and 3rd-order tensors of varying sparsitiesas inputs. The tensors are randomly generated with every component having some probability dof being non-zero, where d is the density of the tensor (i.e. the fraction of components that arenon-zero, and the complement of sparsity). As Fig. 17 shows, while computing with sparse tensorstorage formats incurs some performance penalty as compared to the same computation withdense formats when the inputs are highly dense, the performance penalty decreases and eventuallyturns into performance gain as the sparsity of the inputs increases. For the two computations weevaluate, we observe that input sparsity of as low as approximately 35% is actually sufficient tomake sparse formats that compress out all zeros—including DCSR and CSF—perform better thandense formats, which further emphasizes the practicality of sparse tensor storage formats.

Proc. ACM Program. Lang., Vol. 1, No. OOPSLA, Article 77. Publication date: October 2017.

Nor

mal

ized

tim

e

0.0

0.5

1.0

1.5

CSR DIA

y = Ax

CSR DIA

186x

CSR BCSR

Page 19: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

4

There is no universally superior tensor format

77:22 Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe

DD DS SD SS

Normalized run timeNormalized storage

(a) Dense

DD DS SD SS

158× ruuur 99×

(b) Row-slicing

DD DS SD SS

6270× rrruuu 7780× rurrrrrtruu 6144× ruu 6145×

(c) Thermal

DD DS SD SS

17150×rrrrrrruu 88× rrrruuurrr 26542× rr 123×

(d) Hypersparse

DS SD SS SDᵀTT

112× rrruur 95×

(e) Column-slicing

DS SS DSDD SSDD

(f) Blocked

Fig. 18. Performance of matrix-vector multiplication on various matrices with distinct sparsity patterns usingtaco. The left half of each subfigure depicts the sparsity pattern of the matrix, while the right half showsthe normalized storage costs and normalized average execution times (relative to the optimal format) ofmatrix-vector multiplication using the storage formats labeled on the horizontal axis to store the matrix. Thestorage format labels follow the scheme described in Section 3; for instance, DS is short for (densed1,sparsed2),while SDᵀ is equivalent to (sparsed2,densed1). The dense matrix input has a density of 0.95, the hypersparsematrix has a density of 2.5 × 10−5, the row-slicing and column-slicing matrices have densities of 9.5 × 10−3,and the thermal and blocked matrices have densities of 1.0 × 10−3.

and tensor-times-vector multiplication with matrices and 3rd-order tensors of varying sparsitiesas inputs. The tensors are randomly generated with every component having some probability dof being non-zero, where d is the density of the tensor (i.e. the fraction of components that arenon-zero, and the complement of sparsity). As Fig. 17 shows, while computing with sparse tensorstorage formats incurs some performance penalty as compared to the same computation withdense formats when the inputs are highly dense, the performance penalty decreases and eventuallyturns into performance gain as the sparsity of the inputs increases. For the two computations weevaluate, we observe that input sparsity of as low as approximately 35% is actually sufficient tomake sparse formats that compress out all zeros—including DCSR and CSF—perform better thandense formats, which further emphasizes the practicality of sparse tensor storage formats.

Proc. ACM Program. Lang., Vol. 1, No. OOPSLA, Article 77. Publication date: October 2017.

77:22 Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe

DD DS SD SS

Normalized run timeNormalized storage

(a) Dense

DD DS SD SS

158× ruuur 99×

(b) Row-slicing

DD DS SD SS

6270× rrruuu 7780× rurrrrrtruu 6144× ruu 6145×

(c) Thermal

DD DS SD SS

17150×rrrrrrruu 88× rrrruuurrr 26542× rr 123×

(d) Hypersparse

DS SD SS SDᵀTT

112× rrruur 95×

(e) Column-slicing

DS SS DSDD SSDD

(f) Blocked

Fig. 18. Performance of matrix-vector multiplication on various matrices with distinct sparsity patterns usingtaco. The left half of each subfigure depicts the sparsity pattern of the matrix, while the right half showsthe normalized storage costs and normalized average execution times (relative to the optimal format) ofmatrix-vector multiplication using the storage formats labeled on the horizontal axis to store the matrix. Thestorage format labels follow the scheme described in Section 3; for instance, DS is short for (densed1,sparsed2),while SDᵀ is equivalent to (sparsed2,densed1). The dense matrix input has a density of 0.95, the hypersparsematrix has a density of 2.5 × 10−5, the row-slicing and column-slicing matrices have densities of 9.5 × 10−3,and the thermal and blocked matrices have densities of 1.0 × 10−3.

and tensor-times-vector multiplication with matrices and 3rd-order tensors of varying sparsitiesas inputs. The tensors are randomly generated with every component having some probability dof being non-zero, where d is the density of the tensor (i.e. the fraction of components that arenon-zero, and the complement of sparsity). As Fig. 17 shows, while computing with sparse tensorstorage formats incurs some performance penalty as compared to the same computation withdense formats when the inputs are highly dense, the performance penalty decreases and eventuallyturns into performance gain as the sparsity of the inputs increases. For the two computations weevaluate, we observe that input sparsity of as low as approximately 35% is actually sufficient tomake sparse formats that compress out all zeros—including DCSR and CSF—perform better thandense formats, which further emphasizes the practicality of sparse tensor storage formats.

Proc. ACM Program. Lang., Vol. 1, No. OOPSLA, Article 77. Publication date: October 2017.

Nor

mal

ized

tim

e

0.0

0.5

1.0

1.5

CSR DIA

y = Ax

CSR DIA

186x

CSR BCSR

Page 20: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

Format Abstraction for Sparse Tensor Algebra Compilers Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe

Page 21: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

6

[Kjolstad et al. 2017] This work

Code Generator

4Npos 0 2 44

crd 0 1 10 0

7

3 4

Code Generator

6M

4N

offset -3 0 1-1

crd 0 1 0 30 41

W 6crd 0 1 46 -1-1

offset -3 0 1-1

4N

pos 0 2 44

crd 0 1 10 0

7

3 4

Page 22: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

6

[Kjolstad et al. 2017] This work

Code Generator

4Npos 0 2 44

crd 0 1 10 0

7

3 4

Code Generator

6M

4N

offset -3 0 1-1

crd 0 1 0 30 41

W 6crd 0 1 46 -1-1

offset -3 0 1-1

4N

pos 0 2 44

crd 0 1 10 0

7

3 4

Format abstraction

Page 23: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

6

[Kjolstad et al. 2017] This work

Code Generator

4Npos 0 2 44

crd 0 1 10 0

7

3 4

Code Generator

6M

4N

offset -3 0 1-1

crd 0 1 0 30 41

W 6crd 0 1 46 -1-1

offset -3 0 1-1

4N

pos 0 2 44

crd 0 1 10 0

7

3 4

Page 24: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

FC D E

A B

7

A B

0 1 2 3

0

1

2 F

C D E

Storing sparse tensors efficiently requires additional metadata

Page 25: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

FC D E

A B

7

A B0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3

0

1

2

FC D E

Storing sparse tensors efficiently requires additional metadata

Page 26: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

FC D E

A B

7

A B0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3

0

1

2

FC D E

Storing sparse tensors efficiently requires additional metadata

Page 27: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

FC D E

A B

7

A B0 1 2 3 4 5 6 7 8 9 10 11

row(6) = 6 / 4 = 1

col(6) = 6 % 4 = 2

0 1 2 3

0

1

2

FC D E

Storing sparse tensors efficiently requires additional metadata

Page 28: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

FC D E

A B

7

A B0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3

0

1

2

FC D E

Storing sparse tensors efficiently requires additional metadata

Page 29: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

FC D E

A B

7

A B0 1 2 3 4 5 6 7 8 9 10 11

locate(1,2) = 1 * 4 + 2

0 1 2 3

0

1

2

FC D E

= 6

Storing sparse tensors efficiently requires additional metadata

Page 30: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

8

A B C D E F0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3

0

1

2 F

C D E

A B

Storing sparse tensors efficiently requires additional metadata

Page 31: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

8

A B C D E F0 1 2 3 4 5

0 1 2 3

0

1

2 F

C D E

A B

Storing sparse tensors efficiently requires additional metadata

Page 32: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

8

A B C D E F

row(3) = ???

col(3) = ???

0 1 2 3 4 5

0 1 2 3

0

1

2 F

C D E

A B

Storing sparse tensors efficiently requires additional metadata

Page 33: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

9

A B C D E F0 1 2 3 4 5

0 1 2 3

0

1

2 F

C D E

A B

Coordinates of tensor elements can be encoded in many ways

Page 34: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

9

A B C D E F

0 2 1 2 3 3

0 0 1 1 1 2rows

cols

0 1 2 3 4 5

0 1 2 3

0

1

2 F

C D E

A B

Coordinates of tensor elements can be encoded in many ways

Coordinate

Page 35: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

9

A B C D E F

0 2 1 2 3 3

0 0 1 1 1 2rows

cols

0 1 2 3 4 5

0 1 2 3

0

1

2 F

C D E

A B

Coordinates of tensor elements can be encoded in many ways

Coordinate

Page 36: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

9

A B C D E F

0 2 1 2 3 3

0 2 5 6

cols

pos

0 1 2 3 4 5

0 1 2 3

0

1

2 F

C D E

A B

Coordinates of tensor elements can be encoded in many ways

CSR0 1 2 3

Page 37: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

9

A B C D E F

0 2 1 2 3 3

0 2 5 6

cols

pos

0 1 2 3 4 5

0 1 2 3

0

1

2 F

C D E

A B

Coordinates of tensor elements can be encoded in many ways

CSR0 1 2 3

Page 38: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

9

A B C D E F

0 2 1 2 3 3

0 2 5 6

cols

pos

0 1 2 3 4 5

0 1 2 3

0

1

2 F

C D E

A B

Coordinates of tensor elements can be encoded in many ways

CSR0 1 2 3

Page 39: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

9

A B C D E F

0 2 1 2 3 3

0 2 5 6

cols

pos

0 1 2 3 4 5

0 1 2 3

0

1

2 F

C D E

A B

Coordinates of tensor elements can be encoded in many ways

CSR0 1 2 3

Page 40: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

9

A B C D E F

0 2 1 2 3 3

0 2 5 6

cols

pos

0 1 2 3 4 5

0 1 2 3

0

1

2 F

C D E

A B

Coordinates of tensor elements can be encoded in many ways

CSR0 1 2 3

Page 41: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

10

Computing with different formats can require very different codeA = B ∘ C

Page 42: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

10

for (int pB = B1_pos[0]; pB < B1_pos[1]; pB++) { int i = B1_crd[pB]; int j = B2_crd[pB]; int pC = i * N + j; int pA = i * N + j; A[pA] = B[pB] * C[pC]; }

Coordinate ✕ Dense array

Computing with different formats can require very different codeA = B ∘ C

Page 43: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

10

for (int pB = B1_pos[0]; pB < B1_pos[1]; pB++) { int i = B1_crd[pB]; int j = B2_crd[pB]; int pC = i * N + j; int pA = i * N + j; A[pA] = B[pB] * C[pC]; }

Coordinate ✕ Dense array CSR ✕ Dense array

Computing with different formats can require very different codeA = B ∘ C

for (int i = 0; i < M; i++) { for (int pB = B2_pos[i]; pB < B2_pos[i + 1]; pB++) { int j = B2_crd[pB]; int pC = i * N + j; int pA = i * N + j; A[pA] = B[pB] * C[pC]; }}

Page 44: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

10

for (int pB = B1_pos[0]; pB < B1_pos[1]; pB++) { int i = B1_crd[pB]; int j = B2_crd[pB]; int pC = i * N + j; int pA = i * N + j; A[pA] = B[pB] * C[pC]; }

int pC1 = C1_pos[0];while (pC1 < C1_pos[1]) { int i = C1_crd[pC1]; int C1_segend = pC1 + 1; while (C1_segend < C1_pos[1] && C1_crd[C1_segend] == i) C1_segend++; int pB2 = B2_pos[i]; int pC2 = pC1; while (pB2 < B2_pos[i + 1] && pC2 < C1_segend) { int jB2 = B2_crd[pB2]; int jC2 = C2_crd[pC2]; int j = min(jB2, jC2); int pA = i * N + j; if (jB2 == j && jC2 == j) A[pA] = B[pB2] * C[pC2]; if (jB2 == j) pB2++; if (jC2 == j) pC2++; } pC1 = C1_segend;}

Coordinate ✕ Dense array CSR ✕ Dense array CSR ✕ Coordinate

Computing with different formats can require very different codeA = B ∘ C

for (int i = 0; i < M; i++) { for (int pB = B2_pos[i]; pB < B2_pos[i + 1]; pB++) { int j = B2_crd[pB]; int pC = i * N + j; int pA = i * N + j; A[pA] = B[pB] * C[pC]; }}

Page 45: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

Hand-coding support for a wide range of formats is infeasibleA = B ∘ C

Coordinate ✕ Dense array CSR ✕ Dense array CSR ✕ Coordinate

11

Page 46: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

Hand-coding support for a wide range of formats is infeasibleA = B ∘ C

Coordinate ✕ Dense arrayCSR ✕ Dense arrayCSR ✕ Coordinate

11

Page 47: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

Hand-coding support for a wide range of formats is infeasibleA = B ∘ C

Coordinate ✕ Dense arrayCSR ✕ Dense arrayCSR ✕ CoordinateDense array ✕ Dense arrayCoordinate ✕ CoordinateCSR ✕ CSRDIA ✕ DIADIA ✕ Dense arrayDIA ✕ CoordinateDIA ✕ CSRELLPACK ✕ ELLPACKELLPACK ✕ Dense arrayELLPACK ✕ CoordinateELLPACK ✕ CSRELLPACK ✕ DIABCSR ✕ BCSRBCSR ✕ Dense arrayBCSR ✕ Coordinate 11

Page 48: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

Hand-coding support for a wide range of formats is infeasibleA = B ∘ C

Coordinate ✕ Dense arrayCSR ✕ Dense arrayCSR ✕ CoordinateDense array ✕ Dense arrayCoordinate ✕ CoordinateCSR ✕ CSRDIA ✕ DIADIA ✕ Dense arrayDIA ✕ CoordinateDIA ✕ CSRELLPACK ✕ ELLPACKELLPACK ✕ Dense arrayELLPACK ✕ CoordinateELLPACK ✕ CSRELLPACK ✕ DIABCSR ✕ BCSRBCSR ✕ Dense arrayBCSR ✕ Coordinate

A = B ∘ C ∘ DDense array ✕ CSR ✕ CSRCoordinate ✕ CSR ✕ CSRCSR ✕ CSR ✕ CSRDense array ✕ Coordinate ✕ CSRDense array ✕ Dense array ✕ CSRCoordinate ✕ Coordinate ✕ CSRDIA ✕ Coordinate ✕ Dense arrayDIA ✕ Coordinate ✕ CSRDIA ✕ Dense array ✕ CSRDIA ✕ CSR ✕ CSRDIA ✕ Coordinate ✕ CoordinateDIA ✕ Dense array ✕ Dense arrayDIA ✕ DIA ✕ CSRDIA ✕ DIA ✕ CoordinateDIA ✕ DIA ✕ Dense array

ELLPACK ✕ ELLPACK ✕ DIAELLPACK ✕ CSR ✕ DIAELLPACK ✕ BCSR ✕ DIA

DIA ✕ DIA ✕ DIA

11

Page 49: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

Hand-coding support for a wide range of formats is infeasibleA = B ∘ C

Coordinate ✕ Dense arrayCSR ✕ Dense arrayCSR ✕ CoordinateDense array ✕ Dense arrayCoordinate ✕ CoordinateCSR ✕ CSRDIA ✕ DIADIA ✕ Dense arrayDIA ✕ CoordinateDIA ✕ CSRELLPACK ✕ ELLPACKELLPACK ✕ Dense arrayELLPACK ✕ CoordinateELLPACK ✕ CSRELLPACK ✕ DIABCSR ✕ BCSRBCSR ✕ Dense arrayBCSR ✕ Coordinate

A = B ∘ C ∘ DDense array ✕ CSR ✕ CSRCoordinate ✕ CSR ✕ CSRCSR ✕ CSR ✕ CSRDense array ✕ Coordinate ✕ CSRDense array ✕ Dense array ✕ CSRCoordinate ✕ Coordinate ✕ CSRDIA ✕ Coordinate ✕ Dense arrayDIA ✕ Coordinate ✕ CSRDIA ✕ Dense array ✕ CSRDIA ✕ CSR ✕ CSRDIA ✕ Coordinate ✕ CoordinateDIA ✕ Dense array ✕ Dense arrayDIA ✕ DIA ✕ CSRDIA ✕ DIA ✕ CoordinateDIA ✕ DIA ✕ Dense array

ELLPACK ✕ ELLPACK ✕ DIAELLPACK ✕ CSR ✕ DIAELLPACK ✕ BCSR ✕ DIA

DIA ✕ DIA ✕ DIA

y = Ax + zDense array ✕ Dense array ✕ Dense arrayDense array ✕ Dense array ✕ Sparse vectorDense array ✕ Dense array ✕ Hash mapDense array ✕ Sparse vector ✕ Sparse vectorDense array ✕ Sparse vector ✕ Hash mapDense array ✕ Hash map ✕ Sparse vectorDense array ✕ Sparse vector ✕ Dense arrayCoordinate ✕ Dense array ✕ Dense arrayCoordinate ✕ Sparse vector ✕ Dense arrayCoordinate ✕ Dense array ✕ Hash mapCoordinate ✕ Sparse vector ✕ Hash mapCoordinate ✕ Hash map ✕ Sparse vectorCSR ✕ Dense array ✕ Dense arrayCSR ✕ Dense array ✕ Sparse vectorCSR ✕ Hash map ✕ Sparse vectorCSR ✕ Hash map ✕ Dense arrayCSR ✕ Sparse vector ✕ Dense arrayDIA ✕ Dense array ✕ Dense arrayDIA ✕ Hash map ✕ Dense arrayELLPACK ✕ Dense array ✕ Sparse vector 11

Page 50: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

12

Evaluation

Mode-generic tensorCompressed

Singleton

Dense

Dense

DIADenseRangeOffset

123:16 Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe

x supports locate?

y supports locate?

y supports locate?

x unordered and y ordered?

co-iterate over x and y

iterate over x and locate into y

iterate over y and locate into x

no yes

no

yes yes

no

no

yes

x/y ordered or accessed with locate?

output unordered or

supports insert?

x/y unique?

reorder x/y

aggregate duplicates in x/y

no

yes

yes

yes

no

converted iteratorover x/y

iterator over x/y

co-iterating over x and y?

yes

x/y unique?

no

yes

no

Fig. 8. The most efficient strategies for computing the intersection merge of two vectors x andy, depending onwhether they support the locate capability and whether they are ordered and unique. The sparsity structureof y is assumed to not be a strict subset of the sparsity structure of x . The flowchart on the right describes,for each operand, what iterator conversions are needed at runtime to compute the merge.

If one of the input vectors,y, supports the locate capability (e.g., it is a dense array), we can insteadjust iterate over the nonzero components of x and, for each component, locate the component withthe same coordinate in y. Lines 2–9 in Figure 1b shows another example of this method applied tomerge the column dimensions of a CSR matrix and a dense matrix. This alternative method reducesthe merge complexity from O(nnz(x) + nnz(y)) to O(nnz(x)) assuming locate runs in constanttime. Moreover, this method does not require enumerating the coordinates of y in order. We do noteven need to enumerate the coordinates of x in order, as long as there are no duplicates and we donot need to compute output components in order (e.g., if the output supports the insert capability).This method is thus ideal for computing intersection merges of unordered levels.

We can generalize and combine the two methods described above to compute arbitrarily complexmerges involving unions and intersections of any number of tensor operands. At a high level, anymerge can be computed by co-iterating over some subset of its operands and, for every enumeratedcoordinate, locating that same coordinate in all the remaining operands with calls to locate. Whichoperands need to be co-iterated can be identified recursively from the expression expr that we wantto compute. In particular, for each subexpression e = e1 op e2 in expr , let Coiter (e) denote the setof operand coordinate hierarchy levels that need to be co-iterated in order to compute e . If op is anoperation that requires a union merge (e.g., addition), then computing e requires co-iterating overall the levels that would have to be co-iterated in order to separately compute e1 and e2; in otherwords, Coiter (e) = Coiter (e1) ∪Coiter (e2). On the other hand, if op is an operation that requiresan intersection merge (e.g., multiplication), then the set of coordinates of nonzeros in the resulte must be a subset of the coordinates of nonzeros in either operand e1 or e2. Thus, in order toenumerate the coordinates of all nonzeros in the result, it suffices to co-iterate over all the levelsmerged by just one of the operands. Without loss of generality, this lets us compute e withouthaving to co-iterate over levels merged by e2 that can instead be accessed with locate; in otherwords,Coiter (e) = Coiter (e1)∪ (Coiter (e2) \LocateCapable(e2)), where LocateCapable(e2) denotesthe set of levels merged by e2 that support the locate capability.

4.5 Code Generation Algorithm

Figure 9a shows our code generation algorithm, which incorporates all of the concepts we presentedin the previous subsections. Each part of the algorithm is labeled from 1 to 11; throughout the

Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 123. Publication date: November 2018.

Format Abstraction & Code Generation

Page 51: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

12

Evaluation

Mode-generic tensorCompressed

Singleton

Dense

Dense

DIADenseRangeOffset

123:16 Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe

x supports locate?

y supports locate?

y supports locate?

x unordered and y ordered?

co-iterate over x and y

iterate over x and locate into y

iterate over y and locate into x

no yes

no

yes yes

no

no

yes

x/y ordered or accessed with locate?

output unordered or

supports insert?

x/y unique?

reorder x/y

aggregate duplicates in x/y

no

yes

yes

yes

no

converted iteratorover x/y

iterator over x/y

co-iterating over x and y?

yes

x/y unique?

no

yes

no

Fig. 8. The most efficient strategies for computing the intersection merge of two vectors x andy, depending onwhether they support the locate capability and whether they are ordered and unique. The sparsity structureof y is assumed to not be a strict subset of the sparsity structure of x . The flowchart on the right describes,for each operand, what iterator conversions are needed at runtime to compute the merge.

If one of the input vectors,y, supports the locate capability (e.g., it is a dense array), we can insteadjust iterate over the nonzero components of x and, for each component, locate the component withthe same coordinate in y. Lines 2–9 in Figure 1b shows another example of this method applied tomerge the column dimensions of a CSR matrix and a dense matrix. This alternative method reducesthe merge complexity from O(nnz(x) + nnz(y)) to O(nnz(x)) assuming locate runs in constanttime. Moreover, this method does not require enumerating the coordinates of y in order. We do noteven need to enumerate the coordinates of x in order, as long as there are no duplicates and we donot need to compute output components in order (e.g., if the output supports the insert capability).This method is thus ideal for computing intersection merges of unordered levels.

We can generalize and combine the two methods described above to compute arbitrarily complexmerges involving unions and intersections of any number of tensor operands. At a high level, anymerge can be computed by co-iterating over some subset of its operands and, for every enumeratedcoordinate, locating that same coordinate in all the remaining operands with calls to locate. Whichoperands need to be co-iterated can be identified recursively from the expression expr that we wantto compute. In particular, for each subexpression e = e1 op e2 in expr , let Coiter (e) denote the setof operand coordinate hierarchy levels that need to be co-iterated in order to compute e . If op is anoperation that requires a union merge (e.g., addition), then computing e requires co-iterating overall the levels that would have to be co-iterated in order to separately compute e1 and e2; in otherwords, Coiter (e) = Coiter (e1) ∪Coiter (e2). On the other hand, if op is an operation that requiresan intersection merge (e.g., multiplication), then the set of coordinates of nonzeros in the resulte must be a subset of the coordinates of nonzeros in either operand e1 or e2. Thus, in order toenumerate the coordinates of all nonzeros in the result, it suffices to co-iterate over all the levelsmerged by just one of the operands. Without loss of generality, this lets us compute e withouthaving to co-iterate over levels merged by e2 that can instead be accessed with locate; in otherwords,Coiter (e) = Coiter (e1)∪ (Coiter (e2) \LocateCapable(e2)), where LocateCapable(e2) denotesthe set of levels merged by e2 that support the locate capability.

4.5 Code Generation Algorithm

Figure 9a shows our code generation algorithm, which incorporates all of the concepts we presentedin the previous subsections. Each part of the algorithm is labeled from 1 to 11; throughout the

Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 123. Publication date: November 2018.

Format Abstraction & Code Generation

Page 52: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

12

Evaluation

Mode-generic tensorCompressed

Singleton

Dense

Dense

DIADenseRangeOffset

123:16 Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe

x supports locate?

y supports locate?

y supports locate?

x unordered and y ordered?

co-iterate over x and y

iterate over x and locate into y

iterate over y and locate into x

no yes

no

yes yes

no

no

yes

x/y ordered or accessed with locate?

output unordered or

supports insert?

x/y unique?

reorder x/y

aggregate duplicates in x/y

no

yes

yes

yes

no

converted iteratorover x/y

iterator over x/y

co-iterating over x and y?

yes

x/y unique?

no

yes

no

Fig. 8. The most efficient strategies for computing the intersection merge of two vectors x andy, depending onwhether they support the locate capability and whether they are ordered and unique. The sparsity structureof y is assumed to not be a strict subset of the sparsity structure of x . The flowchart on the right describes,for each operand, what iterator conversions are needed at runtime to compute the merge.

If one of the input vectors,y, supports the locate capability (e.g., it is a dense array), we can insteadjust iterate over the nonzero components of x and, for each component, locate the component withthe same coordinate in y. Lines 2–9 in Figure 1b shows another example of this method applied tomerge the column dimensions of a CSR matrix and a dense matrix. This alternative method reducesthe merge complexity from O(nnz(x) + nnz(y)) to O(nnz(x)) assuming locate runs in constanttime. Moreover, this method does not require enumerating the coordinates of y in order. We do noteven need to enumerate the coordinates of x in order, as long as there are no duplicates and we donot need to compute output components in order (e.g., if the output supports the insert capability).This method is thus ideal for computing intersection merges of unordered levels.

We can generalize and combine the two methods described above to compute arbitrarily complexmerges involving unions and intersections of any number of tensor operands. At a high level, anymerge can be computed by co-iterating over some subset of its operands and, for every enumeratedcoordinate, locating that same coordinate in all the remaining operands with calls to locate. Whichoperands need to be co-iterated can be identified recursively from the expression expr that we wantto compute. In particular, for each subexpression e = e1 op e2 in expr , let Coiter (e) denote the setof operand coordinate hierarchy levels that need to be co-iterated in order to compute e . If op is anoperation that requires a union merge (e.g., addition), then computing e requires co-iterating overall the levels that would have to be co-iterated in order to separately compute e1 and e2; in otherwords, Coiter (e) = Coiter (e1) ∪Coiter (e2). On the other hand, if op is an operation that requiresan intersection merge (e.g., multiplication), then the set of coordinates of nonzeros in the resulte must be a subset of the coordinates of nonzeros in either operand e1 or e2. Thus, in order toenumerate the coordinates of all nonzeros in the result, it suffices to co-iterate over all the levelsmerged by just one of the operands. Without loss of generality, this lets us compute e withouthaving to co-iterate over levels merged by e2 that can instead be accessed with locate; in otherwords,Coiter (e) = Coiter (e1)∪ (Coiter (e2) \LocateCapable(e2)), where LocateCapable(e2) denotesthe set of levels merged by e2 that support the locate capability.

4.5 Code Generation Algorithm

Figure 9a shows our code generation algorithm, which incorporates all of the concepts we presentedin the previous subsections. Each part of the algorithm is labeled from 1 to 11; throughout the

Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 123. Publication date: November 2018.

Format Abstraction & Code Generation

Page 53: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

13

J

GH

3210

0

1

1

0

2

B C

A

F

E

D

Tensor formats can be viewed as compositions of level formats

Page 54: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

13

J

GH

3210

0

1

1

0

2

B C

A

F

E

D

0 3 5 9

3

0 1 1 0 1 0 0 1 1

0 0 2 3 1 1 3 0 3

A B C D E F G H J0 1 2 3 4 5 6 7 8

Tensor formats can be viewed as compositions of level formats

Page 55: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

13

J

GH

3210

0

1

1

0

2

B C

A

F

E

D0 3 5 9

3

0 1 1 0 1 0 0 1 1

0 0 2 3 1 1 3 0 3

A B C D E F G H J0 1 2 3 4 5 6 7 8

Tensor formats can be viewed as compositions of level formats

Page 56: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

13

J

GH

3210

0

1

1

0

2

B C

A

F

E

D0 3 5 9

3

0 1 1 0 1 0 0 1 1

0 0 2 3 1 1 3 0 3

A B C D E F G H J0 1 2 3 4 5 6 7 8

Slices

Tensor formats can be viewed as compositions of level formats

Page 57: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

13

J

GH

3210

0

1

1

0

2

B C

A

F

E

D0 3 5 9

3

0 1 1 0 1 0 0 1 1

0 0 2 3 1 1 3 0 3

A B C D E F G H J0 1 2 3 4 5 6 7 8

Rows

Tensor formats can be viewed as compositions of level formats

Page 58: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

13

J

GH

3210

0

1

1

0

2

B C

A

F

E

D0 3 5 9

3

0 1 1 0 1 0 0 1 1

0 0 2 3 1 1 3 0 3

A B C D E F G H J0 1 2 3 4 5 6 7 8

Columns

Tensor formats can be viewed as compositions of level formats

Page 59: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

13

J

GH

3210

0

1

1

0

2

B C

A

F

E

D0 3 5 9

3

0 1 1 0 1 0 0 1 1

0 0 2 3 1 1 3 0 3

A B C D E F G H J0 1 2 3 4 5 6 7 8

Compressed

Singleton

Dense

Tensor formats can be viewed as compositions of level formats

Page 60: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

14

Dense Compressed Singleton

The same level formats can be composed in many ways

Page 61: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

14

Dense Compressed Singleton

A B

0 1 2 3

0

1

2 F

C D E

The same level formats can be composed in many ways

Page 62: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

14

Dense Compressed Singleton

A B

0 1 2 3

0

1

2 F

C D E

3Dense

The same level formats can be composed in many ways

Page 63: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

14

A B C D E F

Dense Compressed Singleton

A B

0 1 2 3

0

1

2 F

C D E

3

0 2 2 3 310 2 5 6

Dense

Compressed

The same level formats can be composed in many ways

Page 64: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

14

A B C D E F

Dense Compressed Singleton

A B

0 1 2 3

0

1

2 F

C D E

3

0 2 2 3 310 2 5 6

The same level formats can be composed in many ways

CSR{

Page 65: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

15

Dense Compressed Singleton

A B

0 1 2 3

0

1

2 F

C D E

The same level formats can be composed in many ways

Page 66: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

15

Dense Compressed Singleton

0 0 1 1 210 6

A B

0 1 2 3

0

1

2 F

C D E

Compressed

The same level formats can be composed in many ways

Page 67: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

15

A B C D E F

0 2 2 3 31

Dense Compressed Singleton

0 0 1 1 210 6

A B

0 1 2 3

0

1

2 F

C D E

Compressed

Singleton

The same level formats can be composed in many ways

Page 68: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

15

A B C D E F

0 2 2 3 31

Dense Compressed Singleton

0 0 1 1 210 6

A B

0 1 2 3

0

1

2 F

C D E

The same level formats can be composed in many ways

Coordinate{

Page 69: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

16

Dense Compressed Singleton

The same level formats can be composed in many ways

Page 70: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

Tensorformats

Level formats

16

Coordinate matrixCompressed

Singleton

CSRDense

Compressed

Dense Compressed Singleton

[Tinney and Walker, 1967]

The same level formats can be composed in many ways

Page 71: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

Tensorformats

Level formats

16

Coordinate matrixCompressed

Singleton

CSRDense

Compressed

Coordinate tensorCompressed

Singleton

Singleton

Dense Compressed Singleton

Mode-generic tensorCompressed

Singleton

Dense

Dense

[Baskaran et al. 2012]

[Tinney and Walker, 1967]

Dense array tensorDenseDenseDense

The same level formats can be composed in many ways

Page 72: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

Tensorformats

Level formats

16

Coordinate matrixCompressed

Singleton

CSRDense

Compressed

Coordinate tensorCompressed

Singleton

Singleton

BCSRDense

Compressed

Dense

Dense

Dense Compressed Singleton

ELLPACKDense

Dense

Singleton

Mode-generic tensorCompressed

Singleton

Dense

Dense

[Baskaran et al. 2012]

CSBDense

Dense

Compressed

Singleton [Kincaid et al. 1989][Buluç et al. 2009]

[Tinney and Walker, 1967]

[Im and Yelick 1998]

Dense array tensorDenseDenseDense

The same level formats can be composed in many ways

Page 73: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

Tensorformats

Level formats

16

Hashed Range Offset

Coordinate matrixCompressed

Singleton

CSRDense

Compressed

Coordinate tensorCompressed

Singleton

Singleton

BCSRDense

Compressed

Dense

Dense

Dense Compressed Singleton

ELLPACKDense

Dense

Singleton

Mode-generic tensorCompressed

Singleton

Dense

Dense

[Baskaran et al. 2012]

CSBDense

Dense

Compressed

Singleton [Kincaid et al. 1989][Buluç et al. 2009]

[Tinney and Walker, 1967]

[Im and Yelick 1998]

Dense array tensorDenseDenseDense

The same level formats can be composed in many ways

Page 74: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

Tensorformats

Level formats

16

Hashed Range Offset

Coordinate matrixCompressed

Singleton

CSRDense

Compressed

Coordinate tensorCompressed

Singleton

Singleton

BCSRDense

Compressed

Dense

Dense

DIADenseRangeOffset

Block DIADenseRangeOffsetDenseDense

Hash map vectorHashed

Dense Compressed Singleton

ELLPACKDense

Dense

Singleton

Mode-generic tensorCompressed

Singleton

Dense

Dense

[Baskaran et al. 2012]

Hash map matrixHashedHashed

CSBDense

Dense

Compressed

Singleton [Kincaid et al. 1989][Buluç et al. 2009]

[Tinney and Walker, 1967]

[Im and Yelick 1998]

[Saad 2003]

[Patwary et al. 2015]

Dense array tensorDenseDenseDense

The same level formats can be composed in many ways

Page 75: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

17

for (int i = 0; i < m; i++) {

for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1 + 1]; pB2++) { int j = B2_idx[pB2]; int pA2 = (i * n) + j;

int pB3 = B3_pos[pB2]; int pc1 = c1_pos[0]; while (pB3 < B3_pos[pB2 + 1] && pc1 < c1_pos[1]) { int kB = B3_idx[pB3]; int kc = c1_idx[pc1]; int k = min(kB, kc); if (kB == k && kc == k) { a[pA2] += b[pB3] * c[pc1]; } if (kB == k) pB3++; if (kc == k) pc1++; } } }

Tensor Algebra Compiler (taco)

c : (compressed)<latexit sha1_base64="znx8HB33iu6URMn3Pu43WWiyqTY=">AAACAHicbVC7SgNBFJ31GeNr1cLCZjAIsQm7IihWQRvLCOYBSQizk7vJkNmdZeauGJY0/oqNhSK2foadf+PkUWjigYHDOfdw554gkcKg5307S8srq2vruY385tb2zq67t18zKtUcqlxJpRsBMyBFDFUUKKGRaGBRIKEeDG7Gfv0BtBEqvsdhAu2I9WIRCs7QSh33kNMrWmwhPGLGVWSzxkB3dNpxC17Jm4AuEn9GCmSGSsf9anUVTyOIkUtmTNP3EmxnTKPgEkb5VmogYXzAetC0NGYRmHY2OWBET6zSpaHS9sVIJ+rvRMYiY4ZRYCcjhn0z743F/7xmiuFlOxNxkiLEfLooTCVFRcdt0K7QwFEOLWFcC/tXyvtMM462s7wtwZ8/eZHUzkq+5XfnhfL1rI4cOSLHpEh8ckHK5JZUSJVwMiLP5JW8OU/Oi/PufExHl5xZ5oD8gfP5Awmdlg0=</latexit><latexit sha1_base64="znx8HB33iu6URMn3Pu43WWiyqTY=">AAACAHicbVC7SgNBFJ31GeNr1cLCZjAIsQm7IihWQRvLCOYBSQizk7vJkNmdZeauGJY0/oqNhSK2foadf+PkUWjigYHDOfdw554gkcKg5307S8srq2vruY385tb2zq67t18zKtUcqlxJpRsBMyBFDFUUKKGRaGBRIKEeDG7Gfv0BtBEqvsdhAu2I9WIRCs7QSh33kNMrWmwhPGLGVWSzxkB3dNpxC17Jm4AuEn9GCmSGSsf9anUVTyOIkUtmTNP3EmxnTKPgEkb5VmogYXzAetC0NGYRmHY2OWBET6zSpaHS9sVIJ+rvRMYiY4ZRYCcjhn0z743F/7xmiuFlOxNxkiLEfLooTCVFRcdt0K7QwFEOLWFcC/tXyvtMM462s7wtwZ8/eZHUzkq+5XfnhfL1rI4cOSLHpEh8ckHK5JZUSJVwMiLP5JW8OU/Oi/PufExHl5xZ5oD8gfP5Awmdlg0=</latexit><latexit sha1_base64="znx8HB33iu6URMn3Pu43WWiyqTY=">AAACAHicbVC7SgNBFJ31GeNr1cLCZjAIsQm7IihWQRvLCOYBSQizk7vJkNmdZeauGJY0/oqNhSK2foadf+PkUWjigYHDOfdw554gkcKg5307S8srq2vruY385tb2zq67t18zKtUcqlxJpRsBMyBFDFUUKKGRaGBRIKEeDG7Gfv0BtBEqvsdhAu2I9WIRCs7QSh33kNMrWmwhPGLGVWSzxkB3dNpxC17Jm4AuEn9GCmSGSsf9anUVTyOIkUtmTNP3EmxnTKPgEkb5VmogYXzAetC0NGYRmHY2OWBET6zSpaHS9sVIJ+rvRMYiY4ZRYCcjhn0z743F/7xmiuFlOxNxkiLEfLooTCVFRcdt0K7QwFEOLWFcC/tXyvtMM462s7wtwZ8/eZHUzkq+5XfnhfL1rI4cOSLHpEh8ckHK5JZUSJVwMiLP5JW8OU/Oi/PufExHl5xZ5oD8gfP5Awmdlg0=</latexit><latexit sha1_base64="znx8HB33iu6URMn3Pu43WWiyqTY=">AAACAHicbVC7SgNBFJ31GeNr1cLCZjAIsQm7IihWQRvLCOYBSQizk7vJkNmdZeauGJY0/oqNhSK2foadf+PkUWjigYHDOfdw554gkcKg5307S8srq2vruY385tb2zq67t18zKtUcqlxJpRsBMyBFDFUUKKGRaGBRIKEeDG7Gfv0BtBEqvsdhAu2I9WIRCs7QSh33kNMrWmwhPGLGVWSzxkB3dNpxC17Jm4AuEn9GCmSGSsf9anUVTyOIkUtmTNP3EmxnTKPgEkb5VmogYXzAetC0NGYRmHY2OWBET6zSpaHS9sVIJ+rvRMYiY4ZRYCcjhn0z743F/7xmiuFlOxNxkiLEfLooTCVFRcdt0K7QwFEOLWFcC/tXyvtMM462s7wtwZ8/eZHUzkq+5XfnhfL1rI4cOSLHpEh8ckHK5JZUSJVwMiLP5JW8OU/Oi/PufExHl5xZ5oD8gfP5Awmdlg0=</latexit>

B : (dense, compressed, compressed)<latexit sha1_base64="4q1Sk4lxncbqFoMX5S1GLw3t+GQ=">AAACIXicbVDLSgMxFM34rPVVdekmWIQKIjMiWFyVunFZwT6gLSWTuW2DmcyQ3BHL0F9x46+4caFId+LPmD4W2vZA4OSce29yjx9LYdB1v52V1bX1jc3MVnZ7Z3dvP3dwWDNRojlUeSQj3fCZASkUVFGghEasgYW+hLr/eDv260+gjYjUAw5iaIesp0RXcIZW6uSKZXpDCy2EZ0wDUAaG53R641FoJxkDwTLprJPLuxfuBHSReDOSJzNUOrlRK4h4EoJCLpkxTc+NsZ0yjYJLGGZbiYGY8UfWg6alioVg2ulkwyE9tUpAu5G2RyGdqH87UhYaMwh9Wxky7Jt5bywu85oJdovtVKg4QVB8+lA3kRQjOo6LBkIDRzmwhHEt7F8p7zPNONpQszYEb37lRVK7vPAsv7/Kl8qzODLkmJyQAvHINSmRO1IhVcLJC3kjH+TTeXXenS9nNC1dcWY9R+QfnJ9fTfykRA==</latexit><latexit sha1_base64="4q1Sk4lxncbqFoMX5S1GLw3t+GQ=">AAACIXicbVDLSgMxFM34rPVVdekmWIQKIjMiWFyVunFZwT6gLSWTuW2DmcyQ3BHL0F9x46+4caFId+LPmD4W2vZA4OSce29yjx9LYdB1v52V1bX1jc3MVnZ7Z3dvP3dwWDNRojlUeSQj3fCZASkUVFGghEasgYW+hLr/eDv260+gjYjUAw5iaIesp0RXcIZW6uSKZXpDCy2EZ0wDUAaG53R641FoJxkDwTLprJPLuxfuBHSReDOSJzNUOrlRK4h4EoJCLpkxTc+NsZ0yjYJLGGZbiYGY8UfWg6alioVg2ulkwyE9tUpAu5G2RyGdqH87UhYaMwh9Wxky7Jt5bywu85oJdovtVKg4QVB8+lA3kRQjOo6LBkIDRzmwhHEt7F8p7zPNONpQszYEb37lRVK7vPAsv7/Kl8qzODLkmJyQAvHINSmRO1IhVcLJC3kjH+TTeXXenS9nNC1dcWY9R+QfnJ9fTfykRA==</latexit><latexit sha1_base64="4q1Sk4lxncbqFoMX5S1GLw3t+GQ=">AAACIXicbVDLSgMxFM34rPVVdekmWIQKIjMiWFyVunFZwT6gLSWTuW2DmcyQ3BHL0F9x46+4caFId+LPmD4W2vZA4OSce29yjx9LYdB1v52V1bX1jc3MVnZ7Z3dvP3dwWDNRojlUeSQj3fCZASkUVFGghEasgYW+hLr/eDv260+gjYjUAw5iaIesp0RXcIZW6uSKZXpDCy2EZ0wDUAaG53R641FoJxkDwTLprJPLuxfuBHSReDOSJzNUOrlRK4h4EoJCLpkxTc+NsZ0yjYJLGGZbiYGY8UfWg6alioVg2ulkwyE9tUpAu5G2RyGdqH87UhYaMwh9Wxky7Jt5bywu85oJdovtVKg4QVB8+lA3kRQjOo6LBkIDRzmwhHEt7F8p7zPNONpQszYEb37lRVK7vPAsv7/Kl8qzODLkmJyQAvHINSmRO1IhVcLJC3kjH+TTeXXenS9nNC1dcWY9R+QfnJ9fTfykRA==</latexit><latexit sha1_base64="4q1Sk4lxncbqFoMX5S1GLw3t+GQ=">AAACIXicbVDLSgMxFM34rPVVdekmWIQKIjMiWFyVunFZwT6gLSWTuW2DmcyQ3BHL0F9x46+4caFId+LPmD4W2vZA4OSce29yjx9LYdB1v52V1bX1jc3MVnZ7Z3dvP3dwWDNRojlUeSQj3fCZASkUVFGghEasgYW+hLr/eDv260+gjYjUAw5iaIesp0RXcIZW6uSKZXpDCy2EZ0wDUAaG53R641FoJxkDwTLprJPLuxfuBHSReDOSJzNUOrlRK4h4EoJCLpkxTc+NsZ0yjYJLGGZbiYGY8UfWg6alioVg2ulkwyE9tUpAu5G2RyGdqH87UhYaMwh9Wxky7Jt5bywu85oJdovtVKg4QVB8+lA3kRQjOo6LBkIDRzmwhHEt7F8p7zPNONpQszYEb37lRVK7vPAsv7/Kl8qzODLkmJyQAvHINSmRO1IhVcLJC3kjH+TTeXXenS9nNC1dcWY9R+QfnJ9fTfykRA==</latexit>

A : (dense, dense)

Aij =X

k

Bijkck

[Kjolstad et al. 2017]

Page 76: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

18

for (int i = 0; i < m; i++) { for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1 + 1]; pB2++) { int j = B2_idx[pB2]; int pA2 = (i * n) + j; int pB3 = B3_pos[pB2]; int pc1 = c1_pos[0]; while (pB3 < B3_pos[pB2 + 1] && pc1 < c1_pos[1]) { int kB = B3_crd[pB3]; int kc = c1_crd[pc1]; int k = min(kB, kc); if (kB == k && kc == k) { A[pA2] += B[pB3] * c[pc1]; } if (kB == k) pB3++; if (kc == k) pc1++; } } }

Aij =X

k

Bijk · ck<latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="ck8pdC+ekZH4nUmSP+ZG7r8lEyk=">AAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odOn4MoA7ncAFXEMIN3MEDdKALAhJ4hXdv4r15H6uuat66tDP4I+/zBzjGijg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">AAACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKKkW+QNJLgShH7O/tLv+m8ppClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vff9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bbSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO33zrp03pTOag23bBEaB3HSFVqHgwiHUTyyhubOZPPB0XMOvgS8gSOwict88Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe//BKugyF2/BOBXXAADsEXgEEEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">AAACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKKkW+QNJLgShH7O/tLv+m8ppClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vff9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bbSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO33zrp03pTOag23bBEaB3HSFVqHgwiHUTyyhubOZPPB0XMOvgS8gSOwict88Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe//BKugyF2/BOBXXAADsEXgEEEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="9QtCk7M63G3IchN+Z5Vs/Le7raY=">AAACinicbZHfatswFMZld1u7rNvS9nI3YmGwiy1IdhPblELX7WKXLSxtITZGVpRUi/wHSS4EoYfpK+1ubzM5zWBLe0Dw4zvn0znSKRrBlUbot+fvPHv+YnfvZe/V/us3b/sHh1eqbiVlE1qLWt4URDHBKzbRXAt200hGykKw62L5tctf3zGpeF390KuGZSVZVHzOKdFOyvv3Jl1fMpWLIjNoiOIRCo8/oeE4TBCOHSAchXFgv+SG/7QWnsJUtWVultZsWfEoPE4C5wijMQ6RgzhOUIDteWddOm9KZ7WGW7YIjYM46Rqtw0GEwygeWUNzZ7J5f/A3Bx8D3sAAbOIi7/9KZzVtS1ZpKohSU4wanRkiNaeC2V7aKtYQuiQLNnVYkZKpzKxnsvCDU2ZwXkt3Kg3X6r8OQ0qlVmXhKkuib9V2rhOfyk1bPY8zw6um1ayiD43mrYC6ht1e4IxLRrVYOSBUcjcrpLdEEqrd9nruE/D2kx/DVTDEji/R4Ox88x174B14Dz4CDCJwBr6DCzAB1Nv1PntjL/L3/cBP/JOHUt/beI7Af+F/+wOGoLrq</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit>

taco generates code dimension by dimension

Page 77: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

18

for (int i = 0; i < m; i++) { for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1 + 1]; pB2++) { int j = B2_idx[pB2]; int pA2 = (i * n) + j; int pB3 = B3_pos[pB2]; int pc1 = c1_pos[0]; while (pB3 < B3_pos[pB2 + 1] && pc1 < c1_pos[1]) { int kB = B3_crd[pB3]; int kc = c1_crd[pc1]; int k = min(kB, kc); if (kB == k && kc == k) { A[pA2] += B[pB3] * c[pc1]; } if (kB == k) pB3++; if (kc == k) pc1++; } } }

Aij =X

k

Bijk · ck<latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="ck8pdC+ekZH4nUmSP+ZG7r8lEyk=">AAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odOn4MoA7ncAFXEMIN3MEDdKALAhJ4hXdv4r15H6uuat66tDP4I+/zBzjGijg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">AAACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKKkW+QNJLgShH7O/tLv+m8ppClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vff9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bbSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO33zrp03pTOag23bBEaB3HSFVqHgwiHUTyyhubOZPPB0XMOvgS8gSOwict88Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe//BKugyF2/BOBXXAADsEXgEEEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">AAACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKKkW+QNJLgShH7O/tLv+m8ppClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vff9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bbSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO33zrp03pTOag23bBEaB3HSFVqHgwiHUTyyhubOZPPB0XMOvgS8gSOwict88Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe//BKugyF2/BOBXXAADsEXgEEEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="9QtCk7M63G3IchN+Z5Vs/Le7raY=">AAACinicbZHfatswFMZld1u7rNvS9nI3YmGwiy1IdhPblELX7WKXLSxtITZGVpRUi/wHSS4EoYfpK+1ubzM5zWBLe0Dw4zvn0znSKRrBlUbot+fvPHv+YnfvZe/V/us3b/sHh1eqbiVlE1qLWt4URDHBKzbRXAt200hGykKw62L5tctf3zGpeF390KuGZSVZVHzOKdFOyvv3Jl1fMpWLIjNoiOIRCo8/oeE4TBCOHSAchXFgv+SG/7QWnsJUtWVultZsWfEoPE4C5wijMQ6RgzhOUIDteWddOm9KZ7WGW7YIjYM46Rqtw0GEwygeWUNzZ7J5f/A3Bx8D3sAAbOIi7/9KZzVtS1ZpKohSU4wanRkiNaeC2V7aKtYQuiQLNnVYkZKpzKxnsvCDU2ZwXkt3Kg3X6r8OQ0qlVmXhKkuib9V2rhOfyk1bPY8zw6um1ayiD43mrYC6ht1e4IxLRrVYOSBUcjcrpLdEEqrd9nruE/D2kx/DVTDEji/R4Ox88x174B14Dz4CDCJwBr6DCzAB1Nv1PntjL/L3/cBP/JOHUt/beI7Af+F/+wOGoLrq</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit>

taco generates code dimension by dimension

Page 78: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

18

for (int i = 0; i < m; i++) { for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1 + 1]; pB2++) { int j = B2_idx[pB2]; int pA2 = (i * n) + j; int pB3 = B3_pos[pB2]; int pc1 = c1_pos[0]; while (pB3 < B3_pos[pB2 + 1] && pc1 < c1_pos[1]) { int kB = B3_crd[pB3]; int kc = c1_crd[pc1]; int k = min(kB, kc); if (kB == k && kc == k) { A[pA2] += B[pB3] * c[pc1]; } if (kB == k) pB3++; if (kc == k) pc1++; } } }

Aij =X

k

Bijk · ck<latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="ck8pdC+ekZH4nUmSP+ZG7r8lEyk=">AAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odOn4MoA7ncAFXEMIN3MEDdKALAhJ4hXdv4r15H6uuat66tDP4I+/zBzjGijg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">AAACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKKkW+QNJLgShH7O/tLv+m8ppClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vff9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bbSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO33zrp03pTOag23bBEaB3HSFVqHgwiHUTyyhubOZPPB0XMOvgS8gSOwict88Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe//BKugyF2/BOBXXAADsEXgEEEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">AAACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKKkW+QNJLgShH7O/tLv+m8ppClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vff9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bbSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO33zrp03pTOag23bBEaB3HSFVqHgwiHUTyyhubOZPPB0XMOvgS8gSOwict88Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe//BKugyF2/BOBXXAADsEXgEEEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="9QtCk7M63G3IchN+Z5Vs/Le7raY=">AAACinicbZHfatswFMZld1u7rNvS9nI3YmGwiy1IdhPblELX7WKXLSxtITZGVpRUi/wHSS4EoYfpK+1ubzM5zWBLe0Dw4zvn0znSKRrBlUbot+fvPHv+YnfvZe/V/us3b/sHh1eqbiVlE1qLWt4URDHBKzbRXAt200hGykKw62L5tctf3zGpeF390KuGZSVZVHzOKdFOyvv3Jl1fMpWLIjNoiOIRCo8/oeE4TBCOHSAchXFgv+SG/7QWnsJUtWVultZsWfEoPE4C5wijMQ6RgzhOUIDteWddOm9KZ7WGW7YIjYM46Rqtw0GEwygeWUNzZ7J5f/A3Bx8D3sAAbOIi7/9KZzVtS1ZpKohSU4wanRkiNaeC2V7aKtYQuiQLNnVYkZKpzKxnsvCDU2ZwXkt3Kg3X6r8OQ0qlVmXhKkuib9V2rhOfyk1bPY8zw6um1ayiD43mrYC6ht1e4IxLRrVYOSBUcjcrpLdEEqrd9nruE/D2kx/DVTDEji/R4Ox88x174B14Dz4CDCJwBr6DCzAB1Nv1PntjL/L3/cBP/JOHUt/beI7Af+F/+wOGoLrq</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit>

taco generates code dimension by dimension

Page 79: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

18

for (int i = 0; i < m; i++) { for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1 + 1]; pB2++) { int j = B2_idx[pB2]; int pA2 = (i * n) + j; int pB3 = B3_pos[pB2]; int pc1 = c1_pos[0]; while (pB3 < B3_pos[pB2 + 1] && pc1 < c1_pos[1]) { int kB = B3_crd[pB3]; int kc = c1_crd[pc1]; int k = min(kB, kc); if (kB == k && kc == k) { A[pA2] += B[pB3] * c[pc1]; } if (kB == k) pB3++; if (kc == k) pc1++; } } }

Aij =X

k

Bijk · ck<latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="ck8pdC+ekZH4nUmSP+ZG7r8lEyk=">AAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odOn4MoA7ncAFXEMIN3MEDdKALAhJ4hXdv4r15H6uuat66tDP4I+/zBzjGijg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">AAACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKKkW+QNJLgShH7O/tLv+m8ppClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vff9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bbSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO33zrp03pTOag23bBEaB3HSFVqHgwiHUTyyhubOZPPB0XMOvgS8gSOwict88Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe//BKugyF2/BOBXXAADsEXgEEEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="n5e1ptEuTL43cmV9t1ImHPSoNvM=">AAACf3icbZFda9swFIZl76Ndlm1pb3sjWga7aINkN7FNGXTbTS87aNpCbIysKKkW+QNJLgShH7O/tLv+m8ppClu6A4KH9+jVOTqnaARXGqEHz3/1+s3bnd13vff9Dx8/Dfb616puJWUTWota3hZEMcErNtFcC3bbSEbKQrCbYvmjy9/cM6l4XV3pVcOykiwqPueUaCflg98mXT8ylYsiM2iI4hEKT4/RcBwmCMcOEI7COLDfcsN/WQu/wlS1ZW6W1mxZ8Sg8TQLnCKMxDpGDOE5QgO33zrp03pTOag23bBEaB3HSFVqHgwiHUTyyhubOZPPB0XMOvgS8gSOwict88Ced1bQtWaWpIEpNMWp0ZojUnApme2mrWEPokizY1GFFSqYys+7Jws9OmcF5Ld2pNFyrfzsMKZValYW7WRJ9p7Zznfi/3LTV8zgzvGpazSr6VGjeCqhr2O0FzrhkVIuVA0Ild71Cekckodptr+eGgLe//BKugyF2/BOBXXAADsEXgEEEzsEFuAQTQL0d78Qbe5Hf9wM/eRqX723mtg/+Cf/sEWMFufg=</latexit><latexit sha1_base64="9QtCk7M63G3IchN+Z5Vs/Le7raY=">AAACinicbZHfatswFMZld1u7rNvS9nI3YmGwiy1IdhPblELX7WKXLSxtITZGVpRUi/wHSS4EoYfpK+1ubzM5zWBLe0Dw4zvn0znSKRrBlUbot+fvPHv+YnfvZe/V/us3b/sHh1eqbiVlE1qLWt4URDHBKzbRXAt200hGykKw62L5tctf3zGpeF390KuGZSVZVHzOKdFOyvv3Jl1fMpWLIjNoiOIRCo8/oeE4TBCOHSAchXFgv+SG/7QWnsJUtWVultZsWfEoPE4C5wijMQ6RgzhOUIDteWddOm9KZ7WGW7YIjYM46Rqtw0GEwygeWUNzZ7J5f/A3Bx8D3sAAbOIi7/9KZzVtS1ZpKohSU4wanRkiNaeC2V7aKtYQuiQLNnVYkZKpzKxnsvCDU2ZwXkt3Kg3X6r8OQ0qlVmXhKkuib9V2rhOfyk1bPY8zw6um1ayiD43mrYC6ht1e4IxLRrVYOSBUcjcrpLdEEqrd9nruE/D2kx/DVTDEji/R4Ox88x174B14Dz4CDCJwBr6DCzAB1Nv1PntjL/L3/cBP/JOHUt/beI7Af+F/+wOGoLrq</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit><latexit sha1_base64="N+Gz0mErqtYlWzSRJfkrAyiSckE=">AAACinicbZHfbtMwFMadMNgoAwq75MZahcQFVHbSNonQpLHtYpdDotukJooc1+1MnT+yHaTK8sPwStzxNjhdkVjHkSz99J3z+Rz7FI3gSiP02/Of7D19tn/wvPfi8OWr1/03b69V3UrKprQWtbwtiGKCV2yquRbstpGMlIVgN8XqvMvf/GBS8br6ptcNy0qyrPiCU6KdlPd/mnRzyUwui8ygIYrHKBx9RMNJmCAcO0A4CuPAfskN/24tPIGpasvcrKzZseJxOEoC5wijCQ6RgzhOUIDtWWddOW9K57WGO7YITYI46RptwkGEwygeW0NzZ7J5f/A3Bx8D3sIAbOMq7/9K5zVtS1ZpKohSM4wanRkiNaeC2V7aKtYQuiJLNnNYkZKpzGxmsvC9U+ZwUUt3Kg036r8OQ0ql1mXhKkui79RurhP/l5u1ehFnhldNq1lF7xstWgF1Dbu9wDmXjGqxdkCo5G5WSO+IJFS77fXcJ+DdJz+G62CIHX8dDU7Ptt9xAN6BY/ABYBCBU3AJrsAUUG/f++RNvMg/9AM/8T/fl/re1nMEHoR/8QeH4Lru</latexit>

taco generates code dimension by dimension

Page 80: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

19

Dense Compressed⇥Dense Compressed+

Hand-coding support for a wide range of level formats is also infeasible

Page 81: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

19...

Dense Compressed⇥Dense Compressed+

Compressed ⇥ Hashed

Compressed + Singleton

Dense + Range

Compressed + Offset

Dense Compressed⇥ ⇥ Hashed

Compressed⇥ ⇥ HashedSingleton

Compressed ⇥Singleton + Dense

CompressedSingleton + Dense+

Hand-coding support for a wide range of level formats is also infeasible

Page 82: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

20

Code generation is performed in two stages

Page 83: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

20

Code generation is performed in two stages

Compressed ⇥ Hashed

Page 84: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

20

Code generation is performed in two stages

High-level algorithm

Compressed ⇥ Hashed

Page 85: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

20

Code generation is performed in two stages

High-level algorithm

Runnable code

Compressed ⇥ Hashed

Page 86: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

20

Code generation is performed in two stages

High-level algorithm

Runnable code

Compressed ⇥ Hashed

How to compute with different data structures

Page 87: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

20

Code generation is performed in two stages

High-level algorithm

Runnable code

Compressed ⇥ Hashed

How to compute with different data structures

How to compute with multiple operands

Page 88: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

21

Tensor algebra computations can be expressed in terms of high-level operations on tensor operands

OM N∘C D E

B C

F P

K

Q R

A B

G H

I J

S

L

T U

Page 89: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

21

Tensor algebra computations can be expressed in terms of high-level operations on tensor operands

OM N∘C D E

B C

F P

K

Q R

A B

G H

I J

S

L

T U

Page 90: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

21

Tensor algebra computations can be expressed in terms of high-level operations on tensor operands

OM N∘C D E

B C

F P

K

Q R

A B

G H

I J

S

L

T U

Page 91: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

21

Tensor algebra computations can be expressed in terms of high-level operations on tensor operands

OM N∘C D E

B C

F P

K

Q R

A B

G H

I J

S

L

T U

Page 92: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

22

Level formats declare whether they support various high-level operations

Dense

Compressed

Hashed

Singleton

Range

Offset

Page 93: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

22

Level formats declare whether they support various high-level operations

Random access Iteration

Dense

Compressed

Hashed

Singleton

Range

Offset

Page 94: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

22

Level formats declare whether they support various high-level operations

Random access Iteration

Dense

Compressed

Hashed

Singleton

Range

Offset

Page 95: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

22

Level formats declare whether they support various high-level operations

Random access Iteration

Dense

Compressed

Hashed

Singleton

Range

Offset

Page 96: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

22

Level formats declare whether they support various high-level operations

Random access Iteration

Dense

Compressed

Hashed

Singleton

Range

Offset

Page 97: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

23

Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations

Dense

Compressed

Hashed

Singleton

Range

Offset

Random access

Page 98: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

23

Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations

Dense

Compressed

Hashed

Singleton

Range

Offset

Random access

B C∘

Page 99: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

23

Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations

Compressed ⇥ Hashed

Dense

Compressed

Hashed

Singleton

Range

Offset

Random access

B C∘

Page 100: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

23

Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations

Compressed ⇥ Hashed

Dense

Compressed

Hashed

Singleton

Range

Offset

Random access

B C∘

Iterate over B and random access C

Page 101: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

23

Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations

Dense

Compressed

Hashed

Singleton

Range

Offset

Random access

B C∘Compressed ⇥ Singleton

Page 102: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

23

Compiler constructs efficient algorithm by reasoning about whether operands support required high-level operations

Dense

Compressed

Hashed

Singleton

Range

Offset

Random access

B C∘Compressed ⇥ Singleton

Simultaneously iterate over B and C

Page 103: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

24

Random access

Dense

Hashed

Level formats also specify how they support high-level operations

Compressed

Iteration

...

Page 104: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

24

int pB2 = pB1 * N + j;

Random access

Dense

Hashed

Level formats also specify how they support high-level operations

Compressed

Iteration

...

Page 105: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

24

int pB2 = pB1 * N + j;

int pB2 = j % W + pB1 * W; if (crd[pB2] != j && crd[pB2] != -1) { int end = pB2; do { pB2 = (pB2 + 1) % W; } while (crd[pB2] != j && crd[pB2] != -1 && pB2 != end); } if (crd[pB2] == j) {

Random access

Dense

Hashed

Level formats also specify how they support high-level operations

Compressed

Iteration

...

Page 106: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

24

int pB2 = pB1 * N + j;

int pB2 = j % W + pB1 * W; if (crd[pB2] != j && crd[pB2] != -1) { int end = pB2; do { pB2 = (pB2 + 1) % W; } while (crd[pB2] != j && crd[pB2] != -1 && pB2 != end); } if (crd[pB2] == j) {

Random access

Dense

Hashed

Level formats also specify how they support high-level operations

Compressed

Iteration

for (int j = 0; j < N; j++) { int pB2 = pB1 * N + j;

for (int pB2 = pos[pB1]; pB2 < pos[pB1+1]; pB2++) { int j = crd[pB2];

for (int pB2 = pB1 * W; pB2 < (pB1 + 1) * W; pB2++) { int j = crd[pB2]; if (j != -1) {

...

Page 107: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

25

Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations

Compressed ⇥ Hashed

B C∘

Page 108: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

25

Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations

for every element b in B: find corresponding element c in C A[i][j] = b * c;

Compressed ⇥ Hashed

B C∘

Page 109: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

25

Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations

for every element b in B: find corresponding element c in C A[i][j] = b * c;

Compressed ⇥ Hashed

B C∘

Page 110: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

25

Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations

for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1+1]; pB2++) { int j = B2_crd[pB2]; find corresponding element c in C A[i][j] = B[pB2] * c; }

Compressed ⇥ Hashed

B C∘

Page 111: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

25

Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations

for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1+1]; pB2++) { int j = B2_crd[pB2]; find corresponding element c in C A[i][j] = B[pB2] * c; }

Compressed ⇥ Hashed

B C∘

Page 112: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

25

Compiler specializes constructed algorithm to operand formats by inlining code that implements required high-level operations

Compressed ⇥ Hashed

B C∘

for (int pB2 = B2_pos[pB1]; pB2 < B2_pos[pB1+1]; pB2++) { int j = B2_crd[pB2]; int pC2 = j % W + pC1 * W; if (C2_crd[pC2] != j && C2_crd[pC2] != -1) { int end = pC2; do { pC2 = (pC2 + 1) % W; } while (C2_crd[pB2] != j && C2_crd[pB2] != -1 && pC2 != end); } if (C2_crd[pC2] == j) { A[i][j] = B[pB2] * C[pC2]; } }

Page 113: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

26

The same process can be repeated dimension by dimensionAijk = Bijk + Cijk

Page 114: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

26

The same process can be repeated dimension by dimension

int iB = 0; int C0_pos = C0_pos_arr[0]; while (C0_pos < C0_pos_arr[1]) { int iC = C0_idx_arr[C0_pos]; int C0_end = C0_pos + 1; if (iC == iB) while ((C0_end < C0_pos_arr[1]) && (C0_idx_arr[C0_end] == iB)) { C0_end++; } if (iC == iB) { int B1_pos = B1_pos_arr[iB]; int C1_pos = C0_pos; while ((B1_pos < B1_pos_arr[iB + 1]) && (C1_pos < C0_end)) { int jB = B1_idx_arr[B1_pos]; int jC = C1_idx_arr[C1_pos]; int j = min(jB, jC); int A1_pos = (iB * A1_size) + j; int C1_end = C1_pos + 1; if (jC == j) while ((C1_end < C0_end) && (C1_idx_arr[C1_end] == j)) { C1_end++; } if ((jB == j) && (jC == j)) { int B2_pos = B2_pos_arr[B1_pos]; int C2_pos = C1_pos; while ((B2_pos < B2_pos_arr[B1_pos + 1]) && (C2_pos < C1_end)) { int kB = B2_idx_arr[B2_pos]; int kC = C2_idx_arr[C2_pos]; int k = min(kB, kC); int A2_pos = (A1_pos * A2_size) + k; if ((kB == k) && (kC == k)) { A_val_arr[A2_pos] = B_val_arr[B2_pos] + C_val_arr[C2_pos]; } else if (kB == k) { A_val_arr[A2_pos] = B_val_arr[B2_pos]; } else { A_val_arr[A2_pos] = C_val_arr[C2_pos]; } if (kB == k) B2_pos++; if (kC == k) C2_pos++; } while (B2_pos < B2_pos_arr[B1_pos + 1]) { int kB0 = B2_idx_arr[B2_pos]; int A2_pos0 = (A1_pos * A2_size) + kB0; A_val_arr[A2_pos0] = B_val_arr[B2_pos]; B2_pos++; } while (C2_pos < C1_end) { int kC0 = C2_idx_arr[C2_pos]; int A2_pos1 = (A1_pos * A2_size) + kC0; A_val_arr[A2_pos1] = C_val_arr[C2_pos]; C2_pos++; } } else if (jB == j) { for (int B2_pos0 = B2_pos_arr[B1_pos]; B2_pos0 < B2_pos_arr[B1_pos + 1]; B2_pos0++) { int kB1 = B2_idx_arr[B2_pos0]; int A2_pos2 = (A1_pos * A2_size) + kB1; A_val_arr[A2_pos2] = B_val_arr[B2_pos0]; } } else { for (int C2_pos0 = C1_pos; C2_pos0 < C1_end; C2_pos0++) { int kC1 = C2_idx_arr[C2_pos0]; int A2_pos3 = (A1_pos * A2_size) + kC1; A_val_arr[A2_pos3] = C_val_arr[C2_pos0]; } } if (jB == j) B1_pos++; if (jC == j) C1_pos = C1_end; }

while (B1_pos < B1_pos_arr[iB + 1]) { int jB0 = B1_idx_arr[B1_pos]; int A1_pos0 = (iB * A1_size) + jB0; for (int B2_pos1 = B2_pos_arr[B1_pos]; B2_pos1 < B2_pos_arr[B1_pos + 1]; B2_pos1++) { int kB2 = B2_idx_arr[B2_pos1]; int A2_pos4 = (A1_pos0 * A2_size) + kB2; A_val_arr[A2_pos4] = B_val_arr[B2_pos1]; } B1_pos++; } while (C1_pos < C0_end) { int jC0 = C1_idx_arr[C1_pos]; int A1_pos1 = (iB * A1_size) + jC0; int C1_end0 = C1_pos + 1; while ((C1_end0 < C0_end) && (C1_idx_arr[C1_end0] == jC0)) { C1_end0++; } for (int C2_pos1 = C1_pos; C2_pos1 < C1_end0; C2_pos1++) { int kC2 = C2_idx_arr[C2_pos1]; int A2_pos5 = (A1_pos1 * A2_size) + kC2; A_val_arr[A2_pos5] = C_val_arr[C2_pos1]; } C1_pos = C1_end0; } } else { for (int B1_pos0 = B1_pos_arr[iB]; B1_pos0 < B1_pos_arr[iB + 1]; B1_pos0++) { int jB1 = B1_idx_arr[B1_pos0]; int A1_pos2 = (iB * A1_size) + jB1; for (int B2_pos2 = B2_pos_arr[B1_pos0]; B2_pos2 < B2_pos_arr[B1_pos0 + 1]; B2_pos2++) { int kB3 = B2_idx_arr[B2_pos2]; int A2_pos6 = (A1_pos2 * A2_size) + kB3; A_val_arr[A2_pos6] = B_val_arr[B2_pos2]; } } } if (iC == iB) C0_pos = C0_end; iB++; } while (iB < B0_size) { for (int B1_pos1 = B1_pos_arr[iB]; B1_pos1 < B1_pos_arr[iB + 1]; B1_pos1++) { int jB2 = B1_idx_arr[B1_pos1]; int A1_pos3 = (iB * A1_size) + jB2; for (int B2_pos3 = B2_pos_arr[B1_pos1]; B2_pos3 < B2_pos_arr[B1_pos1 + 1]; B2_pos3++) { int kB4 = B2_idx_arr[B2_pos3]; int A2_pos7 = (A1_pos3 * A2_size) + kB4; A_val_arr[A2_pos7] = B_val_arr[B2_pos3]; } } iB++; }

Aijk = Bijk + Cijk

Page 115: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

26

The same process can be repeated dimension by dimension

int iB = 0; int C0_pos = C0_pos_arr[0]; while (C0_pos < C0_pos_arr[1]) { int iC = C0_idx_arr[C0_pos]; int C0_end = C0_pos + 1; if (iC == iB) while ((C0_end < C0_pos_arr[1]) && (C0_idx_arr[C0_end] == iB)) { C0_end++; } if (iC == iB) { int B1_pos = B1_pos_arr[iB]; int C1_pos = C0_pos; while ((B1_pos < B1_pos_arr[iB + 1]) && (C1_pos < C0_end)) { int jB = B1_idx_arr[B1_pos]; int jC = C1_idx_arr[C1_pos]; int j = min(jB, jC); int A1_pos = (iB * A1_size) + j; int C1_end = C1_pos + 1; if (jC == j) while ((C1_end < C0_end) && (C1_idx_arr[C1_end] == j)) { C1_end++; } if ((jB == j) && (jC == j)) { int B2_pos = B2_pos_arr[B1_pos]; int C2_pos = C1_pos; while ((B2_pos < B2_pos_arr[B1_pos + 1]) && (C2_pos < C1_end)) { int kB = B2_idx_arr[B2_pos]; int kC = C2_idx_arr[C2_pos]; int k = min(kB, kC); int A2_pos = (A1_pos * A2_size) + k; if ((kB == k) && (kC == k)) { A_val_arr[A2_pos] = B_val_arr[B2_pos] + C_val_arr[C2_pos]; } else if (kB == k) { A_val_arr[A2_pos] = B_val_arr[B2_pos]; } else { A_val_arr[A2_pos] = C_val_arr[C2_pos]; } if (kB == k) B2_pos++; if (kC == k) C2_pos++; } while (B2_pos < B2_pos_arr[B1_pos + 1]) { int kB0 = B2_idx_arr[B2_pos]; int A2_pos0 = (A1_pos * A2_size) + kB0; A_val_arr[A2_pos0] = B_val_arr[B2_pos]; B2_pos++; } while (C2_pos < C1_end) { int kC0 = C2_idx_arr[C2_pos]; int A2_pos1 = (A1_pos * A2_size) + kC0; A_val_arr[A2_pos1] = C_val_arr[C2_pos]; C2_pos++; } } else if (jB == j) { for (int B2_pos0 = B2_pos_arr[B1_pos]; B2_pos0 < B2_pos_arr[B1_pos + 1]; B2_pos0++) { int kB1 = B2_idx_arr[B2_pos0]; int A2_pos2 = (A1_pos * A2_size) + kB1; A_val_arr[A2_pos2] = B_val_arr[B2_pos0]; } } else { for (int C2_pos0 = C1_pos; C2_pos0 < C1_end; C2_pos0++) { int kC1 = C2_idx_arr[C2_pos0]; int A2_pos3 = (A1_pos * A2_size) + kC1; A_val_arr[A2_pos3] = C_val_arr[C2_pos0]; } } if (jB == j) B1_pos++; if (jC == j) C1_pos = C1_end; }

while (B1_pos < B1_pos_arr[iB + 1]) { int jB0 = B1_idx_arr[B1_pos]; int A1_pos0 = (iB * A1_size) + jB0; for (int B2_pos1 = B2_pos_arr[B1_pos]; B2_pos1 < B2_pos_arr[B1_pos + 1]; B2_pos1++) { int kB2 = B2_idx_arr[B2_pos1]; int A2_pos4 = (A1_pos0 * A2_size) + kB2; A_val_arr[A2_pos4] = B_val_arr[B2_pos1]; } B1_pos++; } while (C1_pos < C0_end) { int jC0 = C1_idx_arr[C1_pos]; int A1_pos1 = (iB * A1_size) + jC0; int C1_end0 = C1_pos + 1; while ((C1_end0 < C0_end) && (C1_idx_arr[C1_end0] == jC0)) { C1_end0++; } for (int C2_pos1 = C1_pos; C2_pos1 < C1_end0; C2_pos1++) { int kC2 = C2_idx_arr[C2_pos1]; int A2_pos5 = (A1_pos1 * A2_size) + kC2; A_val_arr[A2_pos5] = C_val_arr[C2_pos1]; } C1_pos = C1_end0; } } else { for (int B1_pos0 = B1_pos_arr[iB]; B1_pos0 < B1_pos_arr[iB + 1]; B1_pos0++) { int jB1 = B1_idx_arr[B1_pos0]; int A1_pos2 = (iB * A1_size) + jB1; for (int B2_pos2 = B2_pos_arr[B1_pos0]; B2_pos2 < B2_pos_arr[B1_pos0 + 1]; B2_pos2++) { int kB3 = B2_idx_arr[B2_pos2]; int A2_pos6 = (A1_pos2 * A2_size) + kB3; A_val_arr[A2_pos6] = B_val_arr[B2_pos2]; } } } if (iC == iB) C0_pos = C0_end; iB++; } while (iB < B0_size) { for (int B1_pos1 = B1_pos_arr[iB]; B1_pos1 < B1_pos_arr[iB + 1]; B1_pos1++) { int jB2 = B1_idx_arr[B1_pos1]; int A1_pos3 = (iB * A1_size) + jB2; for (int B2_pos3 = B2_pos_arr[B1_pos1]; B2_pos3 < B2_pos_arr[B1_pos1 + 1]; B2_pos3++) { int kB4 = B2_idx_arr[B2_pos3]; int A2_pos7 = (A1_pos3 * A2_size) + kB4; A_val_arr[A2_pos7] = B_val_arr[B2_pos3]; } } iB++; }

Aijk = Bijk + Cijk

Page 116: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

27

Evaluation

Mode-generic tensorCompressed

Singleton

Dense

Dense

DIADenseRangeOffset

123:16 Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe

x supports locate?

y supports locate?

y supports locate?

x unordered and y ordered?

co-iterate over x and y

iterate over x and locate into y

iterate over y and locate into x

no yes

no

yes yes

no

no

yes

x/y ordered or accessed with locate?

output unordered or

supports insert?

x/y unique?

reorder x/y

aggregate duplicates in x/y

no

yes

yes

yes

no

converted iteratorover x/y

iterator over x/y

co-iterating over x and y?

yes

x/y unique?

no

yes

no

Fig. 8. The most efficient strategies for computing the intersection merge of two vectors x andy, depending onwhether they support the locate capability and whether they are ordered and unique. The sparsity structureof y is assumed to not be a strict subset of the sparsity structure of x . The flowchart on the right describes,for each operand, what iterator conversions are needed at runtime to compute the merge.

If one of the input vectors,y, supports the locate capability (e.g., it is a dense array), we can insteadjust iterate over the nonzero components of x and, for each component, locate the component withthe same coordinate in y. Lines 2–9 in Figure 1b shows another example of this method applied tomerge the column dimensions of a CSR matrix and a dense matrix. This alternative method reducesthe merge complexity from O(nnz(x) + nnz(y)) to O(nnz(x)) assuming locate runs in constanttime. Moreover, this method does not require enumerating the coordinates of y in order. We do noteven need to enumerate the coordinates of x in order, as long as there are no duplicates and we donot need to compute output components in order (e.g., if the output supports the insert capability).This method is thus ideal for computing intersection merges of unordered levels.

We can generalize and combine the two methods described above to compute arbitrarily complexmerges involving unions and intersections of any number of tensor operands. At a high level, anymerge can be computed by co-iterating over some subset of its operands and, for every enumeratedcoordinate, locating that same coordinate in all the remaining operands with calls to locate. Whichoperands need to be co-iterated can be identified recursively from the expression expr that we wantto compute. In particular, for each subexpression e = e1 op e2 in expr , let Coiter (e) denote the setof operand coordinate hierarchy levels that need to be co-iterated in order to compute e . If op is anoperation that requires a union merge (e.g., addition), then computing e requires co-iterating overall the levels that would have to be co-iterated in order to separately compute e1 and e2; in otherwords, Coiter (e) = Coiter (e1) ∪Coiter (e2). On the other hand, if op is an operation that requiresan intersection merge (e.g., multiplication), then the set of coordinates of nonzeros in the resulte must be a subset of the coordinates of nonzeros in either operand e1 or e2. Thus, in order toenumerate the coordinates of all nonzeros in the result, it suffices to co-iterate over all the levelsmerged by just one of the operands. Without loss of generality, this lets us compute e withouthaving to co-iterate over levels merged by e2 that can instead be accessed with locate; in otherwords,Coiter (e) = Coiter (e1)∪ (Coiter (e2) \LocateCapable(e2)), where LocateCapable(e2) denotesthe set of levels merged by e2 that support the locate capability.

4.5 Code Generation Algorithm

Figure 9a shows our code generation algorithm, which incorporates all of the concepts we presentedin the previous subsections. Each part of the algorithm is labeled from 1 to 11; throughout the

Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 123. Publication date: November 2018.

Format Abstraction & Code Generation

Page 117: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

28

Our technique supports a wide range of disparate tensor formats

tacoIntel MKL SciPy MTL4 Tensor Toolbox TensorFlow

This work [Kjolstad et al. 2017]Sparse vector

Hash map vectorCoordinate matrix

CSRDCSRELLDIA

BCSRCSBDOKLIL

SkylineBanded

Coordinate tensorCSF

Mode-generic

Page 118: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

28

Our technique supports a wide range of disparate tensor formats

tacoIntel MKL SciPy MTL4 Tensor Toolbox TensorFlow

This work [Kjolstad et al. 2017]Sparse vector

Hash map vectorCoordinate matrix

CSRDCSRELLDIA

BCSRCSBDOKLIL

SkylineBanded

Coordinate tensorCSF

Mode-generic

Page 119: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

28

Our technique supports a wide range of disparate tensor formats

tacoIntel MKL SciPy MTL4 Tensor Toolbox TensorFlow

This work [Kjolstad et al. 2017]Sparse vector

Hash map vectorCoordinate matrix

CSRDCSRELLDIA

BCSRCSBDOKLIL

SkylineBanded

Coordinate tensorCSF

Mode-generic

Page 120: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

28

Our technique supports a wide range of disparate tensor formats

tacoIntel MKL SciPy MTL4 Tensor Toolbox TensorFlow

This work [Kjolstad et al. 2017]Sparse vector

Hash map vectorCoordinate matrix

CSRDCSRELLDIA

BCSRCSBDOKLIL

SkylineBanded

Coordinate tensorCSF

Mode-generic

Page 121: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

28

Our technique supports a wide range of disparate tensor formats

tacoIntel MKL SciPy MTL4 Tensor Toolbox TensorFlow

This work [Kjolstad et al. 2017]Sparse vector

Hash map vectorCoordinate matrix

CSRDCSRELLDIA

BCSRCSBDOKLIL

SkylineBanded

Coordinate tensorCSF

Mode-generic

Page 122: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

28

Our technique supports a wide range of disparate tensor formats

tacoIntel MKL SciPy MTL4 Tensor Toolbox TensorFlow

This work [Kjolstad et al. 2017]Sparse vector

Hash map vectorCoordinate matrix

CSRDCSRELLDIA

BCSRCSBDOKLIL

SkylineBanded

Coordinate tensorCSF

Mode-generic

Page 123: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

28

Our technique supports a wide range of disparate tensor formats

tacoIntel MKL SciPy MTL4 Tensor Toolbox TensorFlow

This work [Kjolstad et al. 2017]Sparse vector

Hash map vectorCoordinate matrix

CSRDCSRELLDIA

BCSRCSBDOKLIL

SkylineBanded

Coordinate tensorCSF

Mode-generic

Page 124: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

28

Our technique supports a wide range of disparate tensor formats

tacoIntel MKL SciPy MTL4 Tensor Toolbox TensorFlow

This work [Kjolstad et al. 2017]Sparse vector

Hash map vectorCoordinate matrix

CSRDCSRELLDIA

BCSRCSBDOKLIL

SkylineBanded

Coordinate tensorCSF

Mode-generic

Page 125: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

29

Our technique generates efficient code

Page 126: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

29

Our technique generates efficient codeCoordinate SpMV

Nor

mal

ized

tim

e

0.0

0.5

1.0

1.5

This work SciPy Intel MKL MTL4 TensorFlow

DIA SpMV

This work SciPy Intel MKL

Page 127: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

29

Our technique generates efficient codeCoordinate SpMV

Nor

mal

ized

tim

e

0.0

0.5

1.0

1.5

This work SciPy Intel MKL MTL4 TensorFlow

DIA SpMV

This work SciPy Intel MKL

CSR Addition

Nor

mal

ized

tim

e

0.02.04.06.08.0

10.0

This work SciPy Intel MKL MTL4

Coordinate MTTKRP

This work Tensor Toolbox

Page 128: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

29

Our technique generates efficient codeCoordinate SpMV

Nor

mal

ized

tim

e

0.0

0.5

1.0

1.5

This work SciPy Intel MKL MTL4 TensorFlow

DIA SpMV

This work SciPy Intel MKL

CSR Addition

Nor

mal

ized

tim

e

0.02.04.06.08.0

10.0

This work SciPy Intel MKL MTL4

Coordinate MTTKRP

This work Tensor Toolbox

Page 129: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

29

Our technique generates efficient codeCoordinate SpMV

Nor

mal

ized

tim

e

0.0

0.5

1.0

1.5

This work SciPy Intel MKL MTL4 TensorFlow

DIA SpMV

This work SciPy Intel MKL

CSR Addition

Nor

mal

ized

tim

e

0.02.04.06.08.0

10.0

This work SciPy Intel MKL MTL4

Coordinate MTTKRP

This work Tensor Toolbox

Page 130: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

29

Our technique generates efficient codeCoordinate SpMV

Nor

mal

ized

tim

e

0.0

0.5

1.0

1.5

This work SciPy Intel MKL MTL4 TensorFlow

DIA SpMV

This work SciPy Intel MKL

CSR Addition

Nor

mal

ized

tim

e

0.02.04.06.08.0

10.0

This work SciPy Intel MKL MTL4

Coordinate MTTKRP

This work Tensor Toolbox

Page 131: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

30

In conclusion…

We can automatically generate kernels that compute with disparate tensor formats

Page 132: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

30

In conclusion…

We can automatically generate kernels that compute with disparate tensor formats

Adding support for even more tensor formats is straightforward

Page 133: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

30

In conclusion…

Supporting many disparate tensor formats is essential for performance

We can automatically generate kernels that compute with disparate tensor formats

Adding support for even more tensor formats is straightforward

Page 134: Format Abstraction for Sparse Tensor Algebra Compilerspeople.csail.mit.edu/s3chou/files/oopsla18-slides.pdfThe dense matrix input has a density of 0.95, the hypersparse matrix has

30

In conclusion…

Supporting many disparate tensor formats is essential for performance

We can automatically generate kernels that compute with disparate tensor formats

Adding support for even more tensor formats is straightforward

This work supported by:

tensor-compiler.org