Stefan Amberger - Johannes Kepler University Linz
Transcript of Stefan Amberger - Johannes Kepler University Linz
![Page 1: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/1.jpg)
A Parallel, In-Place, Rectangular Matrix Transpose Algorithm
Description of Algorithm and Correctness Proof
Stefan Amberger
ICA & [email protected]
![Page 2: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/2.jpg)
Table of Contents
1. Introduction
2. Description of Transpose Algorithm
3. Proof of Correctness
2
![Page 3: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/3.jpg)
Introduction
3
![Page 4: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/4.jpg)
Rectangular Matrices
Large rectangular matrices are abundant
● Discrete Fourier transforms
● Finite element method
● Raster images in earth observation
● Computer graphics (e.g. radiosity equation)
● etc.
4
![Page 5: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/5.jpg)
Moore’s Law
● Number of transistors on a chip doubles every
two years
● Maximum clock frequencies reached in 2005
● Maximum power density reached
→ multiple cores on CPUs
5
Current Situation in Computing
Memory
often the limiting factor
● medium-sized problems on mobile /
embedded device
● large problem on computer
Example:
100.000 x 100.000 matrix: 75 GB
Need parallel, in-place algorithms
![Page 6: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/6.jpg)
Concept of Transpose
two-dimensional
Implementation on Computer
one-dimensional
6
Rectangular Matrix TransposeMathematical Concept vs Implementation
![Page 7: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/7.jpg)
In-Place Transpose of Square Matrix
using one temporary variable
M x (M-1)/2 permutation cycles
In-Place Transpose of Rectangular
Matrix
one-dimensional
, like every permutation, can be decomposed into
disjoint, independent cycles
7
Rectangular Matrix TransposeIn-Place Transpose
![Page 8: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/8.jpg)
Common Approach
Independence of Permutation Cycles
● Limited Parallelism
● Problem-dependent parallelism
● Permutation cycles are inherently serial
8
Rectangular Matrix TransposeParallel In-Place Transpose
Our Approach
Divide and conquer
Transpose of Rectangular matrices, In-place and in
Parallel (TRIP)
● Highly parallel for all problem-sizes (see
presentation 2)
● In-place
● Recursive
![Page 9: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/9.jpg)
9
Description of Transpose Algorithm
![Page 10: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/10.jpg)
TRIP Algorithm
If matrix is rectangular TRIP transposes sub-matrices, then combines the result with
merge or split
10
![Page 11: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/11.jpg)
original matrix is tall → it is divided by TRIP
and the sub-matrices are in-place transposed
11
TRIP ExampleTranspose of a Tall Matrix
![Page 12: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/12.jpg)
TRIP Example the transposed sub-matrices are
combined by merge
12
Transpose of a Tall Matrix
![Page 13: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/13.jpg)
Transpose of a Tall Matrix
TRIP Example the merged result can be reinterpreted
as the transpose of the original matrix
13
![Page 14: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/14.jpg)
merge combines the transposes of sub-matrices of tall matrices
merge first rotates the middle part of the array, then recursively merges the left and right
parts of the array
rol(arr, k) … left rotation (circular shift) of array arr by k elements14
merge Algorithm
![Page 15: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/15.jpg)
split combines the transposes of sub-matrices of wide matrices
split first recursively splits the left and right parts of the array, then rotates the middle
part of the array
split and merge are inverse to each other15
split Algorithm
![Page 16: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/16.jpg)
Correctness Proof
16
![Page 17: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/17.jpg)
Correctness of merge
Matrix is split into two parts → transpose of Matrix is split into two parts
17
Structure of Matrix and Transpose
![Page 18: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/18.jpg)
Correctness of merge
In-place transposition of sub-matrices results in reshaped transposes of sub-matrices
18
Structure of Matrix after In-Place Transposition of Sub-Matrices
![Page 19: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/19.jpg)
Correctness of merge
Prove by induction: merge transforms T into the transpose of A
19
Proof Sketch
merge
![Page 20: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/20.jpg)
20
Correctness of mergeLemma (merge)
![Page 21: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/21.jpg)
Base Case (k=1)
21
Correctness of mergeProof of Lemma (merge) !
Induction Hypothesis (k0 a.b.f)
![Page 22: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/22.jpg)
Induction Step (k0 → k0+1)
22
Correctness of mergeProof of Lemma (merge) !
merge matches recursive case
rol transforms T to
![Page 23: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/23.jpg)
finally: recursive merge calls on sub-arrays
23
Correctness of mergeProof of Lemma (merge) !
I.H.
![Page 24: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/24.jpg)
analogous, by induction
24
Correctness of split
![Page 25: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/25.jpg)
Induction on number of elements of matrix
Base Case (E=1):
25
Correctness of TRIPProof by Induction
Induction Hypothesis (E0 a.b.f.):
![Page 26: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/26.jpg)
Induction Step (E0 → E0+1):
26
Correctness of TRIPProof by Induction
Matrix is divided in two sub-matrices of dimension and
with and
Induction hypothesis applies, merge combines result.
Matrix is divided in two sub-matrices of dimension and
with and
Induction hypothesis applies, split combines result.
![Page 27: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/27.jpg)
Conclusions
27
![Page 28: Stefan Amberger - Johannes Kepler University Linz](https://reader030.fdocuments.us/reader030/viewer/2022012104/616a18d111a7b741a34ec3ea/html5/thumbnails/28.jpg)
Novel Algorithm TRIP transposes rectangular matrices
● correctly
● in-place
● in highly parallel manner (see next presentation)
28
Conclusions