FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt |...
Transcript of FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt |...
![Page 1: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/1.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1
Daniel Thuerck1,2 (advisors Michael Goesele1,2 and Marc Pfetsch1)
Maxim Naumov3
FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING
1 Graduate School of Computational Engineering, TU Darmstadt
2 Graphics, Capture and Massively Parallel Computing, TU Darmstadt
3 NVIDIA Research
![Page 2: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/2.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 2
INTRODUCTION TO LINEAR PROGRAMMING
![Page 3: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/3.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 3
Linear Programs
min 𝑐⊤𝑥s.t. 𝐴𝑥 ≤ 𝑏
𝑥 ≥ 0
𝑐
𝐴 =𝑎1
𝑇
𝑎𝑚𝑇
𝑏 =𝑏1
𝑏𝑚
where and
Linear objective function
Linear constraints
![Page 4: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/4.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 4
Linear Programs: Applications
[3P Logistics]
![Page 5: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/5.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 5
INTERNALS OF AN LP SOLVER
Lower-Level Parallelism in LP
![Page 6: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/6.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 6
Solving LPs
A is 𝑚 × 𝑛 matrix, with 𝑚 ≪ 𝑛
A is sparse and has full row-rank
Variables may be bounded: 𝑙 ≤
𝑥 ≤ 𝑢
min 𝑐⊤𝑥s.t. 𝐴𝑥 = 𝑏. 𝑥 ≥ 0
“Standard”
LP format
![Page 7: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/7.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 7
Solving LPs
𝑐 𝑐
Simplex Interior Point
![Page 8: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/8.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 8
Solving LPs
Simplex Interior Point (IPM)
“Basis”
(active set)
𝐴𝐵 =
𝐷 𝐴⊤
𝐴
𝐴𝐷−1𝐴⊤
“Augmented
(Newton) System”
“Normal
Equations”
![Page 9: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/9.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 9
Solving LPs
IPM / Normal Equations
𝐴𝐷−1𝐴⊤
𝑚 ×𝑚, SPD, might be dense
Squared condition number
Solution: Cholesky-factorization
or CG method
IPM / Aug. System
(𝑚 + 𝑛) × (𝑚 + 𝑛), sparse
Symmetric, indefinite
Solution: Indefinite LDLT or
MINRES method
𝐷 𝐴⊤
𝐴
![Page 10: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/10.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 10
Solving LPs
IPM / Normal Equations
𝐴𝐷−1𝐴⊤
𝑚 ×𝑚, SPD, might be dense
Squared condition number
Solution: Cholesky-factorization
or CG method
IPM / Aug. System
(𝑚 + 𝑛) × (𝑚 + 𝑛), sparse
Symmetric, indefinite
Solution: Indefinite LDLT or
MINRES method
𝐷 𝐴⊤
𝐴
![Page 11: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/11.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 11
Introducing culip-lp…
An ongoing implementation of Mehrotra’s Primal-Dual interior point
algorithm [1], featuring...
(Iterative) Linear Algebra based on the “Augmented Matrix” approach,
Full-rank guarantees,
Comprehensive preprocessing & prescaling.
Towards solving large-scale LPs on the GPU as open source for
everybody
![Page 12: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/12.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 12
IMPLEMENTING CULIP-LP
Progress report
![Page 13: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/13.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 13
Solver architecture
Preprocess Scale Standardize IPM loop
![Page 14: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/14.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 14
Solver architecture
Input data:
Constraints 𝐴𝑒𝑞𝑥 = 𝑏𝑒𝑞
Constraints 𝐴𝑙𝑒𝑥 ≤ 𝑏𝑙𝑒
Objective vector 𝑐
Bounds (on some variables) 𝑙, 𝑢
Preprocess Scale Standardize IPM loop
![Page 15: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/15.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 15
Solver architecture
Storage format: CSR
Compressed sparse row format
Provides efficient row-based access by 3 arrays:
𝑎 𝑏𝑐
𝑑𝑒
0 1 1 2 0 col_Ind
a b c d e val
023
4
5
row_ptr
𝐴𝑒𝑞𝑥 = 𝑏𝑒𝑞
𝐴𝑙𝑒𝑥 ≤ 𝑏𝑙𝑒
𝑐
𝑙, 𝑢
Preprocess Scale Standardize IPM loop
![Page 16: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/16.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 16
Solver architecture
Example: LP “pb-simp-nonunif” (see [2]) Input matrix: 1,4 Mio x 23k with 4,36 Mio nonzeros
Removed 1 singleton inequality
Removed 3629 low-forcing constraints
Removed 1 fixed variable
Removed 1,1 Mio (!) singleton inequalities
Result: approx. 3,6 Mio nonzeros removed
𝐴𝑒𝑞𝑥 = 𝑏𝑒𝑞
𝐴𝑙𝑒𝑥 ≤ 𝑏𝑙𝑒
𝑐
𝑙, 𝑢 Execute in rounds
Preprocess Scale Standardize IPM loop
![Page 17: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/17.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 17
Solver architecture
𝐴𝑒𝑞𝑥 = 𝑏𝑒𝑞
𝐴𝑙𝑒𝑥 ≤ 𝑏𝑙𝑒
𝑐
𝑙, 𝑢
Goal: Reduce element variance in matrices
Scaling [3] makes a difference
1. Geometric scaling (1x – 4x)
2. Equilibration (1x)
𝐴𝑖,⋅ =𝐴𝑖,⋅
max |𝐴𝑖,⋅| min(|𝐴𝑖,⋅|)
𝐴𝑖,⋅ =𝐴𝑖,⋅
𝐴𝑖,⋅ 2
Preprocess Scale Standardize IPM loop
![Page 18: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/18.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 18
Solver architecture
𝐴𝑒𝑞𝑥 = 𝑏𝑒𝑞
𝐴𝑙𝑒𝑥 ≤ 𝑏𝑙𝑒
𝑐
𝑙, 𝑢
Goal: Format LP in standard form
Shift variables:
Split (free) variables
Build std’ matrix: 𝐴𝑙𝑒 𝐼
𝐴𝑒𝑞=
𝑏𝐿𝑒𝑏𝑒𝑞
𝑥 → 𝑥 = 𝑥+ − 𝑥− 𝑥+, 𝑥− ≥ 0
l ≤ 𝑥 ≤ 𝑢 → 0 ≤ 𝑥′ ≤ 𝑢 + 𝑙
Preprocess Scale Standardize IPM loop
![Page 19: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/19.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 19
Solver architecture
Ensure A has full rank (symbolically)
𝑃𝐴𝑄 = 𝑚𝑢
𝑚𝑐
𝐴𝑥 = 𝑏
𝑐
𝑢
Preprocess Scale Standardize IPM loop
![Page 20: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/20.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 20
Solver architecture
𝐴𝑥 = 𝑏
𝑐
𝑢
Goal: Solve KKT conditions by Newton steps
Steps: Augmented matrix assembly
Solving the (indefinite) augmented matrix
Solve twice: predictor and corrector
Stepsize along 𝑣 = 𝑣𝑝 + 𝑣𝑐
𝑣𝑝𝑣𝑐
Preprocess Scale Standardize IPM loop
![Page 21: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/21.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 21
Solving the augmented system
Iterative strategy:
Symmetric, indefinite: use MINRES [4] (in parts)
Equilibrate system implicitly
Preconditioner: Experiments ongoing𝐷 𝐴⊤
𝐴 Direct strategy:
Symmetric, indefinite: use SPRAL SSIDS [5]
Reordering by METIS [6]
Scaling for large pivots
![Page 22: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/22.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 22
PERFORMANCE EVALUATION
Intermediate findings
![Page 23: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/23.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 23
Benchmark problems
Problem name [7] M N NNZ
ex9 40,962 10,404 517,112
ex10 696,608 17,680 1,162,000
neos-631710 169,576 167,056 834,166
bley_xl1 175,620 5831 869,391
map06 328,818 164,547 549,920
map10 328,818 164,547 549,920
nb10tb 150,495 73340 1,172,289
neos-142912 58,726 416,040 1,855,220
in 1,526,202 1,449,074 6,811,639
![Page 24: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/24.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 24
Performance
Problem name [7] NNZ CLP barr [sec] culip-lp [sec]
ex9 517,112 X (NC) 81
ex10 1,162,000 X (NS) 141
neos-631710 834,166 172 478
bley_xl1 869,391 X (NS) 1,492
map06 549,920 X (NC) 466
map10 549,920 X (NC) 615
nb10tb 1,172,289 X (NC) 2,461
neos-142912 1,855,220 356 447
in 6,811,639 X (NS) NC
X – failed, NS – did not start 1st iteration, NC – did not converged within 1 hour
![Page 25: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/25.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 25
Runtime breakdown
0
1
2
3
4
5
6
7
8
9
10
1 11 21 31 41 51 61 71 81 91
tim
e [se
c]
IPM step
Problem: map10 [7]
Corrector
Predictor
Example: map10 [7]
MINRES
SPRAL
![Page 26: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/26.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 26
Iterative vs. direct methods
0
1000
2000
3000
4000
5000
6000
1 2 3 4 5 6 7 8 9 10 11 12 13
Ite
ratio
ns
IPM step
MINRES Iterations
Example: map10 [7]
Corrector
Predictor
1.E-07
1.E-06
1.E-05
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
1 2 3 4 5 6 7 8 9 10 11 12 13
Rela
tive R
esid
ual
IPM step
MINRES relative residual
![Page 27: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/27.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 27
Numerical difficulty
Condition of matrix
depends mainly on
𝐷 𝐴⊤
𝐴
max(𝑥𝑖𝑠𝑖)
min 𝑥𝑖𝑠𝑖≈ 1010
where
xT=[x1,…,xn] are solution and
sT=[s1,…,sn] are slack variables
Remedies
2x2 pivoting in factorizations (e.g. 𝐿𝐷𝐿⊤
in SPRAL)
Preconditioning for MINRES or GMRES
𝐷 = 𝑑𝑖𝑎𝑔(𝑥) ⋅ 𝑑𝑖𝑎𝑔(𝑠)
with strong duality towards the end often yielding
![Page 28: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/28.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 28
Findings on the solver’s performance
We know that individual components attain speedup on the GPU
Linear algebra components, such as preconditioned MINRES, GMRES, etc.
Graph reorderings and matchings
LP problems:
Medium-sized problems: faster than open-source IPM code CLP
We can solve many large problems where CLP fails
However, more time is needed to integrate all components together on the GPU
![Page 29: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/29.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 29
What’s keeping you from optimizing your runtime?
LP Solver
(a.k.a “the black
box”)=
Equilibration
Matrix factorization
Bipartite matching
SpMV
Matrix reordering
LP scaling
Krylov-Solver
Max-flow
Component BFS
Graph partitioning
PreconditionerMPS I/O Preprocessing
Rank estimation
![Page 30: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/30.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 30
Future Work
Numerics
Preconditioning techniques
Investigate influence of 2x2 pivots in LDLT factorization
Tuning
Benchmark & optimize warp-based kernels in preprocessing
Never explicitly form the standardized matrix
![Page 31: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/31.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 31
FEASIBILITY STUDY: LP DECOMPOSITIONS
Higher-Level Parallelism in LP
![Page 32: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/32.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 32
Solving an LP: The usual setup
Large LP
Preprocess
Standardize
Solve
LP Solver
(a.k.a “the black box”)
Solution
![Page 33: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/33.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 33
Higher-level parallelism by LP decomposition
Large LP
Solution
Apply LP
decomposition
Assemble
master/slave
solutions
Master LP
Slave LP 1
Slave LP k
![Page 34: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/34.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 34
LP-decompositions: feasibility
Decomposition works on structure of the constraint matrix 𝐴:
Benders [9]Dantzig-Wolfe [10]
![Page 35: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/35.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 35
LP-decompositions: prototype
Implemented a Benders’ decomposition using hypergraph partitioning:
![Page 36: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/36.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 36
Acknowledgements
The work of Daniel Thuerck is supported by the
'Excellence Initiative' of the German Federal and State
Governments
and the
Graduate School of Computational Engineering
at Technische Universität Darmstadt.
![Page 37: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/37.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 37
References
[1] Mehrotra, Sanjay. "On the implementation of a primal-dual interior point method." SIAM Journal on
optimization 2.4 (1992): 575-601.
[2] Gondzio, Jacek. "Presolve analysis of linear programs prior to applying an interior point
method." INFORMS Journal on Computing 9.1 (1997): 73-91.
[3] Gondzio, Jacek. "Presolve analysis of linear programs prior to applying an interior point
method." INFORMS Journal on Computing 9.1 (1997): 73-91.
[4] Paige, Christopher C., and Michael A. Saunders. "Solution of sparse indefinite systems of linear
equations." SIAM journal on numerical analysis 12.4 (1975): 617-629.
[5] Paige, Christopher C., and Michael A. Saunders. "Solution of sparse indefinite systems of linear
equations." SIAM journal on numerical analysis 12.4 (1975): 617-629.
![Page 38: FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR … · 2017. 5. 11. · 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1 Daniel Thuerck1,2 (advisors Michael](https://reader035.fdocuments.us/reader035/viewer/2022081411/60b0893b81dd71584d119763/html5/thumbnails/38.jpg)
10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 38
References
[6] Karypis, George, and Vipin Kumar. "A fast and high quality multilevel scheme for partitioning
irregular graphs." SIAM Journal on scientific Computing 20.1 (1998): 359-392.
[7] Koch, Thorsten, et al. "MIPLIB 2010." Mathematical Programming Computation 3.2 (2011): 103-163.
[8] Forrest, John, David de la Nuez, and Robin Lougee-Heimer. "CLP user guide." IBM Research (2004).
[9] Benders, Jacques F. "Partitioning procedures for solving mixed-variables programming
problems." Numerische mathematik 4.1 (1962): 238-252.
[10] Dantzig, George B., and Philip Wolfe. "Decomposition principle for linear programs." Operations
research 8.1 (1960): 101-111.