PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR...
Transcript of PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR...
![Page 1: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/1.jpg)
PSLP: Padded SLP
Automatic Vectorization
Vasileios Porpodas†, Alberto Magni‡
and Timothy M. Jones†
University of Cambridge†
University of Edinburgh‡
EuroLLVM APR 2015
slide 1 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 2: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/2.jpg)
Why SIMD Vectorization?
• Scalable parallelism
Scalar Func. Units
Scalar Reg. File
a. ILP
FUFUFUFU
slide 2 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 3: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/3.jpg)
Why SIMD Vectorization?
• Scalable parallelism
Scalar Func. Units
Scalar Reg. File
a. ILP
FUFUFUFU
slide 2 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 4: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/4.jpg)
Why SIMD Vectorization?
• Scalable parallelism
0 1 2 3
Vector Reg. File
b. Vector ParallelismScalar Func. Units
Scalar Reg. File
a. ILP
FUFUFUFU
Vector Unit
slide 2 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 5: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/5.jpg)
Why SIMD Vectorization?
• Scalable parallelism
0 1 2 3
Vector Reg. File
b. Vector ParallelismScalar Func. Units
Scalar Reg. File
a. ILP
FUFUFUFU
Vector Unit
slide 2 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 6: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/6.jpg)
Why SIMD Vectorization?
• Scalable parallelism
• High Performance0 1 2 3
Vector Reg. File
b. Vector ParallelismScalar Func. Units
Scalar Reg. File
a. ILP
FUFUFUFU
Vector Unit
slide 2 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 7: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/7.jpg)
Why SIMD Vectorization?
• Scalable parallelism
• High Performance
• Energy efficiency0 1 2 3
Vector Reg. File
b. Vector ParallelismScalar Func. Units
Scalar Reg. File
a. ILP
FUFUFUFU
Vector Unit
slide 2 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 8: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/8.jpg)
Why SIMD Vectorization?
• Scalable parallelism
• High Performance
• Energy efficiency
• Supported since mid 90’s
• Frequent updates of vectorISAs
0 1 2 3
Vector Reg. File
b. Vector ParallelismScalar Func. Units
Scalar Reg. File
a. ILP
FUFUFUFU
Vector Unit
AVX2
SSE4
slide 2 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 9: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/9.jpg)
Why SIMD Vectorization?
• Scalable parallelism
• High Performance
• Energy efficiency
• Supported since mid 90’s
• Frequent updates of vectorISAs
• Vector generation notdone in hardware
• Low-level programming orcapable compiler
0 1 2 3
Vector Reg. File
b. Vector ParallelismScalar Func. Units
Scalar Reg. File
a. ILP
FUFUFUFU
Vector Unit
AVX2
SSE4
slide 2 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 10: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/10.jpg)
SLP Straight-Line Code Vectorizer
• Superword Level Parallelism [Larsen PLDI’00]
slide 3 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 11: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/11.jpg)
SLP Straight-Line Code Vectorizer
• Superword Level Parallelism [Larsen PLDI’00]
• State-of-the-art straight-line code vectorizer
slide 3 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 12: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/12.jpg)
SLP Straight-Line Code Vectorizer
• Superword Level Parallelism [Larsen PLDI’00]
• State-of-the-art straight-line code vectorizer
• Implemented in most compilers (including GCC andLLVM)
slide 3 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 13: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/13.jpg)
SLP Straight-Line Code Vectorizer
• Superword Level Parallelism [Larsen PLDI’00]
• State-of-the-art straight-line code vectorizer
• Implemented in most compilers (including GCC andLLVM)
• In theory it should be a superset of loop-vectorizer
slide 3 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 14: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/14.jpg)
SLP Straight-Line Code Vectorizer
• Superword Level Parallelism [Larsen PLDI’00]
• State-of-the-art straight-line code vectorizer
• Implemented in most compilers (including GCC andLLVM)
• In theory it should be a superset of loop-vectorizer• Unroll loop and vectorize with SLP• Even if loop-vectorizer fails, SLP could partly succeed
slide 3 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 15: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/15.jpg)
SLP Straight-Line Code Vectorizer
• Superword Level Parallelism [Larsen PLDI’00]
• State-of-the-art straight-line code vectorizer
• Implemented in most compilers (including GCC andLLVM)
• In theory it should be a superset of loop-vectorizer• Unroll loop and vectorize with SLP• Even if loop-vectorizer fails, SLP could partly succeed
• In practice it is missing features present in the Loopvectorizer (Interleaved Loads, Predication)
slide 3 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 16: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/16.jpg)
SLP Vectorization Algorithm
• Input is scalar IR
Scalar Code
slide 4 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 17: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/17.jpg)
SLP Vectorization Algorithm
• Input is scalar IR• Seed instructions are:
1 Consecutive Stores2 Reductions
Find vectorizationseed instructions1.
Scalar Code
slide 4 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 18: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/18.jpg)
SLP Vectorization Algorithm
• Input is scalar IR• Seed instructions are:
1 Consecutive Stores2 Reductions
• Graph contains vectorizableisomorphic instructions
Find vectorizationseed instructions1.
Scalar Code
2.Generate graph of
isomorphic scalar groups
slide 4 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 19: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/19.jpg)
SLP Vectorization Algorithm
• Input is scalar IR• Seed instructions are:
1 Consecutive Stores2 Reductions
• Graph contains vectorizableisomorphic instructions
• Cost: weighted instr. count
Find vectorizationseed instructions1.
CalculateVector Cost
CalculateScalar Cost3.
Scalar Code
2.Generate graph of
isomorphic scalar groups
slide 4 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 20: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/20.jpg)
SLP Vectorization Algorithm
• Input is scalar IR• Seed instructions are:
1 Consecutive Stores2 Reductions
• Graph contains vectorizableisomorphic instructions
• Cost: weighted instr. count
• Check vectorization profitability
Find vectorizationseed instructions1.
CalculateVector Cost
CalculateScalar Cost3.
4.If<Vector Cost
Scalar Cost
Scalar Code
2.Generate graph of
isomorphic scalar groups
slide 4 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 21: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/21.jpg)
SLP Vectorization Algorithm
• Input is scalar IR• Seed instructions are:
1 Consecutive Stores2 Reductions
• Graph contains vectorizableisomorphic instructions
• Cost: weighted instr. count
• Check vectorization profitability
• Emit vectors only if profitable
Find vectorizationseed instructions1.
CalculateVector Cost
CalculateScalar Cost3.
4.If<Vector Cost
Scalar Cost
Vectorize groups& emit vectors
YES
5.
DONE
Scalar Code
2.Generate graph of
isomorphic scalar groups
slide 4 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 22: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/22.jpg)
SLP Vectorization Algorithm
• Input is scalar IR• Seed instructions are:
1 Consecutive Stores2 Reductions
• Graph contains vectorizableisomorphic instructions
• Cost: weighted instr. count
• Check vectorization profitability
• Emit vectors only if profitable
Find vectorizationseed instructions1.
CalculateVector Cost
CalculateScalar Cost3.
4.If<Vector Cost
Scalar Cost
Vectorize groups& emit vectors
YES
5.
NO
DONE
Scalar Code
2.Generate graph of
isomorphic scalar groups
slide 4 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 23: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/23.jpg)
When SLP Fails
1 Data DependenciesADD3ADD1 ADD2
ADD4
slide 5 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 24: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/24.jpg)
When SLP Fails
1 Data Dependencies
2 Too manygather/scatterinstructions. Costsoutweigh benefits.
ADD3ADD1 ADD2ADD4
ADD1ADD2ADD3ADD4
Original Vectorized
ADD1 ADD2 ADD3 ADD4
Insert1Insert2Insert3Insert4
Extract1Extract2Extract3Extract4
slide 5 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 25: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/25.jpg)
When SLP Fails
1 Data Dependencies
2 Too manygather/scatterinstructions. Costsoutweigh benefits.
3 Non-isomorphism
ADD3ADD1 ADD2ADD4
ADD1ADD2ADD3ADD4
Original Vectorized
ADD1 ADD2 ADD3 ADD4
Insert1Insert2Insert3Insert4
Extract1Extract2Extract3Extract4
ADD1 ADD2 ADD4MUL
slide 5 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 26: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/26.jpg)
SLP Fails due to non-isomorphism
X Instruction Node or Constant Data Flow Edge
a. Input C code
...
...
B[i] = A[i] * 7.0 + 1.0;B[i+1]= A[i+1] + 5.0;
slide 6 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 27: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/27.jpg)
SLP Fails due to non-isomorphism
X Instruction Node or Constant Data Flow Edge
S S
L
L
+
*
+
a. Input C code
...
...
7.
1. 5.
B[i] = A[i] * 7.0 + 1.0;B[i+1]= A[i+1] + 5.0;
b. DFG
slide 6 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 28: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/28.jpg)
SLP Fails due to non-isomorphism
X Instruction Node or Constant Data Flow Edge
S S
L
L
SS
+
*
+
a. Input C code
...
...
7.
1. 5.
S S 0
B[i] = A[i] * 7.0 + 1.0;B[i+1]= A[i+1] + 5.0;
b. DFG c. SLP internal graph d. SLP vectorized groups
slide 6 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 29: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/29.jpg)
SLP Fails due to non-isomorphism
X Instruction Node or Constant Data Flow Edge
S S
L
L
SS
+
*
+
a. Input C code
...
...
7.
1. 5.
S S
+ +
0
1
B[i] = A[i] * 7.0 + 1.0;B[i+1]= A[i+1] + 5.0;
b. DFG c. SLP internal graph d. SLP vectorized groups
++
slide 6 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 30: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/30.jpg)
SLP Fails due to non-isomorphism
X Instruction Node or Constant Data Flow Edge
S S
L
L
SS
+
*
+
a. Input C code
STOP !NON−ISOMORPHIC
* L
...
...
L
1. 5.
7.
1. 5.
7.
S S
+ +
0
1
2 L*
B[i] = A[i] * 7.0 + 1.0;B[i+1]= A[i+1] + 5.0;
b. DFG c. SLP internal graph d. SLP vectorized groups
++
slide 6 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 31: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/31.jpg)
SLP Fails due to non-isomorphism
X Instruction Node or Constant Data Flow Edge
S S
L
L
SS
+
S
+
*
*
+
a. Input C code
STOP !NON−ISOMORPHIC
* L
...
...
L
1. 5.
7.
1. 5.
7.
S S
+ +
0
1
2 L*
B[i] = A[i] * 7.0 + 1.0;B[i+1]= A[i+1] + 5.0;
Scalar Cost
b. DFG c. SLP internal graph d. SLP vectorized groups
LL
S
+
++
7
slide 6 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 32: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/32.jpg)
SLP Fails due to non-isomorphism
X Instruction Node or Constant Data Flow Edge
S S
L
L
SS
SS
LL
+
S
+
*
*
+
a. Input C code
STOP !NON−ISOMORPHIC
* L
...
...
L
1. 5.
7.
1. 5.
7.
S S
+ +
0
1
2 L*
B[i] = A[i] * 7.0 + 1.0;B[i+1]= A[i+1] + 5.0;
Vector CostScalar Cost
b. DFG c. SLP internal graph d. SLP vectorized groups
NoBenefit
LL
S
+
++
7 7*
ii++
slide 6 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 33: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/33.jpg)
PSLP fixes Non-Isomorphism
S
L
+
S
*
7.
1.
a. PSLP graphs
+
L 5.
Data Flow EdgeInstruction or ConstantX
slide 7 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 34: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/34.jpg)
PSLP fixes Non-Isomorphism
S
L
S
L
S
L
+
S
+ +
*
*7.
1.
b. PSLP padded graphsa. PSLP graphs
7.
1. 5.
+
L 5.
Data Flow EdgeInstruction or ConstantX
slide 7 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 35: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/35.jpg)
PSLP fixes Non-Isomorphism
S
L
S
L
S
L
+
S
+ +
*
* *7.
1.
b. PSLP padded graphsa. PSLP graphs
7.
1.
7.
5.
+
L 5.
Data Flow EdgeInstruction or ConstantX
slide 7 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 36: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/36.jpg)
PSLP fixes Non-Isomorphism
S
L
S
L
S
L
+
S
+ +
*
* *7.
1.
b. PSLP padded graphsa. PSLP graphs
7.
1.
7.
5.Left Right
+
L 5.
Select Instruction Data Flow EdgeInstruction or ConstantX
slide 7 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 37: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/37.jpg)
PSLP fixes Non-Isomorphism
S
L
S
L
S
L
+
S
+ +
*
* *7.
1.
b. PSLP padded graphsa. PSLP graphs
7.
1.
7.
5.Left Right
+
L 5.
Select Instruction Data Flow EdgeInstruction or ConstantX
slide 7 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 38: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/38.jpg)
PSLP fixes Non-Isomorphism
S
L
S
L
S
L
+
S
+ +
*
* *7.
1.
c. PSLP groupsb. PSLP padded graphsa. PSLP graphs
7.
1.
7.
5.
1
2
0 S S
++
3 **
4 L L
Left Right+
L 5.
Select Instruction Data Flow EdgeInstruction or ConstantX
slide 7 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 39: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/39.jpg)
PSLP fixes Non-Isomorphism
S
L
S
L
S
L
+
S
+ +
*
* *7.
1.
c. PSLP groupsb. PSLP padded graphsa. PSLP graphs
7.
1.
7.
5.
1
2
0 S S
++
3 **
4 L L
Left Right 5+
L 5.
Select Instruction Data Flow EdgeInstruction or ConstantX
slide 7 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 41: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/41.jpg)
PSLP Algorithm
• Extension to SLP
1. Find vectorization seed instructions
slide 8 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 42: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/42.jpg)
PSLP Algorithm
• Extension to SLP
• Generate multiplegraphs (unlike SLP)
Generate a graph for each seed2.
1. Find vectorization seed instructions
slide 8 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 43: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/43.jpg)
PSLP Algorithm
• Extension to SLP
• Generate multiplegraphs (unlike SLP)
• Minimal Padding
3. Perform minimal Padding of graphs
Generate a graph for each seed2.
1. Find vectorization seed instructions
slide 8 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 44: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/44.jpg)
PSLP Algorithm
• Extension to SLP
• Generate multiplegraphs (unlike SLP)
• Minimal Padding
• Cost estimation
CalculateScalar Cost
CalculateVector Cost
3. Perform minimal Padding of graphs
4.Calculate PaddedVector Cost
Generate a graph for each seed2.
1. Find vectorization seed instructions
slide 8 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 45: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/45.jpg)
PSLP Algorithm
• Extension to SLP
• Generate multiplegraphs (unlike SLP)
• Minimal Padding
• Cost estimation
CalculateScalar Cost
CalculateVector Cost
IfPadded Cost
is best5.
3. Perform minimal Padding of graphs
4.Calculate PaddedVector Cost
Generate a graph for each seed2.
1. Find vectorization seed instructions
slide 8 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 46: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/46.jpg)
PSLP Algorithm
• Extension to SLP
• Generate multiplegraphs (unlike SLP)
• Minimal Padding
• Cost estimation
• Emit redundant codeto createisomorphism
CalculateScalar Cost
CalculateVector Cost
IfPadded Cost
is best5.
3. Perform minimal Padding of graphs
4.Calculate PaddedVector Cost
Emit Padded Scalars
YES
6.
Generate a graph for each seed2.
1. Find vectorization seed instructions
slide 8 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 47: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/47.jpg)
PSLP Algorithm
• Extension to SLP
• Generate multiplegraphs (unlike SLP)
• Minimal Padding
• Cost estimation
• Emit redundant codeto createisomorphism
7.If<Vector Cost
Scalar Cost
NO
CalculateScalar Cost
CalculateVector Cost
IfPadded Cost
is best5.
3. Perform minimal Padding of graphs
4.Calculate PaddedVector Cost
Emit Padded Scalars
YES
6.
Generate a graph for each seed2.
1. Find vectorization seed instructions
slide 8 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 48: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/48.jpg)
PSLP Algorithm
• Extension to SLP
• Generate multiplegraphs (unlike SLP)
• Minimal Padding
• Cost estimation
• Emit redundant codeto createisomorphism
• Code vectorized byoriginal SLP
YES
7.If<Vector Cost
Scalar Cost
NO
CalculateScalar Cost
CalculateVector Cost
IfPadded Cost
is best5.
8.
3. Perform minimal Padding of graphs
4.Calculate PaddedVector Cost
Emit Padded Scalars
YES
6.
} Vanilla SLP
Generate a graph for each seed
9.
Generate SLP graph containinggroups of isomorphic scalars
Vectorize groups & emit vectors
2.
1. Find vectorization seed instructions
slide 8 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 49: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/49.jpg)
PSLP Algorithm
• Extension to SLP
• Generate multiplegraphs (unlike SLP)
• Minimal Padding
• Cost estimation
• Emit redundant codeto createisomorphism
• Code vectorized byoriginal SLP
YES
7.If<Vector Cost
Scalar Cost
NO
CalculateScalar Cost
CalculateVector Cost
IfPadded Cost
is best5.
8.
3. Perform minimal Padding of graphs
4.Calculate PaddedVector Cost
Emit Padded Scalars
YES
6.
} Vanilla SLP
Generate a graph for each seed
9.
Generate SLP graph containinggroups of isomorphic scalars
Vectorize groups & emit vectors
NO
DONE
2.
1. Find vectorization seed instructions
slide 8 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 50: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/50.jpg)
Minimal Padding Algorithm
S
+*
7L
1
+
L
S
5
g1g2
Non−Isomorphic
slide 9 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 51: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/51.jpg)
Minimal Padding Algorithm
S
+*
7L
1
+
L
S
5
g1g2
Non−Isomorphic
MCS1MCS2
slide 9 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 52: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/52.jpg)
Minimal Padding Algorithm
S
+*
7L
1
+
L
S
5
+
L
S
1
+
L
S
5
g1g2
Non−Isomorphic
g1
g2
MCS1 MCS2MCS1MCS2
slide 9 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 53: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/53.jpg)
Minimal Padding Algorithm
diff1
diff2
S
+*
7L
1
+
L
S
5
+
L
S
1
+
L
S
5
*
7
g1g2
Non−Isomorphic
g1
g2
L
+
L
+
MCS1 MCS2MCS1MCS2
slide 9 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 54: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/54.jpg)
Minimal Padding Algorithm
diff1
diff2
S
+
L
1
S
5
+
L
S
+*
7L
1
+
L
S
5
+
L
S
1
+
L
S
5
*
7
g1g2
MinCS2MinCS1
Non−Isomorphic
g1
g2
L
+
L
+
MCS1 MCS2MCS1 MCS1 MCS2MCS2
slide 9 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 55: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/55.jpg)
Minimal Padding Algorithm
diff1
diff2
S
+
L 7
*
1
S
5
+
L 7
*
S
+*
7L
1
+
L
S
5
+
L
S
1
+
L
S
5
*
7
g1g2
MinCS2MinCS1
Non−Isomorphic
g1
g2diff1diff1
L
+
L
+
MCS1 MCS2MCS1 MCS1 MCS2MCS2
slide 9 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 56: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/56.jpg)
Minimal Padding Algorithm
diff1
diff2
S
+
L 7
*
1
S
5
+
L 7
*
S
+*
7L
1
+
L
S
5
+
L
S
1
+
L
S
5
*
7
g1g2
MinCS2MinCS1
Isomorphic !Non−Isomorphic
g1
g2diff1diff1
diff2diff2
L
+
L
+
MCS1 MCS2MCS1 MCS1 MCS2MCS2
slide 9 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 57: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/57.jpg)
Minimal Padding Algorithm
diff1
diff2 SELECTSELECT
S
+
L 7
*
1
S
5
+
L 7
*
S
+*
7L
1
+
L
S
5
+
L
S
1
+
L
S
5
*
7
g1g2
MinCS2MinCS1
Isomorphic !Non−Isomorphic
g1
g2diff1diff1
diff2diff2
L
+
L
+
LeftRight
MCS1 MCS2MCS1 MCS1 MCS2MCS2
slide 9 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 58: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/58.jpg)
We can do better: Remove redundant Selects
S
L
+
S
*
EXAMPLE: Instruction acting as Select
7.
1.
+
L 5.
slide 10 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 59: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/59.jpg)
We can do better: Remove redundant Selects
S
L
+
*
S
L
+
S
*
S
L
+
*
Left
7
1
EXAMPLE: Instruction acting as Select
7.
1.
+
L 5.
7
5
Right
slide 10 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 60: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/60.jpg)
We can do better: Remove redundant Selects
S
L
+
*
S
L
+*
S
L
+
S
*
S
L
+
*
S
L
+*Left
7
17
1
EXAMPLE: Instruction acting as Select
7.
1.
+
L 5.
7
5
Right
1
1
slide 10 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 61: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/61.jpg)
We can do better: Remove redundant Selects
S
L
+
*
S
L
+*
S
L
+
S
*
S
L
+
*
S
L
+*Left
7
17
1
EXAMPLE: Instruction acting as Select
7.
1.
+
L 5.
7
5
Right
1
1
slide 10 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 62: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/62.jpg)
We can do better: Remove redundant Selects
S
L
+
*
S
L
+*
S
L
+
S
*
S
L
+
*
S
L
+*Left
7
17
1
EXAMPLE: Instruction acting as Select
7.
1.
+
L 5.
7
5
Right
1
1
1
+
C
A
a. Instruction acting as Selectslide 10 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 63: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/63.jpg)
We can do better: Remove redundant Selects
S
L
+
*
S
L
+*
S
L
+
S
*
S
L
+
*
S
L
+*Left
7
17
1
EXAMPLE: Instruction acting as Select
7.
1.
+
L 5.
7
5
Right
1
1
A
+
0C
1
+
C
A
a. Instruction acting as Selectslide 10 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 64: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/64.jpg)
We can do better: Remove redundant Selects
S
L
+
*
S
L
+*
S
L
+
S
*
S
L
+
*
S
L
+*Left
7
17
1
EXAMPLE: Instruction acting as Select
7.
1.
+
L 5.
7
5
Right
1
1
A
+
0C
A
72
1
+
C
A
a. Instruction acting as Select b. Select constantsslide 10 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 65: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/65.jpg)
We can do better: Remove redundant Selects
S
L
+
*
S
L
+*
S
L
+
S
*
S
L
+
*
S
L
+*Left
7
17
1
EXAMPLE: Instruction acting as Select
7.
1.
+
L 5.
7
5
Right
1
1
A
+
0C
A
2
A
72
1
+
C
A
a. Instruction acting as Select b. Select constantsslide 10 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 66: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/66.jpg)
We can do better: Remove redundant Selects
S
L
+
*
S
L
+*
S
L
+
S
*
S
L
+
*
S
L
+*Left
7
17
1
EXAMPLE: Instruction acting as Select
7.
1.
+
L 5.
7
5
Right
1
1
A
+
0C
A
2
A
72
A
B1
+
C
A
a. Instruction acting as Select b. Select constants c. Select same nodeslide 10 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 67: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/67.jpg)
We can do better: Remove redundant Selects
S
L
+
*
S
L
+*
S
L
+
S
*
S
L
+
*
S
L
+*Left
7
17
1
EXAMPLE: Instruction acting as Select
7.
1.
+
L 5.
7
5
Right
1
1
A
+
0C
A
2
A
72
A
B
A
B
1
+
C
A
a. Instruction acting as Select b. Select constants c. Select same nodeslide 10 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 68: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/68.jpg)
Opportunities for PSLP in real-life applications
1 Non-isomorphic source code (e.g. computingconjugates in 433.milc)
a[0].reala[0].imaga[1].reala[1].imag
...
...
b[0].imag = − a[0].imag
b[1].imag = − a[1].imag
b[0].real = a[0].real
b[1].real = a[1].real
Memory
slide 12 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 69: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/69.jpg)
Opportunities for PSLP in real-life applications
1 Non-isomorphic source code (e.g. computingconjugates in 433.milc)
a[0].reala[0].imaga[1].reala[1].imag
...
...
b[0].imag = − a[0].imag
b[1].imag = − a[1].imag
b[0].real = a[0].real
b[1].real = a[1].real
Memory
slide 12 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 70: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/70.jpg)
Opportunities for PSLP in real-life applications
1 Non-isomorphic source code (e.g. computingconjugates in 433.milc)
a[0].reala[0].imaga[1].reala[1].imag
...
...
b[0].imag = − a[0].imag
b[1].imag = − a[1].imag
b[0].real = a[0].real
b[1].real = a[1].real
Memory
2 Isomorphic source code but non-isomorphic IR dueto high-level optimizations (jdct of cjpeg)
tmp1 = quantval[0]*16384tmp2 = quantval[1]*22725tmp3 = quantval[2]*21407tmp4 = quantval[3]*19266
slide 12 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 71: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/71.jpg)
Opportunities for PSLP in real-life applications
1 Non-isomorphic source code (e.g. computingconjugates in 433.milc)
a[0].reala[0].imaga[1].reala[1].imag
...
...
b[0].imag = − a[0].imag
b[1].imag = − a[1].imag
b[0].real = a[0].real
b[1].real = a[1].real
Memory
2 Isomorphic source code but non-isomorphic IR dueto high-level optimizations (jdct of cjpeg)
tmp1 = quantval[0]<<14tmp2 = quantval[1]*22725tmp3 = quantval[2]*21407tmp4 = quantval[3]*19266
tmp1 = quantval[0]*16384tmp2 = quantval[1]*22725tmp3 = quantval[2]*21407tmp4 = quantval[3]*19266
opt
slide 12 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 72: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/72.jpg)
Opportunities for PSLP in real-life applications
1 Non-isomorphic source code (e.g. computingconjugates in 433.milc)
a[0].reala[0].imaga[1].reala[1].imag
...
...
b[0].imag = − a[0].imag
b[1].imag = − a[1].imag
b[0].real = a[0].real
b[1].real = a[1].real
Memory
2 Isomorphic source code but non-isomorphic IR dueto high-level optimizations (jdct of cjpeg)
tmp1 = quantval[0]<<14tmp2 = quantval[1]*22725tmp3 = quantval[2]*21407tmp4 = quantval[3]*19266
tmp1 = quantval[0]*16384tmp2 = quantval[1]*22725tmp3 = quantval[2]*21407tmp4 = quantval[3]*19266
opt
slide 12 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 73: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/73.jpg)
Experimental Setup
• Implemented PSLP in the trunk version of theLLVM 3.6 compiler.
slide 13 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 74: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/74.jpg)
Experimental Setup
• Implemented PSLP in the trunk version of theLLVM 3.6 compiler.
• Target: Intel Core i5-4570 @ 3.2Ghz
slide 13 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 75: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/75.jpg)
Experimental Setup
• Implemented PSLP in the trunk version of theLLVM 3.6 compiler.
• Target: Intel Core i5-4570 @ 3.2Ghz
• Compiler flags: -O3 -allow-partial-unroll-march=core-avx2 -mtune-core-i7 -ffast-math
slide 13 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 76: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/76.jpg)
Experimental Setup
• Implemented PSLP in the trunk version of theLLVM 3.6 compiler.
• Target: Intel Core i5-4570 @ 3.2Ghz
• Compiler flags: -O3 -allow-partial-unroll-march=core-avx2 -mtune-core-i7 -ffast-math
• Kernels, SPEC 2006 and Mediabench II• We evaluated the following cases:
slide 13 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 77: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/77.jpg)
Experimental Setup
• Implemented PSLP in the trunk version of theLLVM 3.6 compiler.
• Target: Intel Core i5-4570 @ 3.2Ghz
• Compiler flags: -O3 -allow-partial-unroll-march=core-avx2 -mtune-core-i7 -ffast-math
• Kernels, SPEC 2006 and Mediabench II• We evaluated the following cases:
1 All loop, SLP and PSLP vectorizers disabled (O3)
slide 13 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 78: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/78.jpg)
Experimental Setup
• Implemented PSLP in the trunk version of theLLVM 3.6 compiler.
• Target: Intel Core i5-4570 @ 3.2Ghz
• Compiler flags: -O3 -allow-partial-unroll-march=core-avx2 -mtune-core-i7 -ffast-math
• Kernels, SPEC 2006 and Mediabench II• We evaluated the following cases:
1 All loop, SLP and PSLP vectorizers disabled (O3)2 O3 + SLP enabled (SLP)
slide 13 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 79: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/79.jpg)
Experimental Setup
• Implemented PSLP in the trunk version of theLLVM 3.6 compiler.
• Target: Intel Core i5-4570 @ 3.2Ghz
• Compiler flags: -O3 -allow-partial-unroll-march=core-avx2 -mtune-core-i7 -ffast-math
• Kernels, SPEC 2006 and Mediabench II• We evaluated the following cases:
1 All loop, SLP and PSLP vectorizers disabled (O3)2 O3 + SLP enabled (SLP)3 O3 + PSLP enabled (PSLP)
slide 13 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 80: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/80.jpg)
PSLP increases performance
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
conjugates
su3-adjoint
make-ahmat-slow
jdct-ifastfloyd-warshall
GMean
Nor
mal
ized
Tim
e
Performance of Kernels (Execution Time)
O3 SLP PSLP
0.97
0.98
0.99
1.00
1.01
cjpegmpeg2dec
433.milc473.astar
GMean
Whole Benchmarks (Execution Time)
O3 SLP PSLP
slide 14 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 81: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/81.jpg)
PSLP enables or extends vectorization
0
10
20
30
40
50
conjugates
su3-adjoint
make-ahmat-slow
jdct-ifastfloyd-warshall
cjpegmpeg2dec
433.milc473.astar
Tim
es T
echn
ique
Suc
ceed
s
Vectorization Coverage Breakdown163
SLP-onlyPSLP-extends-SLP
PSLP-only
SLPonly
• SLP is adequate
slide 15 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 82: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/82.jpg)
PSLP enables or extends vectorization
0
10
20
30
40
50
conjugates
su3-adjoint
make-ahmat-slow
jdct-ifastfloyd-warshall
cjpegmpeg2dec
433.milc473.astar
Tim
es T
echn
ique
Suc
ceed
s
Vectorization Coverage Breakdown163
SLP-onlyPSLP-extends-SLP
PSLP-only
SLPonly
PSLP extends SLP • SLP is adequate
• SLP stops at non-isomorphiccode. PSLP extends it.
slide 15 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 83: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/83.jpg)
PSLP enables or extends vectorization
0
10
20
30
40
50
conjugates
su3-adjoint
make-ahmat-slow
jdct-ifastfloyd-warshall
cjpegmpeg2dec
433.milc473.astar
Tim
es T
echn
ique
Suc
ceed
s
Vectorization Coverage Breakdown163
SLP-onlyPSLP-extends-SLP
PSLP-only
PSLPonly
SLPonly
PSLP extends SLP • SLP is adequate
• SLP stops at non-isomorphiccode. PSLP extends it.
• SLP fails completely. PSLP
succeeds.slide 15 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 84: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/84.jpg)
Optimizing away redundant Selects
• Select-removal
optimizations
remove about 21%
of the Selects
0%
5%
10%
15%
20%
25%
30%
35%
conjugates
su3-adjoint
make-ahmat-slow
jdct-ifastfloyd-warshall
cjpegmpeg2dec
433.milc473.astar
GMean
Per
cent
age
of S
elec
ts
Percentage of Selects per region before and after Optimizations
Original-Selects Optimized-Selects
slide 16 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 85: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/85.jpg)
Conclusion
• PSLP improves vectorization coverage compared tothe state-of-the-art
slide 17 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 86: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/86.jpg)
Conclusion
• PSLP improves vectorization coverage compared tothe state-of-the-art
• Converts non-isomorphic code into isomorphic by:
slide 17 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 87: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/87.jpg)
Conclusion
• PSLP improves vectorization coverage compared tothe state-of-the-art
• Converts non-isomorphic code into isomorphic by:• Relying on the Min Common Supergraph for minimal
injection of redundant code
slide 17 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 88: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/88.jpg)
Conclusion
• PSLP improves vectorization coverage compared tothe state-of-the-art
• Converts non-isomorphic code into isomorphic by:• Relying on the Min Common Supergraph for minimal
injection of redundant code• Emitting Select instructions to guarantee correctness
slide 17 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 89: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/89.jpg)
Conclusion
• PSLP improves vectorization coverage compared tothe state-of-the-art
• Converts non-isomorphic code into isomorphic by:• Relying on the Min Common Supergraph for minimal
injection of redundant code• Emitting Select instructions to guarantee correctness• Optimizing away redundant Selects
slide 17 of 17 www.cl.cam.ac.uk/ ∼vp331/
![Page 90: PSLP: Padded SLP Automatic Vectorization€¦ · SLP Vectorization Algorithm • Input is scalar IR • Seed instructions are: 1 Consecutive Stores 2 Reductions • Graph contains](https://reader034.fdocuments.us/reader034/viewer/2022050512/5f9d07f1155ec073ee699d38/html5/thumbnails/90.jpg)
Conclusion
• PSLP improves vectorization coverage compared tothe state-of-the-art
• Converts non-isomorphic code into isomorphic by:• Relying on the Min Common Supergraph for minimal
injection of redundant code• Emitting Select instructions to guarantee correctness• Optimizing away redundant Selects
• PSLP performs better compared to SLP oncommodity SIMD-capable hardware
slide 17 of 17 www.cl.cam.ac.uk/ ∼vp331/