Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer...
Transcript of Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer...
![Page 1: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/1.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1
Using GCC Analysis Techniques to Enable Parallel Memory Accesses in HLS
Johanna Rohde, Christian Hochberger
Date: 7.9.2017
TU Darmstadt
Computer Systems Group
![Page 2: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/2.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 2
Outline
● Background
– SpartanMC Toolchain
– GCC
– Hardware Plugin
● Approach
– Scalar Evolution Analysis
– Memory Model
● Evaluation
● Conclusion
![Page 3: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/3.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 3
BackgroundSpartanMC SoC Kit
● 18 Bit Soft-Core
– Tradeoff between 8 and 32 bit
– FPGA use 18 bit for BRAMs and DSPs
– 2 additional bit increase code density
● Graphical system configurator
● Peripheral components
● Simulator
● Supported FPGAs
– Xilinx: Spartan 3/6, 7er Series, …
– Altera: Cyclon 4
– Lattice: ECP 3/5
Interested?www.spartanmc.de
![Page 4: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/4.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 4
BackgroundCompiler Toolchain
● GCC
– Support for 2-address-machines
– Wide range of optimizations● e.g. Polyhedral framework
– Widely used
– Currently using GCC 7.1
![Page 5: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/5.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 5
GIMPLE
DFG
C - Frontend Objekt
Files
Language Specific
Pass-Manager
SSA Passes
RTL Passes
RTL Tree
DFG
Target Specific
Plugin
GIMPLE Pass
Intermediate Representation
BackgroundCompile-Flow
![Page 6: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/6.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 6
BackgroundHardware Accelerator Plugin
● A specific SpartanMC design is generated for every application
● Automatically integrate hardware accelerators
– Statical code analysis at compile time
– Transparent to the user
– No annotations or further information needed
![Page 7: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/7.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 7
BackgroundOptimizations
● Loop pipelining
– Overlapping iterations
– Problem: order of memory accesses
– Increases initiation interval
=
R
in_1
W
=
result_1
+
*value_1
2
=
R
in_1
W
=
result_1
+
*value_1
2
Iteration 1
Iteration 2
![Page 8: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/8.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 8
ApproachChain of Recurrences
● Express evolution of a variable
● Basic Recurrence (BR)
● Tree of Recurrence (TREC)
f={φ 0 ,Θ , f 1}, Θ∈{+,∗}
Φ={Φa ,+,Φb} or Φ=c
f (i)={φ 0 ,+ , f 1}(i)=φ 0+∑j=0
i−1
f 1( j )
f (i)={φ 0 ,∗ , f 1}( i)=φ 0+∏j=0
i−1
f 1( j)
for 0≤ i<N
![Page 9: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/9.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 9
ApproachScalar Evolution Analysis
● Implemented in the GCC
● Calculates a TREC for every expression
● Example:
i →{0,+ ,1}1
i∗M →{0,+ ,1}1∗M ={0,+, M }1
i∗M+ j →{{0,+, M }1,+ ,1}2
for(int i = 0; i < N; i++) { \\loop 1 for(int j = 0; j < M; j++{ \\loop 2 A[i*M+j] = ….; }}
![Page 10: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/10.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 10
ApproachMemory Access Model
● Anatomy of a (regular) memory access
– innermost_loop_behavior struct● tree base_address● tree offset → always 0● tree init● tree step
– A = {(base + offset + init), +, step}
● Pair
– Dependency between● Read + Write ● Write + Write
![Page 11: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/11.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 11
ApproachMemory Access Model
● Static Pairs
– Analyzable at compile time
– Base address equals
– Step size equals
● Dynamic Pairs
– Analyzable at runtime
– Step size equals
–
● Critical Pairs
– Not analyzable
Δ base+Δ init +n∗step=0
![Page 12: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/12.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 12
EvaluationBenchmarks
● 11 benchmarks with 23 loops
● Different areas of application
– Cryptography
– Image processing
– Signal processing
AES
Dijkstra
Binary Tree
Haar-wavelet
Base64
Fletcher
BilinearFilter
GrayscaleFilter
BitReverse
IIR JPEG
![Page 13: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/13.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 13
EvaluationCoverage
![Page 14: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/14.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 14
EvaluationClassification
![Page 15: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/15.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 15
EvaluationFirst-Last Dependencies
=
R
in_1
W
=
result_1
+
*value_1
2
=
R
in_1
W
=
result_1
+
*value_1
2
Iteration 1
Iteration 2
![Page 16: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/16.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 16
EvaluationMemory Carried Dependencies
=
R
in_1
=
result_1
W
+
value_1
=
R
in_2
=
result_2
W
+
value_1
Iteration 1
![Page 17: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/17.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 17
EvaluationStep Size
● Condition for static/dynamic classification is equal step size
– Distance is constant
● Special case: Distance is increasing
– Asml = {(basesml + offsetsml + initsml), +, stepsml}
– Alrg = {(baselrg + offsetlrg + initlrg), +, steplrg}
– Runtime condition:● Alrg(n) – Asml(0) > 0
![Page 18: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/18.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 18
EvaluationStep Size
● General
– 9% → 4% critical pairs
● First-Last Dependencies
– 32% → 9% critical pairs
● Memory Carried Dependencies
– 6% → 3% critical pairs
![Page 19: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/19.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 19
Conclusion
● Over 90% of all dependencies can be considered for decoupling
– 91% of first-last dependencies
– 97% of memory carried dependencies
● Analysis is already implemented
● Future work
– Implementation of static/dynamic decoupling
![Page 20: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/20.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 20
Thank You!
![Page 21: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/21.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 22
GIMPLE
DFG
C - Frontend Objekt
Files
Language Specific
Pass-Manager
SSA Passes
RTL Passes
RTL Tree
DFG
Target SpecificIntermediate Representation
BackgroundCompile-Flow
• Dead Code Elimination• Forward Propagation• Dead Store Elimination• Redundancy Elimination• Loop Optimization• Constant Propagation• ...
![Page 22: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/22.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 23
Benchmarks
● 17 benchmarks with 43 loops
● Different areas of application
– Cryptography
– Image processing
– Signal processing
AES Base64Bit
ReverseBilinear
filterBinary
TreeCRC
Dijkstra EuclidHaar-
waveletFletcherGrayscale
Filter IDCT
IIR JPEG RSAMandel-
brotMatrixMult
![Page 23: Using GCC Analysis Techniques to Enable Parallel Memory ...07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1 Using GCC Analysis Techniques to Enable Parallel Memory](https://reader033.fdocuments.us/reader033/viewer/2022060900/609e0be7c947f94cb0118c48/html5/thumbnails/23.jpg)
07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 24
Benchmarks
● 20 Loops excluded
– No loop pipelining
– No store operation
– Single store operation
AES Base64Bit
ReverseBilinear
filterBinary
TreeCRC
Dijkstra EuclidHaar-
waveletFletcherGrayscale
Filter IDCT
IIR JPEG RSAMandel-
brotMatrixMult