Fast static performance analysis of parallel …...Fast static performance analysis of parallel...
Transcript of Fast static performance analysis of parallel …...Fast static performance analysis of parallel...
![Page 1: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,](https://reader035.fdocuments.us/reader035/viewer/2022062317/5ec980726ace79356a38eb9f/html5/thumbnails/1.jpg)
Fast static performance analysis
of parallel program schemes
Yuriy Sheynin, Boris Sedov,
Alexey Syschikov, Vera Ivanova {sheynin, boris.sedov,
alexey.syschikov, vera.ivanova}@guap.ru
Presenting: Sergey Pakharev
![Page 2: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,](https://reader035.fdocuments.us/reader035/viewer/2022062317/5ec980726ace79356a38eb9f/html5/thumbnails/2.jpg)
Software for embedded systems and parallelism
2/13
20-24 April 2015 17th FRUCT Conference
For parallel software a very
important opportunity early to
assess the potential parallelism
and possible acceleration
depending on the number of
processors platform
![Page 3: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,](https://reader035.fdocuments.us/reader035/viewer/2022062317/5ec980726ace79356a38eb9f/html5/thumbnails/3.jpg)
Parallel program
3/13
20-24 April 2015 17th FRUCT Conference
VPL – visual programming language
Program on VPL – directed graph represented as block-schemes:
• vertices are the operators
• arcs are pointers, links operators
![Page 4: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,](https://reader035.fdocuments.us/reader035/viewer/2022062317/5ec980726ace79356a38eb9f/html5/thumbnails/4.jpg)
Early performance evaluation tool
4/13
20-24 April 2015 17th FRUCT Conference
Static analysis
• Evaluation of parallelism
and performance at early
stages
• Quick and “cheap” task
Complex performance analysis
Static analysis
Virtual simulator Platform simulator
![Page 5: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,](https://reader035.fdocuments.us/reader035/viewer/2022062317/5ec980726ace79356a38eb9f/html5/thumbnails/5.jpg)
5/13
Parallelism and data
20-24 April 2015 17th FRUCT Conference
Part of the program is parallel, the execution of such a program on the 2
processors must significantly reduce the total execution time. Real acceleration
of program execution < 1.5%.
The reason - the difference is the size of the
input data received on each parallel branch
program
![Page 6: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,](https://reader035.fdocuments.us/reader035/viewer/2022062317/5ec980726ace79356a38eb9f/html5/thumbnails/6.jpg)
6/13
Parallelism and data
adding matrix 𝑂 𝑛2
multiply matrix 𝑂 𝑛3
20-24 April 2015 17th FRUCT Conference
Some operators have asymptotic complexity that depends on the size of data
being processed.
For the analysis of the user specifies:
• Minimal data amount 𝑁𝑚𝑖𝑛
• Base data amount 𝑁𝑏𝑎𝑠𝑒
• Maximal data amount 𝑁𝑚𝑎𝑥
• Base time of program execution
𝐸𝑥𝑒𝑐𝐶𝑜𝑠𝑡𝑏𝑎𝑠𝑒
𝐸𝑥𝑒𝑐𝐶𝑜𝑠𝑡 = 𝐸𝑥𝑒𝑐𝐶𝑜𝑠𝑡𝑏𝑎𝑠𝑒
𝑂 𝑁𝑏𝑎𝑠𝑒∙ 𝑂 𝑁
![Page 7: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,](https://reader035.fdocuments.us/reader035/viewer/2022062317/5ec980726ace79356a38eb9f/html5/thumbnails/7.jpg)
7/13
Parallelism and data
20-24 April 2015 17th FRUCT Conference
Parallelism scheme decreases with increasing size of the matrix. The program is
not suitable for parallel platforms
• 𝑁𝑚𝑖𝑛 = 1
• 𝑁𝑚𝑎𝑥 = 15
• 𝑁𝑏𝑎𝑠𝑒 = 1
![Page 8: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,](https://reader035.fdocuments.us/reader035/viewer/2022062317/5ec980726ace79356a38eb9f/html5/thumbnails/8.jpg)
Hierarchy
8/13
12
3
20-24 April 2015 17th FRUCT Conference
VPL scheme program may also contain terminal blocks (data processing) and
composite operators (structural units)
Composite components are designed for
a hierarchical structuring of the program.
They may contain terminal operators and
other composite operators
![Page 9: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,](https://reader035.fdocuments.us/reader035/viewer/2022062317/5ec980726ace79356a38eb9f/html5/thumbnails/9.jpg)
Hierarchy
9/13
1 2 3complex
node
1 2 3P1
P2
1
2
3
complex
node
1
2
3P1
P2
20-24 April 2015 17th FRUCT Conference
Model performance composite structures:
Fully sequential
• all nodes in the body of the
compound statement are placed
on one processor
Fully parallel
• all nodes in the body of a
compound operator placed all
available processors by the
general rules
![Page 10: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,](https://reader035.fdocuments.us/reader035/viewer/2022062317/5ec980726ace79356a38eb9f/html5/thumbnails/10.jpg)
Hierarchy
10/13
Sequential model Parallel model
C1
C2
F1 F2
F3
F4P1
P2C2
F1 F2
F3
F4P1
P2C1
t=700 t=600
20-24 April 2015 17th FRUCT Conference
![Page 11: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,](https://reader035.fdocuments.us/reader035/viewer/2022062317/5ec980726ace79356a38eb9f/html5/thumbnails/11.jpg)
11/13
Iterations
For
While
F1
F2 F3 F4
P1
P2
20-24 April 2015 17th FRUCT Conference
Most of the computing in the program are presented as conditional (while) or
iterative (for) loops, they have a significant impact on the performance of the
program.
• The asymptotic complexity of the loop body
• The number of iterations
• Execution model (parallel / sequential)
Loop execution time = accumulated time execution of the body * number of
iterations
![Page 12: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,](https://reader035.fdocuments.us/reader035/viewer/2022062317/5ec980726ace79356a38eb9f/html5/thumbnails/12.jpg)
12/13
Conclusion
20-24 April 2015 17th FRUCT Conference
Static analyzer of parallel VPL programs provides:
• Evaluation of the program speedup on a different number of processors
• Evaluation of parallelism deviations depending on data amount and
processing operators complexity
• Evaluation includes aspects of the program hierarchy and loops
Further areas of work :
• Accounting features conditional statements (if/switch)
• Implementation of deeper analysis with virtual and platform simulator
![Page 13: Fast static performance analysis of parallel …...Fast static performance analysis of parallel program schemes Yuriy Sheynin, Boris Sedov, Alexey Syschikov, Vera Ivanova {sheynin,](https://reader035.fdocuments.us/reader035/viewer/2022062317/5ec980726ace79356a38eb9f/html5/thumbnails/13.jpg)
13/13
Thank you!
20-24 April 2015 17th FRUCT Conference