Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the...
Transcript of Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the...
Smith Waterman Algorithm - Performance Analysis
Armin Bundle
Department of Computer ScienceUniversity of Erlangen
Seminar muCoSim SS 2016
Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 1 / 18
Outline
1 The Smith Waterman AlgorithmThe conceptThe algorithm
2 Profiling and data structure
3 The code
4 Likwid performance measurement
5 Problems and Outlook
Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 2 / 18
The Smith Waterman Algorithm The algorithm
The concept (1)
The Smith Waterman Algorithm does local sequence alignment tofind similar regions in e.g. DNA or protein sequences.
A sequence alignment is a sequence of edit-operations.
Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 3 / 18
The Smith Waterman Algorithm The algorithm
The concept (2)
The algorithm is a variation of the Needleman-Wunsch algorithm tocompare two sequences and create a global similarity score
Application area of the SW: The search for genes in which sequencesare similar to well known genes
The algorithem uses the method of dynamic programming
The complexity is quadratic
Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 4 / 18
The Smith Waterman Algorithm The algorithm
The algorithm
First step: the matrix initialisation
Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 5 / 18
The Smith Waterman Algorithm The algorithm
The algorithm
Example input data
f = -1MatchScore = 2
MismatchScore = -1
Calculation function
w(x , y) =
{m, x=y
mm, else
Evaluate the neighbours
F (i , j) = max
0
F (i − 1, j − 1) + w(xi , yi )
F (i − 1, j) + f
F (i , j − 1) + fArmin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 6 / 18
The Smith Waterman Algorithm The algorithm
The algorithm
Second step: calculation of the local alignment score of the matrix
Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 7 / 18
The Smith Waterman Algorithm The algorithm
The algorithm
Second step: calculation of the local alignment score of the matrix
Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 8 / 18
The Smith Waterman Algorithm The algorithm
The algorithm
Third step: Traceback Matrix
Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 9 / 18
Profiling and data structure
Profiling
Profiling
Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 10 / 18
Profiling and data structure
Data structure
Input value (reference value is 41)
sequence size = 1 <<(scale / 2)
Arrays
main sequence & match sequence
Memory: 1mb (41)
goodScores & scores
Memory: 4.8 kb
goodEndsI, goodEndsJ, index & best
Memory: 2.4 kb
weights
Memory: 0.6 kb
Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 11 / 18
The code
The code (1)
Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 12 / 18
The code
The code (2)
Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 13 / 18
The code
The code (3)
Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 14 / 18
Likwid performance measurement
Likwid performance measurement
- value
Branch misprediction rate 7.8e-6
Load to Store ratio 5.5
CPI 0.42
L2 bandwidth [MBytes/s] 5702
L2 data volume [GBytes/s] 606.2
L2 miss rate 0.0084
L3 bandwidth [MBytes/s] 5180
L3 data volume [GBytes/s] 550.0
Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 15 / 18
Likwid performance measurement
Runtime
Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 16 / 18
Problems and Outlook
Problems & Outlook
Problems
Run the code with MPI
Catching a node for memory messurements
The roofline model
Outlook
Change the Data structure or the order of the sequence array access
Use MPI to see how the performance increases
Use the SIMD technology of CPUs
Convert the code for GPUs
Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 17 / 18
Appendix
Sources I
https://pressbit.wordpress.com/2014/03/07/lokales-sequenzalignment-mit-dem-smith-waterman-algorithmus-in-c
Mrz 7, 2014
Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 18 / 18