Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the...

18
Smith Waterman Algorithm - Performance Analysis Armin Bundle Department of Computer Science University of Erlangen Seminar muCoSim SS 2016 Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance Analysis Seminar muCoSim SS 2016 1 / 18

Transcript of Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the...

Page 1: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

Smith Waterman Algorithm - Performance Analysis

Armin Bundle

Department of Computer ScienceUniversity of Erlangen

Seminar muCoSim SS 2016

Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 1 / 18

Page 2: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

Outline

1 The Smith Waterman AlgorithmThe conceptThe algorithm

2 Profiling and data structure

3 The code

4 Likwid performance measurement

5 Problems and Outlook

Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 2 / 18

Page 3: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

The Smith Waterman Algorithm The algorithm

The concept (1)

The Smith Waterman Algorithm does local sequence alignment tofind similar regions in e.g. DNA or protein sequences.

A sequence alignment is a sequence of edit-operations.

Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 3 / 18

Page 4: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

The Smith Waterman Algorithm The algorithm

The concept (2)

The algorithm is a variation of the Needleman-Wunsch algorithm tocompare two sequences and create a global similarity score

Application area of the SW: The search for genes in which sequencesare similar to well known genes

The algorithem uses the method of dynamic programming

The complexity is quadratic

Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 4 / 18

Page 5: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

The Smith Waterman Algorithm The algorithm

The algorithm

First step: the matrix initialisation

Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 5 / 18

Page 6: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

The Smith Waterman Algorithm The algorithm

The algorithm

Example input data

f = -1MatchScore = 2

MismatchScore = -1

Calculation function

w(x , y) =

{m, x=y

mm, else

Evaluate the neighbours

F (i , j) = max

0

F (i − 1, j − 1) + w(xi , yi )

F (i − 1, j) + f

F (i , j − 1) + fArmin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 6 / 18

Page 7: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

The Smith Waterman Algorithm The algorithm

The algorithm

Second step: calculation of the local alignment score of the matrix

Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 7 / 18

Page 8: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

The Smith Waterman Algorithm The algorithm

The algorithm

Second step: calculation of the local alignment score of the matrix

Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 8 / 18

Page 9: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

The Smith Waterman Algorithm The algorithm

The algorithm

Third step: Traceback Matrix

Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 9 / 18

Page 10: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

Profiling and data structure

Profiling

Profiling

Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 10 / 18

Page 11: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

Profiling and data structure

Data structure

Input value (reference value is 41)

sequence size = 1 <<(scale / 2)

Arrays

main sequence & match sequence

Memory: 1mb (41)

goodScores & scores

Memory: 4.8 kb

goodEndsI, goodEndsJ, index & best

Memory: 2.4 kb

weights

Memory: 0.6 kb

Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 11 / 18

Page 12: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

The code

The code (1)

Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 12 / 18

Page 13: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

The code

The code (2)

Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 13 / 18

Page 14: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

The code

The code (3)

Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 14 / 18

Page 15: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

Likwid performance measurement

Likwid performance measurement

- value

Branch misprediction rate 7.8e-6

Load to Store ratio 5.5

CPI 0.42

L2 bandwidth [MBytes/s] 5702

L2 data volume [GBytes/s] 606.2

L2 miss rate 0.0084

L3 bandwidth [MBytes/s] 5180

L3 data volume [GBytes/s] 550.0

Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 15 / 18

Page 16: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

Likwid performance measurement

Runtime

Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 16 / 18

Page 17: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

Problems and Outlook

Problems & Outlook

Problems

Run the code with MPI

Catching a node for memory messurements

The roofline model

Outlook

Change the Data structure or the order of the sequence array access

Use MPI to see how the performance increases

Use the SIMD technology of CPUs

Convert the code for GPUs

Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 17 / 18

Page 18: Smith Waterman Algorithm - Performance Analysis · The algorithm is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application

Appendix

Sources I

https://pressbit.wordpress.com/2014/03/07/lokales-sequenzalignment-mit-dem-smith-waterman-algorithmus-in-c

Mrz 7, 2014

Armin Bundle (University of Erlangen) Smith Waterman Algorithm - Performance AnalysisSeminar muCoSim SS 2016 18 / 18