Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity...

34
Dynamic Programming & Smith-Waterman algorith Overview Dynamic Programming Sequence comparison Smith-Waterman algorithm References pgflastimage Dynamic Programming & Smith-Waterman algorithm Seminar: Classical Papers in Bioinformatics Yvonne Herrmann May 3rd, 2010 Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Transcript of Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity...

Page 1: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

References

pgflastimage

Dynamic Programming & Smith-Waterman

algorithmSeminar: Classical Papers in Bioinformatics

Yvonne Herrmann

May 3rd, 2010

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 2: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

References

pgflastimage Overview

1 Dynamic Programming

2 Sequence comparison

3 Smith-Waterman algorithm

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 3: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

References

pgflastimageDynamic ProgrammingIntroduction

Definition

Dynamic Programming is a method of solving problemsby breaking them down into simpler steps

problem need to contain overlapping subproblems andshould have an optimal substructure

method is used for mathematical optimization andcomputer programming

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 4: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

References

pgflastimageDynamic ProgrammingIntroduction

Definition

Dynamic Programming is a method of solving problemsby breaking them down into simpler steps

problem need to contain overlapping subproblems andshould have an optimal substructure

method is used for mathematical optimization andcomputer programming

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 5: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

References

pgflastimageDynamic ProgrammingIntroduction

Divide&Conquer

Divide&Conquer is used when all subproblems areindependent.

calculate partitions and combine the solutions to solvethe entire problem.

vs.

Dynamic Programming

Dynamic Programming is used when subproblems aredependent

there are no partitions, since the subproblems overlap.

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 6: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

References

pgflastimageDynamic ProgrammingIntroduction

Definition

Dynamic Programming is a method of solving problemsby breaking them down into simpler steps

problem need to contain overlapping subproblems andshould have an optimal substructure

method is used for mathematical optimization andcomputer programming

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 7: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

References

pgflastimageDynamic ProgrammingThe Principle of Optimality

The Principle of Optimality

”An optimal policy has the property that

whatever the initial state and initial decision are,

the remaining decisions must constitute an optimal

policy with regard to the state resulting from the

first decision.” a

aBellman, R.E. 1957. Dynamic Programming, Chap.III.3., Princeton

University Press

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 8: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

References

pgflastimageDynamic ProgrammingThe Principle of Optimality - Example

shortest path

shortest way by car to get from Bielefeld to Cologne

have to pass through Hamm(Westf) and Dortmund

shortest route from Hamm(Westf) to Cologne, needs togo through Dortmund

⇛ The second problem is inside the first one.

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 9: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

References

pgflastimageDynamic ProgrammingAlgorithms

Dynamic Programming is used by...

Floyd-Warshall algorithm (shortest path algorithm)

Needleman-Wunsch algorithm

Smith-Waterman algorithm

Bellman-Ford algorithm, etc.

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 10: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

References

pgflastimage Overview

1 Dynamic Programming

2 Sequence comparison

3 Smith-Waterman algorithm

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 11: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Intentions

Alignments

Smith-Waterman

algorithm

References

pgflastimageSequence comparisonIntentions

Why compare sequences?

Quantify the similarity or dissimilarity between two or moresequences and find out where they are similar or different.

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 12: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Intentions

Alignments

Smith-Waterman

algorithm

References

pgflastimageSequence comparisonWhy compare sequences?

The analysis of this can help to determint:

if genes from two different organism are related

if similar nucleotide sequences lead to similar proteinstructures

which species is likely more related to another one

what kind of development happened in the evolution?(Mutations, insertions and deletions of gens or morespecific in the aminoacid sequence itself)

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 13: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Intentions

Alignments

Smith-Waterman

algorithm

References

pgflastimageAlignmentsHow to compare sequences?

sequence alignment

Method of arranging the sequences of DNA, RNA oraminoacids of proteins to find regions of similarity whichmight be a consequence of functional, structural orevolutionary relationships between the sequences.

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 14: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Intentions

Alignments

Smith-Waterman

algorithm

References

pgflastimageAlignmentsHow to compare sequences?

Conditions a alignment has to fulfill

all symbols have to be in the same order they appear inthe given sequences

a symbol can be aligned with a blank (’-’)

two blanks cannot be aligned

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 15: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Intentions

Alignments

Smith-Waterman

algorithm

References

pgflastimageAlignmentsHow to compare sequences?

Example

sequence s and t are given:s: A C T G A A C T Gt: A T G G A C C T Ga possible alignment is:A C T - G A - A C T GA - T G G A C - C T G

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 16: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Intentions

Alignments

Smith-Waterman

algorithm

References

pgflastimageLocal vs. global alignmentWhat’s the difference?

global alignment

The sequences must be aligned from start to end.

local alignment

Local alignments identify regions of high similarity withinsequences.

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 17: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Intentions

Alignments

Smith-Waterman

algorithm

References

pgflastimageLocal vs. global alignmentWhat’s the difference?

global alignment

The sequences must be aligned from start to end.

local alignment

Local alignments identify regions of high similarity withinsequences which are often widely different overall.

Smith-Waterman algorithm calculates the optimal localalignment!

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 18: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Intentions

Alignments

Smith-Waterman

algorithm

References

pgflastimage Overview

1 Dynamic Programming

2 Sequence comparisonIntentionsAlignments

3 Smith-Waterman algorithm

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 19: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

History

Goal of the algorithm

The algorithm

The algorithm - anexample

complexity analysis

Disadvantages

Applications

References

pgflastimageSmith-Waterman algorithmA little history

algorithm was proposed in 1981 by Temple F. Smithand Michael S. Waterman

algorithm uses dynamic programming and is a variationof the Needleman-Wunsch algorithm

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 20: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

History

Goal of the algorithm

The algorithm

The algorithm - anexample

complexity analysis

Disadvantages

Applications

References

pgflastimageSmith-Waterman algorithmWhat’s the goal of this algorithm?

Smith-Waterman algorithm calculates the localalignment of two given sequences

used to identify similar DNA, RNA and proteinsegments

alignments of any possible length starting and ending atany position in the two sequences are compared toobtain the optimal local alignment

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 21: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

History

Goal of the algorithm

The algorithm

The algorithm - anexample

complexity analysis

Disadvantages

Applications

References

pgflastimageSmith-Waterman algorithmWhat’s the goal of this algorithm?

it guarantees to find the optimal local alignmentconsidering the given scoring system.

scoring system includes a substitution matrix and agap-scoring scheme.

scores consider matches, mismatches, substitutions orinsertions/deletions

main difference to the Needleman-Wunsch algorithm is:negative scores are set to zero

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 22: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

History

Goal of the algorithm

The algorithm

The algorithm - anexample

complexity analysis

Disadvantages

Applications

References

pgflastimageSmith-Waterman algorithmThe algorithm

Starting conditions

two molecular sequences A=a1a2...an and B=b1b2...bm.

scoring theme

course of events

first: setting up matrix HHk0 = H0l = 0 (for 0 ≤ k ≤ n and 0 ≤ l ≤ m)

next: calculate score for each cell

last: backtrace the path to obtain optimal alignment

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 23: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

History

Goal of the algorithm

The algorithm

The algorithm - anexample

complexity analysis

Disadvantages

Applications

References

pgflastimageSmith-Waterman algorithmThe algorithm

How to calculate the score for each cell?

Individual pair-wise comparisons between the characters as:

Hi j = max

Hi−1,j−1 +s(ai ,bj),

maxk{ Hi−k ,j - Wk},

maxl{ Hi ,j−l - Wl},

0.

k = deletion of length kl = deletion of length lWk and Wl is the gap − cost function

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 24: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

History

Goal of the algorithm

The algorithm

The algorithm - anexample

complexity analysis

Disadvantages

Applications

References

pgflastimageSmith-Waterman algorithmDefintion

backtracing

During the filling of matrix H you have to use backpointersto reconstruct from which cell you came.Then when you found the highest score in the matrix H youcan backtrace the path and obtain the optimal alignment.

caption of backpointers:

← Deletion

↑ Insertion

տ Substitution

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 25: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

History

Goal of the algorithm

The algorithm

The algorithm - anexample

complexity analysis

Disadvantages

Applications

References

pgflastimageSmith-Waterman algorithmSmith-Waterman - Example

Example

sequence A and B are given:

A: A G C T T and B: A G A C T

scoring theme:match = +1

mismatch = −1

3

Wk = 1 +1

3∗ k

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 26: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

History

Goal of the algorithm

The algorithm

The algorithm - anexample

complexity analysis

Disadvantages

Applications

References

pgflastimageSmith-Waterman algorithmSmith-Waterman - Example

Example

sequence A and B are given:A: A G C T T and B: A G A C T

Figure: Filled matrix H

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 27: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

History

Goal of the algorithm

The algorithm

The algorithm - anexample

complexity analysis

Disadvantages

Applications

References

pgflastimageSmith-Waterman algorithmSmith-Waterman - Example

Example

optimal local alignment:A G A CTA G - CT

Figure: Filled matrix H and backtracing path

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 28: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

History

Goal of the algorithm

The algorithm

The algorithm - anexample

complexity analysis

Disadvantages

Applications

References

pgflastimageSmith-Waterman algorithmSmith-Waterman - Example 2

best optimal local alignment can be anywhere in thesequences→ Find highest score in matrix H as backtracing start point

Figure: Example from the original paper

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 29: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

History

Goal of the algorithm

The algorithm

The algorithm - anexample

complexity analysis

Disadvantages

Applications

References

pgflastimageSmith-Waterman algorithmSmith-Waterman - Example 2

optimal local alignment:G C A U U GG C - U C G

Figure: Example from the original paper

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 30: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

History

Goal of the algorithm

The algorithm

The algorithm - anexample

complexity analysis

Disadvantages

Applications

References

pgflastimageSmith-Waterman algorithmComplexity of the algorithm

Complexity of the algorithm

running-time: O(nm)

algorithm is exact, but very time consuming.FASTA is an heuristic approximation and mostly usedtoday.

need of space: O(nm)

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 31: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

History

Goal of the algorithm

The algorithm

The algorithm - anexample

complexity analysis

Disadvantages

Applications

References

pgflastimageSmith-Waterman algorithmDisadvantages

time and space cost are very high

finds the alignment with maximal score, but not withmaximal percent of matches

algorithm makes ’mosaics’ of well-conserved fragmentswith connections by poorly-conserved fragmentssolution: length-normalized local alignment

→ obtains the region with maximum degree of similarity

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 32: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

History

Goal of the algorithm

The algorithm

The algorithm - anexample

complexity analysis

Disadvantages

Applications

References

pgflastimageSmith-Waterman algorithmApplications

JAligner

SSEARCH (in FASTA package)

Live-Demo of the Smith-Waterman algorithm:http://baba.sourceforge.net/

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 33: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

References

pgflastimage Bibliography

[1] Alison Cawsey, Dynamic Programming,http://www.macs.hw.ac.uk/alison/ds98/node122.html,1998

[2] Temple F. Smith and Michael S. Waterman,Identification of Common Molecular Subsequences,J. Mol. Biol., 147(1):195-197, March 1981

[3] Script: Sequence Analysis I+II, Lecture notes Faculty ofTechnology, Bielefeld University,Winter 2008/09 and Summer 2009

[4] Norman Casagrande, Basic-Algorithms of BioinformaticsApplet,http://baba.sourceforge.net/, 2003

[5] University of Southern California, University Professor,http://www.cmb.usc.edu/people/msw/Waterman.html,2005

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm

Page 34: Dynamic Programming & Smith-Waterman algorithm - …...The algorithm - an example complexity analysis Disadvantages Applications References pgflastimage ... algorithm was proposed

Dynamic

Programming &

Smith-Waterman

algorith

Overview

Dynamic

Programming

Sequence

comparison

Smith-Waterman

algorithm

References

pgflastimage Thank you!

The End

Thank you for your attention!

Yvonne Herrmann Dynamic Programming & Smith-Waterman algorithm