Cédric Notredame (19/10/2015) Using Dynamic Programming To Align Sequences Cédric Notredame.

Cédric Notredame (21/04/23)

Using Dynamic Programming To Align Sequences

Cédric Notredame

Our Scope

Coding a Global and a Local Algorithm

Understanding the DP concept

Aligning with Affine gap penalties

Sophisticated variants…

Saving memory

Outline

-Coding Dynamic Programming with Non-affine Penalties

-Adding affine penalties

-Turning a global algorithm into a local Algorithm

-Using A Divide and conquer Strategy

-The repeated Matches Algorithm

-Double Dynamic Programming

-Tailoring DP to your needs:

Global Alignments Without Affine Gap

penalties

Dynamic Programming

How To align Two Sequences With a Gap Penalty, A Substitution

matrix and Not too Much Time

Dynamic Programming

A bit of History…

-DP invented in the 50s by Bellman

-Programming Tabulation

-Re-invented in 1970 by Needlman and Wunsch

-It took 10 year to find out…

The Foolish Assumption

The score of each column of the alignment is independent from the rest of the alignment

It is possible to model the relationship between two sequences with:

-A substitution matrix-A simple gap penalty

The Principal of DP

If you extend optimally an optimal alignment of two sub-sequences, the result remains an optimal alignment

X-XXXXXX

Deletion

Alignment

Insertion

Finding the score of i,j

-Sequence 1: [1-i]-Sequence 2: [1-j]

-The optimal alignment of [1-i] vs [1-j] can finish in three different manners:

1…i1…j-1

1…i-11…j-1

1…i-11…j

Three ways to buildthe alignment

1…i1…j

1…i-11…j-1

1…i1…j-1

1…i-11…j

In order to Compute the score of

1…i1…j

All we need are the scores of:

Formalizing the algorithm

F(i,j)= best

F(i,j-1) + Gep

F(i-1,j-1) + Mat[i,j]

F(i-1,j) + Gep X-

1…i1…j-1

1…i-11…j-1

1…i-11…j

Arranging Everything in a Table

1…I-11…J-1

1…I1…J-1

1…I-11…J

1…I 1…J

Taking Care of the Limits

In a Dynamic Programming strategy, the most delicate part is to take care of the limits:

-what happens when you start-what happens when you finish

The DP strategy relies on the idea that ALL the cells in your table have the same environment…

This is NOT true of ALL the cells!!!!

Taking Care of the Limits

- F A-FAS

T -4Match=2MisMatch=-1Gap=-1

FAT---

-2FA--

-3FAS---

Filing Up The Matrix

-2 +2-3

+1 +1-4

-3 +10

0 +4-1

+3 +30

-4 0+3

-1 +3+2

+2 +3-1

-5 -1+2

-2 +2+2

Delivering the alignment: Trace-back

Score of 1…3 Vs 1…4

Optimal Aln Score

Trace-back: possible implementation

while (!($i==0 && $j==0)) { if ($tb[$i][$j]==$sub) #SUBSTITUTION

{ $alnI[$aln_len]=$seqI[--$i]; $alnJ[$aln_len]=$seqJ[--$j]; }

elsif ($tb[$i][$j]==$del) #DELETION{ $alnI[$aln_len]='-'; $alnJ[$aln_len]=$seqJ[--$j]; }

elsif ($tb[$i][$j]==$ins) #INSERTION{ $alnI[$aln_len]=$seqI[0][--$i]; $alnJ[$aln_len]='-'; }

$aln_len++; }

Local Alignments Without Affine Gap

penalties

Smith and Waterman

Getting rid of the pieces of Junk between the

interesting bits

Smith and Waterman

The Smith and Waterman Algorithm

F(i,j)= best

F(i-1,j) + Gep

F(i-1,j-1) + Mat[i,j]

F(i,j-1) + Gep X-

1…i1…j-1

1…i-11…j-1

1…i-11…j

F(i,j)= best

F(i-1,j) + Gep

F(i-1,j-1) + Mat[i,j]

F(i,j-1) + Gep

Ignore The rest of the Matrix

Terminate a local Aln

Filing Up a SW Matrix

Filling up a SW matrix: borders

* - A N I C E C A T - 0 0 0 0 0 0 0 0 0C 0A 0T 0A 0N 0 D 0O 0G 0

Easy:Local alignments

NEVER start/end with a gap…

Filling up a SW matrix

* - A N I C E C A T - 0 0 0 0 0 0 0 0 0C 0 0 0 0 2 0 2 0 0 A 0 2 0 0 0 0 0 4 0T 0 0 0 0 0 0 0 2 6A 0 2 0 0 0 0 0 0 4N 0 0 4 2 0 0 0 0 2D 0 0 2 2 0 0 0 0 0O 0 0 0 0 0 0 0 0 0G 0 0 0 0 0 0 0 0 0

Best Local score

Beginning of the trace-back

for ($i=1; $i<=$len0; $i++) { for ($j=1; $j<=$len1; $j++)

{ if ($res0[0][$i-1] eq $res1[0][$j-1]){$s=2;}

else {$s=-1;} $sub=$mat[$i-1][$j-1]+$s; $del=$mat[$i ][$j-1]+$gep; $ins=$mat[$i-1][$j ]+$gep; if ($sub>$del && $sub>$ins && $sub>0)

{$smat[$i][$j]=$sub;$tb[$i][$j]=$subcode;} elsif($del>$ins && $del>0 )

{$smat[$i][$j]=$del;$tb[$i][$j]=$delcode;} elsif( $ins>0 )

{$smat[$i][$j]=$ins;$tb[$i][$j]=$inscode;} else {$smat[$i][$j]=$zero;$tb[$i][$j]=$stopcode;}

if ($smat[$i][$j]> $best_score) { $best_score=$smat[$i][$j]; $best_i=$i; $best_j=$j; }

PrepareTraceback

Turning

A few things to remember

SW only works if the substitution matrix has been normalized to give a Negative score to a random alignment.

Chance should not pay when it comes to local alignments !

More than One match…

-SW delivers only the best scoring Match

-If you need more than one match:-SIM (Huang and Millers)Or-Waterman and Eggert (Durbin, p91)

Waterman and Eggert

-Iterative algorithm:

-1-identify the best match-2-redo SW with used pairs forbidden

-Delivers a collection of non-overlapping local alignments

-Avoid trivial variations of the optimal.

-3-finish when the last interesting local extracted

Adding Affine Gap Penalties

The Gotoh Algorithm

Forcing a bit of Biology into your alignment

The Gotoh Formulation

Why Affine gap Penalties are Biologically better

Afine Gap Penalty

GOP GOP

Parsimony: Evolution takes the simplest path

(So We Think…)

Cost=gop+L*gep

Or Cost=gop+(L-1)*gep

But Harder To compute…

More Than 3 Ways to extend an Alignment

X-XXXXXX

Deletion

Alignment

Insertion

Opening

Extension

Opening

Extension

Cédric Notredame (19/10/2015) Using Dynamic Programming To Align Sequences Cédric Notredame.

Documents

Transcript of Cédric Notredame (19/10/2015) Using Dynamic Programming To Align Sequences Cédric Notredame.

Finding, Aligning and Analyzing Non Coding RNAs Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Introduction to Perl Part I By: Cédric Notredame (Adapted from BT McInnes)

bOOk Cédric Barnat // Senior Strategist & Creative

T-COFFEE, a novel method for Multiple Sequence Alignments Cédric Notredame.

Buffett Lecture at NotreDame

Cédric Dufour ( LTS-IBCM Collaboration )

Cédric Notredame (07/11/2015) Recent Progress in Multiple Sequence Alignments: A Survey Cédric Notredame.

WELCOME! Align-Connect-Engage Align-Connect- Engage Align-Connect-Engage Align- Connect-Engage Align-Connect-Engage Align-Connect-Engage Align-Connect-

An Introduction to ENSEMBL Cédric Notredame. The Top 5 Surprises in the Human Genome Map 1.The blue gene exists in 3 genotypes: Straight Leg, Loose Fit.

Cdric Notredame (22/02/2016) Comparing Two Protein Sequences Cdric Notredame.

Introduction to Perl Part II By: Cédric Notredame (Adapted from BT McInnes)

Army NotreDame 112413

Cédric Notredame and Chantal Abergel - T-Coffee · library does not need to contain a weight associated with each possible pair of residues. On the contrary, an ideal library only

Energy Policy Cédric Philibert Energy Policy 13 Cédric Philibert.

Cédric Argenton Bert Willems - GIS LARSEN

By Cédric Argenton

Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.

Cédric Malaquin, Analyst - RF Devices & Technologies ...

Cédric Foussard International Juvenile Justice Observatory Cédric Foussard Director.