Dot Matrices Global Alignments Local Alignment

38
Dot Matrices Global Alignments Local Alignment HOW Can we Align Two Sequences ?

description

HOW Can we Align Two Sequences ?. Dot Matrices Global Alignments Local Alignment. Dot Matrices. QUESTION. What are the elements shared by two sequences ?. >Seq1 THEFATCAT >Seq2 THELASTCAT. T. H. E. F. A. T. C. A. T. T. Window. H. E. Stringency. F. A. S. T. C. A. T. - PowerPoint PPT Presentation

Transcript of Dot Matrices Global Alignments Local Alignment

Page 1: Dot Matrices Global Alignments Local Alignment

Dot MatricesGlobal AlignmentsLocal Alignment

HOW Can we Align Two Sequences ?

Page 2: Dot Matrices Global Alignments Local Alignment
Page 3: Dot Matrices Global Alignments Local Alignment

Dot Matrices

QUESTION

What are the elements shared by two sequences ?

Page 4: Dot Matrices Global Alignments Local Alignment

Dot Matrices

>Seq1THEFATCAT>Seq2THELASTCAT

T H E F A T C A T

T

H

E

F

A

S

T

C

A

T

Window

Stringency

Page 5: Dot Matrices Global Alignments Local Alignment

Dot Matrices

Sequences Window size

Stringency

Page 6: Dot Matrices Global Alignments Local Alignment

Dot MatricesStrigency

Window=1Stringency=1

Window=11Stringency=7

Window=25Stringency=15

Page 7: Dot Matrices Global Alignments Local Alignment

Dot Matrices

xy

xy x

Page 8: Dot Matrices Global Alignments Local Alignment

Dot Matrices

Page 9: Dot Matrices Global Alignments Local Alignment

Dot Matrices

Page 10: Dot Matrices Global Alignments Local Alignment

Dot Matrices

Page 11: Dot Matrices Global Alignments Local Alignment

Dot Matrices

Page 12: Dot Matrices Global Alignments Local Alignment

Dot MatricesLimits

-Visual aid

-Best Way to EXPLORE the Sequence Organisation

-Does NOT provide us with an ALIGNMENT

wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| ||||????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA

Page 13: Dot Matrices Global Alignments Local Alignment

Using Dynamic Programming To Align

Sequences

Page 14: Dot Matrices Global Alignments Local Alignment

Our Scope

Coding a Global and a Local Algorithm

Understanding the DP concept

Aligning with Affine gap penalties

Sophisticated variants…

Saving memory

Page 15: Dot Matrices Global Alignments Local Alignment

Outline

-Coding Dynamic Programming with Non-affine Penalties

-Adding affine penalties

-Turning a global algorithm into a local Algorithm

-Using A Divide and conquer Strategy

-The repeated Matches Algorithm

-Double Dynamic Programming

-Tailoring DP to your needs:

Page 16: Dot Matrices Global Alignments Local Alignment

Global Alignments Without Affine Gap penalties

Dynamic Programming

Page 17: Dot Matrices Global Alignments Local Alignment

How To align Two Sequences With a Gap Penalty, A Substitution

matrix and Not too Much Time

Dynamic Programming

Page 18: Dot Matrices Global Alignments Local Alignment

A bit of History…

-DP invented in the 50s by Bellman

-Programming Tabulation

-Re-invented in 1970 by Needlman and Wunsch

-It took 10 year to find out…

Page 19: Dot Matrices Global Alignments Local Alignment

The Foolish Assumption

The score of each column of the alignment is independent from the rest of the alignment

It is possible to model the relationship between two sequences with:-A substitution matrix-A simple gap penalty

Page 20: Dot Matrices Global Alignments Local Alignment

The Principal of DP

If you extend optimally an optimal alignment of two sub-sequences, the result remains an optimal alignment

X- XXXXXX

X-

XX

-X

Deletion

Alignment

Insertion

??+

Page 21: Dot Matrices Global Alignments Local Alignment

Finding the score of i,j

-Sequence 1: [1-i]-Sequence 2: [1-j]

-The optimal alignment of [1-i] vs [1-j] can finish in three different manners:

X-

XX

-X

Page 22: Dot Matrices Global Alignments Local Alignment

Finding the score of i,j

i-

ij

-j

1… i1…j-1

1…i-11…j-1

1…i-11… j

+

+

+

Three ways to buildthe alignment 1…i

1…j

Page 23: Dot Matrices Global Alignments Local Alignment

Finding the score of i,j

1…i-11…j-1

1….i1…j-1

1…i-11….j

In order to Compute the score of 1…i1…j

All we need are the scores of:

Page 24: Dot Matrices Global Alignments Local Alignment

Formalizing the algorithm

F(i,j)= best

F(i-1,j) + Gap

F(i-1,j-1) + Mat[i,j]

F(i,j-1) + Gap X-

XX

-X

1….i1…j-1

1…i-11…j-1

1…i-11…j

+

+

+

Page 25: Dot Matrices Global Alignments Local Alignment

Arranging Everything in a Table

- F A-FAS

T

T

1…I-11…J-1

1…I1…J-1

1…I-11…J

1…I 1…J

Page 26: Dot Matrices Global Alignments Local Alignment

Taking Care of the Limits

In a Dynamic Programming strategy, the most delicate part is to take care of the limits:

-what happens when you start-what happens when you finish

The DP strategy relies on the idea that ALL the cells in your table have the same environment…

This is NOT true of ALL the cells!!!!

Page 27: Dot Matrices Global Alignments Local Alignment

Filing Up The Matrix

Page 28: Dot Matrices Global Alignments Local Alignment

Taking Care of the Limits

- F A-FAS

T

T -4Match=2MisMatch=-1Gap=-1

-3

FAT---

-1

F-

-2

FA--

-1F-

-2FA--

-3FAS---

0

Page 29: Dot Matrices Global Alignments Local Alignment

- F A

-

F

A

S -3

-2

-1

-1 -2

T

-3

T -4

-2+2

-2 +2-3

-2

+1 +1-4

-3

0 0+1

-2

-3 +10

+4

0 +4-1

0

+3 +30

-3

-4 0+3

0

-1 +3+2

+3

+2 +3-1

-4

-5 -1+2

-1

-2 +2+2

+5

+1 +5

0

Match=2MisMatch=-1Gap=-1

Page 30: Dot Matrices Global Alignments Local Alignment

Delivering the alignment: Trace-back

Score of 1…3 Vs 1…4

Optimal Aln Score

TT

S-

AA

FF

Page 31: Dot Matrices Global Alignments Local Alignment

Trace-back: possible implementation

while (!($i==0 && $j==0)) { if ($tb[$i][$j]==$sub) #SUBSTITUTION

{ $alnI[$aln_len]=$seqI[--$i]; $alnJ[$aln_len]=$seqJ[--$j];

} elsif ($tb[$i][$j]==$del) #DELETION

{ $alnI[$aln_len]='-'; $alnJ[$aln_len]=$seqJ[--$j]; }

elsif ($tb[$i][$j]==$ins) #INSERTION{ $alnI[$aln_len]=$seqI[0][--$i]; $alnJ[$aln_len]='-'; }

$aln_len++; }

Page 32: Dot Matrices Global Alignments Local Alignment

Local Alignments

Dynamic Programming

Page 33: Dot Matrices Global Alignments Local Alignment

The Smith and Waterman Algorithm

F(i,j)= best

F(i-1,j) + Gep

F(i-1,j-1) + Mat[i,j]

F(i,j-1) + Gep X-

XX

-X

1…i1…j-1

1…i-11…j-1

1…i-11…j

+

+

+

0

Page 34: Dot Matrices Global Alignments Local Alignment

Filing Up a SW Matrix

0

Page 35: Dot Matrices Global Alignments Local Alignment

Filling up a SW matrix: borders

* - A N I C E C A T - 0 0 0 0 0 0 0 0 0C 0A 0T 0A 0N 0 D 0O 0G 0

Easy:Local alignments NEVER start/end with a gap…

Page 36: Dot Matrices Global Alignments Local Alignment

Filing Up a SW Matrix

0

Page 37: Dot Matrices Global Alignments Local Alignment

Filling up a SW matrix

* - A N I C E C A T - 0 0 0 0 0 0 0 0 0C 0 0 0 0 2 0 2 0 0 A 0 2 0 0 0 0 0 4 0T 0 0 0 0 0 0 0 2 6A 0 2 0 0 0 0 0 0 4N 0 0 4 2 0 0 0 0 2D 0 0 2 2 0 0 0 0 0O 0 0 0 0 0 0 0 0 0G 0 0 0 0 0 0 0 0 0

Best Local score

Beginning of the trace-back

Match=2MisMatch=-1Gap=-1

Page 38: Dot Matrices Global Alignments Local Alignment

for ($i=1; $i<=$len0; $i++) { for ($j=1; $j<=$len1; $j++)

{ if ($res0[0][$i-1] eq $res1[0][$j-1]){$s=2;}

else {$s=-1;} $sub=$mat[$i-1][$j-1]+$s; $del=$mat[$i ][$j-1]+$gep; $ins=$mat[$i-1][$j ]+$gep; if ($sub>$del && $sub>$ins && $sub>0)

{$smat[$i][$j]=$sub;$tb[$i][$j]=$subcode;} elsif($del>$ins && $del>0 )

{$smat[$i][$j]=$del;$tb[$i][$j]=$delcode;} elsif( $ins>0 )

{$smat[$i][$j]=$ins;$tb[$i][$j]=$inscode;} else {$smat[$i][$j]=$zero;$tb[$i][$j]=$stopcode;}

if ($smat[$i][$j]> $best_score) { $best_score=$smat[$i][$j]; $best_i=$i; $best_j=$j; }

} }

PrepareTraceback

TurningNW

into

SW