Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch...

42
1 Advanced Topics in Bioinformatics Weizmann Institute of Science, spring 2003 Lecture 2, 12/3/2003: • Introduction to sequence alignment • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment the Smith-Waterman algorithm

Transcript of Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch...

Page 1: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

1

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Lecture 2, 12/3/2003:

• Introduction to sequence alignment

• The Needleman-Wunsch algorithm for global sequence alignment: description and properties

•Local alignmentthe Smith-Waterman algorithm

Page 2: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

2

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003Computational sequence-analysis

The major goal of computational sequence analysis is to predict the function and structure of genes and proteins from their sequence.

This is made possible sinceorganisms evolve by mutation, duplication and selection oftheir genes.

Thus, sequence similarity often indicates functional andstructural similarity.

Page 3: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

3

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Sequence alignment

5’ ATCAGAGTC 3’ 5’ TTCAGTC 3’

ATC ≠ CTA

AG ≠ GA

etc.

Page 4: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

4

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

ATCAGAGTC TTCAGTC

Sequence alignment

We wish to identify what regions are most similar to each other in the two sequences . Sequences are shifted one by the other and gaps introduced, to cover all possible alignments. The shifts and gaps provide the steps by which one sequence can be converted into the other.

ATCAGAGTCTTCAGTC

ATCAGAGTCTTCAGTC

ATCAGAGTCTTCAGTC

++

ATCAGAGTCTTCAGTC++++

ATCAGAGTCTTCA--GTC+++^^+++

Page 5: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

5

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

A T C A G A G T C

T

T

C

A

G

T

C

A T C A G A G T C

T • •T • •C • •A • • •G • •T • •C • •

Sequence alignmentdot-plot

ATTCATCA

GA--GTCGTC

Page 6: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

6

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

ATCAGAGTCTTCA--GTC

Sequence alignmentscoring

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Substitution matrix - the similarity value between each pair of residues

Gap penalty - the cost of introducing gaps Gap penalty -2

A C G TACGT

: 0+2+2+2-2-2+2+2+2 = 8•+++^̂ +++

Page 7: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

7

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

A T C A G A G T C

T 0 2 0 0 0 0 0 2 0

T 0 2 0 0 0 0 0 2 0

C 0 0 2 0 0 0 0 0 2

A 2 0 0 2 0 2 0 0 0

G 0 0 0 0 2 0 2 0 0

T 0 2 0 0 0 0 0 2 0

C 0 0 2 0 0 0 0 0 2

[T2T1] ATC-TT

[C3T1] ATC---TT

[T2T2] ATCTT-

Sequence alignmentNeedleman-Wunsch global alignment

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Initialization

Position 3,2:

[ab]

[a-]

[-b]

Page 8: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

8

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

A T C A G A G T C

0 -2 -4 -6 -8 -10 -12 -14 -16 -18

T -2 0 2 0 0 0 0 0 2 0

T -4 0 2 0 0 0 0 0 2 0

C -6 0 0 2 0 0 0 0 0 2

A -8 2 0 0 2 0 2 0 0 0

G -10 0 0 0 0 2 0 2 0 0

T -12 0 2 0 0 0 0 0 2 0

C -14 0 0 2 0 0 0 0 0 2

Sequence alignmentNeedleman-Wunsch global alignment

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

[ab]

[a-]

[-b]

Directionality of score calculationInitialization

Page 9: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

9

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

A T C A G A G T C

0 -2 -4 -6 -8 -10 -12 -14 -16 -18

T -2 0 0 -2 -4 -6 -8 -10 -12 -14

T -4 -2 2 0 -2 -4 -6 -8 -8 -10

C -6 0 0 2 0 0 0 0 0 2

A -8 2 0 0 2 0 2 0 0 0

G -10 0 0 0 0 2 0 2 0 0

T -12 0 2 0 0 0 0 0 2 0

C -14 0 0 2 0 0 0 0 0 2

Sequence alignmentNeedleman-Wunsch global alignment

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Page 10: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

10

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

A T C A G A G T C

0 -2 -4 -6 -8 -10 -12 -14 -16 -18

T -2 0 0 -2 -4 -6 -8 -10 -12 -14

T -4 -2 2 0 -2 -4 -6 -8 -8 -10

C -6 -4 0 2 0 0 0 0 0 2

A -8 2 0 0 2 0 2 0 0 0

G -10 0 0 0 0 2 0 2 0 0

T -12 0 2 0 0 0 0 0 2 0

C -14 0 0 2 0 0 0 0 0 2

Sequence alignmentNeedleman-Wunsch global alignment

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Page 11: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

11

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

A T C A G A G T C

0 -2 -4 -6 -8 -10 -12 -14 -16 -18

T -2 0 0 -2 -4 -6 -8 -10 -12 -14

T -4 -2 2 0 -2 -4 -6 -8 -8 -10

C -6 -4 0 2 0 0 0 0 0 2

A -8 2 0 0 2 0 2 0 0 0

G -10 0 0 0 0 2 0 2 0 0

T -12 0 2 0 0 0 0 0 2 0

C -14 0 0 2 0 0 0 0 0 2

Sequence alignmentNeedleman-Wunsch global alignment

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Page 12: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

12

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

A T C A G A G T C

0 -2 -4 -6 -8 -10 -12 -14 -16 -18

T -2 0 0 -2 -4 -6 -8 -10 -12 -14

T -4 -2 2 0 -2 -4 -6 -8 -8 -10

C -6 -4 0 4 0 0 0 0 0 2

A -8 2 0 0 2 0 2 0 0 0

G -10 0 0 0 0 2 0 2 0 0

T -12 0 2 0 0 0 0 0 2 0

C -14 0 0 2 0 0 0 0 0 2

Sequence alignmentNeedleman-Wunsch global alignment

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

Page 13: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

13

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

σ[ab] : score of aligning a pair of residues a and b

σ[a-] : score of aligning residue a with a gap (gap penalty: -q)

S : score matrix

S(i,j) : optimal score of aligning residues positions 1 to i on one sequence

with residues positions 1 to j on another sequence

Sequence alignmentNeedleman-Wunsch algorithm

Page 14: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

14

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Sequence alignmentNeedleman-Wunsch algorithm

S(0,0) ⇐ 0for j ⇐ 1 to N do

S(0,j) ⇐ S(0,j-1) + σ[-bj]

for i ⇐ 1 toM do

{ S(i,0) ⇐ S(i-1,0) + σ[ai-]

for j ⇐ 1 to N do

S(i,j) ⇐ max (S(i-1, j-1) + σ[aibj],

S(i-1, j) + σ[ai- ],

S(i, j-1) + σ[-bj ])

} Pearson & MillerMeth Enz 210:575, ‘92

Page 15: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

15

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Sequence alignmentNeedleman-Wunsch global alignment

Optimal score/s is found - more steps needed to find the corresponding alignment/s.This is a time-saving property in database searches and other applications.

Only a single pass through the alignment matrix is needed.

Page 16: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

16

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

A T C A G A G T C

0 -2 -4 -6 -8 -10 -12 -14 -16 -18

T -2 0 0 -2 -4 -6 -8 -10 -12 -14

T -4 -2 2 0 -2 -4 -6 -8 -8 -10

C -6 -4 0 4 2 0 -2 -4 -6 -6

A -8 -4 -2 2 6 4 2 0 -2 -4

G -10 -6 -4 0 4 8 6 4 2 0

T -12 -8 -4 -2 2 6 8 6 6 4

C -14 -10 -6 -2 0 4 6 8 6 8

Needleman-Wunsch global alignment: The TRACEBACK

A C G TA 2 0 0 0C 0 2 0 0G 0 0 2 0T 0 0 0 2

Gap penalty -2

ATCAGAGTC||--||||

TTC--AGTC

Score: 2 x 6 – 2x2 = 8

ATCAGAGTC||||--||TTCAG--TC

Score:2 x 6 – 2x2 = 8

Page 17: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

17

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Sequence alignmentNeedleman-Wunsch global alignment

Algorithm calculates score/s of optimal global sequence alignments,penalizes end gaps andpenalizes each residue in a gap is equally.

ATCAGAGTC has lower score then CAGAGTC --TTCAGTC TTCAGTC

ATCACAGTC has same score as ATCACAGTC T-C--AGTC T---CAGTC

ATCACAGTC has lower score then ACACAGTC T---CAGTC T--CAGTC

Page 18: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

18

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Sequence alignmentNeedleman-Wunsch global alignment

In order to score a gap penalty q independent of the gap length, i.e

ACACAGTC ATCACAGTC AGCTTTCACAGTC all have theT--CAGTC T---CAGTC T-------CAGTC same score

the algorithm we presented is modified to extend alignments in more then the three ways we considered.

Page 19: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

19

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

[ab]

[a-]

[-b]

A T C A G A G T C

T 0 2 0 0 0 0 0 2 0

T 0 2 0 0 0 0 0 2 0

C 0 0 2 0 0 0 0 0 2

A 2 0 0 2 0 2 0 0 0

G 0 0 0 0 2 0 2 0 0

T 0 2 0 0 0 0 0 2 0

C 0 0 2 0 0 0 0 0 2

Sequence alignmentNeedleman-Wunsch global alignment

[ab]

[a-]

[-b]

Page 20: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

20

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Sequence alignmentNeedleman-Wunsch algorithm

S(0,0) ⇐ 0for j ⇐ 1 to N do

S(0,j) ⇐ -q

for i ⇐ 1 toM do

{ S(i,0) ⇐ -q

for j ⇐ 1 to N do

S(i,j) ⇐ max (S(i-1, j-1) + σ[aibj],

max {S(0, j)...S(i-1, j)} -q,max {S(i, 0)...S(i, j-1)} -q)

} Pearson & MillerMeth Enz 210:575, ‘92

Page 21: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

21

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Sequence alignmentNeedleman-Wunsch global alignment

caveatsEvery algorithm is limited by the model it is built upon.

For example, the NW dynamic programming algorithm guaranteesus optimal global alignments with the parameters we supply (substitution matrix, gap penalty and gap scoring).

However -• Different parameters can give different alignments, • The correct alignment might not be the optimal one.• The correct alignment might correspond only to part of the global

alignments,

Page 22: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

22

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Source: Pearson WR & Miller W"Dynamic programming algorithms for biological sequence comparison." Methods in Enzymology , 210:575-601 (1992).

Assignment: Calculate NW alignments with constant gap penalty seeing the effect of different gap penalties and match/mismatch scores. In all cases use substitution matrices that have two types of scores only a value for an exact match and a lower value for mismatches. Try the nucleotide sequences used in class and the following amino acid sequences: “ACDGSMF” & “AMDFR”.

More details, sources and things to do for next class

Page 23: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

23

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Local sequence alignments are necessary for cases of:

• Modular organization of genes and proteins (exons, domains, etc.)

• Repeats• Sequences diverged so that similarity was retained,

or can be detected, just in some sub-regions

Local sequence alignments

Page 24: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

24

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003Modular organization of genes

gene A gene B gene C

gene Y gene Zgene Xgene W

Page 25: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

25

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003Modular protein

organization Adapted from Henikoff et alScience 278:609, ‘97

IG domain

IG domain

Kringle domain

Protein-kinasedomain

TLK receptor tyrosine-kinase

IG domain

IG domain

IG domain

IG domainEGF domain

EGF domainEGF domainFN3 domain

FN3 domainFN3 domain

TEK receptor tyrosine-kinase

Page 26: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

26

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003Modular protein organization

1KAP secreted calcium-binding alkaline-protease

Calcium-binding repeats

Protease domain

Page 27: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

27

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Local sequence alignment

Page 28: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

28

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Local sequence alignment

For local sequence alignment we wish to find what regions(sub-sequences) in the compared pair of sequences will give the best alignment scores with the parameters we supply (substitution matrix, gap penalty and gap scoring model.

The aligned regions may be anywhere along the sequences. More then one region might be aligned with a score above the threshold.

Page 29: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

29

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

S(0,0) ⇐ 0for j ⇐ 1 to N do

S(1,j) ⇐ -q

for i ⇐ 1 toM do

{ S(i,1) ⇐ -q

for j ⇐ 1 to N do

S(i,j) ⇐ max (S(i-1, j-1) + σ[aibj],

max {S(0, j)...S(i-1, j)} -q,max {S(i, 0)...S(i, j-1)} -q)

}

Sequence alignmentNeedleman-Wunsch algorithm

[ab]

[a-]

[-b]

Page 30: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

30

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

σ[ab] : score of aligning a pair of residues a and b

-q : gap penalty

S’(i,j) : optimal score of an alignment ending at residues i,j

best : highest score in the scores-matrix (S)

Local sequence alignmentSmith-Waterman algorithm

Page 31: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

31

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Pearson & MillerMeth Enz 210:575, ‘92

Local sequence alignmentSmith-Waterman algorithmbest ⇐ 0

for j ⇐ 1 to N doS’(0,j) ⇐ 0

for i ⇐ 1 to M do

{ S’(i,0) ⇐ 0

for j ⇐ 1 to N do

S’(i,j) ⇐ max (S’(i-1, j-1) + σ[aibj],

max {S’(0, j)...S(i-1, j)} -q,max {S’(i, 0)...S(i, j-1)} -q,

0)best ⇐ max (S’(i, j) , best)

}

Page 32: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

32

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

A T C A G A G T C

0 0 0 0 0 0 0 0 0 0

G 0 0 0 0 0 1 0 1 0 0

T 0 0 1 0 0 0 0 0 2 0

C 0 0 0 2 0 0 0 0 0 3

A 0 1 0 0 3 1 1 1 1 1

G 0 0 0 0 1 4 2 2 2 2

T 0 0 1 0 1 2 3 1 3 1

C 0 0 0 2 1 2 1 2 1 4

A 0 1 0 0 3 2 3 1 1 2

A C G TA 1 -1 -1 -1C -1 1 -1 -1G -1 -1 1 -1T -1 -1 -1 1

Gap penalty -2

TCAGAGTCTCAG--TC++++^̂ ++ : 1+1+1+1-2+1+1=4

The optimal local alignment is:

Local sequence alignment Smith-Waterman algorithmFinding the optimal alignment

AG A

Page 33: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

33

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

A T C A G A G T C

0 0 0 0 0 0 0 0 0 0

G 0 0 0 0 0 1 0 1 0 0

T 0 0 1 0 0 0 0 0 2 0

C 0 0 0 2 0 0 0 0 0 3

A 0 1 0 0 3 1 1 1 1 1

G 0 0 0 0 1 4 2 2 2 2

T 0 0 1 0 1 2 3 1 3 1

C 0 0 0 2 1 2 1 2 1 4

A 0 1 0 0 3 2 3 1 1 2

A C G TA 1 -1 -1 -1C -1 1 -1 -1G -1 -1 1 -1T -1 -1 -1 1

Gap penalty -2

Score threshold 3

Local sequence alignment Smith-Waterman algorithmFinding the optimal alignment

Page 34: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

34

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

A T C A G A G T C

0 0 0 0 0 0 0 0 0 0

G 0 -1 -1 -1 -1 1 -1 1 -1 -1

T 0 -1 0 -1 -1 -1 -1 -1 1 -1

C 0 -1 -1 0 -1 -1 -1 -1 -1 1

A 0 1 -1 -1 0 -1 1 -1 -1 -1

G 0 -1 -1 -1 -1 0 -1 1 -1 -1

T 0 -1 1 -1 -1 -1 -1 -1 0 -1

C 0 -1 -1 1 -1 -1 -1 -1 -1 0

A 0 1 -1 -1 1 -1 1 -1 -1 -1

A C G TA 1 -1 -1 -1C -1 1 -1 -1G -1 -1 1 -1T -1 -1 -1 1

Gap penalty -2

Remove scores of the current optimal alignment and then recalculate the matrix to find the next best alignment /s

ATCAGAGTCGTCAG--TCA

Local sequence alignment Smith-Waterman algorithmFinding the optimal alignment

Page 35: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

35

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

A GAGTCGTCAG

A T C A G A G T C

0 0 0 0 0 0 0 0 0 0

G 0 0 0 0 0 1 0 1 0 0

T 0 0 0 0 0 0 0 0 2 0

C 0 0 0 0 0 0 0 0 0 3

A 0 1 0 0 0 0 1 0 0 0

G 0 0 0 0 0 0 0 2 0 0

T 0 0 1 0 0 0 0 0 0 0

C 0 0 0 2 0 0 0 0 0 0

A 0 1 0 0 3 1 1 1 1 1

A C G TA 1 -1 -1 -1C -1 1 -1 -1G -1 -1 1 -1T -1 -1 -1 1

Gap penalty -2

Local sequence alignmentSmith-Waterman algorithm

Finding the sub-optimal alignment

Score threshold 3

TCATCA+++ : 1+1+1 =3

Page 36: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

36

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Local sequence alignmentSmith-Waterman algorithm

In order for the algorithm to identify local alignments the score for aligning unrelated sequence segments should typically be negative. Otherwise true optimal local alignments will be extended beyond their correct ends or have lower scores then longer alignments betweenunrelated regions.

Alignment scores are determined by substitution matrix and by the gap penalties and gap scoring model.

Page 37: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

37

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Alignment scoring schemes: gap models

Gap scoring by a constant relation to the gap length:σ ⇐ -q g (g is the number ATCACA σ ⇐ -3q

of gapped residues) T---CA

Gap scoring by a constant relation to the gap length:σ ⇐ -q ATCACA σ ⇐ -q

T---CA

Affine gap scoring (opening [d] and extending gap penalties [e]):σ ⇐ -(d + e (g-1)) ATCACA σ ⇐ -(d + 2e)

T---CA

Page 38: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

38

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Local sequence alignmentSmith-Waterman algorithm

If alignment scores of unrelated sequences are mainly or solely determined by the substitution scores then such alignments wouldhave negative scores if the sum of expected substitution scores would be negative:

Σi,j pi pj sij < 0 i & j - residues,pi - frequency of residue i

sij - score of aligning residues i and j

Page 39: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

39

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Local sequence alignmentSmith-Waterman algorithm

We can easily identify substitution matrices that will not give positive scores to random alignments. However, we have no analytical way for finding which gap scores will satisfy the demand for random alignment scores to be less or equal to zero and produce local sequence alignments.

Nevertheless, certain sets of scoring schemes (substitution matrix and gap scores) were found to give satisfactory local alignments.

Page 40: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

40

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Sources: Pearson & Miller "Dynamic programming algorithms for biological sequence comparison." Methods in Enz. , 210:575-601 (1992),

Altschul “Amino acid substitution matrices from an information theoretic perspective” J Mol Biol 219:555-565 (1991),

Henikoff “Scores for sequence searches and alignments” CurrOpin Struct Biol 6:353-360 (1996).

Assignment: Read the source articles for this lecture. They have more details on the material we covered and introduce topics for next lectures.Calculate S’ for the sequences presented in class, using the unitary matrix (1 for match, -1 for mismatch), and the constant gap penalty model with q=-1, -2 or -4.

More details, sources and things to do for next lecture

Page 41: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

41

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

More details, sources and things to do for next lecture

For those who are not acquainted with information theory or want to be certain they know the basics of it:An information theory primer for molecular biologists-http://www.lecb.ncifcrf.gov/~toms/paper/primer

Page 42: Introduction to sequence alignment • The Needleman-Wunsch ... · • The Needleman-Wunsch algorithm for global sequence alignment: description and properties •Local alignment

42

Advanced Topics in BioinformaticsWeizmann Institute of Science, spring 2003

Next lecture, 12/12/2001:

• Substitution Matrices: amino-acids features and empirical matrices

• BLAST and FASTA: algorithms and statistics; assumptions and associated artifacts