The Basic Local Alignment Search Tool (BLAST)

143
The Basic Local Alignment Search Tool (BLAST) Rapid data base search tool (1990) Idea: (1) Search for high scoring segment pairs

description

The Basic Local Alignment Search Tool (BLAST). Rapid data base search tool (1990) Idea: (1) Search for high scoring segment pairs. The Basic Local Alignment Search Tool (BLAST). A Y W T Y I V A L T – Q V R Q Y E A T S I L C I V M I Y S R A - Q Y R Y W R Y - PowerPoint PPT Presentation

Transcript of The Basic Local Alignment Search Tool (BLAST)

Page 1: The  Basic Local Alignment Search Tool (BLAST)

The Basic Local Alignment Search Tool(BLAST)

Rapid data base search tool (1990)

Idea:

(1) Search for high scoring segment pairs

Page 2: The  Basic Local Alignment Search Tool (BLAST)

The Basic Local Alignment Search Tool(BLAST)

A Y W T Y I V A L T – Q V R Q Y E A T

S I L C I V M I Y S R A - Q Y R Y W R Y

Most local alignments contain highly conserved sections without gaps

Page 3: The  Basic Local Alignment Search Tool (BLAST)

The Basic Local Alignment Search Tool(BLAST)

A Y W T Y I V A L T – Q V R Q Y E A T

S I L C I V M I Y S R A - Q Y R Y W R Y

-> search for high scoring segment pairs

(HSP), i.e. gap-free local alignments

Page 4: The  Basic Local Alignment Search Tool (BLAST)

The Basic Local Alignment Search Tool(BLAST)

Page 5: The  Basic Local Alignment Search Tool (BLAST)

The Basic Local Alignment Search Tool(BLAST)

A Y W T Y I V A L T – Q V R Q Y E A T

S I L C I V M I Y S R A - Q Y R Y W R Y

Advantages: (a) speed

(b) statistical theory about HSP exists.

Page 6: The  Basic Local Alignment Search Tool (BLAST)

The Basic Local Alignment Search Tool(BLAST)

Rapid data base search tool (1990)

Idea:

(1) Search for high scoring segment pairs

(2) Use word pairs as seeds

Page 7: The  Basic Local Alignment Search Tool (BLAST)

Pair-wise sequence alignment

T W L M H C A Q Y I C I M X H X C X T H Y

(1) Search word pairs of length 3 with score > T,Use them as seeds.

Page 8: The  Basic Local Alignment Search Tool (BLAST)

Pair-wise sequence alignment

Naïve algorithm would have a complexity of O(l1 * l2)

Solution: Preprocess query sequence:

Compile a list of all words that have a

Score > T when aligned to a word in the

Query.

Page 9: The  Basic Local Alignment Search Tool (BLAST)

Pair-wise sequence alignment

Naïve algorithm would have a complexity of O(l1 * l2)

Solution: Preprocess query sequence:

Compile a list of all words that have a

Score > T when aligned to a word in the

Query. Complexity: O(l1)

Organize words in efficient data structure (tree) for fast look-up

Page 10: The  Basic Local Alignment Search Tool (BLAST)

The Basic Local Alignment Search Tool(BLAST)

Rapid data base search tool (1990)

Idea:(1) Search for high scoring segment pairs (2) Use word pairs as seeds(3) Extend seed alignments until score drops

below threshold value

Page 11: The  Basic Local Alignment Search Tool (BLAST)

Pair-wise sequence alignment

T W L M H C A Q Y I C I M X H X C X T H Y

Extend seeds until score drops by X.

Page 12: The  Basic Local Alignment Search Tool (BLAST)

Pair-wise sequence alignment

T W L M H C A Q Y I C I X M X H X C X T X H X Y

Extend seeds until score drops by X.

Page 13: The  Basic Local Alignment Search Tool (BLAST)

Pair-wise sequence alignment

Algorithm not guaranteed to find best

segment pair

(Heuristic)

But works well in practice!

Page 14: The  Basic Local Alignment Search Tool (BLAST)

The Basic Local Alignment Search Tool(BLAST)

New BLAST version (1997)

Two-hit strategy

Page 15: The  Basic Local Alignment Search Tool (BLAST)

Pair-wise sequence alignment

W L M H C A Q Y A R V I M X H X C X T H W A X R X v X

Search two word pairs of at the same diagonal, use lower threshold T

Page 16: The  Basic Local Alignment Search Tool (BLAST)

The Basic Local Alignment Search Tool(BLAST)

New BLAST version (1997)

Two-hit strategy Gapped BLAST Position-Specific Iterative BLAST

(PSI BLAST)

Page 17: The  Basic Local Alignment Search Tool (BLAST)

The Basic Local Alignment Search Tool(BLAST)

Page 18: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

1aboA 1 .NLFVALYDfvasgdntlsitkGEKLRVLgynhn..............gE 1ycsB 1 kGVIYALWDyepqnddelpmkeGDCMTIIhrede............deiE 1pht 1 gYQYRALYDykkereedidlhlGDILTVNkgslvalgfsdgqearpeeiG 1ihvA 1 .NFRVYYRDsrd......pvwkGPAKLLWkg.................eG 1vie 1 .drvrkksga.........awqGQIVGWYctnlt.............peG

1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......

Page 19: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

First question: how to score multiple alignments?

Possible scoring scheme:

Sum-of-pairs score

Page 20: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Multiple alignment implies pairwise alignments:

1aboA 36 WCEAQt..kngqGWVPSNYITPVN......

1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......

1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp

1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....

1vie 28 YAVESeahpgsvQIYPVAALERIN......

Page 21: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Multiple alignment implies pairwise alignments:

1aboA 36 WCEAQt..kngqGWVPSNYITPVN......

1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......

1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp

1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....

1vie 28 YAVESeahpgsvQIYPVAALERIN......

Page 22: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Multiple alignment implies pairwise alignments:

1aboA 36 WCEAQt..kngqGWVPSNYITPVN......

1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......

Page 23: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Multiple alignment implies pairwise alignments:

1aboA 36 WCEAQtkngqGWVPSNYITPVN

1ycsB 39 WWWARlndkeGYVPRNLLGLYP

Page 24: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Multiple alignment implies pairwise alignments:

1aboA 36 WCEAQt..kngqGWVPSNYITPVN......

1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......

1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp

1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....

1vie 28 YAVESeahpgsvQIYPVAALERIN......

Page 25: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Multiple alignment implies pairwise alignments:

1aboA 36 WCEAQt..kngqGWVPSNYITPVN......

1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......

1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp

1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....

1vie 28 YAVESeahpgsvQIYPVAALERIN......

Page 26: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Multiple alignment implies pairwise alignments:

1aboA 36 WCEAQt..kngqGWVPSNYITPVN......

1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp

Page 27: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Multiple alignment implies pairwise alignments:

1aboA 36 WCEAQt..kngqGWVPSNYITPVN......

1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp

Page 28: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Multiple alignment implies pairwise alignments:

1aboA 36 WCEAQt..kngqGWVPSNYITPVN......

1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......

1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp

1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....

1vie 28 YAVESeahpgsvQIYPVAALERIN......

Page 29: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Multiple alignment implies pairwise alignments:

Use sum of scores of these p.a.

1aboA 36 WCEAQt..kngqGWVPSNYITPVN......

1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......

1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp

1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....

1vie 28 YAVESeahpgsvQIYPVAALERIN......

Page 30: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Goal:

Find multi-alignment with maximum score !

Page 31: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Needleman-Wunsch coring scheme can be generalized from pair-wise to multiple alignment

Multidimensional search space instead of two-dimensional matrix!

Page 32: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Page 33: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Complexity:

For sequences of length l1 * l2 * l3

O( l1 * l2 * l3 )

For n sequences ( average length l ):

O( ln )

Exponential complexity!

Page 34: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Needleman-Wunsch coring scheme can be generalized from pair-wise to multiple alignment

Optimal solution not feasible:

Page 35: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Needleman-Wunsch coring scheme can be generalized from pair-wise to multiple alignment

Optimal solution not feasible:

-> Heuristics necessary

Page 36: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

(A) Carillo and Lipman (MSA)

Find sub-space in dynamic-programming

Matrix where optimal path can be found

Page 37: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

(B) Stoye, Dress (DCA)

Divide search space into small Calculate optimal alignment for sub-spaces Concatenate sub-alignments

Page 38: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

(B) Stoye, Dress (DCA)

Page 39: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

(B) Stoye, Dress (DCA)

Page 40: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Progressive alignment.

Carry out a series of pair-wise alignment

Page 41: The  Basic Local Alignment Search Tool (BLAST)

Most popular way of constructing multiple alignments:

Progressive alignment.

Carry out a series of pair-wise alignment

Multiple sequence alignment

Page 42: The  Basic Local Alignment Search Tool (BLAST)

WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP

AVVIQDNSDIKVVPKAKIIRD

YAVESEAHPGSFQPVAALERIN

WLNYNETTGERGDFPGTYVEYIGRKKISP

Multiple sequence alignment

Page 43: The  Basic Local Alignment Search Tool (BLAST)

WCEAQTKNGQGWVPSNYITPVN

WWRLNDKEGYVPRNLLGLYP

AVVIQDNSDIKVVPKAKIIRD

YAVESEAHPGSFQPVAALERIN

WLNYNETTGERGDFPGTYVEYIGRKKISP

Align most similar sequences

Multiple sequence alignment

Page 44: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

WCEAQTKNGQGWVPSNYITPVN

WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD

YAVESEASFQPVAALERIN

WLNYNEERGDFPGTYVEYIGRKKISP

Page 45: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

WCEAQTKNGQGWVPSNYITPVN

WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD

YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP

Page 46: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

WCEAQTKNGQGWVPSNYITPVN

WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD

YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP

Align sequence to alignment

Page 47: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

WCEAQTKNGQGWVPSNYITPVN- WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD

YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP

Align alignment to alignment

Page 48: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

WCEAQTKNGQGWVPSNYITPVN-------- WW--RLNDKEGYVPRNLLGLYP-------- AVVIQDNSDIKVVP--KAKIIRD------- YAVESEA---SVQ--PVAALERIN------ WLN-YNE---ERGDFPGTYVEYIGRKKISP

Page 49: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

WCEAQTKNGQGWVPSNYITPVN-------- WW--RLNDKEGYVPRNLLGLYP-------- AVVIQDNSDIKVVP--KAKIIRD------- YAVESEA---SVQ--PVAALERIN------ WLN-YNE---ERGDFPGTYVEYIGRKKISP

Rule: “once a gap - always a gap”

Page 50: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Order of pair-wise profile alignments determined

by phylogenetic tree based on pair-wise similarity

values (guide tree)

Page 51: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

WCEAQTKNGQGWVPSNYITPVN

WWRLNDKEGYVPRNLLGLYP

AVVIQDNSDIKVVPKAKIIRD

YAVESEAHPGSFQPVAALERIN

WLNYNETTGERGDFPGTYVEYIGRKKISP

Page 52: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

WCEAQTKNGQGWVPSNYITPVN

WWRLNDKEGYVPRNLLGLYP

AVVIQDNSDIKVVPKAKIIRD

YAVESEAHPGSFQPVAALERIN

WLNYNETTGERGDFPGTYVEYIGRKKISP

Page 53: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Problem: simple guide tree determines multiple alignment; multiple alignment determines phyolgeneitc analysis

Page 54: The  Basic Local Alignment Search Tool (BLAST)

Multiple sequence alignment

Implementations:

Clustal W, PileUp, MultAlin

Page 55: The  Basic Local Alignment Search Tool (BLAST)

Local multiple alignment

M

M

Page 56: The  Basic Local Alignment Search Tool (BLAST)

Local multiple alignment

M

M

M

Page 57: The  Basic Local Alignment Search Tool (BLAST)

Local multiple alignment

M

M

M

Page 58: The  Basic Local Alignment Search Tool (BLAST)

Local multiple alignment

Find motifs contained in all sequences in data set

Problem:

motifs often present in only sub-families

Page 59: The  Basic Local Alignment Search Tool (BLAST)

Neither local nor global methods appliccable

Page 60: The  Basic Local Alignment Search Tool (BLAST)

Alignment possible if order conserved

Page 61: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Page 62: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Combination of local and global methods.

Page 63: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Combination of local and global methods.

Find local pair-wise similarities between input sequences (fragments)

Page 64: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Combination of local and global methods.

Find local pair-wise similarities between input sequences (fragments)

Compose alignments from fragments

Page 65: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Combination of local and global methods.

Find local pair-wise similarities between input sequences (fragments)

Compose alignments from fragments

Ignore non-related parts of the sequences

Page 66: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

atctaatagttaaactcccccgtgcttagagatccaaaccagtgcgtgtattactaacggttcaatcgcgcacatccgc

Page 67: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

atctaatagttaaactcccccgtgcttagagatccaaaccagtgcgtgtattactaacggttcaatcgcgcacatccgc

Page 68: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

atctaatagttaaactcccccgtgcttagagatccaaaccagtgcgtgtattactaacggttcaatcgcgcacatccgc

Page 69: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

atctaatagttaaactcccccgtgcttagagatccaaaccagtgcgtgtattactaacggttcaatcgcgcacatccgc

Page 70: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

atctaatagttaaactcccccgtgcttagagatccaaaccagtgcgtgtattactaacggttcaatcgcgcacatccgc

------atctaatagttaaaccccctcgtgcttag-------agatccaaaccagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc--

Page 71: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

atctaatagttaaactcccccgtgcttagagatccaaaccagtgcgtgtattactaacggttcaatcgcgcacatccgc

------atctaatagttaaaccccctcgtgcttag-------agatccaaaccagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc--

------atcTAATAGTTAaaccccctcgtGCTTag-------AGATCCaaaccagtgcgtgTATTACTAAc----------GGTTcaatcgcgcACATCCgc--

Page 72: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Score of an alignment:

Define score of fragment f:

l(f) = length of fs(f) = sum of matches (similarity values)

P(f) = probability to find a fragment with length l(f) and at least s(f) matches in random sequences that have the same length as the input sequences.

Score w(f) = -ln P(f)

Page 73: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Score of an alignment:

Define score of alignment as sum of scores w(f) of its fragments

No gap penalty is used!

Optimization problem for pair-wise alignment:

Find chain of fragments with maximal total score

Page 74: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

------atctaatagttaaaccccctcgtgcttag-------agatccaaaccagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc--

Fragment-chaining algorithm finds optimal chain of

fragments.

Page 75: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Multiple fragment alignment

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 76: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Multiple fragment alignment

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 77: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Multiple fragment alignment

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 78: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Multiple fragment alignment

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 79: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Multiple fragment alignment

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 80: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Multiple fragment alignment

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 81: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Multiple fragment alignment

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 82: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Multiple fragment alignment

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Page 83: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Multiple fragment alignment

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaac----------ggttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 84: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Multiple fragment alignment

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 85: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Multiple fragment alignment

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Consistency: it is possible to introduce gaps such that all segment pairs are aligned.

Page 86: The  Basic Local Alignment Search Tool (BLAST)

The DIALIGN approach

Multiple fragment alignment

atc------TAATAGTTAaactccccCGTGC-TTag

cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg

caaa--GAGTATCAcc----------CCTGaaTTGAATaa

Page 87: The  Basic Local Alignment Search Tool (BLAST)

Program evaluation

Use biologically verified alignments

(known 3D structure of proteins)

Compare alignments produced by

computer programs to “biologically correct”

alignments.

Page 88: The  Basic Local Alignment Search Tool (BLAST)

Program evaluation

(1) First evaluation of multiple alignment programs (McClure, Vasi, Fitch,1994)

4 protein families used:

Globin, kinase, protease, ribonuclease H,

all globally related -> global programs

performed best

Page 89: The  Basic Local Alignment Search Tool (BLAST)

Program evaluation

(2) The BAliBASE (Thompson et al., 1999)

~ 100 protein families with known 3D structure,

some with large insertions/deletions.

Page 90: The  Basic Local Alignment Search Tool (BLAST)

Program evaluation

1aboA 1 .NLFVALYDfvasgdntlsitkGEKLRVLgynhn..............gE 1ycsB 1 kGVIYALWDyepqnddelpmkeGDCMTIIhrede............deiE 1pht 1 gYQYRALYDykkereedidlhlGDILTVNkgslvalgfsdgqearpeeiG 1ihvA 1 .NFRVYYRDsrd......pvwkGPAKLLWkg.................eG 1vie 1 .drvrkksga.........awqGQIVGWYctnlt.............peG

1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......

Key

alpha helix RED beta strand GREEN core blocks UNDERSCORE

Page 91: The  Basic Local Alignment Search Tool (BLAST)

Program evaluation

Results:

Four programs performed best, but no method was best in all test examples.

ClustalW, SAGA and RPPR best for global alignment,DIALIGN best for sequences with large insertions ordeletions.

Page 92: The  Basic Local Alignment Search Tool (BLAST)

Program evaluation

(3) Lassmann and Sonnhammer (2002)

Used BAliBASE plus artificial sequencesfor local alignment

Results: T-COFFEE best for closely related sequences, DIALIGN best for distal sequences.

Page 93: The  Basic Local Alignment Search Tool (BLAST)

Program evaluation

Page 94: The  Basic Local Alignment Search Tool (BLAST)

Alignment of large genomic sequences

Important tool for identifying functional

sites (e.g. genes or regulatory elements)

Page 95: The  Basic Local Alignment Search Tool (BLAST)

Alignment of large genomic sequences

Phylogenetic Footprinting:

Functional sites more conserved during evolution

=> Sequence similarity indicates biological function

Page 96: The  Basic Local Alignment Search Tool (BLAST)

Alignment of large genomic sequences

DIALIGN performs well in identifying local homologies, but is slow

Page 97: The  Basic Local Alignment Search Tool (BLAST)

Quadratic program running time

Page 98: The  Basic Local Alignment Search Tool (BLAST)

Quadratic program running time

Page 99: The  Basic Local Alignment Search Tool (BLAST)

Quadratic program running time

Page 100: The  Basic Local Alignment Search Tool (BLAST)

Quadratic program running time

Page 101: The  Basic Local Alignment Search Tool (BLAST)

Quadratic program running time

Page 102: The  Basic Local Alignment Search Tool (BLAST)

Quadratic program running time

Page 103: The  Basic Local Alignment Search Tool (BLAST)

Quadratic program running time

Page 104: The  Basic Local Alignment Search Tool (BLAST)

Solution: Anchored alignments

Page 105: The  Basic Local Alignment Search Tool (BLAST)

Solution: Anchored alignments

Page 106: The  Basic Local Alignment Search Tool (BLAST)

Solution: Anchored alignments

Page 107: The  Basic Local Alignment Search Tool (BLAST)

Solution: Anchored alignments

Page 108: The  Basic Local Alignment Search Tool (BLAST)

Solution: Anchored alignments

Page 109: The  Basic Local Alignment Search Tool (BLAST)

Solution: Anchored alignments

Page 110: The  Basic Local Alignment Search Tool (BLAST)

Solution: Anchored alignments

Page 111: The  Basic Local Alignment Search Tool (BLAST)

Solution: Anchored alignments

Find anchor points to reduce search space

Page 112: The  Basic Local Alignment Search Tool (BLAST)

Solution: Anchored alignments

Use fast heuristic method to find anchor points:

CHAOS developed together with Mike Brudno

Brudno et al. (2003), BMC Bioinformatics 4:66

Page 113: The  Basic Local Alignment Search Tool (BLAST)

Solution: Anchored alignments

Page 114: The  Basic Local Alignment Search Tool (BLAST)

(3) Anchored alignments

Page 115: The  Basic Local Alignment Search Tool (BLAST)

(3) Anchored alignments

Page 116: The  Basic Local Alignment Search Tool (BLAST)

First step to gene prediction:

Exon discovery by genomic alignment

Page 117: The  Basic Local Alignment Search Tool (BLAST)

First step to gene prediction:

Exon discovery by genomic alignment

Evaluation of different alignment programs:

Compare local sequence similarity identified by alignment programs to known exons

Morgenstern et al. (2002), Bioinformatics 18:777-787

Page 118: The  Basic Local Alignment Search Tool (BLAST)

DIALIGN alignment of human and murine genomic sequences

Page 119: The  Basic Local Alignment Search Tool (BLAST)

DIALIGN alignment of tomato and Thaliana genomic sequences

Page 120: The  Basic Local Alignment Search Tool (BLAST)

Evaluation of DIALIGN, PipMaker, WABA, BLASTN and TBLASTX on a set of 42 human and murine genomic sequences.

Compare similarities to annotated exons

Apply cut-off parameter to resulting alignments

Measure sensitivity and specificity

Page 121: The  Basic Local Alignment Search Tool (BLAST)

Performance of long-range alignment programs for exon discovery (human - mouse comparison)

Page 122: The  Basic Local Alignment Search Tool (BLAST)

Performance of long-range alignment programs for exon discovery (thaliana - tomato comparison)

Page 123: The  Basic Local Alignment Search Tool (BLAST)

AGenDA:

Alignment-based Gene Detection Algorithm

Bridge small gaps between DIALIGN fragments

-> cluster of fragments

Search conserved splice sites and start/stop codons at cluster boundaries to Identify candidate exons

Recursive algorithm finds biologically consistent chain of potential exons

Page 124: The  Basic Local Alignment Search Tool (BLAST)

Identification of candidate exons

Fragments in DIALIGN alignment

Page 125: The  Basic Local Alignment Search Tool (BLAST)

Identification of candidate exons

Build cluster of fragments

Page 126: The  Basic Local Alignment Search Tool (BLAST)

Identification of candidate exons

Identify conserved splice sites

Page 127: The  Basic Local Alignment Search Tool (BLAST)

Identification of candidate exons

Candidate exons bounded by conserved splice sites

Page 128: The  Basic Local Alignment Search Tool (BLAST)

Construct gene models using candidate exons

Score of candidate exon (E) based on DIALIGN scores for fragments, score of splice junctions and penalty for shortening / extending

Find biologically consistent chain of candidate exons (starting with start codon, ending with stop codon, no internal stop codons …) with maximal total score

)()()(

),()()( SPscfw

Clen

ECdisClenEsc

i

i

Page 129: The  Basic Local Alignment Search Tool (BLAST)

Find optimal consistent chain of candidate exons

Page 130: The  Basic Local Alignment Search Tool (BLAST)

Find optimal consistent chain of candidate exons

Page 131: The  Basic Local Alignment Search Tool (BLAST)

Find optimal consistent chain of candidate exons

Page 132: The  Basic Local Alignment Search Tool (BLAST)

Find optimal consistent chain of candidate exons

Page 133: The  Basic Local Alignment Search Tool (BLAST)

Find optimal consistent chain of candidate exons

atg gt ag gt ag tga atg tga

Page 134: The  Basic Local Alignment Search Tool (BLAST)

Find optimal consistent chain of candidate exons

atg gt ag gt ag tga atg tga

G1 G2

Page 135: The  Basic Local Alignment Search Tool (BLAST)

Find optimal consistent chain of candidate exons

Recursive algorithm calculates optimal chain of candidate exons in N log N time

Page 136: The  Basic Local Alignment Search Tool (BLAST)

DIALIGN fragments

Page 137: The  Basic Local Alignment Search Tool (BLAST)

Candidate exons

Page 138: The  Basic Local Alignment Search Tool (BLAST)

Complete model

Page 139: The  Basic Local Alignment Search Tool (BLAST)

Results:105 pairs of genomic sequences from human and mouse (Batzoglou et al., 2000)

0%10%20%30%40%50%60%70%80%90%

100%

sensitivity specificity

AGenDAGenScan

Page 140: The  Basic Local Alignment Search Tool (BLAST)

Results:105 pairs of genomic sequences from human and mouse (Batzoglou et al., 2000)

AGenDA

GenScan

64 %

12 % 17 %

Page 141: The  Basic Local Alignment Search Tool (BLAST)

Results:

Quality of AGenDA-based gene models comparable to results from GenScan

Exons identified that have not been identified by GenScan

No statistical models derived from known genes (no training data necessary!)

Method generally appliccable

Page 142: The  Basic Local Alignment Search Tool (BLAST)

AGenDA:

Alignment-based Gene Detection Algorithm

WWW server:

http://bibiserv/TechFak.Uni-Bielefeld.DE/agenda

Rinner, Taher, Goel, Sczyrba, Brudno, Batzoglou, Morgenstern, submitted

Page 143: The  Basic Local Alignment Search Tool (BLAST)