Vorlesung Grundlagen der Bioinformatik .

96
Vorlesung Grundlagen der Bioinformatik http://gobics.de/lectures/ss07/grundlagen

Transcript of Vorlesung Grundlagen der Bioinformatik .

Page 1: Vorlesung Grundlagen der Bioinformatik .

Vorlesung

Grundlagen der Bioinformatik

http://gobics.de/lectures/ss07/grundlagen

Page 2: Vorlesung Grundlagen der Bioinformatik .

Information from a SingleSequenceAlone

Sequence alignment in molecular data analysis:

Page 3: Vorlesung Grundlagen der Bioinformatik .

Information from a SingleSequenceAlone

Multi-OrganismHigh QualitySequences

Sequence alignment in molecular data analysis:

(M. Brudno)

Page 4: Vorlesung Grundlagen der Bioinformatik .

Tools for multiple sequence alignment

seq1 T Y I M R E A Q Y E

seq2 T C I V M R E A Y E

seq3 Y I M Q E V Q Q E

seq4 Y I A M R E Q Y E

Page 5: Vorlesung Grundlagen der Bioinformatik .

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V M R E A - Y E

seq3 Y - I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Page 6: Vorlesung Grundlagen der Bioinformatik .

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V M R E A - Y E

seq3 Y - I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Page 7: Vorlesung Grundlagen der Bioinformatik .

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V M R E A - Y E

seq3 Y - I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Page 8: Vorlesung Grundlagen der Bioinformatik .

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V M R E A - Y E

seq3 Y - I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Page 9: Vorlesung Grundlagen der Bioinformatik .

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V M R E A - Y E

seq3 Y - I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Functionally important regions more conserved than non-functional regions

Page 10: Vorlesung Grundlagen der Bioinformatik .

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V M R E A - Y E

seq3 Y - I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Functionally important regions more conserved than non-functional regions

Local sequence conservation indicates functionality!

Page 11: Vorlesung Grundlagen der Bioinformatik .

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V M R E A - Y E

seq3 - Y I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Astronomical Number of possible alignments!

Page 12: Vorlesung Grundlagen der Bioinformatik .

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V - M R E A Y E

seq3 - Y I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Astronomical Number of possible alignments!

Page 13: Vorlesung Grundlagen der Bioinformatik .

Tools for multiple sequence alignment

seq1 T Y I - M R E A Q Y E

seq2 T C I V M R E A - Y E

seq3 - Y I - M Q E V Q Q E

seq4 Y – I A M R E - Q Y E

Which one is the best ???

Page 14: Vorlesung Grundlagen der Bioinformatik .

Tools for multiple sequence alignment

Questions in development of alignment programs:

(1) What is a good alignment?

→ objective function (`score’)

(2) How to find a good alignment?

→ optimization algorithm

First question far more important !

Page 15: Vorlesung Grundlagen der Bioinformatik .

Tools for multiple sequence alignment

Most important scoring scheme for multiple alignment:

Sum-of-pairs score for global alignment.

Page 16: Vorlesung Grundlagen der Bioinformatik .

Divide-and-Conquer Alignment (DCA)

J. Stoye, A. Dress (Bielefeld)

Approximate optimal global multiple alignment

Divide sequences into small sub-sequences Use MSA to calculate optimal alignment for sub-

sequences Concatenate sub-alignments

Page 17: Vorlesung Grundlagen der Bioinformatik .

Divide-and-Conquer Alignment (DCA)

Page 18: Vorlesung Grundlagen der Bioinformatik .

Divide-and-Conquer Alignment (DCA)

Page 19: Vorlesung Grundlagen der Bioinformatik .

Tools for multiple sequence alignment

Problems with traditional approach:

Results depend on gap penalty

Heuristic guide tree determines alignment; alignment used for phylogeny reconstruction

Algorithm produces global alignments.

Page 20: Vorlesung Grundlagen der Bioinformatik .

First step in sequence comparison: alignment

global alignment (Needleman and Wunsch, 1970; Clustal W)

atctaatagttaatactcgtccaagtat atctgtattactaaacaactggtgctacta

Page 21: Vorlesung Grundlagen der Bioinformatik .

First step in sequence comparison: alignment

global alignment (Needleman and Wunsch, 1970; Clustal W)

atc--taatagttaat--actcgtccaagtat||| || || | || ||| || | | ||atctgtattact-aaacaactggtgctacta-

Page 22: Vorlesung Grundlagen der Bioinformatik .

First step in sequence comparison: alignment

global alignment (Needleman and Wunsch, 1970; Clustal W)

atc--taatagttaat--actcgtccaagtat||| || || | || ||| || | | ||atctgtattact-aaacaactggtgctacta-

local alignment (Smith and Waterman, 1983)

atctaatagttaatactcgtccaagtat gcgtgtattactaaacggttcaatctaacat

Page 23: Vorlesung Grundlagen der Bioinformatik .

First step in sequence comparison: alignment

global alignment (Needleman and Wunsch, 1970; Clustal W)

atc--taatagttaat--actcgtccaagtat||| || || | || ||| || | | ||atctgtattact-aaacaactggtgctacta-

local alignment (Smith and Waterman, 1983)

atctaatagttaatactcgtccaagtat gcgtgtattactaaacggttcaatctaacat

Page 24: Vorlesung Grundlagen der Bioinformatik .

First step in sequence comparison: alignment

global alignment (Needleman and Wunsch, 1970; Clustal W)

atc--taatagttaat--actcgtccaagtat||| || || | || ||| || | | ||atctgtattact-aaacaactggtgctacta-

local alignment (Smith and Waterman, 1983)

atc--taatagttaatactcgtccaagtat || || | || gcgtgtattact-aaacggttcaatctaacat

Page 25: Vorlesung Grundlagen der Bioinformatik .

New question: sequence families with multiple local similarities

Neither local nor global methods appliccable

Page 26: Vorlesung Grundlagen der Bioinformatik .

New question: sequence families with multiple local similarities

Alignment possible if order conserved

Page 27: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

Morgenstern, Dress, Werner (1996),PNAS 93, 12098-12103

Combination of global and local methods

Assemble multiple alignment from gap-free local pair-wise alignments (,,fragments“)

Page 28: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 29: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 30: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 31: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 32: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 33: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 34: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 35: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Page 36: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 37: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 38: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Consistency!

Page 39: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atc------TAATAGTTAaactccccCGTGC-TTag

cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg

caaa--GAGTATCAcc----------CCTGaaTTGAATaa

Page 40: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

Score of an alignment:

Define score of fragment f:

l(f) = length of fs(f) = sum of matches (similarity values)

P(f) = probability to find a fragment with length l(f) and at least s(f) matches in random sequences that have the same length as the input sequences.

Score w(f) = -ln P(f)

Page 41: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

Score of an alignment:

Define score of fragment f:

Define score of alignment as

sum of scores of involved fragments

No gap penalty!

Page 42: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

Score of an alignment:

Goal in fragment-based alignment approach: find

Consistent collection of fragments with maximum sum of weight scores

Page 43: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaaccccctcgtgcttagagatccaaaccagtgcgtgtattactaacggttcaatcgcgcacatccgc

Pair-wise alignment:

Page 44: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaaccccctcgtgcttagagatccaaaccagtgcgtgtattactaacggttcaatcgcgcacatccgc

Pair-wise alignment:

recursive algorithm finds optimal chain of

fragments.

Page 45: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

------atctaatagttaaaccccctcgtgcttag-------agatccaaaccagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc--

Pair-wise alignment:

recursive algorithm finds optimal chain of

fragments.

Page 46: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

------atctaatagttaaaccccctcgtgcttag-------agatccaaaccagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc--

Optimal pairwise alignment: chain of fragments with maximum sum of weights found by dynamic programming:

Standard fragment-chaining algorithm

Space-efficient algorithm

Page 47: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

Multiple alignment:

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 48: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

Multiple alignment:

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaccctgaattgaagagtatcacataa

(1) Calculate all optimal pair-wise alignments

Page 49: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

Multiple alignment:

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

(1) Calculate all optimal pair-wise alignments

Page 50: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

Multiple alignment:

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

(1) Calculate all optimal pair-wise alignments

Page 51: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

Fragments from optimal pair-wise alignments might be inconsistent

Page 52: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 53: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 54: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 55: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Page 56: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Page 57: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 58: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

Fragments from optimal pair-wise alignments might be inconsistent

1. Sort fragments according to scores

2. Include them one-by-one into growing multiple alignment – as long as they are consistent

(greedy algorithm, comparable to rucksack problem)

Page 59: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 60: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 61: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 62: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 63: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Consistency problem

Page 64: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Consistency problem

Page 65: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

Page 66: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

Page 67: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atc------taatagt taaactcccccgtgcttag

Cagtgcgtgtattact aacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

Page 68: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atc------taata-----gttaaactcccccgtgcttag

Cagtgcgtgtatta-----ctaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

Page 69: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

site x = [i,p] (sequence i, position p)

Page 70: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

Calculate upper bound bl(x,i) and lower bound bu(x,i) for each x and sequence i

Page 71: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

bl(x,i) and bu(x,i) updated for each new fragment in alignment

Page 72: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

Consistency bounds are to be updated for each new fragment that is included in to the growing Alignment

Efficient algorithm

(Abdeddaim and Morgenstern, 2002)

Page 73: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

Advantages of segment-based approach:

Program can produce global and local alignments!

Sequence families alignable that cannot be aligned with standard methods

Page 74: Vorlesung Grundlagen der Bioinformatik .

Program input

Program usage:

> dialign2-2 [options] <input_file>

<input_file> = multi-sequence file in FASTA-format

Page 75: Vorlesung Grundlagen der Bioinformatik .

Program output

DIALIGN 2.2.1 ************* Program code written by Burkhard Morgenstern and Said Abdeddaim e-mail contact: [email protected] Published research assisted by DIALIGN 2 should cite: Burkhard Morgenstern (1999). DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211 - 218.

For more information, please visit the DIALIGN home page at

http://bibiserv.techfak.uni-bielefeld.de/dialign/

program call: ./dialign2-2 -nt -anc s

Aligned sequences: length: ================== ======= 1) dog_il4 300 2) bla 200 3) blu 200

Average seq. length: 233.3

Please note that only upper-case letters are considered to be aligned.

Page 76: Vorlesung Grundlagen der Bioinformatik .

Program output

Alignment (DIALIGN format): =========================== dog_il4 1 cagg------ ----GTTTGA atctgataca ttgc------ ---------- bla 1 ctga------ ---------- ---------- --------GC CAAGTGGGAA blu 1 ttttgatatg agaaGTGTGA aacaagctat cctatattGC TAAGTGGCAG 0000000000 0000000000 0000000000 0000000011 1111111111 dog_il4 25 ---------- --ATGGCACT GGGGTGAATG AGGCAGGCAG CAGAATGATC bla 17 ggtgtgaata catgggtttc cagtaccttc tgaggtccag agtacc---- blu 51 ccctggcttt ctATGTGCAC AGAATGGGAG GAAAGTGCCT GCTAGTGAGC 0000000000 0000000000 0000000000 0000000000 0000000000 dog_il4 63 GTACTGCAGC CCTGAGCTTC CACTGGCCCA TGTTGGTATC CTTGTATTTT bla 63 ---------- ---------- ---TTTCCCA TGTGCTCCAT GGTGGAATGG blu 101 CAGGGACTCA GAGAGAATGG AGTATAGGGG TCAGGGCat- ---------- 0000000000 0000000000 0009999999 9999999888 8888888888 dog_il4 113 TCCGCCCCTT CCCAGCACca gcattatcct ---GGGATTG GAGAAGGGGG bla 90 ACCACTCCTT CTCAGCACaa caaagcccaa gaaGGTGTTG CGTTCTAGAC blu 140 ---------- ---------- ---------- ---GGGGTGG CCTTAGGCTC 8888888888 8888888800 0000000000 0007777777 7777777777

Page 77: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 78: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 79: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 80: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 81: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 82: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 83: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 84: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Page 85: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 86: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaac----------ggttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 87: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 88: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atc------TAATAGTTAaactccccCGTGC-TTag------

cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg

caaa--GAGTATCAcc----------CCTGaaTTGAATaa--

Page 89: Vorlesung Grundlagen der Bioinformatik .

The DIALIGN approach

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 90: Vorlesung Grundlagen der Bioinformatik .

Alignment of large genomic sequences

Fragment-based alignment approach useful for alignment of genomic sequences.

Possible applications: Detection of regulatory elements Identification of pathogenic microorganisms Gene prediction

Page 91: Vorlesung Grundlagen der Bioinformatik .

DIALIGN alignment of human and murine genomic sequences

Page 92: Vorlesung Grundlagen der Bioinformatik .

DIALIGN alignment of tomato and Thaliana genomic sequences

Page 93: Vorlesung Grundlagen der Bioinformatik .

Alignment of large genomic sequences

Gene-regulatory sites identified by mulitple sequence alignment (phylogenetic footprinting)

Page 94: Vorlesung Grundlagen der Bioinformatik .

Alignment of large genomic sequences

Page 95: Vorlesung Grundlagen der Bioinformatik .

Performance of long-range alignment programs for exon discovery (human - mouse comparison)

Page 96: Vorlesung Grundlagen der Bioinformatik .

Performance of long-range alignment programs for exon discovery (thaliana - tomato comparison)