Pairwise and Multiple Sequence Alignment Lesson 2.

Pairwise and Pairwise and Multiple Multiple

Sequence Sequence AlignmentAlignment

Lesson 2Lesson 2

|| || ||||| ||| || || |||||||||||||||||||MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFE…

ATGGTGAACCTGACCTCTGACGAGAAGACTGCCGTCCTTGCCCTGTGGAACAAGGTGGACGTGGAAGACTGTGGTGGTGAGGCCCTGGGCAGGTTTGTATGGAGGTTACAAGGCTGCTTAAGGAGGGAGGATGGAAGCTGGGCATGTGGAGACAGACCACCTCCTGGATTTATGACAGGAACTGATTGCTGTCTCCTGTGCTGCTTTCACCCCTCAGGCTGCTGGTCGTGTATCCCTGGACCCAGAGGTTCTTTGAAAGCTTTGGGGACTTGTCCACTCCTGCTGCTGTGTTCGCAAATGCTAAGGTAAAAGCCCATGGCAAGAAGGTGCTAACTTCCTTTGGTGAAGGTATGAATCACCTGGACAACCTCAAGGGCACCTTTGCTAAACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAATTTCAAGGTGAGTCAATATTCTTCTTCTTCCTTCTTTCTATGGTCAAGCTCATGTCATGGGAAAAGGACATAAGAGTCAGTTTCCAGTTCTCAATAGAAAAAAAAATTCTGTTTGCATCACTGTGGACTCCTTGGGACCATTCATTTCTTTCACCTGCTTTGCTTATAGTTATTGTTTCCTCTTTTTCCTTTTTCTCTTCTTCTTCATAAGTTTTTCTCTCTGTATTTTTTTAACACAATCTTTTAATTTTGTGCCTTTAAATTATTTTTAAGCTTTCTTCTTTTAATTACTACTCGTTTCCTTTCATTTCTATACTTTCTATCTAATCTTCTCCTTTCAAGAGAAGGAGTGGTTCACTACTACTTTGCTTGGGTGTAAAGAATAACAGCAATAGCTTAAATTCTGGCATAATGTGAATAGGGAGGACAATTTCTCATATAAGTTGAGGCTGATATTGGAGGATTTGCATTAGTAGTAGAGGTTACATCCAGTTACCGTCTTGCTCATAATTTGTGGGCACAACACAGGGCATATCTTGGAACAAGGCTAGAATATTCTGAATGCAAACTGGGGACCTGTGTTAACTATGTTCATGCCTGTTGTCTCTTCCTCTTCAGCTCCTGGGCAATATGCTGGTGGTTGTGCTGGCTCGCCACTTTGGCAAGGAATTCGACTGGCACATGCACGCTTGTTTTCAGAAGGTGGTGGCTGGTGTGGCTAATGCCCTGGCTCACAAGTACCATTGA

MVNLTSDEKTAVLALWNKVDVEDCGGEALGRLLVVYPWTQRFFE…

MotivationMotivation

What is sequence alignmentWhat is sequence alignment??

Alignment: Alignment: Comparing two (pairwise) or Comparing two (pairwise) or more (multiple) sequences. Searching for more (multiple) sequences. Searching for a series of identical or similar characters in a series of identical or similar characters in the sequences.the sequences.

MVNLTSDEKTAVLALWNKVDVEDCGGE|| || ||||| ||| || || ||MVHLTPEEKTAVNALWGKVNVDAVGGE

Why perform a pairwise sequence Why perform a pairwise sequence alignment?alignment?

e.g., pe.g., predicting characteristics of a protein – redicting characteristics of a protein –

premised on:premised on:

similar sequence (or structure)similar sequence (or structure)

similar functionsimilar function

Finding homology between two sequences

Local vs. GlobalLocal vs. Global

Local alignmentLocal alignment – finds regions of high – finds regions of high similarity in similarity in partsparts of the sequences of the sequences

Global alignmentGlobal alignment – finds the best alignment – finds the best alignment across the across the entireentire two sequences two sequences

ADLGAVFALCDRYFQ|||| |||| |ADLGRTQN-CDRYYQ

ADLGAVFALCDRYFQ|||| |||| |ADLGRTQN CDRYYQ

Three types of nucleotide changes:Three types of nucleotide changes:1.1. SubstitutionSubstitution – a replacement of one (or more) – a replacement of one (or more)

sequence characters by another:sequence characters by another:

2.2. InsertionInsertion - an insertion of one (or more) - an insertion of one (or more) sequence characters:sequence characters:

3.3. DeletionDeletion – a deletion of one (or more) sequence – a deletion of one (or more) sequence characters:characters:

Evolutionary changes in sequencesEvolutionary changes in sequences

InsertionInsertion + + DeletionDeletion IndelIndel

AAAAGGAA AAAACCAA

AAGAAG

GAGAAAAA

Choosing an alignment: Choosing an alignment:

Many Many differentdifferent alignments between two alignments between two sequences are possible:sequences are possible:

AAGCTGAATTCGAAAGGCTCATTTCTGA

A-AGCTGAATTC--GAAAG-GCTCA-TTTCTGA-

How do we determine which is the best alignment?

AAGCTGAATT-C-GAAAGGCT-CATTTCTGA-

Toy exerciseToy exercise

Match: Match: +1+1 Mismatch: Mismatch: -2-2 Indel: Indel: -1-1

AAGCTGAATT-C-GAAAGGCT-CATTTCTGA-

A-AGCTGAATTC--GAAAG-GCTCA-TTTCTGA-

Compute the scores of each of the following alignments using this naïve scoring scheme

Scoring scheme:11--22--22--22

--2211--22--22

--22--2211--22

--22--22--2211

A C G T

Substitution matrix

Gap penalty (opening = extending)

Substitution matrices: accounting Substitution matrices: accounting for biological contextfor biological context

Which best reflects the biological reality regarding nucleotide mismatch penalty?

1. Tr > Tv > 0

2. Tv > Tr > 0

3. 0 > Tr > Tv

4. 0 > Tv > Tr

Tr = Transition

Tv = Transversion

Scoring schemes: accounting for Scoring schemes: accounting for biological contextbiological context

Which best reflects the biological reality regarding these mismatch penalties?

1. Arg->Lys > Ala->Phe

2. Arg->Lys > Thr->Asp

3. Asp->Val > Asp->Glu

PAM matricesPAM matrices Family of matrices PAM 80, PAM 120, PAM 250, …Family of matrices PAM 80, PAM 120, PAM 250, …

The number with a PAM matrix (the The number with a PAM matrix (the nn in PAM in PAMnn) ) represents the evolutionary distance between the represents the evolutionary distance between the sequences on which the matrix is basedsequences on which the matrix is based

The (The (iithth,,jjthth)) cell in a PAMcell in a PAMnn matrix denotes the probability matrix denotes the probability that amino-acid that amino-acid ii will be replaced by amino-acid will be replaced by amino-acid j j in in time time nn:: P Pii→→j,nj,n

Greater Greater nn numbers denote greater distances numbers denote greater distances

PAM - limitationsPAM - limitations

Based on only one original datasetBased on only one original dataset

Examines proteins with few differences Examines proteins with few differences (85% identity)(85% identity)

Based mainly on small globular proteins Based mainly on small globular proteins so the matrix is biased so the matrix is biased

BLOSUM matricesBLOSUM matrices Different BLOSUMDifferent BLOSUMnn matrices are calculated matrices are calculated

independently from BLOCKS (ungapped, manually independently from BLOCKS (ungapped, manually created local alignments)created local alignments)

BLOSUMBLOSUMnn is based on a cluster of BLOCKS of is based on a cluster of BLOCKS of sequences that share at least sequences that share at least nn percent identity percent identity

The (The (iithth,,jjthth)) cell in a BLOSUM matrix denotes the log of cell in a BLOSUM matrix denotes the log of odds of the observed frequency and expected frequency odds of the observed frequency and expected frequency of amino acids of amino acids ii and and j j in the same position in the data: in the same position in the data: log(log(PPijij//qqii**qqjj))

Higher Higher nn numbers denote higher identity between the numbers denote higher identity between the sequences on which the matrix is basedsequences on which the matrix is based

PAM Vs. BLOSUMPAM Vs. BLOSUM PAM100 = BLOSUM90 PAM120 = BLOSUM80 PAM160 = BLOSUM60 PAM200 = BLOSUM52 PAM250 = BLOSUM45

More distant sequences

BLOSUM62 for general useBLOSUM62 for general useBLOSUM80 for close relationsBLOSUM80 for close relationsBLOSUM45 for distant relationsBLOSUM45 for distant relations

PAM120 for general usePAM120 for general usePAM60 for close relations PAM60 for close relations PAM250 for distant relationsPAM250 for distant relations

Substitution matrices exerciseSubstitution matrices exercise

Pick the best substitution matrix (PAM and Pick the best substitution matrix (PAM and BLOSUM) for each pairwise alignment:BLOSUM) for each pairwise alignment:

Human – chimpHuman – chimp Human - yeastHuman - yeast Human – fishHuman – fish

PAM options: PAM60 PAM120 PAM250

BLOSUM options: BLOSUM45 BLOSUM62 BLOSUM80

Substitution matrices Substitution matrices

Nucleic acids:Nucleic acids: Transition-transversionTransition-transversion

Amino acids:Amino acids: Evolutionary (empirical data) based: (PAM, Evolutionary (empirical data) based: (PAM,

BLOSUM)BLOSUM) Physico-chemical properties based Physico-chemical properties based

(Grantham, McLachlan)(Grantham, McLachlan)

Gap penaltyGap penalty

AAGCGAAATTCGAACA-G-GAA-CTCGAAC

AAGCGAAATTCGAACAGG---AACTCGAAC

• Which alignment has a higher score?

• Which alignment is more likely?

Pairwise alignment algorithm matrix Pairwise alignment algorithm matrix representation: representation: formulationformulation

V[i,j] = value of the optimal alignment between S1[1…i] and S2[1…j]

V[i,j] + S(S1[i+1],S2[j+1])

V[i+1,j+1] = max V[i+1,j] + S(gap)

V[i,j+1] + S(gap)

V[i,j]V[i,j]V[i+1,j]V[i+1,j]

V[i,j+1]V[i,j+1]V[i+1,j+1]V[i+1,j+1]

2 sequences: S1 and S2 and a Scoring scheme: match = 1, mismatch = -1, gap = -2

Pairwise alignment algorithm matrix Pairwise alignment algorithm matrix representation: representation: initializationinitialization

0 0 -2 -4 -6

A 1 -2

A 2 -4

A 3 -6

C 4 -8

Match = 1Mismatch = -1Indel (gap) = -2

Scoring scheme:

Pairwise alignment algorithm matrix Pairwise alignment algorithm matrix representation: representation: filling the matrixfilling the matrix

Match = 1Mismatch = -1Indel (gap) = -2

Scoring scheme:

0 0 -2 -4 -6

A 1 -2 1 -1 -3

A 2 -4 -1 0 -2

A 3 -6 -3 -2 -1

C 4 -8 -5 -4 -1

Pairwise alignment algorithm matrix Pairwise alignment algorithm matrix representation: representation: trace backtrace back

0 0 -2 -4 -6

A 1 -2 1 -1 -3

A 2 -4 -1 0 -2

A 3 -6 -3 -2 -1

C 4 -8 -5 -4 -1

Pairwise alignment algorithm matrix Pairwise alignment algorithm matrix representation: trace backrepresentation: trace back

0 0 -2 -4 -6

A 1 -2 1 -1 -3

A 2 -4 -1 0 -2

A 3 -6 -3 -2 -1

C 4 -8 -5 -4 -1

Assessing the significance of an Assessing the significance of an alignment scorealignment score

AAGCTGAATTC-GAAAGGCTCATTTCTGA-

AAGCTGAATTCGAAAGGCTCATTTCTGA

AGATCAGTAGACTAGAGTAGCTATCTCT

AGATCAGTAGACTA---------GAGTAG-CTATCTCT

CGATAGATAGCATAGCATGTCATGATTC

CGATAGATAGCATA------------------GCATGTCATGATTC

Random

Web servers for pairwise alignmentWeb servers for pairwise alignment

BLAST 2 sequences (bl2Seq) at BLAST 2 sequences (bl2Seq) at NCBI NCBI

Produces the Produces the locallocal alignment of two given alignment of two given sequences using sequences using BLASTBLAST (Basic Local (Basic Local Alignment Search Tool)Alignment Search Tool) engine for local engine for local alignmentalignment

Does not use an exact algorithm but a Does not use an exact algorithm but a heuristicheuristic

Back to NCBIBack to NCBI

BLAST – bl2seqBLAST – bl2seq

Bl2Seq - queryBl2Seq - query

blastnblastn – – nucleotide nucleotide blastpblastp – protein – protein

Bl2seq resultsBl2seq results

MatchMatch DissimilarityDissimilarity SimilaritySimilarity GapsGaps Low Low

complexitycomplexity

Query type: AA or DNAQuery type: AA or DNA??

For coding sequences, AA (protein) data For coding sequences, AA (protein) data are betterare better Selection operates most strongly at the protein Selection operates most strongly at the protein

level level →→ the homology is more evident the homology is more evident AA – 20 char’ alphabetAA – 20 char’ alphabet DNA - 4 char’ alphabetDNA - 4 char’ alphabet

lower chance of random homology for AAlower chance of random homology for AA

BLAST – programsBLAST – programs

Query: DNA Protein

Database: DNA Protein

BLAST – BlastpBLAST – Blastp

Blastp - resultsBlastp - results

Blastp – results (cont’)Blastp – results (cont’)

Blast scoresBlast scores:: Bits scoreBits score – A score for the alignment according – A score for the alignment according

to the number of similarities, identities, etc. It has to the number of similarities, identities, etc. It has a standard set of units and is thus independent a standard set of units and is thus independent of the scoring schemeof the scoring scheme

Expected-score (E-value)Expected-score (E-value) –The number of –The number of alignments with the same or higher score one alignments with the same or higher score one can “expect” to see by chance when searching a can “expect” to see by chance when searching a random database with a random sequence of random database with a random sequence of particular sizes. The closer the e-value is to particular sizes. The closer the e-value is to zero, the greater the confidence that the hit is zero, the greater the confidence that the hit is really a homologreally a homolog

Multiple Multiple Sequence Sequence

Alignment (MSA)Alignment (MSA)

Seq1 VTISCTGSSSNIGAG-NHVKWYQQLPGSeq2 VTISCTGTSSNIGS--ITVNWYQQLPGSeq3 LRLSCSSSGFIFSS--YAMYWVRQAPGSeq4 LSLTCTVSGTSFDD--YYSTWVRQPPGSeq5 PEVTCVVVDVSHEDPQVKFNWYVDG--Seq6 ATLVCLISDFYPGA--VTVAWKADS--Seq7 AALGCLVKDYFPEP--VTVSWNSG---Seq8 VSLTCLVKGFYPSD--IAVEWWSNG--

Similar to pairwise alignment BUT n sequences are aligned instead of just 2

Multiple sequence alignment

Each row represents an individual sequenceEach column represents the ‘same’ position

Why perform an MSAWhy perform an MSA??

MSAs are at the heart of comparative genomics studies which seek to study evolutionary histories, functional and structural aspects of sequences, and to understand phenotypic differences between species

Seq1 VTISCTGSSSNIGAG-NHVKWYQQLPGSeq2 VTISCTGTSSNIGS--ITVNWYQQLPGSeq3 LRLSCSSSGFIFSS--YAMYWVRQAPGSeq4 LSLTCTVSGTSFDD--YYSTWVRQPPGSeq5 PEVTCVVVDVSHEDPQVKFNWYVDG--Seq6 ATLVCLISDFYPGA--VTVAWKADS--Seq7 AALGCLVKDYFPEP--VTVSWNSG---Seq8 VSLTCLVKGFYPSD--IAVEWWSNG--

Multiple sequence alignment

variable conserved

Alignment methodsAlignment methods

There is no available optimal solution for There is no available optimal solution for MSA – all methods are MSA – all methods are heuristics:heuristics:

Progressive/hierarchical alignment Progressive/hierarchical alignment (ClustalX)(ClustalX)

Iterative alignment (MAFFT, MUSCLE)Iterative alignment (MAFFT, MUSCLE)

Compute the pairwise Compute the pairwise alignments for all against alignments for all against

all (10 pairwise alignments).all (10 pairwise alignments).The similarities are The similarities are

converted to distances and converted to distances and stored in a tablestored in a table

First step :compute pairwise distances

Progressive alignmentProgressive alignment

AABBCCDDEE

CC15151717

DD161614141010

EE3232313131313232

Cluster the sequences to create a Cluster the sequences to create a tree (tree (guide treeguide tree):):

• represents the order in which pairs of represents the order in which pairs of sequences are to be alignedsequences are to be aligned• similar sequences are neighbors in the similar sequences are neighbors in the tree tree • distant sequences are distant from distant sequences are distant from each other in the treeeach other in the tree

Second step:build a guide tree

AABBCCDDEE

CC15151717

DD161614141010

EE3232313131313232The guide tree is imprecise The guide tree is imprecise and is NOT the tree which and is NOT the tree which truly describes the truly describes the evolutionary relationship evolutionary relationship between the sequences!between the sequences!

Third step: align sequences in a bottom up order

1. Align the most similar (neighboring) pairs

2. Align pairs of pairs

3. Align sequences clustered to pairs of pairs deeper in the tree

Sequence A

Sequence B

Sequence C

Sequence D

Sequence E

Main disadvantages of progressive Main disadvantages of progressive alignmentsalignments

Sequence A

Sequence B

Sequence C

Sequence D

Sequence E

Guide-tree topology may be considerably wrong

Globally aligning pairs of sequences may create errors that will propagate through to the final result

Iterative alignmentIterative alignment

Guide tree

Pairwise distance table

Iterate until the MSA does not change (convergence)

Blastp – acquiring sequencesBlastp – acquiring sequences

blastp – acquiring sequencesblastp – acquiring sequences

MSA input: multiple sequence Fasta fileMSA input: multiple sequence Fasta file>gi|4504351|ref|NP_000510.1| delta globin [Homo sapiens]MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFESFGDLSSPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFSQLSELHCDKLHVDPENFRLLGNVLVCVLARNFGKEFTPQMQAAYQKVVAGVANALAHKYH

>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

>gi|4885393|ref|NP_005321.1| epsilon globin [Homo sapiens]MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPKVKAHGKKVLTSFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENFKLLGNVMVIILATHFGKEFTPEVQAAWQKLVSAVAIALAHKYH

>gi|6715607|ref|NP_000175.1| G-gamma globin [Homo sapiens]MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPKVKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFGKEFTPEVQASWQKMVTGVASALSSRYH

>gi|28302131|ref|NP_000550.2| A-gamma globin [Homo sapiens]MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPKVKAHGKKVLTSLGDATKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFGKEFTPEVQASWQKMVTAVASALSSRYH

>gi|4885397|ref|NP_005323.1| hemoglobin, zeta [Homo sapiens]MSLTKTERTIIVSMWAKISTQADTIGTETLERLFLSHPQTKTYFPHFDLHPGSAQLRAHGSKVVAAVGDAVKSIDDIGGALSKLSELHAYILRVDPVNFKLLSHCLLVTLAARFPADFTAEAHAAWDKFLSVVSSVLTEKYR

MSA using MSA using ClustalXClustalX

Step1: Load the sequencesStep1: Load the sequences

A little unclear…

Edit Fasta headersEdit Fasta headers……>gi|4504351|ref|NP_000510.1| delta globin [Homo sapiens]MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFESFGDLSSPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFSQLSELHCDKLHVDPENFRLLGNVLVCVLARNFGKEFTPQMQAAYQKVVAGVANALAHKYH

>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

>gi|4885393|ref|NP_005321.1| epsilon globin [Homo sapiens]MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPKVKAHGKKVLTSFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENFKLLGNVMVIILATHFGKEFTPEVQAAWQKLVSAVAIALAHKYH

>gi|6715607|ref|NP_000175.1| G-gamma globin [Homo sapiens]MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPKVKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFGKEFTPEVQASWQKMVTGVASALSSRYH

>gi|28302131|ref|NP_000550.2| A-gamma globin [Homo sapiens]MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPKVKAHGKKVLTSLGDATKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFGKEFTPEVQASWQKMVTAVASALSSRYH

>gi|4885397|ref|NP_005323.1| hemoglobin, zeta [Homo sapiens]MSLTKTERTIIVSMWAKISTQADTIGTETLERLFLSHPQTKTYFPHFDLHPGSAQLRAHGSKVVAAVGDAVKSIDDIGGALSKLSELHAYILRVDPVNFKLLSHCLLVTLAARFPADFTAEAHAAWDKFLSVVSSVLTEKYR

> delta globin

> beta globin

> epsilon globin

> G-gamma globin

> A-gamma globin

> hemoglobin zeta

Step2: Perform alignmentStep2: Perform alignment

MSA and conservation viewMSA and conservation view

Messing-up alignment of HIV-1 env

MSA toolsMSA tools

Progressive:Progressive: CLUSTALX/CLUSTALX/CLUSTALWCLUSTALW

Iterative:Iterative: MUSCLEMUSCLE, , MAFFTMAFFT, , PRANKPRANK

Pairwise and Multiple Sequence Alignment Lesson 2.

Documents

Transcript of Pairwise and Multiple Sequence Alignment Lesson 2.

Pairwise sequence alignment - uni-frankfurt.de · Pairwise sequence alignment. 2 Outline • Definitions • Reasons for comparing two sequences • Principles of dot plot comparisons

2.1 References€¦ · 2 Pairwise alignment Introduction to the pairwise sequence alignment problem Dot plots Scoring schemes The principle of dynamic programming Alignment algorithms

Developing Pairwise Sequence Alignment Algorithms

Pairwise sequence alignment with the Smith-Waterman … - Case study - Pairwise sequence alignment with...Pairwise sequence alignment with the Smith-Waterman algorithm Manel Fernández

Pairwise sequence alignments - BioinformaticsPairwise sequence alignment Concept of a sequence alignment •Pairwise Alignment: Explicit mapping between the residues of 2 sequences

HMMs for Pairwise Sequence Alignment based on Ch. 4 fromciortuz/SLIDES/pairHMM.pdf · HMMs for Pairwise Sequence Alignment based on Ch. 4 from Biological Sequence Analysis by R. Durbin

Pairwise Sequence Alignment Exercise 2

Pairwise Sequence Alignment Part 2

GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH

Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.

Sequence Analysis: Part I. Pairwise alignment and database ...barc.wi.mit.edu/education/bioinfo/lecture1-color.pdfBioinformatics for Biologists Sequence Analysis: Part I. Pairwise

Pairwise Sequence Alignment for Very Long Sequences on GPUssahni/papers/pairwise.pdf · alignment. Keywords-Long sequence alignment, local alignment, Smith-Waterman algorithm, CUDA,

Biology 4900 Biocomputing. Chapter 3 Pairwise Sequence Alignment.

Pairwise sequence alignment · 2014-05-26 · September 6, 2006. Pairwise sequence . alignment. Jonathan Pevsner, Ph.D. Introduction to Bioinformatics. Johns Hopkins. 260.602.01

Pairwise sequence alignment - Algorithms in Bioinformatics

Multiple Sequence Alignment - CBCB · • A multiple sequence alignment (MSA) implies a pairwise alignment between every pair of sequences. ... • Multiple sequence alignments (MSAs)

COMP 571 Bioinformatics: Sequence Analysis Pairwise Sequence Alignment (II)

The Pairwise Sequence Alignment Problem

Sequence alig Sequence Alignment Pairwise alignment:-

Pairwise sequence alignment. Pairwise alignment: protein sequences can be more informative than DNA protein is more informative (20 vs 4 characters);