Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona...

18
Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center [email protected]
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona...

Page 1: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Sequence Alignments

Chi-Cheng Lin, Ph.D.Associate Professor

Department of Computer ScienceWinona State University – Rochester Center

[email protected]

Page 2: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Intro to Bioinformatics – Sequence Alignment 2

Sequence Alignments Cornerstone of bioinformatics What is a sequence?

• Nucleotide sequence• Amino acid sequence

Pairwise and multiple sequence alignments What alignments can help

• Determine function of a newly discovered gene sequence

• Determine evolutionary relationships among genes, proteins, and species

• Predicting structure and function of protein

Acknowledgement: This notes is adapted from lecture notes of both Wright State University’s Bioinformatics Program.

Page 3: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Intro to Bioinformatics – Sequence Alignment 3

DNA Replication Prior to cell division, all the

genetic instructions must be “copied” so that each new cell will have a complete set

Page 4: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Intro to Bioinformatics – Sequence Alignment 4

Over time, genes accumulate mutations Environmental factors

• Radiation

• Oxidation Mistakes in replication or

repair

• Deletions, Duplications

• Insertions, Inversions

• Translocations

• Point mutations

Page 5: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Intro to Bioinformatics – Sequence Alignment 5

Codon deletion:ACG ATA GCG TAT GTA TAG CCG…• Effect depends on the protein, position, etc.• Almost always deleterious• Sometimes lethal

Frame shift mutation: ACG ATA GCG TAT GTA TAG CCG… ACG ATA GCG ATG TAT AGC CG?…• Almost always lethal

Deletions

Page 6: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Intro to Bioinformatics – Sequence Alignment 6

Indels Comparing two genes it is generally

impossible to tell if an indel is an insertion in one gene, or a deletion in another, unless ancestry is known:

ACGTCTGATACGCCGTATCGTCTATCTACGTCTGAT---CCGTATCGTCTATCT

Page 7: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Intro to Bioinformatics – Sequence Alignment 7

The Genetic Code

SubstitutionsSubstitutions are mutations accepted by natural selection.

Synonymous: CGC CGA

Non-synonymous: GAU GAA

Page 8: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Intro to Bioinformatics – Sequence Alignment 8

Point Mutation Example: Sickle-cell Disease

Wild-type hemoglobin

DNA

3’----CTT----5’

mRNA

5’----GAA----3’

Normal hemoglobin

------[Glu]------

Mutant hemoglobin

DNA

3’----CAT----5’

mRNA

5’----GUA----3’

Mutant hemoglobin

------[Val]------

Page 9: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Intro to Bioinformatics – Sequence Alignment 9image credit: U.S. Department of Energy Human Genome Program, http://www.ornl.gov/hgmis.

Page 10: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Intro to Bioinformatics – Sequence Alignment 10

Comparing Two Sequences Point mutations, easy:ACGTCTGATACGCCGTATAGTCTATCTACGTCTGATTCGCCCTATCGTCTATCT

Indels are difficult, must align sequences:ACGTCTGATACGCCGTATAGTCTATCTCTGATTCGCATCGTCTATCT

ACGTCTGATACGCCGTATAGTCTATCT----CTGATTCGC---ATCGTCTATCT

Page 11: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Intro to Bioinformatics – Sequence Alignment 11

Why Align Sequences? The draft human genome is available Automated gene finding is possible Gene: AGTACGTATCGTATAGCGTAA

• What does it do?What does it do?

One approach: Is there a similar gene in another species?• Align sequences with known genes• Find the gene with the “best” match

Page 12: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Intro to Bioinformatics – Sequence Alignment 12

Scoring a Sequence Alignment Match score: +1 Mismatch score: +0

Gap penalty: –1ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || ||||||||----CTGATTCGC---ATCGTCTATCT

Matches: 18 × (+1) Mismatches: 2 × 0 Gaps: 7 × (– 1)

Score = +11Score = +11

Page 13: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Intro to Bioinformatics – Sequence Alignment 13

How can we find an optimal alignment? Finding the alignment is computationally

hard:ACGTCTGATACGCCGTATAGTCTATCTCTGAT---TCG-CATCGTC--T-ATCT

There are ~888,000 possibilities to align the two sequences given above.

Algorithms using a technique called “dynamic programming” are used – out of the scope of this workshop.

Page 14: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Intro to Bioinformatics – Sequence Alignment 14

Global and Local alignments Global alignments – score the entire

alignment Local alignment – find the best matching

subsequence Why local sequence alignment?

• Subsequence comparison between a DNA sequence and a genome

• Protein function domains• Exons matching

Page 15: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Intro to Bioinformatics – Sequence Alignment 15

Example Compare the two sequences:TTGACACCCTCCCAATT ACCCCAGGCTTTACACAG

Global alignment (does it look good?)TTGACACCCTCC-CAATT || || || ACCCCAGGCTTTACACAG

Local alignment (does it look good?)---------TTGACACCCTCCCAATT || |||| ACCCCAGGCTTTACACAG--------

Page 16: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Intro to Bioinformatics – Sequence Alignment 16

Dot Plots One of the simplest and oldest methods for

sequence alignment Visualization of regions of similarity

• Assign one sequence on the horizontal axis• Assign the other on the vertical axis• Place dots on the space of matches• Diagonal lines means adjacent regions of

identity

Page 17: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Intro to Bioinformatics – Sequence Alignment 17

A Simple Example Construct a simple

dot plot for

TAGTCGATGTGGTCATC

The alignment is

TAGTCGATGTGGTC-ATC

T A G T C G A T G

T * * *

G * * *

G * * *

T * * *

C *

A * *

T * * *

C *

Page 18: Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Intro to Bioinformatics – Sequence Alignment 18

What else can it do (and how)? Gaps Inverse substring Repeat Palindrome Gene conservation and order study