Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

29
Biocomputatio n: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke

Transcript of Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Page 1: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Biocomputation: Comparative Genomics

Tanya TalkarLolly KruseColleen O’Rourke

Page 2: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

DNA

Page 3: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Page 4: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

JunkDNA

ConservedDNA

Page 5: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

What is Biocomputation?

Statistics

Computer Science

Molecular Biology

Page 6: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Four Main Parts Biomolecular computation Biological Computation Computational Biology Bioinformatics

Page 7: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Bioinformatics:

Biology

Computer Science

Information Technology

Page 8: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Sequence Analysis Very Functional! Compare DNA between Species Small Fragments Return full sequence

Page 9: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Computational Genomics Needleman – Wunsch

Not used much More Mapped Genomes =

Computational Genomics!

Page 10: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Alignment

Page 11: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Global Alignment:Needleman - Wunsch O(N3) Fewest edit operations Similar strings

Page 12: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Local AlignmentSmith - Waterman O(N2) Dissimilar strings Find high similarity regions

Page 13: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Comparison

Page 14: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

S1 P Q R A X A B C S T V Q

S2 X Y A X B A C S L T

A X A B C S

A X B A C S

Page 15: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

S1 A X A B _ C S

S2 A X _ B A C S

Score 2 2 -1 2 -1 2 2

Page 16: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Advantages:Global Alignment

Page 17: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Advantages:Local Alignment

Page 18: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

BLAST• Basic Local Alignment Search Tool• FASTA

Page 19: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Improvements Increased Speed Locate initial alignment hot spots Statistical significance

Page 20: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Terminology Segment Pairs Locally maximal segment pairs Maximal segment pairs

Page 21: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

How it works Query sentence, P Database

Must have score over C! Multiple segment pairs combined

A B C D E F G

A G C B F D E

B E D G A F B

G F B E D C A

Page 22: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

How it works Extends each hit Done efficiently Truncates Doesn’t find all pairs

Page 23: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Proteins Fixed length, W Words above threshold Each hit extended

Page 24: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

DNA Word List Exact matches NOT dynamic programming

Page 25: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Scoring Blosum62 Matrix Match (+2), Mismatch (-3),

Gaps penalized

Page 26: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Substitution Matrix Represents Scoring Functions

Page 27: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Multiple Sequence Alignment

Page 28: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Methods of MSA Progressive Alignment Construction Iterative Methods Hidden Markov Models Genetic Algorithms and Simulated

Annealing

Page 29: Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.

Comparative Genomics Compare Species

Find Evolutionary Significances! Low Level High Level

Importance of Non Coding DNA