What is computational biology?
description
Transcript of What is computational biology?
![Page 1: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/1.jpg)
1 Mona Singh
What is computational biology?
![Page 2: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/2.jpg)
2 Mona Singh
Genome
• The entire hereditary information content of an organism
![Page 3: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/3.jpg)
3 Mona Singh
DNA
• String over 4 letter alphabet A, T, G, C
• Organism’s genome is distributed over chromosomes (e.g., 46 chromosomes in human—22 pairs & XY)
• Genome size: number of base pairs in an organism
![Page 4: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/4.jpg)
4 Mona Singh
Genome SizesHuman 3 billion bps
Mouse 3 billion bps
Fruit fly 165 million bps
Nematode worm
97 million bps
Yeast 15 million bps
E coli 5 million bps
~ 400 genomes sequenced
![Page 5: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/5.jpg)
5 Mona Singh
How are genomes sequenced?
• Can only sequence a few hundred base pairs at a time
• Make many copies of the DNA and cut into smaller (overlapping) pieces
• Assemble pieces: certain substrings occur in multiple fragments
![Page 6: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/6.jpg)
6 Mona Singh
Genomes to Life
ATGCCTTACGTACCCTGCGGCAGCACT
?Genome
![Page 7: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/7.jpg)
7 Mona Singh
• Portions of DNA code for genes, which carry the information for making proteins
• Proteins play key roles in most biological processes (e.g., signaling, catalysis, immune response, etc.)
![Page 8: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/8.jpg)
8 Mona Singh
gucgcuaccauuaccaguuggucuggugucaaaaauaauaauaaccgggcaggccaugucugcccguauuucgcguaaggaaauccauuauguacuauuuaaaaaacacaaacuuuuggauguucgguuuauucuuuuucuuuuacuuuuuuaucaugggagccuacuucccguuuuucccgauuuggcuacaugacaucaaccauaucagcaaaagugauacggguauuauuuuugccgcuauuucucuguucucgcuauuauuccaaccgcuguuuggucugcuuucugacaaacucgggcugcgcaaauaccugcuguggauuauuaccggcauguuagugauguuugcgccguucuuuauuuuuaucuucgggccacuguuacaauacaacauuuuaguaggaucgauuguuggugguauuuaucuaggcuuuuguuuuaacgccggugcgccagcaguagaggcauuuauugagaaagucagccgucgcaguaauuucgaauuuggucgcgcgcggauguuuggcuguguuggcugggcgcugugugccucgauugucggcaucauguucaccaucaauaaucaguuuguuuucuggcugggcucuggcugugcacucauccucgccguuuuacucuuuuucgccaaaacggaugcgcccucuucugccacgguugccaaugcgguaggugccaaccauucggcauuuagccuuaagcuggcacuggaacuguucagacagccaaaacugugguuuuugucacuguauguuauuggcguuuccugcaccuacgauGuuuuugaccaacaguuugcuaauuucuuuacuucguucugucaggugaa...gcaaucaaugucggaugcggcgcgacgcu
Gene Finding
![Page 9: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/9.jpg)
9 Mona Singh
gucgcuaccauuaccaguuggucuggugucaaaaauaauaauaaccgggcaggccaugucugcccguauuucgcguaaggaaauccauuauguacuauuuaaaaaacacaaacuuuuggauguucgguuuauucuuuuucuuuuacuuuuuuaucaugggagccuacuucccguuuuucccgauuuggcuacaugacaucaaccauaucagcaaaagugauacggguauuauuuuugccgcuauuucucuguucucgcuauuauuccaaccgcuguuuggucugcuuucugacaaacucgggcugcgcaaauaccugcuguggauuauuaccggcauguuagugauguuugcgccguucuuuauuuuuaucuucgggccacuguuacaauacaacauuuuaguaggaucgauuguuggugguauuuaucuaggcuuuuguuuuaacgccggugcgccagcaguagaggcauuuauugagaaagucagccgucgcaguaauuucgaauuuggucgcgcgcggauguuuggcuguguuggcugggcgcugugugccucgauugucggcaucauguucaccaucaauaaucaguuuguuuucuggcugggcucuggcugugcacucauccucgccguuuuacucuuuuucgccaaaacggaugcgcccucuucugccacgguugccaaugcgguaggugccaaccauucggcauuuagccuuaagcuggcacuggaacuguucagacagccaaaacugugguuuuugucacuguauguuauuggcguuuccugcaccuacgauGuuuuugaccaacaguuugcuaauuucuuuacuucguucugucaggugaa...gcaaucaaugucggaugcggcgcgacgcu
MYYLKNTNFWMFGLFFFFYFFIMGAYFPFFPIWLHDINHISKSDTGIIFAAISLFSLLFQPLFGLLSDKLGLRKYLLWIITGMLVMFAPFFIFIFGPLLQYNILVGSIVGGIYLGFCFNAGAPAVEAFIEKVSRRSNFEFGRARMFGCVGWALCASIVGIMFTINNQFVFWLGSGCALILAVLLFFAKTDAPSSATVANAVGANHSAFSLKLALELFRQPKLWFLSLYVIGVSCTYDVFDQQFANFFTSFFATGEQGTRVFGYVTTMGELLNASIMFFAPLIINRIGGKNALLLAGTIMSVRIIGSSFATSALEVVILKTLHMFEVPFLLVGCFKYIT
Gene Finding
![Page 10: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/10.jpg)
10 Mona Singh
AUG = methionine/startUUA = LeucineUUG = Leucine
UAA = StopUAG = StopUGA = Stop...
The Genetic Code
Stryer, Biochemistry
![Page 11: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/11.jpg)
11 Mona Singh
Gene Findinggucgcuaccauuaccaguuggucuggugucaaaaauaauaauaaccgggcaggccaugucugcccguauuucgcguaaggaaauccauuauguacuauuuaaaaaacacaaacuuuuggauguucgguuuauucuuuuucuuuuacuuuuuuaucaugggagccuacuucccguuuuucccgauuuggcuacaugacaucaaccauaucagcaaaagugauacggguauuauuuuugccgcuauuucucuguucucgcuauuauuccaaccgcuguuuggucugcuuucugacaaacucgggcugcgcaaauaccugcuguggauuauuaccggcauguuagugauguuugcgccguucuuuauuuuuaucuucgggccacuguuacaauacaacauuuuaguaggaucgauuguuggugguauuuaucuaggcuuuuguuuuaacgccggugcgccagcaguagaggcauuuauugagaaagucagccgucgcaguaauuucgaauuuggucgcgcgcggauguuuggcuguguuggcugggcgcugugugccucgauugucggcaucauguucaccaucaauaaucaguuuguuuucuggcugggcucuggcugugcacucauccucgccguuuuacucuuuuucgccaaaacggaugcgcccucuucugccacgguugccaaugcgguaggugccaaccauucggcauuuagccuuaagcuggcacuggaacuguucagacagccaaaacugugguuuuugucacuguauguuauuggcguuuccugcaccuacgauguuuuugaccaacaguuugcuaauuucuuuacuucguucugucaggugaa...gcaaucaaugucggaugcggcgcgacgcu
![Page 12: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/12.jpg)
12 Mona Singh
Gene Findingaug ucu gcc cgu auu ucg cgu aag gaa auc cau uau gua cua uuu aaa ...
Met Ser Ala Arg Ile Ser Arg Lys Glu Ile His Tyr Val Leu Phe Lys ...
M S A R I S R K E I H Y V L F K ...
Reading offfrom 1st starttriplet
Translating(3 letter aminoacid code)
(1 letter code)
![Page 13: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/13.jpg)
13 Mona Singh
Gene Findingaug ucu gcc cgu auu ucg cgu aag gaa auc cau uau gua cua uuu aaa ...
Met Ser Ala Arg Ile Ser Arg Lys Glu Ile His Tyr Val Leu Phe Lys ...
M S A R I S R K E I H Y V L F K ...
Reading offfrom 1st starttriplet
Translating(3 letter aminoacid code)
(1 letter code)
M Y Y L K N T N F W M F G L F F ...Actual protein sequence
![Page 14: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/14.jpg)
14 Mona Singh
Computational Gene Finding Methods
• Statistical bias: protein coding regions “look different”
- compare coding vs. non-coding regions (Hidden Markov Models, Neural Nets)• Sequence similarity - similar to known protein?
![Page 15: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/15.jpg)
15 Mona Singh
Gene finding is hard
• In some genomes, only a small portion of genome codes for protein (needle in haystack)
• Some genes contain introns and exons – exons are the part that actually encode the protein part – and exons can be short
• Have to get the precise boundaries to get correct protein
![Page 16: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/16.jpg)
16 Mona Singh
Number of genesHuman ~30,000
Mouse ~30,000
Fruit fly ~13,500
Nematode worm
~19,000
Yeast ~6,000
E coli ~4,000
![Page 17: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/17.jpg)
17 Mona Singh
MYYLKNTNFWMFGLFFFFYFFIMGAYFPFFPIWLHDINHISKSDTGIIFAAISLFSLLFQPLFGLLSDKLGLRKYLLWIITGMLVMFAPFFIFIFGPLLQYNILVGSIVGGIYLGFCFNAGAPAVEAFIEKVSRRSNFEFGRARMFGCVGWALCASIVGIMFTINNQFVFWLGSGCALILAVLLFFAKTDAPSSATVANAVGANHSAFSLKLALELFRQPKLWFLSLYVIGVSCTYDVFDQQFANFFTSFFATGEQGTRVFGYVTTMGELLNASIMFFAPLIINRIGGKNALLLAGTIMSVRIIGSSFATSALEVVILKTLHMFEVPFLLVGCFKYIT
Predicting Protein Function
DNA binding protein
![Page 18: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/18.jpg)
18 Mona Singh
Functions of Human Proteins
Science, 2001
![Page 19: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/19.jpg)
19 Mona Singh
Sequence similarity
CF: EGGNAILENISFSISPGQRVGLLGRTGSGKSTLLSAFLRLL-----NT: QAAQPLVHGVSLTLQRGRVLALVGGSGSGKSLTCAATLGILPAGVR
CF: NTEGEIQIDGVSWDSITL---------QQWRKAFGVIPQKVFIFSGNT: QTAGEILADGKPVSPCALRGIKIATIMQNPRSAFNPL---------
CF: TFRKNLDPYEQWSDQEIWKVADEVGLRSVIEQFP-GKLDFVLVDGGNT: ---HTMHTHARETCLALGKPADDATLTAAIEAVGLENAARVLKLYP
CF: CVLSHGHKQLMCLARSVLSKAKILLLDEPSAHLDPV NT: FEMSGGMLQRMMIAMAVLCESPFIIADEPTTDLDVV
Ex: cystic fibrosis gene and bacterial nickel transport gene
![Page 20: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/20.jpg)
20 Mona Singh
Database Searches
http://www.ncbi.nlm.nih.gov
![Page 21: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/21.jpg)
21 Mona Singh
Database SearchesSequences producing significant alignments: E-Value
gi|5523990|gb|AAD44047.1|AF108138_1 (AF108138) DNA helicase 4e-84gi|7511524|pir||T37310 PIF1 protein - Caenorhabditis elegans helicase 1e-77gi|7493349|pir||T40739 rrm3-pif1 helicase homolog - fission... 3e-59gi|11282390|pir||T47241 RRM3/PIF1 helicase homolog - fission yeast 3e-59gi|6321820|ref|NP_011896.1| DNA helicase; Rrm3p [Saccharomyces 4e-43gi|6323579|ref|NP_013650.1| 5' to 3' DNA helicase; Pif1p [Saccharo 1e-41gi|558414|emb|CAA86260.1| (Z38114) len: 750, CAI: 0.14, inc... 1e-41gi|7687929|emb|CAB89609.1| (AL354532) possible DNA helicase... 4e-41
![Page 22: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/22.jpg)
22 Mona Singh
Protein Structure
Sequence: KETAAAKFERQHMDSSTSAASSSN…Structure:
![Page 23: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/23.jpg)
23 Mona Singh
Primary TertiarySecondary Quaternary
Amino acids helixPolypeptide
chainAssembledsubunits
Proteins
Lehninger, Principles of Biochemistry
![Page 24: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/24.jpg)
24 Mona Singh
Protein Structure Prediction
•Physics-based methods•Statistics-based method
![Page 25: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/25.jpg)
25 Mona Singh
Statistics & Protein Structure Prediction
Given a new sequence and a library of folds, figure out which (if any) is a good fit to the sequence.
![Page 26: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/26.jpg)
26 Mona Singh
Secondary structure prediction
• Given a protein sequence, can you tell its secondary structure– E.g., LKVVAKRELVQNNQ aaaa bbbb aaaaaaa
a=alpha, b=beta : ~70% accuracy(neural nets or other learning techniques)
![Page 27: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/27.jpg)
27 Mona Singh
Genome annotation
• Many other important features of DNA– E.g., proteins bind DNA regulatory
elements: determines which genes are “on” when
• Statistical & comparative approaches for finding them– Motif finding
![Page 28: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/28.jpg)
28 Mona Singh
Prokaryotes Eukaryotes
Universal phylogenetic tree
Woese et al.
![Page 29: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/29.jpg)
29 Mona Singh
Building phylogenetic trees
Use DNA (or protein) sequences from various organisms
e.g., human ATCGAGGC mouse ATCCAGCC yeast ATTAAGTA
![Page 30: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/30.jpg)
30 Mona Singh
Building phylogenetic trees
Human Mouse
Yeast
Human
0 2 4
Mouse 2 0 4
Yeast 4 4 0
E.g.,DistanceMatrix:
Tree:1 1
1 2
Human Mouse Yeast
![Page 31: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/31.jpg)
31 Mona Singh
DNA RNA
Protein
Stimulus
Sti
mu
lus
Intracellular networks
![Page 32: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/32.jpg)
32 Mona Singh
DNA RNA
Protein
fn
DNA RNA
Protein
fn
DNA RNA
Protein
fn
DNA RNA
Protein
fn
DNA RNA
Protein
fn
DNA RNA
Protein
fn
Network of cells
![Page 33: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/33.jpg)
33 Mona Singh
DNA RNA
Protein
fn
fn
![Page 34: What is computational biology?](https://reader033.fdocuments.us/reader033/viewer/2022061506/5681529d550346895dc0c23c/html5/thumbnails/34.jpg)
34 Mona Singh
Lecture Notes
• www.cs.princeton.edu/~mona/computational_biology_notes.html