Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely...
-
Upload
curtis-oconnor -
Category
Documents
-
view
214 -
download
0
Transcript of Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely...
![Page 1: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/1.jpg)
Why Manual Genome Annotation?
Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning that most gene annotations contain at least one mis-annotated exon. (Yandell and Ence, 2012, Nature Reviews)
Automated annotation is often not good enough for genes you really care about!
![Page 2: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/2.jpg)
![Page 3: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/3.jpg)
Yandell and Ence, 2012, Nature Reviewshttp://www.yandell-lab.org/publications/pdf/euk_genome_annotation_review.pdf
![Page 4: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/4.jpg)
Different lines of evidence go into modern gene annotation pipelines:1. Computational prediction (Open Reading Frames, etc.)2. Evidence based prediction (ESTs, RNA-seq, etc)3. Homology based prediction (BLAST, etc)Synthesized into a consensus gene annotation – still may be wrong!
![Page 5: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/5.jpg)
Bees(Order Hymenoptera, Family Apidae)
Western Honey Bee (Apis mellifera)
Common Eastern Bumble Bee (Bombus impatiens)
Buff-Tailed Bumble Bee (Bombus terrestris) Dwarf Asian Honey Bee
(Apis florea)
![Page 6: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/6.jpg)
NADPH + H+ + O2 + R-H NADP+ + H2O + R-OH
cytochrome P450 monooxygenase enzymes
classification: CYP 3 A 4
family>40% amino acid sequence-homology
sub-family>55% amino acid sequence-homology
isoenzyme
*15 A-B
allele
![Page 7: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/7.jpg)
Chemical signalling??? (pheromone synthesis and breakdown)
Detoxication(toxin and pesticide metabolism)
Hormone synthesis (highly conserved orthologs)+ Detoxication
![Page 8: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/8.jpg)
Organism P450s food / environment
Nasonia vitripennis 92 f ly pupae
Apis mellifera 46 nectar and pollen / homeostatic nest
Anopheles gambiae 106 blood and detritus / standing water
Drosophila melanogaster 85 rotting fruit
Tribolium castaneum 131 seeds
![Page 9: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/9.jpg)
Organism P450s Mito CYP2 CYP3 CYP4
Drosophila melanogaster 85 11 6 36 32
Apis mellifera 46 6 8 28 4
Nasonia vitripennis 87 6 7 45 29
![Page 10: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/10.jpg)
Repeats
![Page 11: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/11.jpg)
Intron splice sites are highly conserved
![Page 12: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/12.jpg)
P450s:~ 500 amino acids (1500 nucleotides)Highly conserved heme-binding site (cysteine)
![Page 13: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/13.jpg)
Basic Annotation Rules
CDS StartAmino acid MNucleotide ATG
CDS Stop * Amino AcidTAA/TAG/TAG Nucleotide
Translation Frames
Frame 1Frame 2Frame 3
![Page 14: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/14.jpg)
http://en.wikipedia.org/wiki/File:Exon_and_Intron_classes.png
http://doc.goldenhelix.com/SVS/latest/_images/splice_site_diagram.png
Intron splice sites
GT-AG
![Page 15: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/15.jpg)
![Page 16: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/16.jpg)
![Page 17: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/17.jpg)
![Page 18: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/18.jpg)
![Page 19: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/19.jpg)
![Page 20: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/20.jpg)
![Page 21: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/21.jpg)
![Page 22: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/22.jpg)
![Page 23: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/23.jpg)
![Page 24: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/24.jpg)
![Page 25: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/25.jpg)
![Page 26: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/26.jpg)
![Page 27: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/27.jpg)
![Page 28: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/28.jpg)
![Page 29: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/29.jpg)
![Page 30: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/30.jpg)
![Page 31: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/31.jpg)
![Page 32: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/32.jpg)
![Page 33: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/33.jpg)
“(\w)”
“\1 “
![Page 34: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/34.jpg)
![Page 35: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/35.jpg)
‘GT’ intron donor site
![Page 36: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/36.jpg)
![Page 37: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/37.jpg)
![Page 38: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/38.jpg)
‘AG’ intron acceptor site
![Page 39: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/39.jpg)
‘GT’ intron donor site
1 nucelotide “G” for next codon = Phase 1 intron
![Page 40: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/40.jpg)
‘AG’ intron acceptor site
2 nucelotides “AA” before first full codon
Combine with “G” on exon 2
Make the codon “GAA” for glutamic acid (E)
![Page 41: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/41.jpg)
![Page 42: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/42.jpg)
This start looks good!
![Page 43: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/43.jpg)
![Page 44: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/44.jpg)
![Page 45: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/45.jpg)
![Page 46: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/46.jpg)
![Page 47: Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649d945503460f94a7c976/html5/thumbnails/47.jpg)
Jamboree!Search for paralogs using one of these genes from Apis mellifera in the protein database on Genbank (e.g. CYP9R1 AND Apis mellifera)
CYP9R1 CYP6AS3CYP6BD1CYP6AQ1CYP4G11
Use BLASTP to find predicted paralogs in the NCBI “nr” database. Select one of the following bees for the Organism:
Apis floreaBombus impatiensBombus terrestrisMegachile rotundata
Copy and paste verified amino acid sequences (FASTA formatted) into a text file: