An introduction to Web Apollo for the Biomphalaria glabatra research community.
Introduction to Apollo for i5k
-
Upload
monica-munoz-torres -
Category
Science
-
view
321 -
download
1
Transcript of Introduction to Apollo for i5k
![Page 1: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/1.jpg)
Introduction to Apollo Collaborative genome annotation editing A webinar for the i5K Research Community - Hemiptera
Monica Munoz-Torres | @monimunozto Berkeley Bioinformatics Open-Source Projects (BBOP) Environmental Genomics & Systems Biology Division, Lawrence Berkeley National Laboratory i5k Pilot Project Species Calls | 9 February, 2016
http://GenomeArchitect.org
![Page 2: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/2.jpg)
Outline
• Today you will discover effective ways to extract valuable information about a genome through curation efforts. Apollo Collabora've Cura'on and
Interac've Analysis of Genomes
![Page 3: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/3.jpg)
After this talk you will... • Better understand ‘curation’ in the context of genome annotation:
assembled genome à automated annotation à manual annotation
• Become familiar with Apollo’s environment and functionality.
• Learn to identify homologs of known genes of interest in your newly sequenced genome.
• Learn how to corroborate and modify automatically annotated gene models using all available evidence in Apollo.
![Page 4: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/4.jpg)
Experimental design, sampling.
Comparative analyses
Official / Merged Gene Set
Manual Annotation
Automated Annotation
Sequencing Assembly
Synthesis & dissemination.
This is our focus.
![Page 5: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/5.jpg)
We must care about curation
Marbach et al. 2011. Nature Methods | Shutterstock.com | Alexander Wild
The gene set of an organism informs a variety of studies: • Characterization: Gene number, GC%, TEs, repeats. • Functional assignments. • Molecular evolution, sequence conservation. • Gene families. • Metabolic pathways. • What makes an organism what it is?
What makes a bee a “bee”?
![Page 6: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/6.jpg)
Genome Curation
Identifies elements that best represent the underlying biology and eliminates elements that reflect systemic errors of automated analyses.
Assigns function through comparative analysis of similar genome elements from closely
related species using literature, databases, and experimental
data.
Apollo
Gene Ontology Resources
![Page 7: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/7.jpg)
A few things to rememberwhen conducting manual annotation
To remember… Biological concepts to be;er understand manual annota'on
7 BIO-REFRESHER
• KEEP A GLOSSARY HANDY from con$g to splice site
• WHAT IS A GENE?
defining your goal
• TRANSCRIPTION mRNA in detail
• TRANSLATION
reading frames, etc.
• GENOME CURATION steps involved
![Page 8: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/8.jpg)
The gene: a “moving target”
“The gene is a union of genomic
sequences encoding a coherent set of
potentially overlapping
functional products.”
Gerstein et al., 2007. Genome Res
![Page 9: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/9.jpg)
9
"Gene structure" by Daycd- Wikimedia Commons
BIO-REFRESHER
mRNA
• Although of brief existence, understanding mRNAs is crucial, as they will become the center of your work.
![Page 10: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/10.jpg)
10 BIO-REFRESHER
Reading frames
v In eukaryotes, only one reading frame per section of DNA is biologically relevant at a time: it has the potential to be transcribed into RNA and translated into protein. This is called the OPEN READING FRAME (ORF) • ORF = Start signal + coding sequence (divisible by 3) + Stop signal
![Page 11: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/11.jpg)
11 BIO-REFRESHER
Splice sites
v The spliceosome catalyzes the removal of introns and the ligation of flanking exons.
v Splicing signals (from the point of view of an intron): • One splice signal (site) on the 5’ end: usually GT (less common: GC) • And a 3’ end splice site: usually AG • Canonical splice sites look like this: …]5’-GT/AG-3’[…
![Page 12: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/12.jpg)
12 BIO-REFRESHER
Exons and Introns
v Introns can interrupt the reading frame of a gene by inserting a sequence between two consecutive codons
v Between the first and second nucleotide of a codon
v Or between the second and third nucleotide of a codon
"Exon and Intron classes”. Licensed under Fair use via Wikipedia
![Page 13: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/13.jpg)
Predic'on & Annota'on
![Page 14: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/14.jpg)
14 GENE PREDICTION & ANNOTATION
PREDICTION & ANNOTATION
v Iden'fica'on and annota'on of genome features:
• primarily focuses on protein-‐coding genes. • also iden'fies RNAs (tRNA, rRNA, long and small non-‐coding
RNAs (ncRNA)), regulatory mo'fs, repe''ve elements, etc.
• happens in 2 phases: 1. Computa'on phase 2. Annota'on phase
![Page 15: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/15.jpg)
15 GENE PREDICTION & ANNOTATION
COMPUTATION PHASE
a. Experimental data are aligned to the genome: expressed sequence tags, RNA-‐sequencing reads, proteins (also from other species).
b. Gene predic;ons are generated: -‐ ab ini$o: based on nucleo'de sequence and composi'on e.g. Augustus, GENSCAN, geneid, fgenesh, etc.
-‐ evidence-‐driven: iden'fying also domains and mo'fs e.g. SGP2, JAMg, fgenesh++, etc.
Result: the single most likely coding sequence, no UTRs, no isoforms. Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174
![Page 16: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/16.jpg)
16 GENE PREDICTION & ANNOTATION
ANNOTATION PHASE
Experimental data (evidence) and predic'ons are synthe'zed into gene annota'ons.
Result: gene models that generally include UTRs, isoforms, evidence trails.
Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174
5’ UTR 3’ UTR
![Page 17: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/17.jpg)
17
In some cases algorithms and metrics used to generate consensus sets may actually reduce the accuracy of the gene’s representa'on.
CONSENSUS GENE SETS
Gene models may be organized into sets using: v combiners for automa'c integra'on of predicted sets
e.g: GLEAN, EvidenceModeler
or v tools packaged into pipelines
e.g: MAKER, PASA, Gnomon, Ensembl, etc.
GENE PREDICTION & ANNOTATION
![Page 18: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/18.jpg)
ANNOTATIONneeds some refinement
No one is perfect, least of all automated annotation. 18
New technologies bring new challenges: • Assembly errors can cause fragmented
annota'ons • Limited coverage makes precise
iden'fica'on a difficult task
Image: www.BroadInstitute.org
![Page 19: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/19.jpg)
MANUAL ANNOTATIONimproving predictions
Precise elucida;on of biological features encoded in the genome requires careful
examina;on and review.
Schiex et al. Nucleic Acids 2003 (31) 13: 3738-‐3741
Automated Predictions
Experimental Evidence
Manual Annotation – to the rescue. 19
cDNAs, HMM domain searches, RNAseq, genes from other species.
![Page 20: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/20.jpg)
GENOME CURATIONan inherently collaborative task
GENE PREDICTION & ANNOTATION 20
So many sequences, not enough hands.
Apis mellifera | Alexander Wild | www.alexanderwild.com
![Page 21: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/21.jpg)
We have provided continuous training and support for hundreds of geographically dispersed scientists to conduct manual annotations efforts in order to recover coding sequences in agreement with all available biological evidence.
21
Lessons learned
APOLLO
• Collaborative work distills invaluable knowledge.
• A little training goes a long way! Wet lab scientists can easily learn to maximize the generation of accurate, biologically supported gene models.
![Page 22: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/22.jpg)
Apollo
![Page 23: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/23.jpg)
APOLLO: versatile genome annotation editing • Apollo is a web-based genome annotation editor, integrated with JBrowse
• Supports real time collaboration & generates analysis-ready data
USER-CREATED ANNOTATIONS
EVIDENCE TRACKS
ANNOTATOR PANEL
![Page 24: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/24.jpg)
BECOMING ACQUAINTED WITH APOLLO 24
General process of curation
1. Select or find a region of interest, e.g. scaffold. 2. Select appropriate evidence tracks to review the gene model.
3. Determine whether a feature in an exis'ng evidence track will provide a reasonable gene model to start working.
4. If necessary, adjust the gene model.
5. Check your edited gene model for integrity and accuracy by comparing it with available homologs.
6. Comment and finish.
![Page 25: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/25.jpg)
Apollo - version at i5K Workspace@NAL
25 4. Becoming Acquainted with Web Apollo.
25
The Sequence Selec'on Window
![Page 26: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/26.jpg)
Sort
Apollo - version at i5K Workspace@NAL
26
“Old Track Select Page”
4. Becoming Acquainted with Web Apollo.
26
![Page 27: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/27.jpg)
27
APOLLOannotation editing environment
BECOMING ACQUAINTED WITH APOLLO
Color by CDS frame, toggle strands, set color scheme and highlights.
-‐ Upload evidence files (GFF3, BAM, BigWig), -‐ combina;on track -‐ sequence search track
Query the genome using BLAT.
Naviga'on and zoom.
Search for a gene model or a scaffold.
Get coordinates and “rubber band” selec'on for zooming.
Login
User-‐created annota'ons. New
annotator panel.
Evidence Tracks
Stage and cell-‐type specific transcrip'on data.
h;p://genomearchitect.org/web_apollo_user_guide
![Page 28: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/28.jpg)
28 | 28 BECOMING ACQUAINTED WITH APOLLO
USER NAVIGATION
Annotator panel.
• Choose appropriate evidence from list of “Tracks” on annotator panel.
• Select & drag elements from evidence track into the ‘User-‐created Annota$ons’ area.
• Hovering over annota'on in progress brings up an informa'on pop-‐up.
• Crea'ng a new annota'on
![Page 29: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/29.jpg)
Adding a gene model
![Page 30: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/30.jpg)
Adding a gene model
![Page 31: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/31.jpg)
Adding a gene model
![Page 32: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/32.jpg)
Editing functionality
![Page 33: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/33.jpg)
Editing functionality Example: Adding an exon supported by experimental data
• RNAseq reads show evidence in support of a transcribed product that was not predicted. • Add exon by dragging up one of the RNAseq reads.
![Page 34: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/34.jpg)
Editing functionality Example: Adjusting exon boundaries supported by experimental data
![Page 35: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/35.jpg)
Cura'ng with Apollo
![Page 36: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/36.jpg)
36 | 36
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
• ‘Zoom to base level’ reveals the DNA Track.
![Page 37: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/37.jpg)
37 | 37
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
• Color exons by CDS from the ‘View’ menu.
![Page 38: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/38.jpg)
38 |
Zoom in/out with keyboard: shio + arrow keys up/down
38
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
• Toggle reference DNA sequence and transla;on frames in forward strand. Toggle models in either direc'on.
![Page 39: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/39.jpg)
annota'ng simple cases
![Page 40: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/40.jpg)
“Simple case”: -‐ the predicted gene model is correct or nearly correct, and
-‐ this model is supported by evidence that completely or mostly agrees with the predic'on.
-‐ evidence that extends beyond the predicted model is assumed to be non-‐coding sequence.
The following are simple modifica'ons.
40
ANNOTATING SIMPLE CASES
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
![Page 41: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/41.jpg)
• A confirma'on box will warn you if the receiving transcript is not on the same strand as the feature where the new exon originated.
• Check ‘Start’ and ‘Stop’ signals aoer each edit.
41
ADDING EXONS
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
![Page 42: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/42.jpg)
If transcript alignment data are available & extend beyond your original annota'on, you may extend or add UTRs.
1. Right click at the exon edge and ‘Zoom to base level’.
2. Place the cursor over the edge of the exon un$l it becomes a black arrow then click and drag the edge of the exon to the new coordinate posi'on that includes the UTR.
42
ADDING UTRs
To add a new spliced UTR to an exis'ng annota'on also follow the procedure for adding an exon.
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
![Page 43: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/43.jpg)
To modify an exon boundary and match data in the evidence tracks: select both the [offending] exon and the feature with the expected boundary, then right click on the annota'on to select ‘Set 3’ end’ or ‘Set 5’ end’ as appropriate.
In some cases all the data may disagree with the annota'on, in other cases some data support the annota'on and some of the
data support one or more alterna've transcripts. Try to annotate as many alterna've transcripts as are well supported by the data.
43
MATCHING EXON BOUNDARY TO EVIDENCE
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
![Page 44: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/44.jpg)
Non-‐canonical splice sites flags. Double click: selec'on of feature and sub-‐features
Evidence Tracks Area
‘User-‐created Annota$ons’ Track
Edge-‐matching
Apollo’s edi'ng logic (brain): § selects longest ORF as CDS § flags non-‐canonical splice sites
44
ORFs AND SPLICE SITES
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
![Page 45: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/45.jpg)
Non-‐canonical splices are indicated by an orange circle with a white exclama'on point inside, placed over the edge of the offending exon.
Canonical splice sites:
3’-‐…exon]GA / TG[exon…-‐5’
5’-‐…exon]GT / AG[exon…-‐3’ reverse strand, not reverse-‐complemented:
forward strand
45
SPLICE SITES
Zoom to review non-‐canonical splice site warnings. Although these may not always have to be corrected (e.g GC donor), they should be flagged with a comment.
Exon/intron splice site error warning
Curated model
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
![Page 46: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/46.jpg)
Apollo calculates the longest possible open reading frame (ORF) that includes canonical ‘Start’ and ‘Stop’ signals within the predicted exons.
If ‘Start’ appears to be incorrect, modify it by selec'ng an in-‐frame ‘Start’ codon further up or downstream, depending on evidence (proteins, RNAseq).
It may be present outside the predicted gene model, within a region supported by another evidence track.
In very rare cases, the actual ‘Start’ codon may be non-‐canonical (non-‐ATG).
46
‘Start’ AND ‘Stop’ SITES
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
![Page 47: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/47.jpg)
1. Two exons from different tracks sharing the same start/end coordinates display a red bar to indicate matching edges.
2. Selec'ng the whole annota'on or one exon at a 'me, use this edge-‐matching func'on and scroll along the length of the annota'on, verifying exon boundaries against available data. Use square [ ] brackets to scroll from exon to exon. User curly { } brackets to scroll from annota'on to annota'on.
3. Check if cDNA / RNAseq reads lack one or more of the annotated exons or include addi'onal exons.
47
CHECKING EXON INTEGRITY
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
![Page 48: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/48.jpg)
annota'ng complex cases
![Page 49: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/49.jpg)
Evidence may support joining two or more different gene models. Warning: protein alignments may have incorrect splice sites and lack non-‐conserved regions!
1. In ‘User-‐created Annota<ons’ area shio-‐click to select an intron from each gene model and right click to select the ‘Merge’ op'on from the menu.
2. Drag suppor'ng evidence tracks over the candidate models to corroborate overlap, or review edge matching and coverage across models.
3. Check the resul'ng transla'on by querying a protein database e.g. UniProt, NCBI nr. Add comments to record that this annota'on is the result of a merge.
49
Red lines around exons: ‘edge-‐matching’ allows annotators to confirm whether the evidence is in agreement without examining each exon at the base level.
COMPLEX CASES merge two gene predictions on the same scaffold
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
![Page 50: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/50.jpg)
One or more splits may be recommended when: -‐ different segments of the predicted protein align to two or more different gene families -‐ predicted protein doesn’t align to known proteins over its en're length -‐ Transcript data may support a split, but first verify whether they are alterna've transcripts.
50
COMPLEX CASES split a gene prediction
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
![Page 51: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/51.jpg)
DNA Track
‘User-‐created Annota;ons’ Track
51
COMPLEX CASES annotate frameshifts and correct single-base errors
Always remember: when annota'ng gene models using Apollo, you are looking at a ‘frozen’ version of the genome assembly and you will not be able to modify the assembly itself.
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
![Page 52: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/52.jpg)
52
COMPLEX CASES correcting selenocysteine containing proteins
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
![Page 53: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/53.jpg)
53
COMPLEX CASES correcting selenocysteine containing proteins
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
![Page 54: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/54.jpg)
1. Apollo allows annotators to make single base modifica'ons or frameshios that are reflected in the sequence and structure of any transcripts overlapping the modifica'on. These manipula'ons do NOT change the underlying genomic sequence.
2. If you determine that you need to make one of these changes, zoom in to the nucleo'de level and right click over a single nucleo'de on the genomic sequence to access a menu that provides op'ons for crea'ng inser'ons, dele'ons or subs'tu'ons.
3. The ‘Create Genomic Inser<on’ feature will require you to enter the necessary string of nucleo'de residues that will be inserted to the right of the cursor’s current loca'on. The ‘Create Genomic Dele<on’ op'on will require you to enter the length of the dele'on, star'ng with the nucleo'de where the cursor is posi'oned. The ‘Create Genomic Subs<tu<on’ feature asks for the string of nucleo'de residues that will replace the ones on the DNA track.
4. Once you have entered the modifica'ons, Apollo will recalculate the corrected transcript and protein sequences, which will appear when you use the right-‐click menu ‘Get Sequence’ op'on. Since the underlying genomic sequence is reflected in all annota'ons that include the modified region you should alert the curators of your organisms database using the ‘Comments’ sec'on to report the CDS edits.
5. In special cases such as selenocysteine containing proteins (read-‐throughs), right-‐click over the offending/premature ‘Stop’ signal and choose the ‘Set readthrough stop codon’ op'on from the menu.
54
COMPLEX CASES annotating frameshifts and correcting single-base errors & selenocysteines
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
![Page 55: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/55.jpg)
55 | 55
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
• Information Editor
![Page 56: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/56.jpg)
56
The Annota'on Informa;on Editor
56
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
![Page 57: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/57.jpg)
57
The Annota'on Informa;on Editor
• Add PubMed IDs • Include GO terms as appropriate
from any of the three ontologies • Write comments sta'ng how you
have validated each model.
57
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
![Page 58: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/58.jpg)
58 | 58
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
• Keeping track of each edit
![Page 59: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/59.jpg)
59
Annota'ons, annota'on edits, and History: stored in a centralized database.
59
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
![Page 60: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/60.jpg)
Follow the checklist un'l you are happy with the annota'on!
And remember to… – comment to validate your annota'on, even if you made no changes to an exis'ng model. Think of comments as your vote of confidence.
– or add a comment to inform the community of unresolved issues you think this model may have.
60 | 60
Always Remember: Apollo cura'on is a community effort so please use comments to communicate the reasons for your
annota'on. Your comments will be visible to everyone.
COMPLETING THE ANNOTATION
BECOMING ACQUAINTED WITH APOLLO
![Page 61: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/61.jpg)
Checklist
![Page 62: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/62.jpg)
• Check ‘Start’ and ‘Stop’ sites.
• Check splice sites: most splice sites display these residues …]5’-‐GT/AG-‐3’[…
• Check if you can annotate UTRs, for example using RNA-‐Seq data: – align it against relevant genes/gene family – blastp against NCBI’s RefSeq or nr
• Check for gaps in the genome.
• Addi'onal func'onality may be necessary: – merging 2 gene predic'ons -‐ same scaffold – merging 2 gene predic'ons -‐ different scaffolds
– spli`ng a gene predic'on – annota'ng frameshias – annota'ng selenocysteines, correc'ng single-‐base and other assembly errors, etc.
62 | 62
• Add: – Important project informa'on in the form of
comments – IDs from public databases e.g. GenBank (via
DBXRef), gene symbol(s), common name(s), synonyms, top BLAST hits, orthologs with species names, and everything else you can think of, because you are the expert.
– Comments about the kinds of changes you made to the gene model of interest, if any.
– Any appropriate func'onal assignments, e.g. via BLAST, RNA-‐Seq data, literature searches, etc.
CHECKLIST for accuracy and integrity
MANUAL ANNOTATION CHECKLIST
![Page 63: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/63.jpg)
Genome cura'on with i5k
![Page 64: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/64.jpg)
64 i5K Workspace@NAL
The collaborative curation process at i5k
1. A computa'onally predicted consensus gene set has been generated using mul'ple lines of evidence; e.g. HVIT_v0.5.3-‐Models
2. i5K Projects will integrate consensus computa'onal predic'ons with
manual annota'ons to produce an updated Official Gene Set (OGS):
Achtung! • If it’s not on either track, it won’t make the OGS! • If it’s there and it shouldn’t, it will s'll make the OGS!
![Page 65: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/65.jpg)
65
The ‘Replace Models’ rules
65
BECOMING ACQUAINTED WITH APOLLO http://tinyurl.com/apollo-i5k-replace
![Page 66: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/66.jpg)
66 i5K Workspace@NAL
3. In some cases algorithms and metrics used to generate consensus sets may actually reduce the accuracy of the gene’s representa'on. Use your judgment, try choosing a different model to begin the annota'on.
4. Isoforms: drag original and alterna'vely spliced form to ‘User-‐created Annota<ons’ area.
5. If an annota'on needs to be removed from the consensus set, drag it to the ‘User-‐created Annota<ons’ area and label as ‘Delete’ on the Informa$on Editor.
6. Overlapping interests? Collaborate to reach agreement.
7. Follow guidelines for i5K Pilot Species Projects, at h;p://goo.gl/LRu1VY
The collaborative curation process at i5k
![Page 67: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/67.jpg)
Example
![Page 68: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/68.jpg)
What’s new?... finding inspiration in PubMed.
Example 68
“Molecular analysis of bed bug populations from across the USA and Europe found that >80% and >95% of the respective populations contained V419L and/or L925I mutations in the voltage-gated sodium channel gene, indicating widespread distribution of target-site-based pyrethroid resistance.”
Homalodisca vitripennis | Alexander Wild | www.alexanderwild.com Halyomorpha halys | Fondazione Edmund Mach - Italy
Now for our species of interest. . .
![Page 69: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/69.jpg)
Example
Example 69
Cura'on example using the Hyalella azteca genome (amphipod crustacean).
![Page 70: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/70.jpg)
What do we know about this genome?
• Currently publicly available data at NCBI: • >37,000 nucleo'de seqsà scaffolds, mitochondrial genes • 344 amino acid seqsà mitochondrion • 47 ESTs • 0 conserved domains iden'fied • 0 “gene” entries submi;ed
• Data at i5K Workspace@NAL (annota'on hosted at USDA) -‐ 10,832 scaffolds: 23,288 transcripts: 12,906 proteins
Example 70
![Page 71: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/71.jpg)
PubMed Search: what’s new?
Example 71
![Page 72: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/72.jpg)
PubMed Search: what’s new?
Example 72
“Ten popula'ons (3 cultures, 7 from California water bodies) differed by at least 550-‐fold in sensi;vity to pyrethroids.”
“By sequencing the primary pyrethroid target site, the voltage-‐gated sodium channel (vgsc), we show that point muta'ons and their spread in natural popula'ons were responsible for differences in pyrethroid sensi'vity.”
“The finding that a non-‐target aqua'c species has acquired resistance to pes'cides used only on terrestrial pests is troubling evidence of the impact of chronic pes;cide transport from land-‐based applica'ons into aqua'c systems.”
![Page 73: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/73.jpg)
How many sequences are there, publicly available, for our gene of interest?
Example 73
• Para, (voltage-‐gated sodium channel alpha subunit; Nasonia vitripennis).
• NaCP60E (Sodium channel protein 60 E; D. melanogaster). – MF: voltage-‐gated ca'on channel ac'vity (IDA, GO:0022843).
– BP: olfactory behavior (IMP, GO:0042048), sodium ion transmembrane transport (ISS,GO:0035725).
– CC: voltage-‐gated sodium channel complex (IEA, GO:0001518).
And what do we know about them?
![Page 74: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/74.jpg)
Retrieving sequences for a sequence similarity search.
Example 74
>vgsc-‐Segment3-‐DomainII RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
![Page 75: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/75.jpg)
BLAT search
input
Example 75
>vgsc-‐Segment3-‐DomainII RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
![Page 76: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/76.jpg)
BLAT search
results
Example 76
• High-‐scoring segment pairs (hsp) are listed in tabulated format.
• Clicking on one line of results sends you to those coordinates.
![Page 77: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/77.jpg)
BLAST at i5K heps://i5k.nal.usda.gov/blast
Example 77
>vgsc-‐Segment3-‐DomainII RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
![Page 78: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/78.jpg)
BLAST at i5K heps://i5k.nal.usda.gov/blast
Example 78
![Page 79: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/79.jpg)
BLAST at i5K: hsps in “BLAST+ Results” track
Example 79
![Page 80: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/80.jpg)
Creating a new gene model: drag and drop
Example 80
• Apollo automa'cally calculates longest ORF.
• In this case, ORF includes the high-‐scoring segment pairs (hsp), marked here in blue.
• Note that gene is transcribed from reverse strand.
![Page 81: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/81.jpg)
Available Tracks
Example 81
![Page 82: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/82.jpg)
Get Sequence
Example 82
http://blast.ncbi.nlm.nih.gov/Blast.cgi
![Page 83: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/83.jpg)
Also, flanking sequences (other gene models) vs. NCBI nr
Example 83
In this case, two gene models upstream, at 5’ end.
BLAST hsps
![Page 84: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/84.jpg)
Review alignments
Example 84
HaztTmpM006234
HaztTmpM006233
HaztTmpM006232
![Page 85: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/85.jpg)
Hypothesis for vgsc gene model
Example 85
![Page 86: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/86.jpg)
Editing: merge the three models
Example 86
Merge by dropping an exon or gene model onto another.
Merge by selec'ng two exons (holding down “Shio”) and using the right click menu.
or…
![Page 87: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/87.jpg)
Result of merging the gene models:
Example 87
![Page 88: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/88.jpg)
Editing: correct offending splice site
Example 88
Modify exon / intron boundary: -‐ Drag the end of the
exon to the nearest canonical splice site.
or
-‐ Use right-‐click menu.
![Page 89: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/89.jpg)
Editing: set translation start
Example 89
![Page 90: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/90.jpg)
Editing: delete exon not supported by evidence
Example 90
Delete first exon from HaztTmpM006233
![Page 91: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/91.jpg)
Editing: add an exon supported by RNAseq
Example 91
• RNAseq reads show evidence in support of transcribed product, which was not predicted. • Add exon at coordinates 97946-‐98012 by dragging up one of the RNAseq reads.
![Page 92: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/92.jpg)
Editing: adjust offending splice site using evidence
Example 92
![Page 93: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/93.jpg)
Editing: adjust other boundaries supported by evidence
Example 93
![Page 94: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/94.jpg)
Finished model
Example 94
Corroborate integrity and accuracy of the model: -‐ Start and Stop -‐ Exon structure and splice sites …]5’-‐GT/AG-‐3’[… -‐ Check the predicted protein product vs. NCBI nr, UniProt, etc.
![Page 95: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/95.jpg)
Information Editor
• DBXRefs: e.g. NP_001128389.1, N. vitripennis, RefSeq
• PubMed iden'fier: PMID: 24065824
• Gene Ontology IDs: GO:0022843, GO:0042048, GO:0035725, GO:0001518.
• Comments
• Name, Symbol
• Approve / Delete radio bu;on
Example 95
Comments (if applicable)
![Page 96: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/96.jpg)
Go play!
![Page 97: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/97.jpg)
PUBLIC DEMO 97 | 97
APOLLO ON THE WEBinstructions
At i5K 1. Register for access to Apollo at the i5K Workspace@NAL at
h;ps://i5k.nal.usda.gov/web-‐apollo-‐registra'on
2. Contact the coordinator for each species community to receive more informa'on about how to contribute. Contact info is available on each organism’s page.
![Page 98: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/98.jpg)
PUBLIC DEMO 98 | 98
APOLLO ON THE WEBinstructions
Public Honey bee demo available at: h;p://GenomeArchitect.org/WebApolloDemo
Username: [email protected]
Password: demo
![Page 99: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/99.jpg)
APOLLOdemonstration
PUBLIC DEMO 99
Demonstra'on video is available at h;ps://youtu.be/VgPtAP_fvxY
![Page 100: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/100.jpg)
OUTLINE
Apollo Collabora've Cura'on and Interac've Analysis of Genomes
100 OUTLINE
• BIO-‐REFRESHER biological concepts for cura'on
• ANNOTATION automa'c predic'ons
• MANUAL ANNOTATION necessary, collabora've
• APOLLO
advancing collabora've cura'on • EXAMPLE
demos
![Page 101: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/101.jpg)
Apollo Development
Nathan Dunn Eric Yao
Christine Elsik’s Lab, University of Missouri
Suzi Lewis Principal Investigator
BBOP
Moni Munoz-Torres Colin Diesh Deepak Unni
JBrowse. Ian Holmes’ Lab University of California, Berkeley
![Page 102: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/102.jpg)
• Berkeley Bioinformatics Open-source Projects (BBOP), Berkeley Lab: Apollo and Gene Ontology teams. Suzanna E. Lewis (PI).
• § Christine G. Elsik (PI). University of Missouri. • * Ian Holmes (PI). University of California Berkeley. • Arthropod genomics community & i5K Steering
Committee. • Stephen Ficklin, GenSAS, Washington State University • Apollo is supported by NIH grants 5R01GM080203
from NIGMS, and 5R01HG004483 from NHGRI. Also supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231
• For your attention, thank you!
Apollo Nathan Dunn Colin Diesh § Deepak Unni §
Gene Ontology
Chris Mungall
Seth Carbon
Heiko Dietze
BBOP
Learn more about Apollo at http://GenomeArchitect.org
Thank you!
NAL at USDA
Monica Poelchau
Mei-Ju Chen
Christopher Childers
Gary Moore
HGSC at BCM
fringy Richards
Kim Worley
JBrowse Eric Yao *
![Page 103: Introduction to Apollo for i5k](https://reader031.fdocuments.us/reader031/viewer/2022022203/5872ee7a1a28abfa548b7bb7/html5/thumbnails/103.jpg)