Post on 23-Aug-2014
description
MAKERThe Genome Annotation Pipeline
GMOD Summer CourseMay 19, 2014
Barry Moore/Carson HoltYandell Lab
University of Utah
MAKER
• The Annotation Problem• How MAKER Works• Why Choose MAKER• Working with MAKER
What are Annotations?Fu
nctio
nal
Stru
ctur
al
FunctioncAMP-dependent and sulfonylurea-sensitive anion transporter. Key gatekeeper influencing intracellular cholesterol transport.
Subcellular location Membrane; Multi-pass membrane protein Ref.13 Ref.14.
Domain
Multifunctional polypeptide with two homologous halves, each containing a hydrophobic membrane-anchoring domain and an ATP binding cassette (ABC) domain.
Genomes Online Database
http://www.genomesonline.org/
1998 2000 2002 2004 2006 2008 2010 20120
1000
2000
3000
4000
5000
6000
7000
8000
9000
Genome Project Status
IncompleteComplete
Year
Geno
mes
http://www.genome.gov/
http://www.genome.gov/
100
1,600
3,200
4,800
6,400
8,000
0
Next Gen Genome Annotation 2013-14
• Coelacanth• Pine• Sacred Lotus• Conus ballatus• Pigeon• King Cobra• Hymenopterids
• Fusarium cirinatum• Cardiocondyla
obscurior• Burmese Python• Sarcocystis neurona• Spotted Gar• Apple magot fly
The ‘NextGen’ Genome ProjectLab/Small Group FundingShort-read Genome SequencingRNASeq DataGenome/Transcriptome AssemblyGene AnnotationGenome Database / Blast ServerManual curationNew assemblyReannotate/Merge annotations
• The Annotation Problem• How MAKER Works• Why Choose MAKER• Working with MAKER
MAKER
The Source of Annotations
RNA and Protein
Evidence
AccurateGene
Annotations
Ab Initio Computational
Evidence
Annotating the Genome – Apollo View
current evidence
gene annotations
genome assembly
http://apollo.berkeleybop.org/
Identify and mask repetitive elements
current evidence
genome assembly
http://www.repeatmasker.org
Generate ab initio gene predictions
ab initio predictionsSNAPGeneMark
Augustus
current evidence
genome assembly
http://korflab.ucdavis.edu/
Align RNA and protein evidence
ab initio predictions
protein - BLASTXEST - BLASTN
altEST - TBLASTX
current evidence
genome assembly
http://blast.ncbi.nlm.nih.gov
Polish BLAST alignments with Exonerate
ab initio predictions
polished proteinpolished EST
current evidence
genome assembly
http://www.ebi.ac.uk/~guy/exonerate/
current evidence
Pass gene-finders evidence-based ‘hints’
ab initio predictions
Hint-based SNAP Hint-based Augustus
genome assembly
current evidence
Identify gene model most consistent with evidence
ab initio predictions*Hint-based SNAP Hint-based Augustus
genome assembly
current evidence
Revise further if necessary; create new annotation
ab initio predictions
genome assembly
Compute support for each portion of gene model
Eilbeck et al BMC Bioinformatics 2009
genome assembly
Compute support for each portion of gene model
Cantarel BL et al., Genome Res 2008
genome assembly
GFF3
FASTA
MAKER2 Workflow
MAKER2 Distributed Workflow
ParalellizationEfficiency
Holt C, Yandell M. BMC Bioinformatics. 2011 12:491.
30 GB Pine genome annotated in 37 hrs on
6,000 CPUs at the TACC
• The Annotation Problem• How MAKER Works• Why Choose MAKER• Working with MAKER
MAKER
MAKERThe Genome Annotation PipelineMaintenance and Management
^GMOD Summer Course
May 19, 2014
Barry Moore/Carson HoltYandell Lab
University of Utah
MAKER2 Use Cases
1. De novo annotation providing quality metrics2. Merging multiple annotation sets3. Re-annotation with new evidence4. Mapping annotations forward to a new
assembly5. Generating GMOD Compliant Output
1. Gbrowse/JBrowse2. Apollo3. Tripal
Sensitivity, Specificity, AccuracyAs a Measure of Annotation Quality
Gold Standard Genes
SN SP AC
1.0 1.0 100%
Gold Standard Genes
Perfect Accuracy
Sensitivity, Specificity, AccuracyAs a Measure of Annotation Quality
SN SP AC
1.0 1.0 100%
1.0 0.5 80%
Gold Standard Genes
Perfect Accuracy
Poor Specificity
Sensitivity, Specificity, AccuracyAs a Measure of Annotation Quality
SN SP AC
1.0 1.0 100%
1.0 0.5 80%
0.5 1.0 80%
Gold Standard Genes
Perfect Accuracy
Poor Specificity
Poor Sensitivity
Sensitivity, Specificity, AccuracyAs a Measure of Annotation Quality
SN SP AC
1.0 1.0 100%
1.0 0.5 80%
0.5 1.0 80%
0.5 0.5 50%
Gold Standard Genes
Perfect Accuracy
Poor Specificity
Poor Sensitivity
Poor Specificityand Sensitivity
Sensitivity, Specificity, AccuracyAs a Measure of Annotation Quality
Guigó R et al. Genome Biol. 2006
MAKER vs. Predictors
Holt C, Yandell M. BMC Bioinformatics. 2011
MAKER vs. Predictors(the wrong HMM...)
Holt C, Yandell M. BMC Bioinformatics. 2011 12:491.
Annotation Edit Distance
Gold Standard GenesGold StandardEvidence
Protein Alignments
EST Alignments
mRNASeq
Eilbeck et al BMC Bioinformatics 2009
Annotation Edit Distance
SN SP AED
1.0 1.0 0.0
1.0 0.5 0.2
0.5 1.0 0.2
0.5 0.5 0.5
Gold StandardEvidence
Perfect Accuracy
Poor Specificity
Poor Sensitivity
Poor Specificityand Sensitivity
Eilbeck et al BMC Bioinformatics 2009
AED as a Measure of Genome Wide Annotation Quality
Eilbeck et al BMC Bioinformatics 2009
TAIR Star Rating System
http://www.arabidopsis.org/
AED Agrees well with the TAIR star system
Evidence: mRNA-seq (17 experiments), ESTs, full length cDNAs, Swiss-Prot (minus Arabidopsis)
0 0.25 0.5 0.75 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
***** (7,880)
**** (12,654)
*** (2,087)
** (2,188)
* (1,788)
(604)
AED
Cum
ulat
ive
Frac
tion
of A
nnot
atio
ns
Holt C, Yandell M. BMC Bioinformatics. 2011
AED as a Measure of Annotation Quality
MAKER Annotations Match the Evidence Well
0 0.25 0.5 0.75 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
TAIR10 rep transcripts (27,206)MAKER de novo (25,956)MAKER update of TAIR10 (26,885)
AED
Cum
ulat
ive
Frac
tion
of A
nnot
atio
ns
0 0.25 0.5 0.75 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
chr10 rep transcripts (2,688)MAKER de novo (3,056)MAKER update of v3 (2,661)
AED
Cum
ulat
ive
Frac
tion
of A
nnot
atio
ns
A. thaliana Z. mays
Campbell et al, 2013 submitted
Protein Domain ContentAs a Measure of Annotation Quality
Holt C, Yandell M. BMC Bioinformatics. 2011
MAKER vs. Predictors
Holt C, Yandell M. BMC Bioinformatics. 2011
• The Annotation Problem• How MAKER Works• Why Choose MAKER• Working with MAKER
MAKER
http://derringer.genetics.utah.edu/cgi-bin/mwas/maker.cgi
MAKER Installation• Automated query/answer based installation
script.• Installs Perl prerequisites.• Installs necessary executables
– RepeatMasker (RepBase)– BLAST+– Exonerate– SNAP
• Even installs MWAS and MPICH2
MAKER Runtime Features
• Fill out a config file with input data and parameters
• Parallelize:– Running with MPI– Simply start multiple instances in the same
directory.• Re-run MAKER in the same directory and it
won't redo completed work.• Restart aborted jobs without losing any work.
Accessory ScriptsOver 30 accessory scripts:
•cegma2zff•chado2gff3•cufflinks2gff3•gff3_2_gtf•gff3_preds2models•gff3_to_eval_gtf•maker2chado•maker2jbrowse•maker2zff•tophat2gff3•compare•evaluator•gff3_merge•fasta_merge•fasta_tool
•fix_fasta•genemark_gtf2gff3•ipr_update_gff•iprscan2gff3iprscan_batch•iprscan_wrap•maker_functional•maker_functional_fasta•maker_functional_gff•maker_map_ids•map2assembly•map_data_ids•map_fasta_ids•map_gff_ids•split_fasta
• The Annotation Problem• How MAKER Works• Why Choose MAKER• Working with MAKER
MAKER
Acknowledgements• Mark Yandell
– Carson Holt– Mike Campbell– Daniel Ence– Steven Flygare– Zev Kronenberg– Qing Li– Marc Singleton– Bretty Kennedy– Brandi Cantarel– Hadi Islam
• Karen Eilbeck– Shawn Reynearson– Nicole Ruiz– Keith Simmons– Bret Heale
• Alejandro Alvarado– Eric Ross
• Jason Stajich• Sophia Robb• Kevin Childs• Shin-Han Shui• Ning Jiang• Yanni Sun
NSF IOS-1126998