"The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage...

Post on 10-May-2015

33.557 views 5 download

Tags:

description

Tools and Methods developed under the PhAnToMe (http://www.phantome.org) project between 2009-2012 using the Subsystems Technology, the SEED (http://theseed.org) environment, and RAST server (http://rast.nmpdr.org) Third presentation at the Phage Genomics Workshop at the 20th Biennial Evergreen International Phage Meeting

Transcript of "The Opera of PhAnToMe": Phage Annotation Tools at the 20th Biennial Evergreen International Phage...

The Opera of PhAnToMe

Ramy K. Aziz (Twitter: @azizrk)Aug 04 2013

opus (LT) = work (Pl. opera)

The environment, the toolbox, and the community

Phage Genomics Workshop, Evergreen 2013

08/04/2013

Past,

Phage Genomics - Evergreen 2013

NSF-funded, 3-year project (09-12) to develop

PhageAnnotationTools andMethods

Four Centers:- SDSU, San Diego, CA- VCU, Richmond, VA- USF, St. Pete FL- UA, Tucson, AZ

http://www.phantome.org

08/04/2013

… present, ...

Phage Genomics - Evergreen 2013

?TBA

08/04/2013

… and future

Phage Genomics - Evergreen 2013

Aims• Direct

– Discuss the concepts behind RAST– Quickly preview several tools developed under (or

under influence of) the PhAnToMe project– Demonstrate online, community annotation using

SEED

• Indirect {hidden agenda ;)}– PhAnToMe 2.0?– Establish community annotation efforts/

crowdsourcing– Seek Funding? Crowdfunding?

08/04/2013 Phage Genomics - Evergreen 2013

Outline• The environment (the SEED)

– The SEED and the ‘Subsystems Technology’

• The toolbox (PhAnToMe and sequels)– PHAST and RAST– PhACTS– PhiSPy– iVireons

• The community– Online annotation process – Annotation jamboree(s)– Course design

08/04/2013 Phage Genomics - Evergreen 2013

$$

Writing proposals, applying for grants

I. THE ENVIRONMENTThe Opera of PhAnToMe

Phage Genomics - Evergreen 2013

I. The Environment: SEED

http://theseed.org

08/04/2013

Aziz RK,, et al. (2012) PLoS ONE 7(10): e48053. doi:10.1371/journal.pone.0048053

Phage Genomics - Evergreen 2013

SEED: Main concept

One genome

All genomes

08/04/2013 Phage Genomics - Evergreen 2013

SEED: Main concept

One genome

All genomes

08/04/2013 Phage Genomics - Evergreen 2013

“Subsystems-based technologies were developed in the SEED with the view that the interpretation of one genome can be made more efficient and consistent if hundreds of genomes are simultaneously annotated in one subsystem at a time”

Aziz RK,, et al. (2012) PLoS ONE 7(10): e48053. doi:10.1371/journal.pone.0048053

SEED: Main concept• Protein-based database

Jargon: PEG = protein-encoding gene

• The subsystems approach

and• FIGfams: protein families based on

– sequence similarity– chromosomal co-occurrence, gene order,

synteny– human curation, evidence-based expert

assertions08/04/2013 Phage Genomics - Evergreen 2013

RAST: automated annotation

08/04/2013 Phage Genomics - Evergreen 2013

08/04/2013

What is a subsystem?• “A subset of functional roles studied across genomes”• A spreadsheet where:

– each row represents a genome– each column represents a functional role/ feature/ protein– different patterns = variants

Function 1 Function 2 … Function n

Genome a

Genome b

Genome z

Phage Genomics - Evergreen 2013

08/04/2013

What is a subsystem?

Phage Genomics - Evergreen 2013

Advantages of subsystems

Subsystems-basedannotation

08/04/2013 Phage Genomics - Evergreen 2013

Annotation Reconstruction

from genome from metagenome

08/04/2013 Phage Genomics - Evergreen 2013

Incomplete

frameshift

- complete- accurate

Credit: Andrew Kropinski Credit: Bas Dutilh

faulty assembly

Annotation Reconstruction

from genome from metagenome

08/04/2013

Incomplete faulty assembly

frameshift

- complete- accurate

Phage Genomics - Evergreen 2013

Credit: Andrew Kropinski Credit: Bas Dutilh

II. THE TOOLBOXThe Opera of PhAnToMe

Phage Genomics - Evergreen 2013

II. PhAnToMe ToolBoxhttp://www.phantome.org

08/04/2013 Phage Genomics - Evergreen 2013

II. The ToolBox: RAST• (At least) Four ways to annotate a genome via

RAST:

– myRAST (local)

• uses the server but you can edit offline)

– RAST (http://rast.nmpdr.org)

• annotates online, saves your genome on server

– “PhAST” (http://www.phantome.org/PhageSeed/Phage.cgi?page=phast)

• optimized gene-calling

– Use your favorite gene caller then upload gbk file to RAST

08/04/2013 Phage Genomics - Evergreen 2013

http://rast.nmpdr.org

08/04/2013 Phage Genomics - Evergreen 2013

phiRAST complaints• ORF/Gene calling

• tRNA– bug fixed, but still follow Andrew’s advice

• Too many hypotheticals, etc. – manual annotation, see later

– need for expert annotations, community contribution

– funding

08/04/2013 Phage Genomics - Evergreen 2013

“PhAST”: some improvement?

08/04/2013 Phage Genomics - Evergreen 2013

“PhAST”: some improvement?

08/04/2013 Phage Genomics - Evergreen 2013

PHAST: Disambiguation

08/04/2013 Phage Genomics - Evergreen 2013

Other tools• PHACTS:

– classifies and predicts lifestyle

• PhiSpy: – finds prophages

• iVireons– predicts phage structural proteins, holins,

more to come

08/04/2013 Phage Genomics - Evergreen 2013

II. The ToolBox: PHACTS• PHAge Classification Tool Set

• Uses a novel similarity algorithm and a supervised Random Forest classifier to predict whether the lifestyle of a phage, described by its proteome, is virulent or temperate.

• The similarity algorithm creates a training set from phages with known lifestyles and along with the lifestyle annotation, trains a Random Forest to classify the lifestyle of a phage.

• PHACTS predictions have had a 99% precision rate.

08/04/2013 Phage Genomics - Evergreen 2013 Kate McNair

PHACTS

• Out of the 227 phages with a known lifestyle, PHACTS was able to confidently and correctly calculate the lifestyle of 197 phages.

• Only 2 phages were predicted confidently wrong: The two phages that were confidently incorrectly classified were both virulent phages that contained a functional integrase

08/04/2013 Phage Genomics - Evergreen 2013 Kate McNair

PHACTS• http://www.phantome.org/PHACTS/

• Other applications• Host prediction: whether a phage infects a Gram

positive or Gram negative bacteria• Taxonomy prediction: a phage’s Family

08/04/2013 Phage Genomics - Evergreen 2013 Kate McNair

PHACTS

08/04/2013 Phage Genomics - Evergreen 2013 Kate McNair

II. The ToolBox: PhiSpy

Calculate genomic characteristics

Classifyprophage region

Evaluate predicted prophages

• Transcriptional Strand Orientation• Customized AT skew• Customized GC skew• Protein length • Abundance of Phage words

• Random Forest• Pre calculated training genome• Input bacterial genome

• Produce a rank for each gene

• Phage insertion points• Similarity of phage proteins

08/04/2013 Phage Genomics - Evergreen 2013 Sajia Akhter

PhiSpy

• Performance comparison in 50 complete bacterial genomes

Applications %Identified %FN %FP

Prophinder 89% 11% 12%

Phage_finder 82% 18% 1.33%

PhiSpy 94% 6% 0.66%

08/04/2013 Phage Genomics - Evergreen 2013 Sajia Akhter

• Download: PhiSpy – http://sourceforge.net/projects/phispy

• PhiSpy is on Kbase– http://kbase.science.energy.gov

• Web version under final development

• Ran PhiSpy on 4,335 bacterial genomes

• Predicted 12,826 prophages in 3,203 genomes

– 9,101 known prophages

– 3,723 undefined prophages08/04/2013 Phage Genomics - Evergreen 2013

PhiSpy

Sajia Akhter

iVIREONS – http://vdm.sdsu.edu/ivireons

Victor Seguritan

Victor Seguritan

Application of Artificial Neural Networks (ANNs)

to Viral Dark Matter

Viral Hypothetical Protein Sequences

Known

eval <= 0.001

Conserved Domain DB (rpsblast)

Keep sequences ≥ 200 aa

no hit OR e-value > 0.001

no hit OR e-value > 0.001

eval <= 0.001

Reference Sequence DB(tblastp)

Artificial Neural Networks (ANNs)

Remove ≥ 80% identical sequences

Synthesize ANN-predicted Hypothetical Protein Genes

Clone in E.coli

Purification By Cobalt Affinity

Validation by TEM or X-ray Crystallography

08/04/2013 Phage Genomics - Evergreen 2013

“FAMILIES” OF ANNs

1) General structural proteins:

2) Phage major capsid proteins

3) Phage tail/tail fibers/collar etc.

4) Holins

5) Portals

• Trained with all types of proteins• Both phages & viruses

08/04/2013 Phage Genomics - Evergreen 2013

Victor Seguritan

1

iVIREONS – http://vdm.sdsu.edu/ivireons

2Enter User Info

VibrioPhage

virus@microsoft.comDHS

3Upload Sequences

Victor Seguritan

4 View Results

5Copy Results to a Spreadsheet

iVIREONS – http://vdm.sdsu.edu/ivireons

- Structural 1:1- MCP 1:1- MCP 2:1- MCP 3:1- MCP 4:1- MCP 7:1- MCP 22:1

(lambda)- Tail 1:1- Tail 2:1- Tail 4:1- Tail 7:1- Tail 6.6:1

(lambda)

Stringencies Reported

08/04/2013 Phage Genomics - Evergreen 2013

III. THE COMMUNITYThe Opera of PhAnToMe

Phage Genomics - Evergreen 2013

SEED allows continuous annotation

08/04/2013

SEED

RAST

GenomesSubsystems

SEED Viewer

New Genomes

Subsystems Editor

Phage Genomics - Evergreen 2013

SEED allows community annotation

08/04/2013 Phage Genomics - Evergreen 2013

Later in the meeting, • Who might be interested in putting

together:a) an outline for an annotation jamboree/

workshop with phage experts

b) a syllabus/outline for a course to get undergraduate/graduate students to annotate specific subsystems

c) a proposal to get funding for community annotation efforts

d) all above

08/04/2013 Phage Genomics - Evergreen 2013

POST SCRIPTUMThe Opera of PhAnToMe

08/04/2013 Phage Genomics - Evergreen 2013

Aims• Direct

– Discuss the concepts behind RAST– Quickly preview several tools developed under (or

under influence of) the PhAnToMe project– Demonstrate online, community annotation using

SEED

• Indirect {hidden agenda ;)}– PhAnToMe 2.0?– Establish community annotation efforts/

crowdsourcing– Seek Funding? Crowdfunding?

08/04/2013 Phage Genomics - Evergreen 2013

If you use, please cite• SEED, RAST, myRAST, phiRAST, PHAST:

– RAST, BMC Genomics 2008 and SEED servers: PLoS ONE 2011

• Other tools– PHAST: McNair et al. PMID: 22238260; PhiSpy: Akhter et al. PMID:

22584627; iVireons: Seguritan et al. PMID: 22927809

• Letters of support

08/04/2013 Phage Genomics - Evergreen 2013

AcknowledgmentsRobert A. Edwards, PhD

• PhiRAST development: Ross Overbeek, Robert Olson, Gordon Pusch, Terry Disz, Bruce Parrello

• Phage annotators (Phantomers): Bhakti Dwivedi, Mya Breitbart, et al.

• FIG and all SEED annotators:VeronikaV, SvetaG, OlgaV/Z, et al.

Sajia Akhter

08/04/2013

$$

Phage Genomics - Evergreen 2013

& NSF

$$& NSF

Acknowledgments

• PHAST

Victor Seguritan

08/04/2013

Katelyn McNair

• iVireons

Phage Genomics - Evergreen 2013