How We Annotated Genomes for Free: Fast and Accurate Functional Analysis Using Subsystems Technology...

25
How We Annotated Genomes for Free: Fast and Accurate Functional Analysis Using Subsystems Technology Rob Edwards Depts of Computer Science And Biology, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory ASM Philadelphia, May 2009 http://rast.nmpdr.org/? page=Conference
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    2

Transcript of How We Annotated Genomes for Free: Fast and Accurate Functional Analysis Using Subsystems Technology...

How We Annotated Genomes for Free: Fast and Accurate Functional

Analysis Using Subsystems Technology

Rob EdwardsDepts of Computer Science And Biology,

San Diego State University

Mathematics and Computer Sciences Division, Argonne National Laboratory

ASM Philadelphia, May 2009

http://rast.nmpdr.org/?page=Conference

Pigeons

If it’s good enough for Google – it’s good enough for me

Annotation Servers

• Metagenomes– http://metagenomics.theseed.org

http://rast.nmpdr.org/?page=Conference

• Complete genomes– http://rast.nmpdr.org

Firstbacterial genome

100bacterial genomes

1,000bacterial genomes

Num

ber

of

know

n s

equence

s

Year

How much has been sequenced?

Environmentalsequencing

http://rast.nmpdr.org/?page=Conference

Everybody atan ASM meeting

Everybody inUSA

AllculturedBacteria

100people

How much will be sequenced?

One genome fromevery species

Most majormicrobial environments

http://rast.nmpdr.org/?page=Conference

The SEED Family

http://rast.nmpdr.org/?page=Conference

Subsystem Spreadsheet

Chaperone Subunit Usher Adhesin

S. enterica Enteritidis 2389 2388 2387 2386

E. coli HS 3068 3067 3066 3065

B. cenocepacia J2315 2604 2603 2602 2601

S. maltophilia 1085 1088 1087 1086

Over 1,000 Subsystems

Three level “hierarchy”

• Amino Acids and Derivatives– Alanine, serine, and glycine

• Serine Biosynthesis

• Amino Acids and Derivatives– Lysine, threonine, methionine, and cysteine

• Methionine Biosynthesis

Make your own subsystems!

http://rast.nmpdr.org/?page=Conference

Class # SS Class # SS

Amino Acids and Derivatives 56 Nucleosides and Nucleotides 14

Carbohydrates 97 Phosphorus Metabolism 6

Cell Division and Cell Cycle 10 Photosynthesis 9

Cell Wall and Capsule 50 Potassium metabolism 3

Clustering-based subsystems 193 Protein Metabolism 52

Cofactors, Vitamins, Pigments 43 RNA Metabolism 39

DNA Metabolism 30 Regulation and Cell signaling 23

Fatty Acids, Lipids, and Isoprenoids

22 Respiration 44

Membrane Transport 41 Secondary Metabolism 24

Metabolism of Aromatic Compounds

30 Stress Response 37

Motility and Chemotaxis 8 Sulfur Metabolism 12

Nitrogen Metabolism 11 Virulence 116

The Annotation Process

• Find the phylogenetic neighborhood of your genome

• Look for proteins that related organisms have– Core proteins– Subset of all subsystems

• Use those calls as a training set for critica/glimmer– Intrinsic training set!

http://rast.nmpdr.org/?page=Conference

This one’s for Gary

Automatic Metabolic Reconstruction

• Subsystem, GO, and KEGG connections– KEGG EC numbers– KEGG reaction numbers– SEED reaction numbers (Chris Henry)

• Metabolic flux models – Automatically generate FBA matrices (Aaron

Best/Matt DeJongh; Hope College)

http://rast.nmpdr.org/?page=Conference

The Populated Subsystem

http://rast.nmpdr.org/?page=Conference

Automatically Compare Metabolic Reconstructions

Find And Suggest Candidate Functions

• Rapidly correct missing annotations

• Add more members to subsystems

• Improves future genome annotations!(especially with new subsystems)

http://rast.nmpdr.org/?page=Conference

The Real Live Test

• 10 genomes submitted on Thursday at 6 pm

• First annotation complete before 8 am Friday

• Remaining annotations completed Friday before noon

• (there were others in the pipeline too!)

http://rast.nmpdr.org/?page=Conference

Subsystems Coverage

Genome Percent of Proteins in Subsystems

Haloferax denitrificans 20%

Haloferax mediterranei 19%

Haloferax sulfurifontis 19%

Haloferax volcanii DS2 19%

Haloarcula sp 33800 19%

Haloarcula sp 33799 18%

http://rast.nmpdr.org/?page=Conference

Prophages

PHANTOME

Mya Breitbart,

Matt Sullivan, Je

ff Elhai, Rob Edwards

NSF

Haloferax sulfurifontis prophage

Metagenome Comparisons

Metagenomics RAST has 300 public metagenomes

Compared using tblastx

http://rast.nmpdr.org/?page=Conference

Human Poop

High Salinity SalternsSaN Diego, July 2004

Thanks Beltran Rodriguez-Mueller, Mya Breitbart, & Forest Rohwer

Low salinity salterns High salinity salterns

July2004

Nov2005

Free workshops on NMPDR, RAST, mg-RAST, SEED

Contact Leslie McNeil [email protected]

or visithttp://www.nmpdr.org/

http://rast.nmpdr.org/?page=Conference

Acknowledgements

Environmental GenomicsForest Rohwer Beltran Rodriguez-Mueller

Annotation ServersRick StevensRoss OverbeekFolker MeyerBob Olson

Daniel Paarman Mark D'Souza

Jared Wilkening Andreas Wilke

FIGRoss OverbeekVeronika VonsteinAnnotators

ArtistPaula Morris

http://rast.nmpdr.org/?page=Conference