The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob...

28
The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory Roche Life Sciences Workshop, Sept 2008 www.nmpdr.org www.theseed.org

Transcript of The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob...

Page 1: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

The Metagenomics RAST server: Annotation, Analysis, and

ComparisonsPerfect for Pyrosequencing

Rob Edwards

Department of Computer Science, San Diego State University

Mathematics and Computer Sciences Division, Argonne National Laboratory

Roche Life Sciences Workshop, Sept 2008

www.nmpdr.org www.theseed.org

Page 2: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Outline

• Metagenomics

• Tools for analyzing sequences

• Computational Challenges

• Does it work?

www.nmpdr.org www.theseed.org

Page 3: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Firstbacterial genome

100bacterial genomes

1,000bacterial genomes

Num

ber

of

know

n s

equence

s

Year

How much has been sequenced?

Environmentalsequencing

www.nmpdr.org www.theseed.org

Page 4: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Everybody inSan Diego

Everybody inUSA

AllculturedBacteria

100people

How much will be sequenced?

One genome fromevery species

Most majormicrobial environments

www.nmpdr.org www.theseed.org

Page 5: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Metagenomics(Just sequence it)

200 liters water 5-500 g fresh fecal matter50 g soil

Sequence

Epifluorescent Microscopy

Concentrate and purify bacteria, viruses, etc

Extract nucleic acids

Publish papers

Page 6: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Marine Near-shore water (~100 samples) Off-shore water (~50 samples) Near- and off-shore sediments

Metazoanassociated Corals Fish Human blood Human stool

ModernMetagenomics

Terrestrial/Soil Terragenomics Amazon rainforest Konza prairie Joshua Tree desert Air

Freshwater Aquifer Glacial lake

ExtremeHot springs (84oC; 78oC)Soda lake (pH 13)Solar saltern (>35% salt)

Page 7: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

The Problem

How do you generate consistent and accurate annotations for metagenomes?

www.nmpdr.org www.theseed.org

Page 8: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

The SEED Family

www.nmpdr.org www.theseed.org

Page 9: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Annotations using subsystemsFIG developed the notion of Subsystem – a generalization of “pathway” as a collection of functional roles jointly involved in a biological process or complex

Extended subsystems into FIGfams – protein families that perform the same functions.

www.nmpdr.org www.theseed.org

Page 10: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Annotation of Complete Genomes

• Automated user originated processing

• Takes 1-7 hours depending on size and complexity of the genome

• ~2,000 external submissions, including hundreds of genomes not yet publicly released.

• Reannotation of >500 genomes complete

• 1,000 users, 200 organizations, 25 countries.

http://rast.nmpdr.org/

www.nmpdr.org www.theseed.org

Page 11: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

The metagenomics RAST server

www.nmpdr.org www.theseed.org

Page 12: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Automated Processing

Page 13: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

www.nmpdr.org www.theseed.org

Summary View

Page 14: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Metagenomics ToolsAnnotation & Subsystems

www.nmpdr.org www.theseed.org

Page 15: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Metagenomics ToolsAnnotation & KEGG maps

Page 16: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Metagenomics ToolsRecruitment Plots

Page 17: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Metagenomics ToolsPhylogenetic Reconstruction

Page 18: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Metagenomics ToolsComparative Tools

Page 19: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Hours

of

Com

pute

Tim

e

Input size (MB)

Computational Requirements~19 hours of compute per input megabyte

www.nmpdr.org www.theseed.org

Page 20: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

How much so far

986 metagenomes

79,417,238 sequences

17,306,834,870 bp (17 Gbp)

Average: ~15-20 M bp per genome

Compute time (on a single CPU):

328,814 hours = 13,700 days = 38 years

~300 GS20~300 FLX~300 Sanger

www.nmpdr.org www.theseed.org

Page 21: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Lots of sequencesall pyrosequencing

www.nmpdr.org www.theseed.org

Page 22: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Metagenomics ToolsFunctional Heat Maps

Page 23: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Sulfur

CDA 60.2%

CD

A 2

1.7

% Respiration

Capsule Motility

Membranetransport

Stress

Signaling

Phosphorus

RNA

MineSaltern

MarineMicrobialites

CoralFish

AnimalsFreshwater

From Sequences To Environments

Dinsdale et al, Nature 2008

Page 24: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Workshops

Free workshops on NMPDR, RAST, mg-RAST, SEED

Contact Leslie McNeil [email protected]

or visithttp://www.nmpdr.org/

www.nmpdr.org www.theseed.org

Page 25: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Acknowledgements

Environmental GenomicsForest Rohwer All the labs that

provided sequence

Metagenomics Annotation ServerRick StevensFolker MeyerBob Olson

Daniel Paarman Mark D'Souza

Jared Wilkening Andreas Wilke

Statistics & Web servicesLiz DinsdaleRobert SchmiederDana HallBeltran Rodriguez-BritoBahador Nosrat

FIGRoss OverbeekVeronika VonsteinAnnotators

www.nmpdr.org www.theseed.org

ArtistPaula Morris

Argonne SequencingMarc DomanusAreej Ammar

Page 26: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Artists impression : not all machines are known to explode

Page 27: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Terragenomics

Page 28: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State.

Differences between soil samples