C A M E R A A Metagenomics Resource for Microbial Ecology
description
Transcript of C A M E R A A Metagenomics Resource for Microbial Ecology
![Page 1: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/1.jpg)
C A M E R AA Metagenomics Resource
for Microbial Ecology
Saul A. KravitzJ. Craig Venter InstituteRockville, Maryland USA
KNAW ColloquiumMay 29, 2008
![Page 2: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/2.jpg)
Goals
• Introduce you to CAMERA
• Encourage you to use CAMERA
• What can CAMERA do for you?
![Page 3: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/3.jpg)
Presentation Outline
• Introduction to Metagenomics
• Global Ocean Sampling (GOS) Expedition
• CAMERA Capabilities and Features- Compute Resources
- Data Resources
- Tools Resources
• Looking Forward
![Page 4: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/4.jpg)
• Within an environment- What biological functions are present (absent)?
- What organisms are present (absent)
• Compare data from (dis)similar environments- What are the fundamental rules of microbial ecology
• Adapting to environmental conditions?- How?
- Evidence and mechanisms for lateral transfer
• Search for novel proteins and protein families
- And diversity within known families
Metagenomic Questions
![Page 5: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/5.jpg)
• Genomics – ‘Old School’- Study of a single organism's genome - Genome sequence determined using shotgun
sequencing and assembly- >1300 microbes sequenced, first in 1995
- DNA usually obtained from pure cultures (<1%) • Metagenomics
- Application of genome sequencing methods to environmental samples (no culturing)
- Environmental shotgun sequencing is the most widely used approach
- Environmental Metadata provides key context
Genomics vs Metagenomics
![Page 6: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/6.jpg)
Complexity of Microbial Communities
• Simple (e.g., AMD, gutless worm)- Few species present (<10)
- Diverse
Variations on standard genomics techniques
• Complex (e.g., Soil or Marine)- Many species present (>10, often >1000)
- Many closely related
New techniques
![Page 7: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/7.jpg)
Global Ocean Sampling Expedition
![Page 8: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/8.jpg)
Global Ocean Sampling (GOS)• 178 Total Sampling Locations
- Phase 1: 7.7M reads, >6M proteins 3/07- Phase 2-IO: 2.2M reads 3/08- Phase 2: ~10M reads future
• Diverse Environments- Open ocean, estuary, embayment, upwelling, fringing reef, atoll…
3/08
3/07
4/04
![Page 9: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/9.jpg)
• Most sequence reads are unique- Very limited assembly- Most sequences not taxonomically anchored- Relating shotgun data to reference genomes- Annotation challenging
• New Techniques Needed- Fragment Recruitment- Extreme Assembly to find pan genomes- Sample to Sample Comparisons
GOS: Sequence Diversity in the Ocean
Rusch et al (PLoS 2007)
![Page 10: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/10.jpg)
Comparing of Dominant Ribotypes
![Page 11: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/11.jpg)
Comparison of Total Genomic Content
![Page 12: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/12.jpg)
• Novel clustering process• Sequence similarity based
• Predict proteins and group into related clusters
• Include GOS and all known proteins
• Findings• GOS proteins
• cover ~all existing prokaryotic families
• expands diversity of known protein families
• ~10% of large clusters are novel
• Many are of viral origin
• No saturation in the rate of novel protein family discovery
GOS Protein Analysis Yooseph et al (PLoS 2007)
![Page 13: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/13.jpg)
Rubisco homologs
Added Protein Family Diversity
Yooseph et al (PLoS 2007)
New Groups
GOS prokaryotes
Known eukaryotes
Known prokaryotes
![Page 14: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/14.jpg)
• Study of dsDNA viruses from shotgun data- 155k viral proteins identified from 37 GOS I sites (~2.5%)
- 59% of viral sequences were bacteriophage
• Viral acquisition and retention of host metabolic genes is common and widespread- Viruses have made these genes “their own”- Clade tightly with viral genes
• Codistribution of P-SSM4-like cyanophage and the dominant ecotype of Prochlorococcus in GOS samples.
GOS Viral Analysis(Williamson et al PLoSOne 2008)
![Page 15: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/15.jpg)
Viral acquisition of host genestalC Gene
GOS Viral
Public Viral
GOS Bacterial
Public Bacterial
Public Euk
![Page 16: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/16.jpg)
Reference Genomes
• Overview- 150+ reference marine microbes (101 released)
- Scaffold for GOS
- Sequenced, assembled, autoannotated
• Isolation Metadata- Incomplete
• Bottlenecks- Availability of DNA
- Purity of DNA
• Status and Data- https://research.venterinstitute.org/moore/
![Page 17: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/17.jpg)
• Significant investment in sequencing- Only accessible to bioinformatics elite
- Diversity of user sophistication and needs
• Bioinformatics and Computation Challenges- Assembly, annotation, comparative analysis, visualization
- Dedicated compute resources
• Importance of Metadata- Metadata required for environmental analysis
- Need to drive standards
• Compliance with Convention on Biodiversity
Motivations for CAMERA
![Page 18: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/18.jpg)
Convention on Biological Diversity
• Sample in territorial waters?- Country granted certain rights by CBD
- Sampling agreements may contain restrictions
• CAMERA users must acknowledge potential restrictions on commercial data use
• CAMERA maintains mapping of country-of-origin for all data objects
![Page 19: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/19.jpg)
CAMERA – http://camera.calit2.net
• “Convenient acronym for cumbersome name…”- Henry Nichols, PLoS Biology
• Mission- Enable Research in Marine Microbiology
• Debuted March 2007
![Page 20: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/20.jpg)
CAMERA Capabilities
• Compute Resources- 512 node compute grid + 200 Tb storage
• Data and Metadata Resources- Annotated Metagenomic and genomic data
• Tools Resources- Scalable BLAST
- Fragment Recruitment
- Metagenomic Annotation
- Text Search
![Page 21: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/21.jpg)
512 Processors ~5 Teraflops
~ 200 Terabytes Storage
CAMRA Compute and Storage Complexat UCSD/Calit2
Source: Larry Smarr, Calit2
![Page 22: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/22.jpg)
CAMERA Metagenomic Data Volume
by Project
![Page 23: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/23.jpg)
CAMERA Metagenomic Samples
![Page 24: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/24.jpg)
CAMERA Users>2000 Registered Since March
2007
![Page 25: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/25.jpg)
• Metagenomic Sequence Collection- Reads and assemblies w/associated metadata
- CAMERA-computed annotation
• Protein Clusters- Maintaining clusters from Yooseph et al (Yooseph and Li, ’08)
• Genomic Data- Viral, Fungal, pico-Eukaryotes, Microbial- Moore Marine Genomes with Metadata
• Non-redundant sequence Collection- Genbank, Refseq, Uniprot/Swissprot, PDB etc
CAMERA Data Collections
![Page 26: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/26.jpg)
• Genome Standards Consortium- Led by Dawn Field, NIEeS
- Members from EU, UK, US
• Goals are to promote- Standardization of genomic descriptions
- Exchange & Integration of genomic data
• Metadata standardization key enabler- MIMS: Min Info for Metagenomic Sample
- GCDML: Standard format
Standardizing Contextual Metadata
![Page 27: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/27.jpg)
Contextual Metadata Challenges
• Researchers Need to Collect and Submit
• Relevant metadata depends on study – MIMS- Specification of minimum metadata
• Standardize Exchange Format - GCDML
- Comprehensive and extensible
- Leverages Existing Ontologies, Validatable
And…
- Easy for a scientist to use...
• Need ongoing software support for tools
![Page 28: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/28.jpg)
CAMERA Core Metadata by Project
• Defacto Core•Lattitude and Longitude•Collection date•Habitat and Geographic Location
• Missing metadata =
![Page 29: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/29.jpg)
CAMERA Contextual Metadata
![Page 30: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/30.jpg)
CAMERA 1.3
http://camera.calit2.net
![Page 31: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/31.jpg)
Scalable BLAST with Metadata
• Large searches permitted and encouraged
• 454 FLX run vs “All Metagenomic”
• Some larger tblastx jobs have run >20 hrs
• 10kbp BLASTN vs All Metagenomic – 1 min
• BLAST XML or Tabular Export
• Searches against NRAA
• BLAST XML output feeds MEGAN
• Searches against ‘All Metagenomic’
• GUI with metdata
• Tabular with metadata
![Page 32: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/32.jpg)
Scalable BLAST with Metadata
![Page 33: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/33.jpg)
Integration of Metadata and Data
![Page 34: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/34.jpg)
Browsing Large Data Collections: Fragment Recruitment Viewer
• Microbial Communities vs Reference Genomes- Millions of sequence reads vs Thousands of genomes
• Definition: A read is recruited to a sequence if:- End-to-end blastN alignment exists
• Rapid Hypothesis Generation and Exploration- How do cultured and wildtype genomes differ?
- Insertions, deletion, translocations
- Correlation with environmental factors
• Export sequence and annotation• Credits: Doug Rusch and Michael Press
![Page 35: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/35.jpg)
Fragment Recruitment ViewerS
eque
nce
Sim
ilarit
y
Genomic Position
Doug Rusch, JCVI
![Page 36: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/36.jpg)
Seq
uenc
e S
imila
rity
Genomic Position
Annotation
Geographic Legend
![Page 37: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/37.jpg)
![Page 38: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/38.jpg)
![Page 39: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/39.jpg)
![Page 40: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/40.jpg)
Prochlorococcus marinus str. MIT 9312
• Coloring by geography • 80-95% identity cloud • = GOS Indian Ocean• Regions with no coverage
• Where?• Real?
![Page 41: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/41.jpg)
Mate Status Highlights Differences
• Paired end (mate) sequencing • Coloring by mate status• Highlights cultured vs metagenomic differences • Selective display of
- Mates by status- Reads by sample
![Page 42: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/42.jpg)
Mate Pairs Highlight Variation
![Page 43: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/43.jpg)
What Genes are Involved
![Page 44: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/44.jpg)
View
by
Sample
![Page 45: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/45.jpg)
View by Sample
Filter by mate status
![Page 46: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/46.jpg)
Annotation ofEnvironmental Shotgun Data
• Gene Finding- Using Yooseph’s Protein Clusters, and/or- Metagene
• Functional Assignment- Variation of JCVI prok annotation pipeline*- Leverages protein cluster annotation -- soon
• Quality Nearly Comparable to Prokaryotic Genomic Annotation
![Page 47: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/47.jpg)
Protein Clusters as Gene Finder
• Identification and soft mask of ncRNAs• Naïve identification of ORFs (60aa min)• Add peptides to clusters incrementally
- Yooseph and Li, 2008
• Predicted Genes based on ORFS in- Clusters of sufficient size- Clusters that satisfy additional filters
![Page 48: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/48.jpg)
Protein ClustersAdvantages and Disadvantages
• Weaknesses- Homology-based- Stateful (also a strength)- Less sensitive (for now)
• Strengths- More specific- Transitive Annotation- Learns over time- Easy to maintain
![Page 49: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/49.jpg)
Search for Dehalogenase
![Page 50: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/50.jpg)
Browse Clusters
![Page 51: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/51.jpg)
Near Future
• More extensive data collection
• Summary views of data sets by- Annotation
- Samples
- Mate Status
- Taxonomy
- Habitat and other contextual metadata
• 16S datasets?
![Page 52: C A M E R A A Metagenomics Resource for Microbial Ecology](https://reader036.fdocuments.us/reader036/viewer/2022070418/56815969550346895dc6a832/html5/thumbnails/52.jpg)
Credits• JCVI CAMERA Team
- Leonid Kagan, Michael Press, Todd Safford, Cristian Goina, Qi Yang, Sean Murphy, Jeff Hoover, Tanja Davidsen, Ramana Madupu, Sree Nampally, Nikhat Zhafar, Prateek Kumar
- Doug Rusch, Shibu Yooseph, Aaron Halpern*, Granger Sutton, Shannon Williamson
- Marv Frazier and Bob Friedman
• Calit2 CAMERA Team- Adam Brust, Michael Chiu, Brian Fox, Adam Dunne, Kayo
Arima
- Larry Smarr and Paul Gilna
http://camera.calit2.net