SGM Meeting, Warwick, April 2006

download SGM Meeting, Warwick, April 2006

If you can't read please download the document

description

Outline The envy is not mine A tour around the world, thanks to phage People suck What is the most successful gene in evolution? Is there a Future?

Transcript of SGM Meeting, Warwick, April 2006

SGM Meeting, Warwick, April 2006
Challenges for metagenomic data analysis and lessons from viral metagenomes[What would you do if sequencing were free?] Rob Edwards San Diego State University Fellowship for Interpretation of Genomes Outline The envy is not mine A tour around the world, thanks to phage People suck What is the most successful gene in evolution? Is there a Future? This is all 454 sequence data
21 libraries 10 microbial, 11 phage 597,340,328 bp total 20% of the human genome 50% of all complete and partial microbial genomes 5,769,035 sequences Average 274,716 per library Average read length bp Av. read length has not increased in 7 months Cost 0.04 per bp Sequencing is cheap and easy. Bioinformatics is neither. The Soudan Mine, Minnesota
Red StuffOxidized Black Stuff Reduced Red and Black Samples Are Different
Black stuff Cloned and 454 sequenced 16S are indistinguishable Cloned Red Red There are different amounts of metabolism in each environment There are different amounts of substrates in each environment
Stuff Black Stuff But are the differences significant?
Sample 10,000 proteins from site 1 Count frequency of each subsystem Repeat 20,000 times Repeat for sample 2 Combine both samples Sample 10,000 proteins 20,000 times Build 95% CI Compare medians from sites 1 and 2 with 95% CI Rodriguez-Brito (2006). BMC Bioinformatics Subsystem differences & metabolism Iron acquisition
Black Stuff Siderophore enterobactin biosynthesis ferric enterobactin transport ABC transporter ferrichrome ABC transporter heme Black stuff: ferrous iron (Fe2+, ferroan [(Mg,Fe)6(Si,Al)4O10(OH)8]) Red stuff: ferric iron (goethite [FeO(OH)]) Nitrification differentiates the samples
Edwards (2006) BMC Genomics The challenge is explaining the differences between samples
Red Sample Arg, Trp, His Ubiquinone FA oxidation Chemotaxis, Flagella Methylglyoxalmetabolism Black Sample Ile, Leu, Val Siderophores Glycerolipids NiFe hydrogenase Phenylpropionate degradation We can cheaply compare the important
biochemistry happening in different environments We dont care which organisms are doing the metabolism but we know what organisms are there Outline The envy is not mine A tour around the world, thanks to phage People suck What is the most successful gene in evolution? Is there a Future? Why Phages? Phages are viruses that infect bacteria
10:1 ratio of phages:bacteria 1031 phages on the planet Specific interactions (probably) one virus : one host Small genome size Higher coverage Horizontal gene transfer bp DNA per year in the oceans Cant do fosmids Phages In The Worlds Oceans
GOM 41 samples 13 sites 5 years SAR 1 sample 1 site 1 year BBC 85 samples 38 sites 8 years ARC 56 samples 16 sites LI 4 sites Most Marine Phage Sequences are Novel Phages are specific to environments
ssDNA -like Phage Proteomic Tree v. 5 (Edwards, Rohwer) T4-like T7-like Thanks: Mya Breitbart Marine Single-Stranded DNA Viruses
6% of SAR sequences ssDNA phage (Chlamydia-like Microviridae) 40% viral particles in SAR are ssDNA phage Several full-genome sequences were recovered via de novo assembly of these fragments Confirmed by PCR and sequencing SAR Aligned Against the Chlamydia 4
Individual sequence reads Coverage Concatenated hits Chlamydia phi 4 genome 12,297 sequence fragments hit using TBLASTX over a ~4.5 kb genome Outline The envy is not mine A tour around the world, thanks to phage People suck What is the most successful gene in evolution? Is there a Future? Phages, Reefs, and Human Disturbance Phages, Reefs, and Human Disturbance
Kingman Christmas Kingman Palmyra Washington Fanning Christmas The Northern Line Islands Expedition, 2005 Christmas to Kingman Bias in No. Phage Hosts
Negative numbers mean relatively more phage hosts at Kingman More pathogens at Christmas. More people at Christmas. More photosynthesis at Kingman. No people at Kingman. Outline The envy is not mine A tour around the world, thanks to phage People suck What is the most successful gene in evolution? Is there a Future? Phages enrich for important genes
Rios Mesquites Stromatolites No photosynthesis genes in phages Pozas Azules Stromatolites 5 different photosynthesis genes in phages RNR is the most successful reaction in evolution Outline The envy is not mine A tour around the world, thanks to phage People suck What is the most successful gene in evolution? Is there a Future? Computational Challenges
Sequence annotations and analysis What is there? What is it doing? How is it doing it? Gene predictions in unknowns Lutz Krause (Bielefeld) Sequence comparisons BLAST Other ways to rapidly compare short sequences What happens when everyone is using 454 sequencing? Sequence data from 21 libraries
600 million bp 6 million sequences Each BLASTX search takes 1,000 CPU hours 21 libraries = 21,000 CPU hours or 2.4 CPU years Users want repeat runs, TBLASTX, more analysis more data more, more, more, more SDSU Forest Rohwer USF Mya Breitbart Rohwer Lab Stromatolites ANL
Beltran Rodriguez-Brito USF Mya Breitbart Rohwer Lab Linda Wegley Florent Angly Matt Haynes Stromatolites Janet Seifert Rice University) Valeria Souza (UNAM, Mexico) ANL Rick Stevens Bob Olsen CI Support FIG Veronika Vonstein Ross Overbeek Annotators Also at SDSU Anca Segall Stanley Maloy Math Peter Salamon Joe Mahaffy James Nulton Ben Felts David Bangor Steve Rayhawk Jennifer Mueller UBC Curtis Suttle Amy Chan MIT: Ed DeLong