Big Data Field Museum

Click here to load reader

download Big Data Field Museum

of 58

  • date post

    16-Jul-2015
  • Category

    Science

  • view

    85
  • download

    0

Embed Size (px)

Transcript of Big Data Field Museum

PowerPoint Presentation

Riding the big data tidal wave of modern microbiologyAdina HoweArgonne National Laboratory / Michigan State UniversityIowa State University, Ag & Biosystems Engr (January)Thank BeckettJourney with big data1Understanding community dynamicsWho is there?What are they doing? How are they doing it?

The questions we have in understanding microbes have not changed much2

Understanding community dynamicsWho is there?What are they doing? How are they doing it?

Kim Lewis, 2010Historically, we have been asking these questions in model organisms.The challenge of model organismscomparing them to what we know is in the environment3Gene / Genome SequencingCollect samples Extract DNASequence DNAAnalyze DNA to identify its content and origin

Taxonomy (e.g., pathogenic E. Coli)Function (e.g., degrades cellulose)First automated DNA sequencing machines late 80s, New ay of asking questions.4Effects of low cost sequencing

1995First free-living bacterium sequenced for billions of dollars and years of analysis

Personal genome can be mapped in a few days and hundreds to few thousand dollarsHighlighted in recent news5The experimental continuum

Single Isolate Pure CultureEnrichmentMixed CulturesNatural systemsOpportunities and changes in the systems we study. So then the question is not only who is there and what they are doing? But what are they doing together and how?6The era of big data in biologyStein, Genome Biology, 2010

Computational Hardware(doubling time 14 months)

Sanger Sequencing(doubling time 19 months)

NGS (Shotgun) Sequencing(doubling time 5 months)

1990 1992 1994 1996 1998 2000 2003 2004 2006 2008 2010 2012Year 01101001,00010,000100,0001,000,000Disk Storage, Mb/$0.11101001,00010,000100,0001,000,000DNA Sequencing, Mbp per $10,000,000100,000,000

0.11101001,00010,000100,0001,000,00010,000,000100,000,000The growth point out NGS imapctAccompanied by challenges of computationeven to store data on.7Postdoc experience with data

2003-2008 Cumulative sequencing in PhD = 2000 bp2008-2009 Postdoc Year 1 = 50 Gbp2009-2010 Postdoc Year 2 = 450 Gbp = 50 Tbp = 500 Tbp budgetedData during my career really reflects this groth.During postdoc, first year, 50 million reads to about 40x that within literally 9 months. data increased 25x million times.Notice the gap from 2010 2014, figuring stuff out. 8Soil Census to Soil Catalogs: Who is there?

Targetting conserved regions of known genes

Most popular:16S ribosomal RNA gene conserved in bacteria and archaea

Who is there - community profiling based on sequence similarityMust have previous knowledge of genes Must infer function based on phylogeny not advisedTARGETTED SEQUENCING STRATEGYThe goal is to understand the communities in the soil. The challenge is that the community in the soil is too large to sample. Using the targeted approaches, youll see many microbial soil and enviornmental studies report data on community membership and structure.

These investigations target the 16S rib RNA gene which is conserved in bacteria and archaea. Because this gene is conserved, this allows the sampling of these genes to result in a comparison of how similar these biomarkers are within a community. Basically you take each sequence of each gene and align into the other genes youve sampled. And from that you can identify a community structure profile that you can then compare between samples.

You can compare sampled sequences to previously observed sequences and identify who and how much of that microorganism is in your soils. 9Soil Census to Soil Catalogs: Who is there?

Targetting conserved regions of known genes

Most popular:16S ribosomal RNA gene conserved in bacteria and archaea

Who is there - community profiling based on sequence similarityMust have previous knowledge of genes Must infer function based on phylogeny not advised$15 / sampleTARGETTED SEQUENCING STRATEGYThe goal is to understand the communities in the soil. The challenge is that the community in the soil is too large to sample. Using the targeted approaches, youll see many microbial soil and enviornmental studies report data on community membership and structure.

These investigations target the 16S rib RNA gene which is conserved in bacteria and archaea. Because this gene is conserved, this allows the sampling of these genes to result in a comparison of how similar these biomarkers are within a community. Basically you take each sequence of each gene and align into the other genes youve sampled. And from that you can identify a community structure profile that you can then compare between samples.

You can compare sampled sequences to previously observed sequences and identify who and how much of that microorganism is in your soils. 10Tackling Soil Biodiversity

Source: Chuck HaneyC. Titus Brown, James Tiedje, Qingpeng Zhang, Jason Pell (MSU)Janet Jansson, Susannah Tringe (JGI)Soil biodiversity is amazing. Great Prairie worlds most fertile. Important reference site for the biological baseis and ecosystems of soil microbial communities. It sequesters most carbon, produces large amount of biomass anually, key for biofuels and security. Know little about the who / what in these soils.Excitement about what we could clean now with the technologies.

11THE DIRT ON SOIL

Biodiversity in the dark, Wall et al., Nature Geoscience, 2010

Jeremy Burgress MAGNIFICENT BIODIVERSITYMost of us now recognize that microbial communities generally exhibit a high level of diversity, much highter than previously assume by what was revealed by classical microscopy and basic culturing techniques.

In soil, even in one gram of soil, there is estimated to be more microbial species than there are stars in the galaxy. We have far to go for any comprehensive characterization of any single soil community. A key question then Is why is soil diversity so high?

12THE DIRT ON SOILSPATIAL HETEROGENEITY

http://www.fao.org/

www.cnr.uidaho.eduOne reason may be that the soil structure provides unique niche that provide a high diversity of food resources.

Its varied structure provides stable, protective, and even ancient environments for microorganims.13THE DIRT ON SOILDYNAMIC

Soil investigations are further complicated by the primarily dormant state of the large majority of the soil microbial population. The turnover rate of soil microbes is predicted to be over 30 fold and even up to 300 fold slower than that of microbes in the oceans.

And these microbes live in relatively unpredicatlbe patterns of pertubations for example rainfall or leaf litter introduction. They also undergo defined temporal perturbations diurnal energy input. 14THE DIRT ON SOILINTERACTIONS: BIOTIC, ABIOTIC, ABOVE, BELOW, SCALES

Philippot, 2013, Nature Reviews MicrobiologyThis complexity in the soil has formed a dynamic microbial ecosystem which interacts with nutrients, plants, and the soil structure itself at multiple scales.

I would argue that we as a field are still trying to find tractable methods of accessing these interactions and understanding the drivers of healthy or productive soils.

15Our shared challenges

Climate ChangeEnergy Supply

USGCRP 2009

www.alutiiq.comhttp://guardianlv.com/

Human HealthAn understanding of microbial ecologyThere are several grand challenges that our society is currently facing which I think are of paramount importance. These are predicting and managing the impacts of climate change, finding sustainable sources of liquid fuels, and understanding the emerging pandemics facing human health in recent years. From carbon emissions from land use (which is magnitudes more than that of car emissions), degrading cellulosic biomass, to pathogens in our bodies, microbes are involved in complex communities that drive the health and productivity of either our natural resources or our own bodies. And its buidling up the expertise to ask 16SOIL MICROBIOLOGY: CARBON REGULATION

The anthropogenic CO2 production is only 10% of that of the soilSustainable agriculture permits carbon sequestration in the range of 0.3 1 ton of C/ha.yr ~ 10% of all carbon emitted by cars(Denman et al., 2007; Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change)

For example, microbes in the soil help cycle important nutrients for plants to grow while also impacting global flows of important elements such as carbon and nitrogen.

In fact, when you estimate the carbon production of CO2 in soils and compare it to automative emissions, youd find that anthropogenic sources of CO2 make up only 10% of that of the soils which has a lot to do with the underlying microbes. As a consequence, you could capture roughly about 10% of all carbon emitted by cars just by employing sustainable agriculture practices.

Understanding these processes in the soil can help us then learn how to both predict the impacts of our land management strategies on climate change but also help us understand how we can best manage our limited soil resources.

17Tackling Soil Biodiversity

Source: Chuck HaneyC. Titus Brown, James Tiedje, Qingpeng Zhang, Jason Pell (MSU)Janet Jansson, Susannah Tringe (JGI)Soil biodiversity is amazing. Great Prairie worlds most fertile. Important reference site for the biological baseis and ecosystems of soil microbial communities. It sequesters most carbon, produces large amount of biomass anually, key for biofuels and security. Know little about the who / what in these soils.Excitement about what we could clean no