IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution...

Post on 16-Jan-2016

215 views 2 download

Tags:

Transcript of IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution...

iPlant cyberifrastructure to support ecological modeling

Presented at the Species Distribution Modeling Group at the American Museum of Natural History

May 30, 2014Ramona Walls

iPlant Collaborative Vision

Enable life science researchers and educators to use and extend cyberinfrastructure to understand and ultimately predict the complexity of biological systems.

What is cyberinfrastructure?

Data Storage

Software HPC People

iPlant CI

Storage and compute

Platforms, tools, datasets

Training, support, expertise

Software Hardware People

iPlant CI supports synthetic biology

Genotypic

Phylogenetic Tools for inference

Ecological Models

Crop Models

Association Studies

Molecular Networks

Environmental

Comparative Genomics

Sequencing & Assembly

Annotation

Environmental datasets

Phenotypic

Image-based Phenotyping

Molecular Phenotyping

Trait Data

Climate model products

iPlant is a collaborative virtual organization

iPlant collaborates to enable access to the solutions that work

the best for you.

OVERVIEW OF IPLANT TOOLS AND SERVICES

iPlant Data Store

Initial 100 GB allocation – TB allocations available

Automatic data backup

Easy upload /download and sharing

The resources you need to share and manage data with your lab, colleagues and community

AtmosphereCloud computing for the life sciences

Simple: One-click access to more than 100 virtual machine images

Flexible: Fully customize your software setup

Powerful: Integrated with iPlant computing and data resources

Discovery EnvironmentHundreds of bioinformatics Apps in an easy-to-use interface

A platform that can run almost any bioinformatics application

Seamlessly integrated with data and high performance computing

User extensible – add your own applications

BisqueImage analysis, management, and metadata

Secure image storage, analysis, and data management

Integrate existing algorithms or create new ones

Custom visualization and image handling routines and APIs

Agave APIFully customize iPlant resources

Science-as-a-service platform

Define your own compute and storage resources (local and iPlant)

Build your own app store of scientific codes and workflows

DNA SubwayEducational workflows for Genomes, DNA Barcoding, RNA-Seq

Commonly used bioinformatics tools in streamlined workflows

Teach important concepts in biology and bioinformatics

Inquiry-based experiments for novel discovery and publication of data

SUPPORT FOR ECOLOGICAL MODELING

Project Goals

• Provide computational support for scalable:– modeling of species’ geographic distribution

(SDM)– mechanistic eco-physiological modeling

Major limitations in the field of ecological modeling

• Access to data– environmental and organismal

• Access to high performance computing (HPC) tools that can support compute-intensive models

• Model development

iPlant can provide infrastructure to help overcome the first two challenges and partner with the community on the third challenge.

?

iPlant’s long-term vision for an ecological modeling infrastructure

• Modular access to climate layers• A query interface for finding and extracting

relevant occurence and trait data for the taxa of interest from iPlant’s Data Commons

• Powerful, flexible modeling tools• Sophisticated visualization of geospatial data

Initial plan is to provide access to:

• Environmental data• Organismal locality (occurence) data• High performance computing environment for

running models

Environmental Data

• Data layers are often large and difficult to work with, even though the researcher only needs a subset of the layer.

• Web services (e.g., GeoNode.org and GeoServer.org) can be harnessed to allow researchers to work with data layers stored remotely.

iPlant will make environmental data layers available through the Data Commons and GeoServer

• University Corporation for Atmospheric Research (UCAR)

• Oakridge National Laboratory’s Distributed Active Archive Center for Biogeochemical Dynamics (ORNL DAAC)

• NASA Earth Observing System Data and Information System (EOSDIS) available through the Data Commons.

• High-res layers from iPlant collaborators?

Organismal locality (occurence) data

• For many modeling efforts, users will supply their own list of species’ localities.

• Through the BIEN3 database, iPlant users will also have access to data for North American plants– includes cleaned-up Global Biodiversity Information

Framework (GBIF) data. – iPlant will provide a query interface for extracted

subsets of the BIEN data for use in ecological modeling.

• Some trait data will also available

Modeling tools 1

• Initially, iPlant will make an HPC version of Maxent available to users.

• Investigating the utility of making popular R packages for modeling (biomod2, Maxlike, and IPMpack) available through rPlant and wrapR, so that they can run on HPC resources.

Modeling tools 2

• More generally, ecological modeling will be supported through an HPC version Matlab.

• Because of licensing restrictions, users will initially be restricted to running Matlab models which they build on their own, licensed system.

• Stan and OpenBUGS are being considered to support Bayesian modeling.

?

Links

• http://www.iplantcollaborative.org/

• contact: rwalls_at_iplantcollaborative.org

Timeline• Q3 2014:

– HPC version of Maxent available through iPlant (DE or Atmosphere)– Availability of BIEN occurence data– Scope work on query and subsetting services for data layers*– Metadata template for environmental layers

• Q4 2014:– HPC version of Matlab for running models– Query interface for BIEN occurence data– Continue development of query and subsetting services for data layers*

• Q1 2015:– Ability to query environmental layers through Data Commons*– Ability to subset environmental layers through iPlant CI (DE, Atmosphere, or

API)– Species distribution modeling tutorial.

*May happen sooner through GeoNode