IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution...

25
iPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History May 30, 2014 Ramona Walls

Transcript of IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution...

Page 1: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

iPlant cyberifrastructure to support ecological modeling

Presented at the Species Distribution Modeling Group at the American Museum of Natural History

May 30, 2014Ramona Walls

Page 2: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

iPlant Collaborative Vision

Enable life science researchers and educators to use and extend cyberinfrastructure to understand and ultimately predict the complexity of biological systems.

Page 3: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

What is cyberinfrastructure?

Data Storage

Software HPC People

iPlant CI

Storage and compute

Platforms, tools, datasets

Training, support, expertise

Software Hardware People

Page 4: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

iPlant CI supports synthetic biology

Genotypic

Phylogenetic Tools for inference

Ecological Models

Crop Models

Association Studies

Molecular Networks

Environmental

Comparative Genomics

Sequencing & Assembly

Annotation

Environmental datasets

Phenotypic

Image-based Phenotyping

Molecular Phenotyping

Trait Data

Climate model products

Page 5: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

iPlant is a collaborative virtual organization

Page 6: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

iPlant collaborates to enable access to the solutions that work

the best for you.

Page 7: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

OVERVIEW OF IPLANT TOOLS AND SERVICES

Page 8: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

iPlant Data Store

Initial 100 GB allocation – TB allocations available

Automatic data backup

Easy upload /download and sharing

The resources you need to share and manage data with your lab, colleagues and community

Page 9: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

AtmosphereCloud computing for the life sciences

Simple: One-click access to more than 100 virtual machine images

Flexible: Fully customize your software setup

Powerful: Integrated with iPlant computing and data resources

Page 10: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

Discovery EnvironmentHundreds of bioinformatics Apps in an easy-to-use interface

A platform that can run almost any bioinformatics application

Seamlessly integrated with data and high performance computing

User extensible – add your own applications

Page 11: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

BisqueImage analysis, management, and metadata

Secure image storage, analysis, and data management

Integrate existing algorithms or create new ones

Custom visualization and image handling routines and APIs

Page 12: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

Agave APIFully customize iPlant resources

Science-as-a-service platform

Define your own compute and storage resources (local and iPlant)

Build your own app store of scientific codes and workflows

Page 13: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

DNA SubwayEducational workflows for Genomes, DNA Barcoding, RNA-Seq

Commonly used bioinformatics tools in streamlined workflows

Teach important concepts in biology and bioinformatics

Inquiry-based experiments for novel discovery and publication of data

Page 14: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

SUPPORT FOR ECOLOGICAL MODELING

Page 15: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

Project Goals

• Provide computational support for scalable:– modeling of species’ geographic distribution

(SDM)– mechanistic eco-physiological modeling

Page 16: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

Major limitations in the field of ecological modeling

• Access to data– environmental and organismal

• Access to high performance computing (HPC) tools that can support compute-intensive models

• Model development

iPlant can provide infrastructure to help overcome the first two challenges and partner with the community on the third challenge.

?

Page 17: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

iPlant’s long-term vision for an ecological modeling infrastructure

• Modular access to climate layers• A query interface for finding and extracting

relevant occurence and trait data for the taxa of interest from iPlant’s Data Commons

• Powerful, flexible modeling tools• Sophisticated visualization of geospatial data

Page 18: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

Initial plan is to provide access to:

• Environmental data• Organismal locality (occurence) data• High performance computing environment for

running models

Page 19: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

Environmental Data

• Data layers are often large and difficult to work with, even though the researcher only needs a subset of the layer.

• Web services (e.g., GeoNode.org and GeoServer.org) can be harnessed to allow researchers to work with data layers stored remotely.

Page 20: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

iPlant will make environmental data layers available through the Data Commons and GeoServer

• University Corporation for Atmospheric Research (UCAR)

• Oakridge National Laboratory’s Distributed Active Archive Center for Biogeochemical Dynamics (ORNL DAAC)

• NASA Earth Observing System Data and Information System (EOSDIS) available through the Data Commons.

• High-res layers from iPlant collaborators?

Page 21: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

Organismal locality (occurence) data

• For many modeling efforts, users will supply their own list of species’ localities.

• Through the BIEN3 database, iPlant users will also have access to data for North American plants– includes cleaned-up Global Biodiversity Information

Framework (GBIF) data. – iPlant will provide a query interface for extracted

subsets of the BIEN data for use in ecological modeling.

• Some trait data will also available

Page 22: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

Modeling tools 1

• Initially, iPlant will make an HPC version of Maxent available to users.

• Investigating the utility of making popular R packages for modeling (biomod2, Maxlike, and IPMpack) available through rPlant and wrapR, so that they can run on HPC resources.

Page 23: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

Modeling tools 2

• More generally, ecological modeling will be supported through an HPC version Matlab.

• Because of licensing restrictions, users will initially be restricted to running Matlab models which they build on their own, licensed system.

• Stan and OpenBUGS are being considered to support Bayesian modeling.

?

Page 24: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

Links

• http://www.iplantcollaborative.org/

• contact: rwalls_at_iplantcollaborative.org

Page 25: IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.

Timeline• Q3 2014:

– HPC version of Maxent available through iPlant (DE or Atmosphere)– Availability of BIEN occurence data– Scope work on query and subsetting services for data layers*– Metadata template for environmental layers

• Q4 2014:– HPC version of Matlab for running models– Query interface for BIEN occurence data– Continue development of query and subsetting services for data layers*

• Q1 2015:– Ability to query environmental layers through Data Commons*– Ability to subset environmental layers through iPlant CI (DE, Atmosphere, or

API)– Species distribution modeling tutorial.

*May happen sooner through GeoNode