The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

Post on 05-Dec-2014

424 views 2 download

Tags:

description

iPlant Presentation given at NESCent in June 2012 for the phylotastic participants of Phylotastic

Transcript of The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

The iPlant Collaborative: A Cyberinfrastructure for the Life

Sciences

Naim MatasciBIO5 / The iPlant Collaborative

What is iPlant?

http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html

Problem 1: Data Volume

• Cost of analysis follows Moore's Law:– 1 Student with 1 computer to analyze 1 Mb of

data produced in 2001– 200 Students and 200 computers to analyze all

data produced for the same cost today (10 Gb)

1. Tools separated by compute platform, data format, integration issues, and programming model.

2. Mixture of desktop, command line, database, and web-based tools

3. Labor intensive, fragile solutions devised to reach scientific objectives

4. Little ability to share results, analytical methods

5. Lack of reproducibility

Problem 2: Fragmented Analytical Landscape

Scalability

ABI 3730 DNA Analyzer, illumina Genome Analyzer, Joe Felesenstein ca. 1980, Ranger Cluster at TACC

10

Major Ways to Access iPlant

• Storing and sharing data large and small: iPlant Data Storage• Integrated web-based analysis: The Discovery Environment• Cloud computing: Atmosphere• Applications: TNRS, TreeViewer, PhytoBisque, etc• Scientific networking, knowledgebase and information

exchange: My-Plant.org • Educational tools: DNASubway• Embedding iPlant CI capabilities into software: The

Foundation API• High Performance Computing for experts: TeraGrid/XSEDE

Why is the tree of life important?

“Knowledge of evolutionary relationships is fundamental to biology, yielding new insights across the plant sciences, from comparative genomics and molecular evolution, to plant development, to the study of adaptation, speciation, community assembly, and ecosystem functioning.”

Nothing in biology makes sense except in the light of evolution.

T. G. Dobzahnsky

C3 to C4 Photosynthesis

Xin-Guang et al. 2008

"We combined geospatial and molecular sequence data from two public archives to produce a 1,230-taxon phylogeny of the grasses with accompanying climate data for all species, extracted from more than 1.1 million herbarium specimens."

Edwards and Smith, 2010

"Here we show that grasses are ancestrally a warm-adapted clade and that C4 evolution was not correlated with shifts between temperate and tropical biomes. Instead, 18 of 20 inferred C4 origins were correlated with marked reductions in mean annual precipitation."

New Possibilities

illumina Genome Analyzer, Ranger Cluster at TACC, Acer phylogeny (Ackerly 2009), Green Plant ToL

Just Ask

Atmosphere

iPlant's APIs – The Foundation APIService

EndpointRole

IO File storage, retrieval and management. Database interoperability

DATA File format conversion

APPS Registration and discovery of HPC applications

JOB Submission and management of compute jobs

SYSTEMS Availability and info about XSEDE hosts

PROFILE User profile discovery

AUTH Token based secure authentication

POSTIT URL shortener

25

Consumer Applications

iPlant Data Store

Dramatization: Not the actual iPlant Data Store

Overview of the iPlant Data StoreSome important items we won’t see in the demo

Source Destination Copy Method Time (seconds)

CD My Computer cp 320

Berkeley Server My Computer scp 150

External Drive My Computer cp 36

USB2.0 Flash My Computer cp 30

iDS MyComputer iget 18

My Computer My Computer cp 15

Close to optimum conditions; transfer between

Univ. of Arizona and UC Berkeley

100GB: 29m15s

1 GB / 17.5 seconds

Tree Visualization

• > 500K Taxa• Fast• Web based, platform independent• Semantic zooming• Metadata driven display of information

iPlant Tree Viewer

http://portnoy.iplantcollaborative.org/

LIVE TREE VIEW DEMO

Obstacles

Number of taxa Taxa names

Taxonomic uncertainty

1. Non-existent names• Misspellings• Contamination

• Annotations• Morphospecies• Digitization issues (frame shifts, character encoding)Lexical

variants (digitization conventions)

2. Synonymy• Nomenclatural synonyms• Taxonomic synonyms / concepts

3. Misidentifications, incomplete identifications

a) Centaurium curvistamineum (Wittr.) Abrams (1951)

b) Centaurium minimum (Howell) Piper (1915)

c) Centaurium muhlenbergii (Griseb.) Wight ex Piper (1906)

d) Centaurium muhlenbergii (Griseb.) Wight ex Piper forma albiflorum (Suksd.) St. John (1937)

e) Centaurium muhlenbergii (Griseb.) Wight ex Piper var. albiflorum Suksd. (1927)

f) Centaurodes muhlenbergii (Griseb.) Kuntze (1891)

g) Erythraea curvistaminea Wittr. (1886)

h) Erythraea minima Howell (1901)i) Erythraea muhlenbergii Griseb.

(1839)

Image: Gordon Leppig & Andrea J. Pickart

Request Tool Installation

Apps -> Create -> New App

Create New -> Request Tool Installation

Fill out forms and submit.Receive response in 2-5 days.