Galaxy

77
Galaxy (http://g2.bx.psu.edu)

description

Title: GalaxyAuthor: James Taylor

Transcript of Galaxy

Page 1: Galaxy

Galaxy

(http://g2.bx.psu.edu)

Page 2: Galaxy

What is Galaxy?

• An open-source framework for integrating various computational tools and databases into a cohesive workspace

• A web-based service we (Penn State) provide, integrating many popular tools and resources for comparative genomics

• A completely self-contained Python application for building your own Galaxy style sites

Page 3: Galaxy

Galaxy’s web user interface

Page 4: Galaxy
Page 5: Galaxy
Page 6: Galaxy
Page 7: Galaxy
Page 8: Galaxy
Page 9: Galaxy
Page 10: Galaxy
Page 11: Galaxy
Page 12: Galaxy
Page 13: Galaxy
Page 14: Galaxy
Page 15: Galaxy
Page 16: Galaxy
Page 17: Galaxy

Integrating tools into Galaxy

Page 18: Galaxy
Page 19: Galaxy
Page 20: Galaxy
Page 21: Galaxy

How Galaxy integrates existing web-based tools

Page 22: Galaxy

Proxy based tools

User makes request to Galaxy

Page 23: Galaxy

Proxy based tools

Galaxy delegates request to external site

Page 24: Galaxy

Proxy based tools

External site generates response

• If data, Galaxy determines type, processes, and adds to ‘history’

• Otherwise, return response to user

Page 25: Galaxy

External tools

User makes request to Galaxy

Page 26: Galaxy

External tools

Galaxy sends user directly to external site with extra URL data

Page 27: Galaxy

External tools

User interacts directly with external site

Page 28: Galaxy

External tools

When data is generated the user is sent back to Galaxy. Data can be fetched immediately, or wait for notification from the external site

Page 29: Galaxy
Page 30: Galaxy
Page 31: Galaxy
Page 32: Galaxy
Page 33: Galaxy
Page 34: Galaxy
Page 35: Galaxy
Page 36: Galaxy

How Galaxy integrates existing command line tools

Page 37: Galaxy
Page 38: Galaxy
Page 39: Galaxy

HTML inputs generated from abstract parameter description

Page 40: Galaxy

HTML inputs generated from abstract parameter description

Page 41: Galaxy

HTML inputs generated from abstract parameter description

Page 42: Galaxy

HTML inputs generated from abstract parameter description

Page 43: Galaxy

Tool help generated from a simple text format

Page 44: Galaxy

Automatic input validation based on type, or more...

Page 45: Galaxy
Page 46: Galaxy

}Template for generating command line from parameter values

Page 47: Galaxy

} Output datasets generated by the tool

Page 48: Galaxy

} Special actions to be run before / after execution

Page 49: Galaxy

Functional tests to be run with the “full stack” in place

Page 50: Galaxy

Running functional tests for a speci!c tool on the command line

Page 51: Galaxy

Test results, on command line and as HTML report

Page 52: Galaxy

Dealing with more complex interface needs

Page 53: Galaxy
Page 54: Galaxy
Page 55: Galaxy

Repeating sets of parameters

Page 56: Galaxy

Template language for building complex command lines

Page 57: Galaxy
Page 58: Galaxy
Page 59: Galaxy
Page 60: Galaxy

Conditional groups, grouping constructs can be nested

Page 61: Galaxy

Command line tool expects a con!guration !le

Page 62: Galaxy

Con!guration !le is generated based on user input

Page 63: Galaxy

Job execution in Galaxy

Page 64: Galaxy

Flexible execution environment

• Dependencies between jobs handled by “JobManager” within Galaxy.

• Either in-process with the web application, or a separate process managing a queue to which multiple front-ends submit

Page 65: Galaxy

Flexible execution environment

• Once jobs are ready, submitted to a “JobRunner”

• Runners are pluggable

• Can have multiple runners, and jobs to di"erent runners depending on capabilities

• Current implementations:

• Local runner executing a limited number of local processes

• PBS runner dispatches to a cluster of worker nodes

• Pluggable queueing policies in the works!

Page 66: Galaxy

Deeper customization of Galaxy

Page 67: Galaxy

Galaxy web interface is easily customized / branded

Page 68: Galaxy

Custom datatypes

• Datatypes supported by a Galaxy instance can be con!gured at runtime

• Completely reengineering “metadata”

• Easy way to de!ne custom metadata

• Automatically generated editing interfaces (similar to tool interfaces)

• Actions on datatypes (displaying at external sites, format conversion) all pluggable

• Nothing “genomics” speci!c will be hardcoded!

Page 69: Galaxy

The future

Page 70: Galaxy

Future tool development

• Tools for statistical genetics

• Collaborating closely with the “RGenetics” project (http://rgenetics.org)

• Tools for phylogenetic analysis

• Based on HyPhy (http://hyphy.org)

Page 71: Galaxy

Work#ow support

• Work#ow construction by example

• Users will continue to build analysis as they do now, and will be able to extraction portions of their histories as reusable work#ows

• Will probably work for most existing histories! (we’ve been saving the right data all along)

• Explicit work#ow construction and editing

• Support for repetitive invocation of tools and work#ows, and aggregation of results

• Saving and sharing of work#ows, reproducible!

Page 72: Galaxy

Some Technical Details

Page 73: Galaxy

Under the hood

• Python 2.4, though some dependencies use CPython speci!c extensions

• Web framework: PythonPaste, Routes, WebHelpers, Beaker, CheetahTemplate, ...

• SQLAlchemy for database abstraction

Page 74: Galaxy

Out of the box con!guration

• Just checkout from subversion and run!

• All dependencies packaged as eggs

• Pure python HTTP server included(paste.httpserver)

• Embedded database (sqlite)

• Datasets stored on local !lesystem

• Jobs run locally

Page 75: Galaxy

PSU production con!guration

• Deployed behind Apache using mod_proxy

• Python threads do not scale across CPUs, we use both forking and threading similar to Apache’s worker MPM

• PostgreSQL

• Jobs dispatched to a PBS cluster using “pbs-python”

Page 76: Galaxy

The core Galaxy development team

Page 77: Galaxy

Acknowledgements

• Galaxy collaborators:

• Ross Lazarus, Sergei Kosakovsky Pond

• UCSC Genome Browser team

• Biomart team

• National Science Foundation