SubmitR - · PDF fileDiaGrid resources. • When its completed, the output is returned to...
Transcript of SubmitR - · PDF fileDiaGrid resources. • When its completed, the output is returned to...
SubmitR
• http://diagrid.org
• Gateway to computing resources
• Based on HUBzero platform
• Hub = Community:
– Wish list
– Forum
– Issue tracker
– Tools
Hub Tools
• GUI programs
• Solve workflow problems
• Can be made easily available to non-Purdue users
• You can create tools
• Easily developed with Rappture
• Long runner – Submit and forget
• Parallel execution– Requires parallel library
• Parameter sweeps– Including Monte Carlo simulations
Use SubmitR for...
• Single: one process
• Parallel: multiple processes communicating with each other
• Sweep: many isolated processes – Different parameter values on the command
line, within a data file, or both
Job Types
SubmitR
R Script & Files
R Script & Files(Individually, .tar.gz, or .zip)
Results, logs, and uploaded files (within single .zip file)
Results / Output
ElectroGraph GWASExactHW KernSmooth MASS Matrix PBSmapping base boot class cluster codetools compiler cubature datasets deldir foreign grDevices graphics
grid igraph lattice maptools methods mgcv mvtnorm ncf nlme nnet np parallel plotrix plyr qtl raster rgdal rgeos
rpart snow snowfall sp spatial spatstat splancs splines stats stats4 stpp stringr survival tcltk tools utils
AvailableR
Libraries
• Process for requesting installation of new libraries
• http://diagrid.org
• Questions?
Thank you!
1
SubmitR
• Introduction• Brief overview of “SubmitR”• Disclaimers:
– New to the R language ● Don't be surprised if confused by a question.
– SubmitR at version 1.0● Open to feedback.
• SubmitR designed to ease task of running R scripts on a remote HPC system.
• Accessed by way of DiaGrid.• Before we get into SubmitR, let's take a very quick
look at DiaGrid.
2
• http://diagrid.org
• Gateway to computing resources
• Based on HUBzero platform
• Hub = Community:
– Wish list
– Forum
– Issue tracker
– Tools
• This site acts as a gateway to computing resources.
• Uses the HUBzero platform.• Enables a community by providing things like
forums and ticketing systems. • Also provides for tools.
3
Hub Tools
• GUI programs
• Solve workflow problems
• Can be made easily available to non-Purdue users
• You can create tools
• Easily developed with Rappture
• Tools are applications that run embedded within the hub website itself.
• They can be published to a wide audience of users.
• SubmitR is a hub tool.
• Long runner – Submit and forget
• Parallel execution– Requires parallel library
• Parameter sweeps– Including Monte Carlo simulations
Use SubmitR for...
• Here's some of the use cases we see for SubmitR.
• Scripts with extreme runtimes.• Scripts that need access to clusters for
parallel execution.• Or, if you want to invoke a large number of
runs that sweep over a range of values.• One thing we've been looking at is Monte
Carlo simulations.
• Single: one process
• Parallel: multiple processes communicating with each other
• Sweep: many isolated processes – Different parameter values on the command
line, within a data file, or both
Job Types
• And based on that, we split out some different job types:
– A single job for long runners and simple tests.
– A parallel run for process that need to talk to each other...
● Of course you need to use a supporting parallel library.
– And a parameter sweep job...
● With sweep values that change either on the command line, or within a data file, or both.
SubmitR
R Script & Files
R Script & Files(Individually, .tar.gz, or .zip)
Results, logs, and uploaded files (within single .zip file)
Results / Output
• Here's the basic flow of SubmitR• Staring in the lower left...• Use a browser to upload your R script and
any input files.• SubmitR then uses the hub's “submit”
service to execute the script on available DiaGrid resources.
• When its completed, the output is returned to SubmitR and then you can download it.
• Uploading files:• Bring in your R script or scripts and any
input files you might need.• Upload files individually or in an archive
file (zipped/tared/gzipped). They are automatically extracted.
• Of course there's a practical limit to how much data you can upload and store.
• Setting up the job:• Select the R script to run and specify any
extra R command line options.• This includes options for R itself and
command line arguments that the script might use.
• Submit the job for execution.• This sends the job out to queue up for a
run.• The log area indicates what we've done so
far.• The status line is updated periodically-
every minute or so.• It is ok to log off of diagrid.org and check
back later. Just make sure press “Keep for later” (instead of “Terminate)!
• Review output of completed job• Standard output is collected and displayed
in the output tab.• If the job had spawned multiple runs, the
output of each one would also appear here.
• Download everything in one zip file:• This includes the redirected standard
output as well as any output files and the uploded R scripts and input files.
• Important! After the files are collected and downloaded, they're all deleted from the server.
• This clears the app so it can be set up for another run.
ElectroGraph GWASExactHW KernSmooth MASS Matrix PBSmapping base boot class cluster codetools compiler cubature datasets deldir foreign grDevices graphics
grid igraph lattice maptools methods mgcv mvtnorm ncf nlme nnet np parallel plotrix plyr qtl raster rgdal rgeos
rpart snow snowfall sp spatial spatstat splancs splines stats stats4 stpp stringr survival tcltk tools utils
AvailableR
Libraries
• Process for requesting installation of new libraries
• (List as of Nov. 2012. More to be added soon.)