CMTS Cyberinfrastructure - University of Chicago
Transcript of CMTS Cyberinfrastructure - University of Chicago
CMTS Cyberinfrastructure
Cyberinfrastructure Challenges
• Expressing mul>-‐scale workflow protocols
• Leveraging and expressing parallelism
• Moving, processing and managing data
• Traversing networks and security • Diverse environments:
schedulers, storage… • Handling system failures • Tracking what was done:
where, when, how, who?
Campus-‐wide clusters and archival storage
Departmental clusters, storage, and
personal worksta>ons
Cyberinfrastructure Tasks
§ Enable easy of use of mul>ple distributed large-‐scale systems
§ Reduce effort of wri>ng parallel simula>on and analysis scripts
§ Record and organize provenance of computa>ons
§ Annotate datasets to facilitate sharing, collabora>on, and valida>on
§ Reduce the cost of modeling, both in terms of wall clock >me and computer >me.
Swift Parallel Scripting Language: A Core Component of CMTS Cyberinfrastructure
• Swi$ is a parallel scrip>ng language • Composes applica,ons linked by files • Captures protocols and processes as a CMTS library
• Easy to write: a simple, high-‐level language • Small Swi: scripts can do large-‐scale work
• Easy to run: on clusters, clouds, supercomputers and grids • Sends work to XSEDE, campus, and Amazon resources
• Automates solu>ons to four hard problems • Implicit parallelism • Transparent execu,on loca,on • Automated failure recovery • Provenance tracking
Swi: is a parallel scrip>ng language Composes applica,ons linked by files Easy to write: a simple, high-‐level language Small Swi: scripts can do large-‐scale work Easy to run: on clusters, clouds and grids Sends work to XSEDE, Amazon, OSG, Cray Fast and highly parallel Runs a million tasks on thousands of cores hundreds of tasks per second
Swi: is a parallel scrip>ng language Composes applica,ons linked by files Easy to write: a simple, high-‐level language Small Swi: scripts can do large-‐scale work Easy to run: on clusters, clouds and grids Sends work to XSEDE, Amazon, OSG, Cray Fast and highly parallel Runs a million tasks on thousands of cores hundreds of tasks per second
Swi: is a parallel scrip>ng language Composes applica,ons linked by files Easy to write: a simple, high-‐level language Small Swi: scripts can do large-‐scale work Easy to run: on clusters, clouds and grids Sends work to XSEDE, Amazon, OSG, Cray Fast and highly parallel Runs a million tasks on thousands of cores hundreds of tasks per second
Swi: is a parallel scrip>ng language Composes applica,ons linked by files Easy to write: a simple, high-‐level language Small Swi: scripts can do large-‐scale work Easy to run: on clusters, clouds and grids Sends work to XSEDE, Amazon, OSG, Cray Fast and highly parallel Runs a million tasks on thousands of cores hundreds of tasks per second
Swi: is a parallel scrip>ng language Composes applica,ons linked by files Easy to write: a simple, high-‐level language Small Swi: scripts can do large-‐scale work Easy to run: on clusters, clouds and grids Sends work to XSEDE, Amazon, OSG, Cray Fast and highly parallel Runs a million tasks on thousands of cores hundreds of tasks per second
Swi9’s run>me system supports and aggregates diverse, distributed execu>on environments
Using Swift in the Larger Cyberinfrastructure Landscape
Submit host (login node, laptop, Linux server)
Data servers
Swift script
Local data Campus systems Application
programs
Expressing CMTS Algorithms in Swift
1. (…)EnergyLandscape(…) 2. { 3. loadPDB (pdbcode="3lzt"); 4. createSystem (LayerOfWater=7.5, 5. IonicStrength=0.15); 6. preequilibrate (waterEquilibration_ps=30, 7.
totalEquilibration_ps=1000); 8. runMD (config="EL1",tinitial=0, tfinal=10, 9. frame_write_interval=1000); 10. postprocess(…); 11. forcematching(…); 12. }
…to create a well-‐structured library of scripts that automate workflow-‐level parallelism
Transla>ng from a shell script of 1300 lines into code like this:
CMTS Cyberinfrastructure Architecture
1: Run script(EL1.trj) 2. Lookup file name=EL1.trj user=Anton type=trajectory
Storage loca>ons
3: Transfer inputs
Compute Facili>es
4: Run app
• A challenging “breakthrough” project"• Integration & evaluation underway"• Full implementation by the end of
Phase II"
6: Update catalogs
5: Transfer results
Researchers
External collaborators
CMTS Collabora>on Catalogs
Provenance
Files & Metadata
Script libraries
0: Develop script
When do you need Swift? Typical application: protein-ligand docking for drug screening
8
www.ci.uchicago.edu/swi9 www.mcs.anl.gov/exm
Work of M. Kubal, T.A.Binkowski, And B. Roux
(B)
O(100K) drug
candidates
Tens of fruiQul candidates for wetlab & APS
O(10) proteins implicated in a disease
1M compute jobs
X …
Numerous many-task applications
9
www.ci.uchicago.edu/swi9 www.mcs.anl.gov/exm
§ Simula>on of super-‐ cooled glass materials
§ Protein folding using homology-‐free approaches
§ Climate model analysis and decision making in energy policy
§ Simula>on of RNA-‐protein interac>on
§ Mul>scale subsurface flow modeling
§ Modeling of power grid for OE applica>ons
§ A-‐E have published science results obtained using Swi9
T0623, 25 res., 8.2Å to 6.3Å (excluding tail)
Protein loop modeling. Courtesy A. Adhikari
Na?ve Predicted
Ini?al
E
D
C
A B
A
B
C
D
E
F F
>
Nested parallel prediction loops in Swift
10
www.ci.uchicago.edu/swi9 www.mcs.anl.gov/exm
1. Sweep( ) 2. { 3. int nSim = 1000; 4. int maxRounds = 3; 5. Protein pSet[ ] <ext; exec="Protein.map">; 6. float startTemp[ ] = [ 100.0, 200.0 ]; 7. float delT[ ] = [ 1.0, 1.5, 2.0, 5.0, 10.0 ]; 8. foreach p, pn in pSet { 9. foreach t in startTemp { 10. foreach d in delT { 11. ItFix(p, nSim, maxRounds, t, d); 12. } 13. } 14. } 15. } 16. Sweep();
10 proteins x 1000 simula>ons x 3 rounds x 2 temps x 5 deltas
= 300K tasks
Programming model: all execution driven by parallel data flow
§ f() and g() are computed in parallel § myproc() returns r when they are done
§ This parallelism is automa,c § Works recursively throughout the program’s call graph
11
www.ci.uchicago.edu/swi9 www.mcs.anl.gov/exm
(int r) myproc (int i)!{! j = f(i); ! k = g(i);! r = j + k;!}!!
Programming model: Swift in a nutshell
§ All expressions are computed in parallel, thro5led by various levels of the run>me system
§ Simple data type model: scalars (boolean, int, float, string, file, external) and collec>ons (arrays and structs)
§ ALL data atoms and collec>ons have “future” behavior § A few simple control constructs
– if, switch, foreach, iterate § A growing library
– tracef(), strcat(), regexp(), readData(), writeData()
12
www.ci.uchicago.edu/swi9 www.mcs.anl.gov/exm
Swift 2.0
§ Mo>va>on for 2.0 – Scalability: 1M tasks/sec vs 500 – goal has been reached for basic tests
– Richer programming model, broader spectrum of applica>on – Extensibility and Maintainability
§ Convergence issues – Some array closing; for loop extensions; library cleanup; mapper models
– Data marshalling/passing for in-‐memory leaf func>ons § Informa>on and downloads
– h5p://exm.xstack.org
13
www.ci.uchicago.edu/swi9 www.mcs.anl.gov/exm
Encapsulation enables distributed parallelism
14
www.ci.uchicago.edu/swi9 www.mcs.anl.gov/exm
Swi9 app( ) func>on Interface defini>on
Applica>on program
Typed Swi9 data object
Files expected or produced
by applica>on program
Encapsula?on is the key to transparent distribu?on, paralleliza?on, and automa?c provenance capture
app( ) func?ons specify cmd line argument passing
15
www.ci.uchicago.edu/swi9 www.mcs.anl.gov/exm
Swi9 app func>on “predict()”
t
seq dt log
PSim applica>on -‐t -‐d -‐s -‐c
> pdb
pg
To run: psim -‐s 1ubq.fas -‐pdb p -‐t 100.0 -‐d 25.0 >log In Swi$ code: app (PDB pg, File log) predict (Protein seq, Float t, Float dt) { psim "-‐c" "-‐s" @pseq.fasta "-‐pdb" @pg "–t" temp "-‐d" dt; } Protein p <ext; exec="Pmap", id="1ubq">; PDB structure; File log; (structure, log) = predict(p, 100., 25.);
Fasta file
100.0 25.0
foreach sim in [1:1000] { (structure[sim], log[sim]) = predict(p, 100., 25.); } result = analyze(structure)
…1000
Runs of the “predict” applica,on
Analyze()
T1af7 T1r69
T1b72
Large scale parallelization with simple loops
16
www.ci.uchicago.edu/swi9 www.mcs.anl.gov/exm
Dataset mapping example: fMRI datasets
17
www.ci.uchicago.edu/swi9 www.mcs.anl.gov/exm
type Study { Group g[ ];
} type Group {
Subject s[ ]; } type Subject {
Volume anat; Run run[ ];
} type Run {
Volume v[ ]; } type Volume {
Image img; Header hdr;
}
On-‐Disk Data Layout Swi9’s
in-‐memory data model
Mapping func>on or script
Metadata
18
www.ci.uchicago.edu/swi9 www.mcs.anl.gov/exm
tag /d1/d2/f3 owner=asinitskiy group=cmts-‐chem \ create-‐date=2013.0415 \ type=trajectory-‐namd \ state=published \ note="trajectory for UCG case 1" \ molecule=1ubq domain=loop7, bead=u2 tag /d1/d2/f0 owner=jdama group=cmts-‐chem \ create-‐date=2013.0412 \ type=trajectory-‐namd \ state=validated \ note="trajectory for UCG case 2" molecule=clathrin, domain=loop1, bead=px
Metadata
19
www.ci.uchicago.edu/swi9 www.mcs.anl.gov/exm
Query ==> (owner=asinitskiy) and (molecule=1ubq)!!! dataset_id | name | value!------------+-------------+------------------------!! /d1/d2/f4 | owner | asinitskiy! | group | cmts-chem! | create-date | 2013.0416.12:20:29.456! | type | trajectory-namd! | state | published! | note | trajectory for case 1! | molecule | 2ubq! | domain | loop7! | bead | dt!!
Swift information
§ Swi9 Web: – h5p://www.ci.uchicago.edu/swi9 or newer: h5p://swi9-‐lang.org
§ Swi9 User Guide: – h5p://swi9-‐lang.org/guides/trunk/userguide/userguide.html
20
www.ci.uchicago.edu/swi9 www.mcs.anl.gov/exm
Running the CMTS “cyber” tutorials # pick up latest changes if instructed: cd /project/cmtsworkshop cp –rp tutorial/cyber $USER/cyber cd $USER/cyber source ./setup.sh cd basic_swi9 cat README cd part01 cat README swi9 p1.swi9 # etc etc etc # start with NAMD cd cyber/namd_sweep cat README source setup.sh # <== Don’t forget this step!!! swi9 rmsd.swi9 –etc etc etc
Summary of the CMTS Cyberinfrastructure Approach § Streamlining scientist / computer interaction:"
– Large datasets (e.g. MD trajectories), fast distributed storage"– Chemically important metadata: force field, bound ions, …"– Save time locating, accessing, moving, and sharing data"– Scheduling, data management, authentication, site-specific
dependencies"– Automatic recovery of failed computations"– Save time by load balancing, leveraging more parallel resources, …"– Benefit: chemists & biologists can focus on science and less on the
mechanics of computation.!
§ Facilitating scientific collaboration:"– Understandable code, flexibility for future development"– Find provenance and tag data and procedures for reuse and sharing"– Facilitate collaboration by sharing data, libraries, protocols"– Wide range of researchers able to use the environment."– Benefit: cyberinfrastructure enables collaborations not otherwise
possible!
§ Swi9 is a parallel scrip>ng system for grids, clouds and clusters – for loosely-‐coupled applica>ons -‐ applica>on and u>lity programs linked by
exchanging files
§ Swi9 is easy to write: simple high-‐level C-‐like func>onal language – Small Swi9 scripts can do large-‐scale work
§ Swi9 is easy to run: contains all services for running Grid workflow -‐ in one Java applica>on – Untar and run – acts as a self-‐contained Grid client
§ Swi9 is fast: uses efficient, scalable and flexible “Karajan” execu>on engine. – Scaling close to 1M tasks – .5M in live science work, and growing
§ Swi9 usage is growing: – applica>ons in neuroscience, proteomics, molecular dynamics, biochemistry,
economics, sta>s>cs, and more.
§ Try Swi9! www.ci.uchicago.edu/swi9 and www.mcs.anl.gov/exm
23
www.ci.uchicago.edu/swi9 www.mcs.anl.gov/exm
24
www.ci.uchicago.edu/swi9 www.mcs.anl.gov/exm
Author's personal copy
Swift: A language for distributed parallel scripting
Michael Wilde a,b,⇑, Mihael Hategan a, Justin M. Wozniak b, Ben Clifford d, Daniel S. Katz a,Ian Foster a,b,caComputation Institute, University of Chicago and Argonne National Laboratory, United StatesbMathematics and Computer Science Division, Argonne National Laboratory, United StatescDepartment of Computer Science, University of Chicago, United StatesdDepartment of Astronomy and Astrophysics, University of Chicago, United States
a r t i c l e i n f o
Article history:Available online 12 July 2011
Keywords:SwiftParallel programmingScriptingDataflow
a b s t r a c t
Scientists, engineers, and statisticians must execute domain-specific application programsmany times on large collections of file-based data. This activity requires complex orches-tration and data management as data is passed to, from, and among application invoca-tions. Distributed and parallel computing resources can accelerate such processing, buttheir use further increases programming complexity. The Swift parallel scripting languagereduces these complexities by making file system structures accessible via language con-structs and by allowing ordinary application programs to be composed into powerful par-allel scripts that can efficiently utilize parallel and distributed resources. We presentSwift’s implicitly parallel and deterministic programming model, which applies externalapplications to file collections using a functional style that abstracts and simplifies distrib-uted parallel execution.
! 2011 Elsevier B.V. All rights reserved.
1. Introduction
Swift is a scripting language designed for composing applicationprograms into parallel applications that can be executedonmulticore processors, clusters, grids, clouds, and supercomputers. Unlike most other scripting languages, Swift focuses on theissues that arise from the concurrent execution, composition, and coordination of many independent (and, typically, distrib-uted) computational tasks. Swift scripts express the execution of programs that consume and produce file-resident datasets.Swift uses a C-like syntax consisting of function definitions and expressions, with dataflow-driven semantics and implicit par-allelism. To facilitate the writing of scripts that operate on files, Swift mapping constructs allow file system objects to be ac-cessed via Swift variables.
Many parallel applications involve a single message-passing parallel program: a model supported well by the MessagePassing Interface (MPI). Others, however, require the coupling or orchestration of large numbers of application invocations:either many invocations of the same program or many invocations of sequences and patterns of several programs. Scaling uprequires the distribution of such workloads among cores, processors, computers, or clusters and, hence, the use of parallel orgrid computing. Even if a single large parallel cluster suffices, users will not always have access to the same system (e.g., bigmachines may be congested or temporarily unavailable because of maintenance). Thus, it is desirable to be able to use what-ever resources happen to be available or economical at the moment when the user needs to compute—without the need tocontinually reprogram or adjust execution scripts.
0167-8191/$ - see front matter ! 2011 Elsevier B.V. All rights reserved.doi:10.1016/j.parco.2011.05.005
⇑ Corresponding author at: Computation Institute, University of Chicago and Argonne National Laboratory, United States. Tel.: +1 630 252 3351.E-mail address: [email protected] (M. Wilde).
Parallel Computing 37 (2011) 633–652
Contents lists available at ScienceDirect
Parallel Computing
journal homepage: www.elsevier .com/ locate/parco
Parallel Compu>ng, Sep 2011