The UNC ADCIRC Application on the SURAGrid Steve Thorpe MCNC [email protected] SURAGrid Application...
-
Upload
jase-colston -
Category
Documents
-
view
213 -
download
0
Transcript of The UNC ADCIRC Application on the SURAGrid Steve Thorpe MCNC [email protected] SURAGrid Application...
The UNC ADCIRC The UNC ADCIRC Application on the Application on the
SURAGridSURAGridSteve Thorpe
SURAGrid Application Workshop
February 22, 2006Washington, D.C.
You can grab a copy of these slides by following links from
http://www.mcnc.org/thorpe/presentations
2
OutlineOutline• What is the UNC ADCIRC Application?
• Why ADCIRC is Important to Grid-Enable
• Current Status of Grid-Enabling
• Next Steps Towards SURAGrid Deployment
• Steps for Day 2
• Future Plans
• Conclusions
3
Motivation: Weather Strikes Motivation: Weather Strikes Daily!Daily!
• Hurricane Season 2005– 26 named storms, 14 hurricanes, 3 with major impact– billions of dollars economic losses
• We need to … – provide early, accurate, and frequent forecasts, dissemination
of information– provide infrastructure to solve inter-disciplinary problems – to be able to interact in real-time i.e. evaluate and adapt– innovate new methodologies for prediction
Source: NOAA
2005 … 2006…
4
Part of SURA’s SCOOP Part of SURA’s SCOOP ProgramProgram
• The Southeastern Universities Research Association’s (SURA) Coastal Ocean Observing and Prediction Program
• SCOOP goals– improve predictions and mitigate the impact of coastal phenomena such as extra-tropical storms and hurricanes
– implement a comprehensive observing system that will validate accurate and timely short- and long-term predictions
– open web portal access to basic and analyzed data and linked numerical models, available in real-time
– “plug and play” model for next generation
From www.openioos.org
5
SCOOP Participant SCOOP Participant InstitutionsInstitutions
• Gulf of Maine Ocean Observing System (GoMOOS)• Louisiana State University• MCNC• National Oceanic and Atmospheric Administration• SURA• Texas A&M University• University of Alabama, Huntsville (UAH)• University of Florida• University of Maryland • University of Miami• University of North Carolina, Chapel Hill / RENCI
• Virginia Institute of Marine Science (VIMS)
6
ADCIRC Circulation ModelADCIRC Circulation Model
• Developed by: – Dr. Rick Luettich, UNC Chapel Hill’s Institute of Marine Sciences
– Dr. Joannes Westerink, University of Notre Dame’s Department of Civil Engineering and Geologic Sciences
• ADCIRC is a finite element method (FEM) shallow water model for computing tidal and storm surge water level and depth averaged currents
• Primary model used by the NC SCOOP team
NC SCOOP TeamNC SCOOP TeamUNC Marine Sciences
Brian Blanton, Rick Luettich, Larry Mason (ITS)
• ADCIRC development/implementation
RENaissance Computing InstituteLavanya Ramakrishnan, Brad Viviano, Howard Lander, Dan Reed
•Joint institute spanning UNC-CH, NCSU and Duke
•National grid efforts such as Linked Environments for Atmospheric Discovery (LEAD), TeraGrid
MCNC Michael Garvin, Steve Thorpe, Chuck Kesler
•Non-profit org, committed to advancing education, innovation and economic development throughout NC by delivering next-generation IT services
•Subcontractor to RENCI
7
8
SCOOP Tasks*SCOOP Tasks*
• Data Standards• Data Translation, Transport, & Management
• Modeling• Configuration Tool Set• Visualization Services• Verification Services• Storage and Computing Services• Security• Grid Management Middleware
*NC Team has focused especially in the blue areas
9
NC SCOOP Project GoalsNC SCOOP Project Goals
• Large scale model availability via web-based portal – users can access the model easily– enable model runs with different input datasets (e.g. meteorological data)
– facilitate model output distribution• Distributed data management for collecting, archiving and providing access to model output products
• Model execution in a Grid computing environment – automatically find available computational resources
10
High Level View of Where ADCIRC High Level View of Where ADCIRC Fits InFits In
Computational Grid ExecutesADCIRC Model
UF WindsAnalytical/GFDL
NCEP WindsNAM/POLAR
LDM
LDM
SCOOPARCHIVAL &
VisualizationLDM
OPeNDA
P
Other DistributionProcessing
Visualization
OPeNDAPLDM
11
NC SCOOP V1 - Hindcast Data FlowNC SCOOP V1 - Hindcast Data FlowManually specify model run parameters in
Make tarball of needed Archived Files
Third-party transfer between Portal host and Compute host
Execution of requested simulation on Compute host
1
2
3
RENCI/UNC
Portal
Globus Gatekeeper
MassStorage
NCEPDailyModelRuns
OPeNDAPServer
GridFTP
Globus Gatekeeper
GridFTP
LDMTo UAH
MCNC Grid
Globus Gatekeeper
GridFTP
LSF Queue
MyProxy
UNC ExperimentalSCOOP Machine
23
41
4
NFS Mounted
UNC Production System
12
NC SCOOP Portal (1/4)NC SCOOP Portal (1/4)
Models(ADCIRC)
Data Access via
OPeNDAP
GridFTP Transfer
13
NC SCOOP Portal (2/4)NC SCOOP Portal (2/4)OPeNDAP AccessOPeNDAP Access
•Access to operational (daily) ADCIRC output.
•Global Elevation
•Global Velocity
•OPeNDAP
•LDM
14
NC SCOOP Portal (3/4)NC SCOOP Portal (3/4)HindcastHindcast
SubmitCompute Job
Set Run DatesHurricane Ivan
Current ADCIRC grid
16 CPU Decomposition
15
NC SCOOP Portal (4/4)NC SCOOP Portal (4/4)Solution DisplaySolution Display
•14 day sim•16 cpu•8 min
•Note: storm •surge and tides•are reflected in•the water levels.
Hurrican
e
Ivan
16
RENCI VizWall
16
17
OutlineOutline• What is the UNC ADCIRC Application?
• Why ADCIRC is Important to Grid-Enable
• Current Status of Grid-Enabling
• Next Steps Towards SURAGrid Deployment
• Steps for Day 2
• Future Plans
• Conclusions
18
NCEP and WANAF Wind NCEP and WANAF Wind EnsembleEnsemble
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
NCEP Ensemble Perturbation Forecast Storm Tracks for Forecast with
Initial time = 2005071800
UFL-SCOOP Analytic Forecast Storm Tracks for Forecast
with Initial time = 2005071800
Hurricane Emily
Need lots of compute resources, as each wind forecast can drive a different ADCIRC simulation. We refer to a set of such runs as an ensemble.
19
2 ADCIRC Input Wind 2 ADCIRC Input Wind Predictions for Tropical Storm Predictions for Tropical Storm
ArleneArlene
Standard NCEP model
NAH, NCEP’s Hurricane model
The NAH model’s improved results can substantially effect the skill of ADCIRC’s storm surge forecast.
20
OpenIOOS
Translation
Archive Services
Catalog Services
Verification / Validation
Archive Services
Catalog Services
ADCIRC Model
ResourceSelection
Data Management
Application Env.
Archive Services
Catalog Services
ObservationsWinds
Model Results
User Interface
Resource Access Layer
Visualization
Automated ExecutionAutomated ExecutionADCIRC automatically
triggered upon arrival of input data sets
Since jobs are not manually initiated in this scenario, we need grid technologies to help find compute resources to run the jobs.
21
Benefits of Grid Enabling Benefits of Grid Enabling ADCIRCADCIRC
• Take advantage of “compute on demand” cycles provided by grids
• Greater value of prediction results since get them quicker
• Allows us to run more models and at higher resolutions
• Can adjust compute location and model configuration dynamically based on:– Compute load on available resources– Number of CPUs available– Resolution of the model’s finite element mesh– Number of ensemble members– etc.
22
OutlineOutline• What is the UNC ADCIRC Application?
• Why ADCIRC is Important to Grid-Enable
• Current Status of Grid-Enabling
• Next Steps Towards SURAGrid Deployment
• Steps for Day 2
• Future Plans
• Conclusions
23
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Hurricane Forecast track
NC SCOOP Team Has Developed a NC SCOOP Team Has Developed a Storm Surge Ensemble Prediction Storm Surge Ensemble Prediction
SystemSystem
winds
ADCIRC computes coastal water levels for each ensemble member across the SCOOP grid, with jobs migrating to
available resources
Ensemble
solutions
Compute forecast
wind ensemble
Develop and publish
water level ensemble forecast
National
Hurricane
Center
University of Florida
UNC, MCNC, UAH,
UFL, LSU, VIMS,
….. SURAGrid
UNC
UNC
Createtars of winds, mesh,ICs
24
Ensemble System Can Improve Ensemble System Can Improve Forecast ResultsForecast Results
Left: ADCIRC max water level for 72 hr Hurricane Katrina forecast starting 29 Aug 2005,driven by the "usual, always-available” ETA winds.
Right: ADCIRC max water level over ALL of UFL ensemble wind fields for 72 hr Hurricane Katrina forecast starting 29 Aug 2005, driven by “UFL always-available” ETA winds.
Images credit: Brian O. Blanton, Dept of Marine Sciences, UNC Chapel Hill
25
Real-Time Resource Real-Time Resource Selection APISelection API
• We developed a simple Java API– We want to use more than just MCNC and UNC resources for the multiple runs required by the ensemble system
– The API answers the question “What is the best place for me to run this job, right now?”
– Bases choice on current availability of compute and data resources
– Allows arbitrary ranking algorithm(s) through user-supplied Java plug-ins
– Uses Java CoG kit + the usual GT 3.2.1 tools:• MDS for information sharing
• GridFTP for file transfer
• PreWS GRAM for job submission
26
Real-Time Resource Real-Time Resource SelectionSelection
MDS
GridFTPGatekeeper
MDS
GridFTPGatekeeper
MDS
GridFTPGatekeeper
How many resources can I get (queue)?
If MDS available elseRun a probe job to find no of cpus and rough estimate on
timeAre
services up?
Resource Chooser
Monitoring
Meta-scheduling
Policy …
Given a set of remote resources, what is the best one for me to run on, right now??
27
Additional Thoughts on the Additional Thoughts on the Resource ChooserResource Chooser
• Considered other meta scheduler technologies – Too complex, proprietary components, non-java, etc….
– The basics that come with GT 3.2.1 plus our simple API work for us
• Future possible directions for API– Speed up resource choosing by thread enabling
– Add additional plug-ins to take into account job priority, additional resource characteristics (e.g. CPU speed, prior reliability…)
– Apply to different models other than ADCIRC
– Revisit whether to use other meta scheduler technologies?
28
SCOOP Partner Grid SCOOP Partner Grid ResourcesResources
• We’ve established grid resources at MCNC and RENCI– GT 3.2.1 based
• Connected to non-NC SCOOP partner resources also– Not always easy!
cluster16 TB Storage
LSF
scoop.ncgrid.orgGridFTP, MDS, GRAM
RENCI Portal – www.scoop.unc.edu
Users
clusterMassStorageSystem
PBS
dante1.renci.orgGridFTP, MDS, GRAM
MCNC UNC
……..
… UAH … UFL … LSU … VIMS …
29
NC SCOOP Forecast Status Portal NC SCOOP Forecast Status Portal DisplayDisplay
30
Grid Testbed ExperiencesGrid Testbed Experiences
• Components at every site– Globus gatekeeper, GridFTP, PBS/LSF, MDS
• Globus setup at compute sites– firewall problems– CA trust problems– “old” style cert problems – MDS not set up– Job manager (PBS or LSF) not set up– Tools sometimes not installed (e.g. uudecode)
• This is NOT easy, it takes a LOT of effort!
TAMULSU
UF
UNC, MCNC
UAHb
31
SURAGrid Partner Grid SURAGrid Partner Grid ResourcesResources
• Initial progress has been made at:– Louisiana State University (Tevfik Kosar, Harmat Kaiser,
Gabrielle Allen) – Texas A&M University (Steve Johnson)– University of Alabama Huntsville (Sandi Redman)– University of Kentucky (Vikram Gazula)– University of Southern California (Nirmal Seenu)– TACC (Ashok Adiga)– Have I missed any others?
• Status is in various stages:– The “machine ordering” stage– The “gt3 installation” stage– The “configure scheduler” stage– The “gt3 connectivity/Firewall problem resolution” stage
32
OutlineOutline• What is the UNC ADCIRC Application?
• Why ADCIRC is Important to Grid-Enable
• Current Status of Grid-Enabling
• Next Steps Towards SURAGrid Deployment
• Steps for Day 2
• Future Plans
• Conclusions
33
Next Steps Toward Next Steps Toward SURAGrid Deployment (1 of SURAGrid Deployment (1 of
2)2)
• Ensure your site has met the basic requirements (tomorrow, day 2 of the workshop, and beyond if necessary)
• Howard, Steve, and Lavanya will then work to test ADCIRC on your system:– First, “standalone” from the command line– Next, through the Globus Gatekeeper and using the resource choosing API
– This assumes you’ve first completed the steps outlined in http://www.ccs.uky.edu/SCOOP (see upcoming slides)
34
Next Steps Toward Next Steps Toward SURAGrid Deployment (2 of SURAGrid Deployment (2 of
2)2)• If/when the testing is successful, your resource(s) will be added to those chosen from for the regular ADCIRC runs
• We hope to have further testing done during the next couple weeks to 2 month time frame
• If all goes well, there will be many more CPUs available in time for hurricane season (June 1)!
35
OutlineOutline• What is the UNC ADCIRC Application?
• Why ADCIRC is Important to Grid-Enable
• Current Status of Grid-Enabling
• Next Steps Towards SURAGrid Deployment
• Steps for Day 2
• Future Plans
• Conclusions
36
Ensure Your Site Has Met The Ensure Your Site Has Met The Basic Requirements for ADCIRC Basic Requirements for ADCIRC
(1 of 2)(1 of 2)• Pre-Web Services Versions of the Globus Toolkit services- We presume you’re using GT 3.2.1; although in theory GT versions 2.4 -> 4.x should work
- See www.globus.org
• You need these three Globus services:– gridFTP server for file transfer– GRAM server for job submission– MDS for information sharing
37
• Also you’ll need:– RENCI and MCNC’s installations will need to trust
your CA– Your installation will need to trust RENCI and
MCNC’s CAs– Back end queuing system such as LSF or PBS– Queuing system should have an mpich-based mpirun
behind it– Adapter (this allows Globus to submit jobs to the
back-end queuing system, and to publish information about the queuing system through MDS)
– Linux x86-based system– Preferably a cluster (a single node almost
definitely would never be chosen for an ADCIRC run)
Ensure Your Site Has Met The Ensure Your Site Has Met The Basic Requirements for ADCIRC Basic Requirements for ADCIRC
(2 of 2)(2 of 2)
38
Follow the steps in the URL Follow the steps in the URL http://www.ccs.uky.edu/SCOOhttp://www.ccs.uky.edu/SCOO
PP• This site has some (basic) setup instructions– Thanks to University of Kentucky’s Vikram Gazula for this web space!
• Letting your GT installation trust the RENCI and MCNC certificate authorities
• Account setup for Lavanya, Howard, and Steve– Also their /etc/grid-security/grid-mapfile entries
• Setting up Globus with your back end scheduler
• How to test MDS using grid-info-search– Should get back an “Mds-Computer-Total-Free-nodeCount” entry when doing a “grid-info-search”
• testHosts.sh, a script that can do limited checks of your GRAM, MDS, and GridFTP installation
39
Setup Related QuestionsSetup Related Questions
• Before contacting us, have you checked the web page to see if your answer is buried somewhere within?– http://www.ccs.uky.edu/SCOOP
• Recommend email to “scoop-support at renci dot org” as a next try– Howard Lander, Steve Thorpe, and Lavanya Ramakrishnan are on this list. We’ll try to check email frequently tomorrow.
• If that doesn’t produce a satisfactory response, you can try IM’ing us (could try to set up a group chat). Our AIM addresses are– Steve: thorpe682– Howard: howardlander– Lavanya: lavanyaRamakrish
40
Last ResortLast Resort
• If you’re still striking out, you can try calling Howard and/or Steve:– Howard’s office: (919) 445-9651 – Steve’s home office: (610) 866-3286
• Note: I usually pick up only after I hear a friendly voice on the machine!
– Steve’s cell: (919) 724-9654
41
OutlineOutline• What is the UNC ADCIRC Application?
• Why ADCIRC is Important to Grid-Enable
• Current Status of Grid-Enabling
• Next Steps Towards SURAGrid Deployment
• Steps for Day 2
• Future Plans
• Conclusions
42
Future NC SCOOP Plans (1 of Future NC SCOOP Plans (1 of 2)2)
• Resource Chooser API extensions– Threading to speed up choosing– Additional plug-ins (CPU speed, reliability, data location, priorities based on urgency…)
– Apply to different models, not just to ADCIRC
• Improve fault tolerance of ADCIRC workflow– e.g. if high priority, submit same job multiple times
– Or if job fails, restart ASAP
43
Future NC SCOOP Plans (2 of Future NC SCOOP Plans (2 of 2)2)
• Better integration with other SCOOP efforts– UFL’s Virtual Cluster setup – Data management– Catalog and archive access– “Coupling” different models (e.g. ADCIRC and SWAN)
• Model verification activities
44
OutlineOutline• What is the UNC ADCIRC Application?
• Why ADCIRC is Important to Grid-Enable
• Current Status of Grid-Enabling
• Next Steps Towards SURAGrid Deployment
• Steps for Day 2
• Future Plans
• Conclusions
45
Concluding ThoughtsConcluding Thoughts
• Can be time consuming to establish grid connectivity among organizations…– Configuration of firewalls, NAT, GlobusToolkit, Certificate Authorities, job scheduler adapters
– Systems administrator nervousness
– etc…..
• …but it’s worth it!– SCOOP partners are creating a distributed, scalable, modular resource that empowers scientists at multiple institutions
– This will advance the science of prediction of surge & wave impacts during storms, and other environmental hazards
46
Special ThanksSpecial Thanks
… to Brian Blanton and Lavanya Ramakrishnan for their help with these slides!
… to Philip Bogden, Joanne Bintz, Mary Fran Yafchak and others from SURA for their SCOOP project leadership
… to Art Vandenberg for his tireless Gridification efforts
… to you, the SURAGridsters who are generously sharing your resources!!
47
Questions?Questions?
Also, don’t forget about Also, don’t forget about scoop-support at renci dot orgscoop-support at renci dot org