CyberShake Study 2.3 Technical Readiness Review

Study re-versioning

SCEC software uses year.month versioning Suggest renaming this study to 13.4

Study 13.4 Scientific Goals

Compare Los Angeles-area hazard maps― RWG V3.0.3 vs AWP-ODC-SGT (CPU)― CVM-S4 vs CVM-H

● Different version of CVM-H than previous runs● Adds San Bernardino, Santa Maria basins

286 sites (10 km mesh + points of interest)

Proposed Study sites

Study 13.4 Data Products

2 CVM-S4 Los Angeles-area hazard maps 2 CVM-H 11.9 Los Angeles-area hazard maps Hazard curves for 286 sites – 10s, 5s, 3s

― Calculated with OpenSHA v13.0.0 1144 sets of 2-component SGTs Seismograms for all ruptures (about 470M) Peak amplitudes in DB for 10, 5, 3s Access via CyberShake Data Product Site (in

development)

Study 13.4 Notables

First AWP-ODC-SGT hazard maps First CVM-H 11.9 hazard maps First CyberShake use of Blue Waters (SGTs) First CyberShake use of Stampede (post-

processing) Largest CyberShake calculation by 4x

Study 13.4 Parameters

0.5 Hz, deterministic post-processing― 200 m spacing

CVMs― Vs min = 500 m/s― GTLs for both velocity models

UCERF 2 Latest rupture variation generator

Verification work

4 sites (WNGC, USC, PAS, SBSM)― RWG V3.0.3, CVM-S― RWG V3.0.3, CVM-H― AWP, CVM-S― AWP, CVM-H

Plotted with previously calculated RWG V3 Expect RWG V3 slightly higher than the others

WNGCCVM-S CVM-H

RWG V3.0.3 - GreenAWP - Purple

RWG V3 - Orange

USCCVM-S CVM-H


RWG V3 - Orange

PASCVM-S CVM-H


RWG V3 - Orange

SBSMCVM-S CVM-H


RWG V3 - Orange

SBSM Velocity Profile

Study 13.4 SGT Software Stack

Pre AWP― New production code― Converts velocity mesh into AWP format― Generates other AWP input files

SGTs― AWP-ODC-SGT CPU v13.4 (from verification work)― RWG V3.0.3 (from verification work)

Post AWP― New production code― Creates SGT header files for post-processing with AWP

Study 13.4 PP Software Stack

SGT Extraction ― Optimized MPI version with in-memory rupture variation

generation― Support for separate header files― Same code as Study 2.2

Seismogram Synthesis / PSA calculation― Single executable― Same code as Study 2.2

Hazard Curve calculation― OpenSHA v13.0

All codes tagged in SVN before study begins

Distributed Processing (SGTs) Runs placed in pending file on Blue Waters (as scottcal) Cron job calls build_workflow.py with run parameters

― build_workflow.py creates PBS scripts defining jobs with dependencies Cron job calls run_workflow.py

― run_workflow.py submits PBS scripts using qsub dependencies― Limited restart capability

Final workflow jobs ssh to shock, call handoff.py― Performs BW->Stampede SGT file transfer (as scottcal)

― scottcal BW proxy must be resident on shock― Registers SGTs in RLS― Adds runs to pending file on shock for post-processing

Distributed Processing (PP)

Cron job on shock submits post-processing runs― Pegasus 4.3, from Git repository― Condor 7.6.6― Globus 5.0.3

Jobs submitted to Stampede (as tera3d)― 8 sub-workflows― Extract SGT jobs as standard jobs (128 cores)― seis_psa jobs as PMC jobs (1024 cores)

Results staged back to shock, DB populated, curves generated

SGT Computational Requirements

SGTs on Blue Waters Computational time: 8.4 M SUs

― RWG: 16k SUs/site x 286 sites = 4.6 M SUs ― AWP: 13.5k Sus/site x 286 sites = 3.8 M SUs― 22.35 M SU allocation, 22 M SUs remaining

Storage: 44.7 TB― 160 GB/site x 286 sites = 44.7 TB

PP computational requirements

Post-processing on Stampede Computational time:

― 4000 SUs/site x 286 sites = 1.1 M SUs― 4.1 M SU allocation, 3.9 M remaining

Storage: 44.7 TB input, 13 TB output― 44.7 TB of SGT inputs; will need to rotate out― Seismograms: 46 GB/site x 286 sites = 12.8 TB― PSA files: 0.8 GB/site x 286 sites = 0.2 TB

Computational Analysis

Monitord for post-processing performance― Will run after workflows have completed― May need python scripts for specific CS metrics

Scripts for SGT performance― Cronjob to monitor core usage on BW― Does wrapping BW jobs in kickstart help?

Ideally, same high-level metrics as Studies 1.0 and 2.2

Long-term storage

44.7 TB SGTs:― To be archived to tape (NCSA? TACC?

Somewhere else?) 13 TB Seismograms, PSA data

― Have been using SCEC storage - scec-04? 5.5 TB workflow logs

― Can compress after mining for stats CyberShake database

― 1.4 B entries, 330 GB data (scaling issues?)

Estimated Duration

Limiting factors:― Blue Waters queue time

● Uncertain how many sites in parallel― Blue Waters → Stampede transfer

● 100 MB/sec seems sustainable from tests, but could get much worse

● 50 sites/day; unlikely to reach Estimated completion by end of June

Personnel Support

Scientists― Tom Jordan, Kim Olsen, Rob Graves

Technical Lead― Scott Callaghan

Job Submission / Run Monitoring― Scott Callaghan, David Gill, Phil Maechling

Data Management― David Gill

Data Users― Feng Wang, Maren Boese, Jessica Donovan

Risks

Stampede becomes busier― Post-processing still probably shorter than SGTs

CyberShake database unable to handle data― Would need to create other DBs, distributed DB, change

technologies Stampede changes software stack

― Last time, necessitated change to MPI library― Can use Kraken as backup PP site while resolving issues

New workflow system on Blue Waters― May be as yet undetected bugs

Thanks for your time!

CyberShake Study 2.3 Technical Readiness Review

Documents

Transcript of CyberShake Study 2.3 Technical Readiness Review