CyberShake Study 2.3 Technical Readiness Review

25
CyberShake Study 2.3 Technical Readiness Review

description

CyberShake Study 2.3 Technical Readiness Review. Study re-versioning. SCEC software uses year.month versioning Suggest renaming this study to 13.4. Study 13.4 Scientific Goals. Compare Los Angeles-area hazard maps RWG V3.0.3 vs AWP-ODC-SGT (CPU) CVM-S4 vs CVM-H - PowerPoint PPT Presentation

Transcript of CyberShake Study 2.3 Technical Readiness Review

Page 1: CyberShake Study 2.3  Technical Readiness  Review

CyberShake Study 2.3 Technical Readiness Review

Page 2: CyberShake Study 2.3  Technical Readiness  Review

Study re-versioning

SCEC software uses year.month versioning Suggest renaming this study to 13.4

Page 3: CyberShake Study 2.3  Technical Readiness  Review

Study 13.4 Scientific Goals

Compare Los Angeles-area hazard maps― RWG V3.0.3 vs AWP-ODC-SGT (CPU)― CVM-S4 vs CVM-H

● Different version of CVM-H than previous runs● Adds San Bernardino, Santa Maria basins

286 sites (10 km mesh + points of interest)

Page 4: CyberShake Study 2.3  Technical Readiness  Review

Proposed Study sites

Page 5: CyberShake Study 2.3  Technical Readiness  Review

Study 13.4 Data Products

2 CVM-S4 Los Angeles-area hazard maps 2 CVM-H 11.9 Los Angeles-area hazard maps Hazard curves for 286 sites – 10s, 5s, 3s

― Calculated with OpenSHA v13.0.0 1144 sets of 2-component SGTs Seismograms for all ruptures (about 470M) Peak amplitudes in DB for 10, 5, 3s Access via CyberShake Data Product Site (in

development)

Page 6: CyberShake Study 2.3  Technical Readiness  Review

Study 13.4 Notables

First AWP-ODC-SGT hazard maps First CVM-H 11.9 hazard maps First CyberShake use of Blue Waters (SGTs) First CyberShake use of Stampede (post-

processing) Largest CyberShake calculation by 4x

Page 7: CyberShake Study 2.3  Technical Readiness  Review

Study 13.4 Parameters

0.5 Hz, deterministic post-processing― 200 m spacing

CVMs― Vs min = 500 m/s― GTLs for both velocity models

UCERF 2 Latest rupture variation generator

Page 8: CyberShake Study 2.3  Technical Readiness  Review

Verification work

4 sites (WNGC, USC, PAS, SBSM)― RWG V3.0.3, CVM-S― RWG V3.0.3, CVM-H― AWP, CVM-S― AWP, CVM-H

Plotted with previously calculated RWG V3 Expect RWG V3 slightly higher than the others

Page 9: CyberShake Study 2.3  Technical Readiness  Review

WNGCCVM-S CVM-H

RWG V3.0.3 - GreenAWP - Purple

RWG V3 - Orange

Page 10: CyberShake Study 2.3  Technical Readiness  Review

USCCVM-S CVM-H

RWG V3.0.3 - GreenAWP - Purple

RWG V3 - Orange

Page 11: CyberShake Study 2.3  Technical Readiness  Review

PASCVM-S CVM-H

RWG V3.0.3 - GreenAWP - Purple

RWG V3 - Orange

Page 12: CyberShake Study 2.3  Technical Readiness  Review

SBSMCVM-S CVM-H

RWG V3.0.3 - GreenAWP - Purple

RWG V3 - Orange

Page 13: CyberShake Study 2.3  Technical Readiness  Review

SBSM Velocity Profile

Page 14: CyberShake Study 2.3  Technical Readiness  Review

Study 13.4 SGT Software Stack

Pre AWP― New production code― Converts velocity mesh into AWP format― Generates other AWP input files

SGTs― AWP-ODC-SGT CPU v13.4 (from verification work)― RWG V3.0.3 (from verification work)

Post AWP― New production code― Creates SGT header files for post-processing with AWP

Page 15: CyberShake Study 2.3  Technical Readiness  Review

Study 13.4 PP Software Stack

SGT Extraction ― Optimized MPI version with in-memory rupture variation

generation― Support for separate header files― Same code as Study 2.2

Seismogram Synthesis / PSA calculation― Single executable― Same code as Study 2.2

Hazard Curve calculation― OpenSHA v13.0

All codes tagged in SVN before study begins

Page 16: CyberShake Study 2.3  Technical Readiness  Review

Distributed Processing (SGTs) Runs placed in pending file on Blue Waters (as scottcal) Cron job calls build_workflow.py with run parameters

― build_workflow.py creates PBS scripts defining jobs with dependencies Cron job calls run_workflow.py

― run_workflow.py submits PBS scripts using qsub dependencies― Limited restart capability

Final workflow jobs ssh to shock, call handoff.py― Performs BW->Stampede SGT file transfer (as scottcal)

― scottcal BW proxy must be resident on shock― Registers SGTs in RLS― Adds runs to pending file on shock for post-processing

Page 17: CyberShake Study 2.3  Technical Readiness  Review

Distributed Processing (PP)

Cron job on shock submits post-processing runs― Pegasus 4.3, from Git repository― Condor 7.6.6― Globus 5.0.3

Jobs submitted to Stampede (as tera3d)― 8 sub-workflows― Extract SGT jobs as standard jobs (128 cores)― seis_psa jobs as PMC jobs (1024 cores)

Results staged back to shock, DB populated, curves generated

Page 18: CyberShake Study 2.3  Technical Readiness  Review

SGT Computational Requirements

SGTs on Blue Waters Computational time: 8.4 M SUs

― RWG: 16k SUs/site x 286 sites = 4.6 M SUs ― AWP: 13.5k Sus/site x 286 sites = 3.8 M SUs― 22.35 M SU allocation, 22 M SUs remaining

Storage: 44.7 TB― 160 GB/site x 286 sites = 44.7 TB

Page 19: CyberShake Study 2.3  Technical Readiness  Review

PP computational requirements

Post-processing on Stampede Computational time:

― 4000 SUs/site x 286 sites = 1.1 M SUs― 4.1 M SU allocation, 3.9 M remaining

Storage: 44.7 TB input, 13 TB output― 44.7 TB of SGT inputs; will need to rotate out― Seismograms: 46 GB/site x 286 sites = 12.8 TB― PSA files: 0.8 GB/site x 286 sites = 0.2 TB

Page 20: CyberShake Study 2.3  Technical Readiness  Review

Computational Analysis

Monitord for post-processing performance― Will run after workflows have completed― May need python scripts for specific CS metrics

Scripts for SGT performance― Cronjob to monitor core usage on BW― Does wrapping BW jobs in kickstart help?

Ideally, same high-level metrics as Studies 1.0 and 2.2

Page 21: CyberShake Study 2.3  Technical Readiness  Review

Long-term storage

44.7 TB SGTs:― To be archived to tape (NCSA? TACC?

Somewhere else?) 13 TB Seismograms, PSA data

― Have been using SCEC storage - scec-04? 5.5 TB workflow logs

― Can compress after mining for stats CyberShake database

― 1.4 B entries, 330 GB data (scaling issues?)

Page 22: CyberShake Study 2.3  Technical Readiness  Review

Estimated Duration

Limiting factors:― Blue Waters queue time

● Uncertain how many sites in parallel― Blue Waters → Stampede transfer

● 100 MB/sec seems sustainable from tests, but could get much worse

● 50 sites/day; unlikely to reach Estimated completion by end of June

Page 23: CyberShake Study 2.3  Technical Readiness  Review

Personnel Support

Scientists― Tom Jordan, Kim Olsen, Rob Graves

Technical Lead― Scott Callaghan

Job Submission / Run Monitoring― Scott Callaghan, David Gill, Phil Maechling

Data Management― David Gill

Data Users― Feng Wang, Maren Boese, Jessica Donovan

Page 24: CyberShake Study 2.3  Technical Readiness  Review

Risks

Stampede becomes busier― Post-processing still probably shorter than SGTs

CyberShake database unable to handle data― Would need to create other DBs, distributed DB, change

technologies Stampede changes software stack

― Last time, necessitated change to MPI library― Can use Kraken as backup PP site while resolving issues

New workflow system on Blue Waters― May be as yet undetected bugs

Page 25: CyberShake Study 2.3  Technical Readiness  Review

Thanks for your time!