Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos...

30
Gateways Tutorial Outline: Morning Session • Overview: Marlon Pierce • OGCE Introduction: Marlon • Demos for OGCE Gadget Container, GFAC Application Factory, XRegistry and Xbaya: Marlon and Suresh Marru • Getting and building the software: Marlon • Hands-on demos: Suresh • TeraGrid Gadget Demos: Thomas Uram •Afternoon session: SimpleGrid and GISolve with Yan Liu SciDAC, Chattanooga, TN, July 16, 2010

Transcript of Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos...

Page 1: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

Gateways Tutorial Outline: Morning Session

• Overview: Marlon Pierce• OGCE Introduction: Marlon• Demos for OGCE Gadget Container, GFAC Application Factory, XRegistry and Xbaya: Marlon and Suresh Marru• Getting and building the software: Marlon• Hands-on demos: Suresh• TeraGrid Gadget Demos: Thomas Uram

•Afternoon session: SimpleGrid and GISolve with Yan Liu

SciDAC, Chattanooga, TN, July 16, 2010

Page 2: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

TeraGrid Science Gateways

Marlon PierceFor Nancy Wilkins-DiehrTeraGrid Area Science

Gateway [email protected]

SciDAC, Chattanooga, TN, July 16, 2010

Gateway to Chattanooga's Confederate Cemetery

Page 3: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

TeraGrid is one of the largest investments in shared CI from NSF’s Office of Cyberinfrastructure

SciDAC, Chattanooga, TN, July 16, 2010

Page 4: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

TeraGrid resources today include:•Tightly Coupled Distributed Memory Systems, 2 systems in the top 10 at top500.org

– Kraken (NICS): Cray XT5, 99,072 cores, 1.03 Pflop– Ranger (TACC): Sun Constellation, 62,976 cores, 579 Tflop, 123 TB RAM

•Shared Memory Systems– Cobalt (NCSA): Altix, 8 Tflop, 3 TB shared memory– Pople (PSC): Altix, 5 Tflop, 1.5 TB shared memory

•Clusters with Infiniband– Abe (NCSA): 90 Tflops– Lonestar (TACC): 61 Tflops– QueenBee (LONI): 51 Tflops

•Condor Pool (Loosely Coupled)– Purdue- up to 22,000 cpus

•Gateway hosting– Quarry (IU): virtual machine support

•Visualization Resources– TeraDRE (Purdue): 48 node nVIDIA GPUs– Spur (TACC): 32 nVIDIA GPUs

•Storage Resources– GPFS-WAN (SDSC)– Lustre-WAN (IU)– Various archival resources

SciDAC, Chattanooga, TN, July 16, 2010 Source: Dan Katz, U Chicago

But change is constant - new systems:•Data Analysis and Vis systems

•Longhorn (TACC): Dell/NVIDIA, CPU and GPU•Nautilus (NICS): SGI UltraViolet, 1024 cores, 4TB global shared memory

•Data-Intensive Computing•Dash (SDSC): Intel Nehalem, 544 processors, 4TB flash memory

•FutureGrid•Experimental computing grid and cloud test-bed to tackle research challenges in computer science

•Keeneland•Experimental, high-performance computing system with NVIDIA Tesla accelerators

Page 5: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

What Is a Science Gateway?

•Web and desktop user interfaces and user-centric Web services for accessing Grid and Cloud resources.–Clusters, supercomputers, mass storage–Applications, databases–Workflows

•Example Science Gateways from the NSF TeraGrid–GridChem: computational chemistry–UltraScan: biophysics computational analysis –LEAD: Atmospheric science–BioDrugScreen: drug docking, scoring, and discovery.

•Many others: see https://www.teragrid.org/web/science-gateways/gateway_list •This tutorial is about software that powers gateways.

•Web and desktop user interfaces and user-centric Web services for accessing Grid and Cloud resources.–Clusters, supercomputers, mass storage–Applications, databases–Workflows

•Example Science Gateways from the NSF TeraGrid–GridChem: computational chemistry–UltraScan: biophysics computational analysis –LEAD: Atmospheric science–BioDrugScreen: drug docking, scoring, and discovery.

•Many others: see https://www.teragrid.org/web/science-gateways/gateway_list •This tutorial is about software that powers gateways.

Page 6: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

When is a gateway appropriate?

•Researchers using defined sets of tools in different ways–Same executables, different input

•GridChem, CHARMM

–Creating multi-scale workflows–Datasets

•Common data formats–National Virtual Observatory–Earth System Grid–Some groups have invested significant efforts here

•caBIG, extensive discussions to develop common terminology and formats

•BIRN, extensive data sharing agreements

•Difficult to access data/advanced workflows–Sensor/radar input

•LEAD, GEON

SciDAC, Chattanooga, TN, July 16, 2010

Page 7: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

3 steps to connect a gateway to TeraGrid

•Request an allocation–Only a 1 paragraph abstract

required for up to 200k CPU hours

•Register your gateway–Visibility on public TeraGrid page

•Request a community account–Run jobs for others via your

portal

•Staff support is available!•www.teragrid.org/gateways

SciDAC, Chattanooga, TN, July 16, 2010

Page 8: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

Linked Environments for Atmospheric Discovery (LEAD)

•Providing tools that are needed to make accurate predictions of tornados and hurricanes

•Meteorological data•Forecast models•Analysis and visualization tools

•Data exploration and Grid workflow

SciDAC, Chattanooga, TN, July 16, 2010

Page 9: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

Analytical UltracentrifugationEmerging computational tool for the study of proteins

•The Center for Analytical Ultracentrifugation of Macromolecular Assemblies, UT Health Sciences – Major advances in the

characterization of proteins and protein complexes as a result of new instrumentation and powerful software

– Monitoring the sedimentation of macromolecules in real time in the centrifugal field allows their hydrodynamic and thermodynamic characterization in solution

– Observations are electronically digitized and stored for further mathematical analysis

– http://uslims.uthscsa.edu/SciDAC, Chattanooga, TN, July 16, 2010

Source: Modern analytical ultracentrifugation in protein science: A tutorial review, Wikipedia

Page 10: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

BioVLAB Cloud Use Case

Page 11: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

Geographic Information Systems and HPC

•GISolve–Data-intensive, large- and multi-scale spatial analysis and modeling increasingly important for scientific discovery and decision-making•Ecology, environmental sciences, geosciences, public health, and social sciences

SciDAC, Chattanooga, TN, July 16, 2010

Page 12: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

SciDAC, Chattanooga, TN, July 16, 2010

Thank you for your attention!Questions?

Nancy Wilkins-Diehr, [email protected]

Page 13: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

Gateways can further investments in other projects

•Increase access–To instruments, expensive data collections

•Increase capabilities–To analyze data

•Improve workforce development–Can prepare students to function in today’s cross-disciplinary

world

•Increase outreach•Increase public awareness

–Public sees value in investments in large facilities–Pew 2006 study indicates that half of all internet users have been

to a site specializing in science–Those who seek out science information on the internet are more

likely to believe that scientific pursuits have a positive impact on society

SciDAC, Chattanooga, TN, July 16, 2010

Page 14: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

But, sustained funding is a problem

•Gateways can be used for the most challenging problems, but–Scientists won’t rely on something that they are not confident

will be around for the duration•We see this with software, but even more so with gateway infrastructure

•A sustained gateway program can–Reduce duplication of effort

•Sporadic development with many small programs

– Increase diversity of end users– Increase skill set diversity of developers–Bring together teams to address the toughest problems

SciDAC, Chattanooga, TN, July 16, 2010

Page 15: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

So how do Gateways fit into this?Gateways are a natural result of the impact of the internet on

worldwide communication and information retrieval

SciDAC, Chattanooga, TN, July 16, 2010

Only 18 years since the release of Mosaic!

Page 16: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

Tremendous Opportunities Using the Largest Shared Resources -

Challenges too!•What’s different when the resource doesn’t belong just to me?–Resource discovery–Accounting–Security–Proposal-based requests for resources (peer-reviewed access)

•Code scaling and performance numbers•Justification of resources•Gateway citations

•Tremendous benefits at the high end, but even more work for the developers•Potential impact on science is huge

–Small number of developers can impact thousands of scientists

–But need a way to train and fund those developersSciDAC, Chattanooga, TN, July 16, 2010

Page 17: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

How to get started?

•Conduct a needs assessment–Should I build a gateway?–Can I use an existing gateway?–What problems am I trying to solve?

•All gateways don’t need high end computing

•Decide on a software approach–Recommended software at www.teragrid.org

•Targeted effort by a few can benefit many–Could a pool of developers design gateways for different

domain areas? Yes!

•TeraGrid staff assistance

SciDAC, Chattanooga, TN, July 16, 2010

Page 18: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

Highlights: LEAD Inspires StudentsAdvanced capabilities regardless of location

•A student gets excited about what he was able to do with LEAD•“Dr. Sikora:Attached is a display of 2-m T and wind depicting the WRF's interpretation of the coastal front on 14 February 2007. It's interesting that I found an example using IDV that parallels our discussion of mesoscale boundaries in class. It illustrates very nicely the transition to a coastal low and the strong baroclinic zone with a location very similar to Markowski's depiction. I created this image in IDV after running a 5-km WRF run (initialized with NAM output) via the LEAD Portal. This simple 1-level plot is just a precursor of the many capabilities IDV will eventually offer to visualize high-res WRF output. Enjoy!• Eric” (email, March 2007)

SciDAC, Chattanooga, TN, July 16, 2010

Page 19: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

Community Climate System Model (CCSM)

•Makes a world-leading, fully coupled climate model easier to use and available to a wide audience•Compose, configure, and submit CCSM simulations to the TeraGrid

•Used in Purdue’s POL 520/EAS 591: Models in Climate Change Science and Policy

–Semester-long projects, 100 year CCSM simulations, generate policy recommendations based on scientific, economic, and political models of climate change impacts

SciDAC, Chattanooga, TN, July 16, 2010

Page 20: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

Today, there are approximately 35 gateways using the TeraGrid

SciDAC, Chattanooga, TN, July 16, 2010

Page 21: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

Social Informatics Data GridCollaborative access to large, complex datasets

•SIDGrid is unique among social science data archive projects–Streaming data which

change over time•Voice, video, images (e.g. fMRI), text, numerical (e.g. heart rate, eye movement)

– Investigate multiple datasets, collected at different time scales, simultaneously•Large data requirements•Sophisticated analysis tools

SciDAC, Chattanooga, TN, July 16, 2010

http://www.ci.uchicago.edu/research/files/sidgrid.mov

Page 22: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

Viewing multimodal data like a symphony conductor

•“Music-score” display and synchronized playback of video and audio files– Pitch tracks– Text– Head nods, pause, gesture

references

•Central archive of multi-modal data, annotations, and analyses– Distributed annotation efforts by

multiple researchers working on a common data set•History of updates

•Computational tools– Distributed acoustic analysis using

Praat– Statistical analysis using R– Matrix computations using Matlab

and Octave

SciDAC, Chattanooga, TN, July 16, 2010

Source: Studying Discourse and Dialog with SIDGrid, Levow, 2008

Page 23: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

Southern California Earthquake Consortium (SCEC) Gateway used to produce realistic

hazard map•Probabilistic Seismic Hazard Analysis (PSHA) map for California– Created from Earthquake

Rupture Forecasts (ERC)•~7000 ruptures can have 415,000 variations

•Warm colors indicate regions with a high probability of experiencing strong ground motion in the next 50 years•Ground motion calculated using full 3-D waveform modeling for improved accuracy– Results in significant CPU use

SciDAC, Chattanooga, TN, July 16, 2010

Page 24: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

SCEC: Why a gateway?

•Calculations need to be done for each of the hundreds of thousands of rupture variations–SCEC has developed the “CyberShake computational

platform”•Hardware, software and people which combine to produce a useful scientific result

–For each site of interest - two large-scale MPI calculations and hundreds of thousands of independent post-processing jobs with significant data generation»Jobs aggregated to appear as a single job to the TeraGrid»Workflow throughput optimizations and use of SCEC’s

gateway “platform” reduced time to solution by a factor of three

–Computationally-intensive tasks, plus the need for reduced time to solution is a priority make TeraGrid a good fit

SciDAC, Chattanooga, TN, July 16, 2010

Source: S. Callahan et.al. “Reducing Time-to-Solution Using Distributed High-Throughput Mega-Workflows – Experiences from SCEC CyberShake”.

Page 25: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

Future Technical Areas•Web technologies change fast

– Must be able to adapt quickly

•Gateways and gadgets– Gateway components

incorporated into any social networking page

– 75% of 18 to 24 year-olds have social networking websites

•iPhone apps?•Web 3.0

– Beyond social networking and sharing content

– Standards and querying interfaces to programmatically share data across sites

•Resource Description Framework (RDF), SPARQL

SciDAC, Chattanooga, TN, July 16, 2010

Page 26: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

Gateways democratize access to high end resources

•Almost anyone can investigate scientific questions using high end resources–Not just those in the research groups of those who request

allocations–Gateways allow anyone with a web browser to explore

•Opportunities can be uncovered via google–My then 11-year-old son discovered nanoHUB.org when his science class was studying Bucky Balls

•Foster new ideas, cross-disciplinary approaches–Encourage students to experiment

•But used in production too–Significant number of papers resulting from gateways including

GridChem, nanoHUB–Scientists can focus on challenging science problems rather

than challenging infrastructure problemsSciDAC, Chattanooga, TN, July 16, 2010

Page 27: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

Not just ease of useWhat can scientists do that they

couldn’t do previously?•Linked Environments for Atmospheric Discovery (LEAD) - radar data coupled with on demand computing•National Virtual Observatory (NVO) – access to sky surveys•Ocean Observing Initiative (OOI) – access to sensor data•PolarGrid – access to polar ice sheet data•SIDGrid – expensive datasets, analysis tools•GridChem –coupling multiscale codes

•How would this have been done before gateways?SciDAC, Chattanooga, TN, July 16, 2010

Page 28: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

Why are gateways worth the effort?

•Increasing range of expertise needed to tackle the most challenging scientific problems–How many details do you

want each individual scientist to need to know?•PBS, RSL, Condor•Coupling multi-scale codes•Assembling data from multiple sources

•Collaboration frameworks

SciDAC, Chattanooga, TN, July 16, 2010

#! /bin/sh#PBS -q dque#PBS -l nodes=1:ppn=2 #PBS -l walltime=00:02:00#PBS -o pbs.out#PBS -e pbs.err#PBS -Vcd /users/wilkinsn/tutorial/exercise_3../bin/mcell nmj_recon.main.mdl

+( &(resourceManagerContact="tg-login1.sdsc.teragrid.org/jobmanager-pbs") (executable="/users/birnbaum/tutorial/bin/mcell") (arguments=nmj_recon.main.mdl) (count=128) (hostCount=10) (maxtime=2) (directory="/users/birnbaum/tutorial/exercise_3") (stdout="/users/birnbaum/tutorial/exercise_3/globus.out") (stderr="/users/birnbaum/tutorial/exercise_3/globus.err"))

=======# Full path to executableexecutable=/users/wilkinsn/tutorial/bin/mcell

# Working directory, where Condor-G will write # its output and error files on the local machine.initialdir=/users/wilkinsn/tutorial/exercise_3

# To set the working directory of the remote job, we# specify it in this globus RSL, which will be appended# to the RSL that Condor-G generatesglobusrsl=(directory='/users/wilkinsn/tutorial/exercise_3')

# Arguments to pass to executable.arguments=nmj_recon.main.mdl

# Condor-G can stage the executabletransfer_executable=false

# Specify the globus resource to execute the jobglobusscheduler=tg-login1.sdsc.teragrid.org/jobmanager-pbs

# Condor has multiple universes, but Condor-G always uses globusuniverse=globus

# Files to receive sdout and stderr.output=condor.outerror=condor.err

# Specify the number of copies of the job to submit to the condor queue.queue 1

Page 29: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

vt100 in the 1980s and alogin window on Ranger today

SciDAC, Chattanooga, TN, July 16, 2010

Page 30: Gateways Tutorial Outline: Morning Session Overview: Marlon Pierce OGCE Introduction: Marlon Demos for OGCE Gadget Container, GFAC Application Factory,

UltraScan provides a comprehensive data analysis environment

•Management of analytical ultracentrifugation data for single users or entire facilities•Support for storage, editing, sharing and analysis of data

–HPC facilities used for 2-D spectrum analysis and genetic algorithm analysis•TeraGrid (~2M CPU hours used)•Technische University of Munich•Juelich Supercomputing Center

•Portable graphical user interface•MySQL database backend for data management•Over 30 active institutions•TeraGrid advanced support

–Fault tolerance, workflows, use of multiple TG resources, community account implementation

SciDAC, Chattanooga, TN, July 16, 2010