Geoffrey Fox Andrea Donnellan May 3, 2004 Network and Grid Computing Computational Geoinformatics...
-
Upload
rodney-eaton -
Category
Documents
-
view
215 -
download
0
Transcript of Geoffrey Fox Andrea Donnellan May 3, 2004 Network and Grid Computing Computational Geoinformatics...
Geoffrey FoxAndrea Donnellan
May 3, 2004
Network and Grid Computing
Computational Geoinformatics Workshop
Solid Earth Science Questions
From NASA’s Solid Earth Science Working Group Report, Living on a Restless Planet, Nov. 2002
1. What is the nature of deformation at plate boundaries and what are the implications for earthquake hazards?
2. How do tectonics and climate interact to shape the Earth’s surface and create natural hazards?
3. What are the interactions among ice masses, oceans, and the solid Earth and their implications for sea level change?
4. How do magmatic systems evolve and under what conditions do volcanoes erupt?
5. What are the dynamics of the mantle and crust and how does the Earth’s surface respond?
6. What are the dynamics of the Earth’s magnetic field and its interactions with the Earth system?
The Solid Earth is:Complex, Nonlinear, and Self-Organizing
Relevent questions that Computational technologies can help answer:
1. How can the study of strongly correlated solid earth systems be enabled by space-based data sets?
2. What can numerical simulations reveal about the physical processes that characterize these systems?
3. How do interactions in these systems lead to space-time correlations and patterns?
4. What are the important feedback loops that mode-lock the system behavior?
5. How do processes on a multiplicity of different scales interact to produce the emergent structures that are observed?
6. Do the strong correlations allow the capability to forecast the system behavior in any sense?
SESWG fed into NASA ESE Computational TechnologyRequirements Workshop, May 2002
Characteristics of Computing for Solid Earth Science
• Widely distributed heterogeneous datasets
• Multiplicity of time and spatial scales
• Decomposable problems requiring interoperability for full models
• Distributed models and expertise
Enabled by Grids and Networks
Objectives
• IT approaches: Integrate multiple scales into computer simulations.
• Web services: Simplified access to data, simulation codes, and flow between simulations of varying types.
What are Grids Good for?
• They are “Internet Scale Distributed Computing” and support the linking of globally distributed entities in e-Science concept– Computers– Data from repositories and sensors– People
• Early Grids focused on metacomputing (linking computers together) but recently e-Science has highlighted integration of data and building communities
• Grid technology naturally build Problem Solving Environments
Some Relevant Grid/Framework Projects• QuakeSim and Solid Earth Research Virtual
Observatory SERVOGrid (JPL …) • GEON: Cyberinfrastructure for the Geosciences (San
Diego, Missouri, USGS ..)• CME: Community Modeling Environment from SCEC• CIG: Computational Infrastructure for Geodynamics• Geoframework.org Caltech/VPAC • ESMF: Earth System Modeling Framework (NASA)• NERCGrid: Natural Environment Research Council
UK e-Science• Earth Systems Grid in DoE Science Grid
Earth Science ComputingCapability Capacity
Earth Science Data
Large Scale Parallel Computers
Metacomputing GridQuickTime™ and a
decompressorare needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
IMAGING INSTRUMENTS
COMPUTATIONALRESOURCES
LARGE-SCALE DATABASES
DATA ACQUISITION ,ANALYSIS
ADVANCEDVISUALIZATION
Analysis and Visualization
NO Capability: Spread a single large Problem over multiple supercomputers YES Capacity: Seamless access to multiple computers
Large Disks
Database Database
ResearchSimulations
Analysis and VisualizationPortal
RepositoriesFederated Databases
Data Filter
Services
Field Trip DataStreaming Data
Sensors
?DiscoveryServices
SERVOGridResearch Education
CustomizationServices
From Research
to Education
EducationGrid ComputerFarm
Geoscience Research andEducation Grids
More General Material on Grids
• Grids today are built in terms of Web Services – a technology designed to support Enterprise Software and e-Business– Provides wonderful support tools– Provides a new software engineering model supporting
interoperability• Grids do not compete with parallel computing
– They let MPI run untouched so your parallel codes run as fast as they used to do
• Grids do “control/management/metadata management” where higher latency (around 10 milliseconds – thousand times worse than MPI) acceptable
• Global Grid Forum, W3C, OASIS set relevant standards and support community
Raw (HPC) Resources
Middleware
Database
PortalServices
SystemServices
SystemServices
SystemServices
Application Service
SystemServices
SystemServices
GridComputing
Environments
UserServices
“Core”Grid
Application Metadata
Actual Application
Grids provide
• “Service Oriented Architecture” supporting distributed programs in scalable fashion with clean software engineering
• “Multi-tier” architecture supporting seamless access with brokers mediating access to diverse computers and data sources
• “Workflow” integrating different distributed services in a single application
• Event services to notify computers and people of issues (earthquake struck, job completed)
• Easy support of parameter searches and other pleasingly parallel applications with many related non-communicating jobs
• Security (Web Services), Database access (OGSA-DAI), Collaboration (Access Grid, GlobalMMCS)
• File, data and meta-data management
Web Services
• Web services are the fundamental pieces of distributed Service Oriented Architectures.
• We should define lots of useful services that are remotely available– Archival data access services supporting queries, real
time sensor access, and mesh generation all seem to be popular choices.
• Web services have two important parts:– Distributed services– Client applications
• These two pieces are decoupled: one can build clients to remote services without caring about the programming language implementation of the remote service.– Java, C++, Python
Web Services, Continued• Clients can be built in any number of styles
– We build portlet clients: ubiquitous, can combine – One can build fancier GUI client applications.– You can even embed Web service client stubs (library
routines) in your application code, so that your code can make direct calls to remote data sources, etc.
• Regardless of the client one builds, the services are the same in all cases: – my portal and your application code may each use the same
service to talk to the same database.• So we need to concentrate on services and let clients bloom as
they may:– Client applications (portals, GUIs, etc.) will have a much
shorter lifecycle than service interface definitions, if we do our job correctly.
– Client applications that are locked into particular services, use proprietary data formats and wire protocols, etc., are at risk. Use WSRF/JSR-168 Portlet standards
Data Deluged Science
• During the HPCC Initiative 1990-2000, we worried about data in the form of parallel I/O or MPI-IO, but we didn’t consider it as an enabler of new algorithms and new ways of computing
• Data assimilation was not central to HPCC• DoE ASCI (Stockpile Stewardship) set up because didn’t want/have
test data!• Now particle physics will get 100 petabytes from CERN LHC
– Nuclear physics (Jefferson Lab) in same situation– Use continuously ~30,000 CPU’s simultaneously 24X7
• Weather, climate, solid earth (EarthScope)• Bioinformatics curated databases • Virtual Observatory and SkyServer in Astronomy• Environmental Sensor nets
Data
Information
Ideas
Simulation
Model
Assimilation
Reasoning
Datamining
ComputationalScience
Informatics
Data DelugedScienceComputingParadigm
HPCSimulation
DataFilter
Data FilterD
ata
Filt
er
Data
Filter
Data
Filter
Distributed Filters massage dataFor simulation
Other
Grid
and W
eb
Servi
ces
AnalysisControl
Visualize
Data Deluged ScienceComputing Architecture
Grid
OGSA-DAIGrid Services
Grid Data Assimilation
Some Questions for Data Deluged Science
• A new trade-off: How to split funds between sensors and simulation engines
• No systematic study of how best to represent data deluged sciences without known equations at resolution of interest
• Data assimilation very relevant• Relationship to “just” interpolating data and then extrapolating
a little• Role of Uncertainty Analysis – everything (equations, model,
data) is uncertain!• Relationship of data mining and simulation• Growing interest in Data curation and provenance• Role of Cellular Automata (CA) Potts Models and Neural
Networks which are “fundamental equation free” approaches
Recommendations of NASA’s Computational Technologies Workshop (May 2002)
1. Create a Solid Earth Research Virtual Observatory (SERVO)• Numerous distributed heterogeneous real-time datasets• Seamless access to large distributed volumes of data• Data handling and archiving part of framework• Tools for visualization, datamining, pattern recognition, and data fusion
2. Develop an Solid Earth Science Problem Solving Environment (PSE)• Addresses the NASA specific challenges of multiscale modeling• Model and algorithm development and testing, visualization, and data
assimilation• Scalable to workstations or supercomputer depending on size of problem• Numerical libraries existing within a compatible framework
3. Improve the Computational Environment• PetaFLOP computers with Terabytes of RAM• Distributed and cluster computers for decomposable problems• Development of GRID technologies
SERVOGrid Requirements
• Seamless Access to Data repositories and large scale computers
• Integration of multiple data sources including sensors, databases, file systems with analysis system– Including filtered OGSA-DAI (Grid database access)
• Rich meta-data generation and access with SERVOGrid specific Schema extending openGIS (Geography as a Web service) standards and using Semantic Grid
• Portals with component model for user interfaces and web control of all capabilities
• Collaboration to support world-wide work• Basic Grid tools: workflow and notification• Not metacomputing
Solid Earth Research Virtual Solid Earth Research Virtual Observatory (SERVO)Observatory (SERVO)
Tier2 Center
Archive
SERVO
…Goddard JPL Ames
InstituteInstituteInstituteInstitute
Fully functional problem solving environment
•Plug and play composing of parallel programs from algorithmic modules
•On-demand downloads of 100 GB in 5 minutes•106 volume elements rendering in real-time
•Program-to-program communication in milliseconds
•Approximately 100 model codesData cache
~TBytes/day
Tier2 CenterTier2 CenterTier2 Center
Tier 0 +1
Tier 1
Tier 3
Tier 4
Tier2 Center
•1 PB per year data rate in 2010
•Distributed Heterogeneous Real-Time Datasets
Observations
Archive
Downlink
Archive Downlink
Downlink
…
……
…… …
…
100 TeraFLOPs sustained
Tier 2
Workstations, other portals
100 - 1000 Mbits/sec
Virtual Observatory Project
2003 2004 2005 2006 2007 2008 2009 2010Timeline
Cap
abili
ty
Architecture & technology approach
Decomposition into services with requirements
Prototype cooperative federated data base service integrating 5 datasets of 10 TB each
Prototype data analysis service
Prototype modeling service capable of integrating 5 modules
Prototype 1920x1080 pixels at 120 frames per second visualization service
Scaled to 100 sites
• Solid earth research virtual observatory (SERVO)
• On-demand downloads of 100 GB files from 40 TB datasets within 5 minutes.
• Uniform access to 1000 archive sites with volumes from 1 TB to 1 PB
NASA CT Workshop, May 2002
Problem Solving Environment Project
2003 2004 2005 2006 2007 2008 2009 2010Timeline
Cap
abili
ty
Isolated platform dependent code fragments
Prototype PSE front end (portal) integrating 10 local and remote services
Extend PSE to Include• 20 users collaboratory with shared windows• Seamless access to high-performance computers
linking remote processes over Gb data channels.
Integrated visualization service with volumetric rendering
• Fully functional PSE used to develop models for building blocks for simulations.
• Program-to-program communication in milliseconds using staging, streaming, and advanced cache replication
• Integrated with SERVO
• Plug and play composing of parallel programs from algorithmic modules
Plug and play composing of sequential programs from algorithmic modules
NASA CT Workshop, May 2002
Computational Environment
2003 2004 2005 2006 2007 2008 2009 2010Timeline
Cap
abili
ty
100’s GigaFLOPs40 GB RAM1 Gb/s network bandwidth
~100 model codes with parallel scaled efficiency of 50%
~104 PetaFLOPs throughput per subfield per year
~100 TeraFLOPs sustained capability per model
~106 volume elements rendering in real time
Access to mixture of platforms low cost clusters (20-100) to supercomputers with massive memory and thousands of processors
NASA CT Workshop, May 2002
This slide appears inconsistentwith slide 8
Solid Earth Research Virtual Observatory (iSERVO)
Web-services (portal) based Problem Solving Environment (PSE)Couples data with simulation, pattern recognition software, and
visualization softwareEnable investigators to seamlessly merge multiple data sets and
models, and create new queries.
Data• Spaced-based observational data• Ground-based sensor data (GPS, seismicity)• Simulation data• Published/historical fault measurements
Analysis Software• Earthquake fault• Lithospheric modeling• Pattern recognition software
Philosophy
• Store simulated and observed data• Archive simulation data with original simulation code and
analysis tools• Access heterogeneous distributed data through cooperative
federated databases• Couple distributed data sources, applications, and hardware
resources through an XML-based Web Services framework. • Users access the services (and thus distributed resources)
through Web browser-based Problem Solving Environment clients.
• The Web services approach defines standard, programming language-independent application programming interfaces, so non-browser client applications may also be built.
SERVOGrid Basics
• Under development in collaboration with researchers at JPL, UC-Davis, USC, and Brown University.
• Geoscientists develop simulation codes, analysis and visualization tools.
• We need a way to bind distributed codes, tools, and data sets.
• We need a way to deliver it to a larger audience– Instead of downloading and installing the code,
use it as a remote service.
SERVOGrid Application Descriptions
• Codes range from simple “rough estimate” codes to parallel, high performance applications.– Disloc: handles multiple arbitrarily dipping dislocations (faults) in an elastic
half-space.– Simplex: inverts surface geodetic displacements for fault parameters using
simulated annealing downhill residual minimization. – GeoFEST: Three-dimensional viscoelastic finite element model for
calculating nodal displacements and tractions. Allows for realistic fault geometry and characteristics, material properties, and body forces.
– Virtual California: Program to simulate interactions between vertical strike-slip faults using an elastic layer over a viscoelastic half-space
– RDAHMM: Time series analysis program based on Hidden Markov Modeling. Produces feature vectors and probabilities for transitioning from one class to another.
– PARK: Boundary element program to calculate fault slip velocity history based on fault frictional properties.a model for unstable slip on a single earthquake fault.
• Preprocessors, mesh generators• Visualization tools: RIVA, GMT
iSERVO Web Services
• Job Submission: supports remote batch and shell invocations– Used to execute simulation codes (VC suite, GeoFEST, etc.), mesh
generation (Akira/Apollo) and visualization packages (RIVA, GMT).• File management:
– Uploading, downloading, backend crossloading (i.e. move files between remote servers)
– Remote copies, renames, etc.• Job monitoring• Apache Ant-based remote service orchestration
– For coupling related sequences of remote actions, such as RIVA movie generation.
• Database services: support SQL queries• Data services: support interactions with XML-based fault and surface
observation data.– For simulation generated faults (i.e. from Simplex)– XML data model being adopted for common formats with translation
services to “legacy” formats.– Migrating to Geography Markup Language (GML) descriptions.
Some Conclusions
• Grids facilitates support– International Collaborations– Integration of computing with distributed data
repositories and real-time sensors– Web services from a variety of fields (e.g.
map services from openGIS)– Seamless access to multiple networked
compute resources including computational steering
– Software infrastructure for Problem Solving Environments