grids.ucs.indiana.edu

28
1 Cyberinfrastructure to integrate simulation, data and sensors for collaborative eScience in CRESIS CERSER and CRESIS http://nia.ecsu.edu/ Elizabeth City State University October 19 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email protected] http:// www.infomall.org

Transcript of grids.ucs.indiana.edu

  • 1. Cyberinfrastructure to integrate simulation, data and sensors for collaborative eScience in CRESIS CERSER and CRESIShttp://nia.ecsu.edu/ Elizabeth City State UniversityOctober 19 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email_address] http:// www.infomall.org

2. Abstract

  • Cyberinfrastructuresupports eScience or collaborative science with distributed scientists, computers, data repositories and sensors.
  • We describe the emergingGrid softwarefor eScience and the underlying Cyberinfrastructure such as theTeraGrid .
  • We give one examples in detail:iSERVO the International Solid Earth Research Virtual Organization supporting Earthquake Science
  • This illustratesComputing Grids ,Geographical Information System Grids ,Sensor Grids
  • We suggest implications forCReSIS Center for Remote Sensing of Ice Sheets

3. Why Cyberinfrastructure Useful

  • Supportsdistributed science data, people, computers
  • ExploitsInternet technology(Web2.0) adding management, security, supercomputers etc.
  • It has two aspects:parallel low latency (microseconds) between nodes anddistributed highish latency (milliseconds) between nodes
  • Parallel needed to gethigh performanceonindividual3D simulations, data analysis etc.; mustdecompose problem
  • Distributed aspectintegratesalready distinct components
  • Cyberinfrastructure is in general adistributed collection of parallel systems
  • Grids are made of servicesthat are just programs or data sources packaged for distributed access

4. e-moreorlessanything and the Grid

  • e-Scienceis about global collaboration in key areas of science, and the next generation of infrastructure that will enable it. from its inventorJohn TaylorDirector General of Research Councils UK, Office of Science and Technology
  • e-Scienceis about developing tools and technologies that allow scientists to do faster, better or different research
  • Similarlye-Businesscaptures an emerging view of corporations as dynamicvirtual organizationslinking employees, customers and stakeholders across the world.
    • The growing use ofoutsourcingis one example
  • TheGridprovides the information technologye-infrastructurefore-moreorlessanything .
  • Adeluge of dataof unprecedented and inevitable size must be managed and understood.
  • People ,computers ,dataandinstrumentsmust be linked.
  • On demandassignment of experts, computers, networks and storage resources must be supported

5. TeraGrid: Integrating NSF Cyberinfrastructure TeraGrid is a facility that integrates computational, information, and analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University,Indiana University , Oak Ridge National Laboratory, the Pittsburgh Supercomputing Center, and the National Center for Atmospheric Research. Today 100 Teraflop; tomorrow a petaflop; Indiana 20 teraflop today. SDSC TACC UC/ANL NCSA ORNL PU IU PSC NCAR Caltech USC-ISI Utah Iowa Cornell Buffalo UNC-RENCI Wisc 6. Virtual Observatory Astronomy Grid Integrate Experiments Radio Far-Infrared Visible Visible + X-ray Dust Map Galaxy Density Map 7. Grid Capabilities for Science

  • Opentechnologies for anylarge scale distributed systemthat is adopted by industry, many sciences and many countries (including UK, EU, USA, Asia)
    • Security, Reliability, Management and state standards
  • Serviceand messaging specifications
  • User interfacesvia portals and portlets virtualizing to desktops, email, PDAs etc.
    • ~20 TeraGridScience Gateways(their name for portals)
    • OGCE Portaltechnology effort led by Indiana
  • Uniform approach to access distributed(super)computerssupportingsingle (large) jobsandspawning lots of related jobs
  • Dataandmeta-dataarchitecture supporting real-time and archives as well as federation
    • Links toSemantic webandannotation
  • Grid (Web service) workflow with standards and several successful instantiations (such asTavernaandMyLead )
  • ManyEarth science gridsincluding ESG (DoE), GEON, LEAD, SCEC, SERVO; LTER and NEON for Environment
    • http://www.nsf.gov/od/oci/ci-v7.pdf

8. APEC Cooperation for Earthquake Simulation

  • ACESis a seven year-long collaboration among scientists interested inearthquake and tsunami predication
    • iSERVOis Infrastructure to support work of ACES
    • SERVOGridis (completed) USGrid that isa prototype of iSERVO
    • http://www.quakes.uq.edu.au/ACES/
  • Chartered underAPECthe Asia Pacific EconomicCooperation of 21 economies

9. Database Analysis andVisualization Portal Repositories Federated Databases DataFilter Services StreamingData Sensors SERVOGrid Research Simulations Research Education Customization Services FromResearch to Education Education GridComputer Farm Grid of Grids: Research Grid and Education Grid Sensor Grid Database Grid Compute Grid Database Field Trip Data ? Discovery Services GIS Grid 10. SERVOGrid and Cyberinfrastructure

  • Gridsare the technology based on Web services that implementCyberinfrastructurei.e. support eScience or science as a team sport
    • Internet scale managed services that linkcomputers data repositories sensors instrumentsandpeople
  • There is aportaland services inSERVOGridfor
    • Applicationssuch as GeoFEST, RDAHMM, Pattern Informatics, Virtual California (VC), Simplex, mesh generating programs ..
    • Job managementand monitoring web services for running the above codes.
    • File managementweb services for moving files between various machines.
    • Geographical Information System services
    • Quaketablesearthquake specific database
    • Sensorsas well as databases
    • Context(dynamic metadata) andUDDIsystem long term metadata services
    • Services supportstreaming real-timedata

11. a Topography 1 km Stress Change Earthquakes PBO Site-specific Irregular Scalar Measurements Constellations for Plate Boundary-Scale Vector Measurements a a Ice Sheets Volcanoes Long Valley, CA Northridge, CA Hector Mine, CA Greenland 12. Some Grid Concepts I

  • Servicesare just (distributed) programs sending and receiving messages with well defined syntax
  • Interfaces(input-output)must be open ; innards can be open source (allowing you to modify) or proprietary
    • Services can be any language from Fortran, Shell scripts, C, C#, C++, Java, Python, Perl your choice!!
    • Web Servicessupported by all vendors (IBM, Microsoft )
  • Service overheadwill be just afew milliseconds(more now) which is < typical network transit time
    • Any program that is distributed can be a Web service
    • Any program taking execution time 20ms can be an efficient Web service

13. Web services

  • Web Servicesbuildloosely-coupled, distributedapplications, (wrapping existing codes and databases) based on theSOA(service oriented architecture) principles.
  • Web Services interact by exchanging messages inSOAP format
  • The contracts for the message exchanges that implement those interactions are described viaWSDLinterfaces.

14. A typical Web Service

  • In principle, services can be inanylanguage (Fortran .. Java .. Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining)
  • The simplest implementations involveXML messages (SOAP)and programs written in net friendly languages like Java and Python

Payment Credit Card Warehouse Shipping control WSDL interfaces WSDL interfaces Web Services Web Services Security Catalog Portal Service 15. Some Grid Concepts II

  • Systems are built from contributions from many different groups you do not need one vendor for all components as Web services allow interoperability between components
    • One reasonDoD likes Grids(called Net-Centric computing)
  • Grids aredistributedin services and data allowing anybody to store their data and to produce their view
    • Some think that University Library of future will curate/store data of their faculty
  • 2 level programming model : Classic programming of services and services are composed using workflow consistent with industry standards (BPEL)
  • Grid of Grids : (System of Systems) Realistically Grid-like systems will be built using multiple technologies and standards integrate separate Grids for Sensors, GIS, Visualization, computing etc. withOGSA(Open Grid Service Architecture from OGF) system Grid (Security, registry) into a singleGrid
  • Existing codes UNCHANGED ; wrap as a service with metadata

16. TeraGrid User Portal 17. LEAD Gateway Portal NSF Large ITR and Teragrid Gateway - Adaptive Response to Mesoscale weather events - Supports Data exploration,Grid Workflow 18. Grid Workflow Data Assimilation in Earth Science

  • Grid servicestriggered by abnormal events and controlled byworkflowprocess real time data from radar and high resolution simulations for tornado forecasts

Use a Portlet-based user portal to accessand control services and workflow 19. SERVOGrid has a portal

  • The Portal is built from portlets providing user interface fragments for each service that are composed into the full interface uses OGCE technology as does planetary science VLAB portal with University of Minnesota

20. GIS and Sensor Grids

  • OGChas defined a suite ofdata structuresandservicesto supportGeographical Information Systems and Sensors
  • GMLGeography Markup language defines specification of geo-referenced data
  • SensorMLandO&M(Observation and Measurements) define meta-data and data structure for sensors
  • Services likeWeb Map Service, Web Feature Service, Sensor Collection Servicedefine services interfaces to access GIS and sensor information
  • Grid workflowlinks services that are designed to support streaming input and output messages
  • We built Grid (Web) service implementations of these specifications for NASAsSERVOGrid
  • UseGoogle mapsas front end to WMS and WFS

21. Grid Workflow Datamining in Earth Science

  • Work withScripps Institute
  • Grid servicescontrolled byworkflowprocess real time data from ~70 GPS Sensors in Southern California

NASA GPS Earthquake Streaming Data Support Transformations Data Checking Hidden Markov Datamining (JPL) Display (GIS) 22. Earth/Atmosphere Grids built as Grids of (library) Grids Ice Sheet Sensors, SAR, Filters, EM, Glacier Simulations Physical Network Registry Metadata Earthquake Data, Filters & Simulation Services EarthquakeSERVOGrid Ice Sheet PolarGrid Data Access/Storage Portals Visualization Grid Collaboration Grid Sensor Grid Compute Grid GIS Grid Core Grid Services TornadoGrid Security Workflow Notification Messaging 23. CReSIS PolarGrid

  • Important CReSIS-specific Cyberinfrastructure components include
    • Managed data fromsensorsandsatellites
    • Data analysis such asSAR processing possibly with parallel algorithms
    • Electromagnetic simulations(currently commercial codes) to design instrument antennas
    • 3D simulations ofice-sheets(glaciers) with non-uniform meshes
    • GISGeographical Information Systems
  • Also need capabilities present in many Grids
    • Portali.e. Science Gateway
    • Submittingmultiple sequential or paralleljobs

24. What should we do?

  • Identifyexisting programsthat should be wrapped asGrid services
    • One can do this even for commercial codes as one keeps existing codes (Fortran, C++) unchanged and constructs a metadata wrapper defining where programs and its data are located and how to invoke.
  • Identify whereparallel versionsneeded and ifhelpneeded in creating these
    • Parallel codes can be Grid services
    • Electromagnetic codes are commercial in principle parallel
    • Ice sheet models can be parallelized for high resolution simulations
  • Scope out system;Computationalneeds -Identify value ofTeraGrid ; datastorageneeds;networkrequirements
  • Examinedata modeland produce a dataGrid architecture
    • Use databases? Distributed? Metadata? Files? What are key performance issues?
  • Examine integration ofGISwith Grid Services
  • Design and implementScience Gateway
  • Are there importantvisualizationrequirements outside GIS?
  • Are there key issues fromsecurity ?
  • Bring up core services such asregistries
  • Need infrastructure to run services ( Linux PC )

25. Benefits of CReSIS PolarGrid

  • Shared resources supportcollaboration among CReSIS scientists
  • Integrationof Polar related data with appropriate compute resources enabling research on specific topics and studies across topics
  • Polar Science Gatewayaccessing common services (programs), data and their integration as workflow
  • Access toTeraGridwith same interface for large scale simulations
  • Can sharecommon capabilities(SAR analysis, GIS) with related Grids such as SERVOGrid, GEON, LEAD etc.
  • Modular Grid servicesallow exchange of new capabilities preserving systems
    • e.g. Change EM Simulation service
  • Managementof dynamic heterogeneous data

26. SERVO/QuakeSim Services Eye Chart Weve built a Web Service version of this OGC standard.Weve extended it to support data streaming for increased performance. Web Feature Service We built a Web Service version of this Open Geospatial Consortium specification.The WMS constructs images out of abstract feature descriptions. Web Map Service We have built data model extensions to UDDI to support XPath queries over Geographical Information System capability.xml files.This is designed to replace OGC (Open Geospatial Consortium) Web registry service Information Service This uses capabilities built into portal. Note that simulations are typically performed on machines where user has accounts while data services are shared for read access Authentication andAuthorization We use an OGCE based portal based on portlet architecture Portal We built a file web service that could do uploads, downloads, and crossloads between different services. Clearly this supports specific operations such as file browsing, creation, deletion and copying. File Services We have an Application and a Host Descriptor service based on XML schema descriptors.Portlet interfaces allow code administrators to make applications available through the browser. Application and Host Metadata Service We store information gathered from users interactions with the portal interface in a generic, recursively defined XML data structure.Typically we store input parameters and choices made by the user so that we can recover and reload these later.We also use this for monitoring remote workflows.We have devoted considerable effort into developing WS-Context to support the generalization of this initial simple service. Context Data ServiceThese can be all launched by a single Job Management service or by custom instances of this with metadatapreset to a particular application Specific Applications: Virtual California, Geofest,Park, RDAHMM .. SERVO wraps Apache Ant as a web service and uses it to launch jobs.For a particular application, we design a build.xml template.The interface is simply a string array of build properties called for by the template.Weve also built a simple generic template engine version of this. Job Management Description Service 27. Service Eye Chart Continued WS-Security JSDL WSRF BPEL OGSA-DAI Key interfaces/standards/software NOT Used (often just for historical reasons as project predated standard) GML WFS WMSWSDL XML Schema with pull parser XPPSOAP with Axis 1.x UDDI WS-ContextJSR-168 JDBC Servlets WS-Management VOTables in Research Key interfaces/standards/software Used We are developing a Web Service based on the National Virtual Observatorys VOTables XML format for tabular data.We see this as a useful general format for ASCII data produced by various application codes in SERVO and other projects. Data Tables Web Service We are developing Dislin-based scientific plotting services as a variation of our Web Map Service: for a given input service, we can generate a raster image (like a contour plot) which can be integrated with other scientific and GIS map plot images. Scientific Plotting Services The USC QuakeTables fault database project includes a web service that allows you to search for Earthquake faults. QuakeTables Database Services This supplies alertsto users when filters (data-mining) detects features of interest Notification Service This is used to stream data in workflow fed by real-time sources. It is based on NaradaBrokering which can also be used in cases just involving archival data Messaging Service We are developing infrastructure to support streaming GPS signals and their successive filtering into different formats.This is built over NaradaBrokering (see messaging service).This does not use Web Services as such at present but the filters can be controlled by HPSearch services. Sensor Grid Services The HPSearch project uses HPSearch Web Services to execute JavaScript workflow descriptions.It has more recently been revised to support WS-Management and to support both workflow (where there are many alternatives) and system management (where there is less work). Management functions include life cycle of services and QoS for inter-service links Workflow/Monitoring/Management Services 28. Key GIS and Related Services Description Component Publish/subscribe system allows data streams to be reorganized using topics.Sensor Grid Supports integration of local and remote map services; treats Google maps as an OGC-compliant map server; Web Map Services Supports both streaming and non-streaming returns of query results. Web Feature Service Contexts can be used to hold arbitrary content (XML, URIs, name-value pairs); can be used to support distributed session state as well as persistent data; currently researching scalability. WS-Context Support for streaming data between services; supports scriptable workflows so not limited to DAGs; implementation of WS-Distributed Management HPSearch