GEON IT Advances: ⁃ Data Integration ⁃ GEON Workbench ⁃ Scientific Workflows
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GEON Systems Report Karan Bhatia San Diego Supercomputer...
-
Upload
hugo-french -
Category
Documents
-
view
214 -
download
0
Transcript of CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GEON Systems Report Karan Bhatia San Diego Supercomputer...
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
GEON Systems Report
Karan BhatiaSan Diego Supercomputer Center
Friday Aug 13 2004
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Year 2 Goals & Accomplishments
• Goals:– Procure and deploy
physical resources for partners
– Provide infrastructure for management of systems
• including mechanisms for collaboration and communication
– Provide basic production services for data
– Provide basic grid services for applications
• Physical Layer– Purchased and Deployed hardware
• Systems Layer– Developed management software
and collaborations with partner sites– Developed Geon Software Stack
• Grid Layer– Beginning to build out Services
• Portal & Security done (end of aug)• Naming & Discovery, Data
Management & Replication, and mediation
– Basic research still being done
• Applications Layer– Some apps ready, used as templates
for how to build apps in Geon
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
GEONgrid Development
Physical Deployment Hardware, clusters, networks
Systems Layer OS & Software layer
Grid Layer Grid System Services
Applications End-user Apps & Services
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Physical Deployment
• Vendors:– Dell (27 prod systems + 9 devel systems)
• Poweredge 2650-based systems• Dual 2.8 GHz Pentium processors• 2 GB RAM
– ProMirco (3 systems)• Dual pentium• 4 TB + RAID
– HP Cluster donation (9 systems)• Rx2600-based dual 1.4 GHz
• 15 partner sites– 1 PoP node
– Optional small cluster (4 system)
– Optional data node
• Misc equipment as needed– Switches, racks, etc.
Physical Deployment
Systems Layer
Grid Layer
Applications
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Deployment Architecture
• Similar to BIRN Architecture
– Each site runs a PoP– Optional cluster and
data nodes
• Users access resources through PoP
– PoP provides point of entry
– PoP provides access to global services
• Developers add services & data hosted on GEON resources
– Web services/Grid services
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
GEONgrid Current Status
Physical Resources:
- all pops deployed, 3 data nodes deployed, clusters all up
- HP cluster delivered
Software Stack:
- mix of GeonRocks 0.1 (redhat 9-based), redhat 9
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Systems Layer
• Unified Software Stack definition– Custom GEON Roll
• Web/Grid Services software Stack• Common GEON Applications and Services
• Focus on scalable systems management– Modified Rocks for wide-area cluster management
(See [Sacerdoti94])
• Collaborations with partner sites– Identified appropriate contacts
Physical Deployment
Systems Layer
Grid Layer
Applications
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
GEON Software Roll
• Development– OGSI 1.0 (from GT3.0.2) --> GT3.2 (packaged by
NMI) – Web Services (jakarta, axis, ant, etc)– GridSphere 2.02 Portal Framework
• Database– IBM DB2 (packaged for Protein Data Bank)– Postgres --> PostGIS– SRB Client software– OPeNDAP roll (UNAVCO)
• Security– DB2 with GSI Plugin (developed by Teragrid)– Tripwire
• System Monitoring– Grid Monitor– INCA Testing and Monitoring framework (Teragrid)
• With GRASP benchmarks
– Network Weather Service (NWS)
GEON Software Stack Version 1.0 to be deployed
starting Sept 1, 2004!
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Wide-Area Cluster Management
• Frederico Sacerdoti, Sandeep Chandra, and Karan Bhatia, “Grid Systems Deployment and Management using Rocks”, Cluster 2004, Sept. 20-23 2004, San Diego, California
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Additional Infrastructure
• Production/Development servers– 8 development servers used for various activities – Main Production Portal – Blogs, forums, RSS– Production application services
• CVS services– cvs.geongrid.org
• Geon Certificate Authority– ca.geongrid.org
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Grid Layer
• Goals – Evaluate core software infrastructure
• CAS, Handle.net, RLS (Replica Location Service), VOMS (Virtual Organization Mgmt),Firefish, MCS (Metadata Catalog Service), SRB, CSF (Community Scheduling Framework).
– Integrate or build as necessary1. Portal Infrastructure2. Security Infrastructure3. Naming and Discovery Infrastructure4. Data Management and Replication5. Generic Mediation
Physical Deployment
Systems Layer
Grid Layer
Applications
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
1. Portal Infrastructure
• GridSphere Portal Framework– Developed by GridLab (Jason Novatny, and others) Albert Einstein
Institute, Berlin, Germany– Java/JSP Portlet Container
• JSR 168 support, WSRP and JSF coming
– Supports • Collaboration (standard portlet API)• Personalization (eg. my.yahoo.com)• Grid Services (GSI support)• Web Services
• Other Frameworks– Open Grid Computing Environments (OGCE)
• Apache JetSpeed based --> Sakai
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
2. Security Infrastructure• GSI Based
– Collaboration with Telescience & BIRN
– GEON certificate authority: ca.geongrid.org
• SDSC CACL system
– Roll-based access control using Globus Community Authorization System (CAS)
• geonAdmin, geonPI, geonUser, public
– Portal Integration• Account requests,
certificate management
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
3. Naming and Discovery
• Naming– All service instances, datasets and applications
– Two level naming scheme to support replication and versioning
– GeoID similar to LSID (Life Sciences ID)
– Globally Unique and Resolvable
• Resolution– GeoID --> usable reference (eg. WSDL)
– Handle system (CNRI)
• Discovery– Discover resources in heterogeneous metadata repositories
• MCAT, MCS, Geography Network (ESRI), OPeNDAP
– Firefish (LBL)
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
4. Data Management & Replication
• Installed Services– GridFTP– SRB Server
• GMR testing – Grid Movement and
Replication– With IBM Research
• OGSA-DAI performance– With GRASP (baru,
casanova, snavely)
0
10
20
30
40
50
Seconds
LAN WAN
Data Access Performance
OGSA-DAI JDBC
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
5. Mediation Services
• GIS Map Integration– See next talk (Ludaescher)
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Year 2 Summary
• Physical Layer– Purchased and Deployed hardware
• Systems Layer– Developed management software and
collaborations with partner sites– Developed Geon Software Stack
• Grid Layer– Beginning to build out Services
• Portal & Security done (end of aug)• Naming & Discovery, Data Management &
Replication, and mediation
– Basic research still being done
• Applications Layer– Some apps ready, used as templates for
how to build apps in Geon
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Looking Ahead, Year 3
• Goals:– Provide core software infrastructure– Integration with outside resources– Encourage software development and
integration with partners– More data, more apps, more tools
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Questions?
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Additional Material
www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Grid Movement and Replication (with IBM)
• Data is stored in the postgres database at UTEP on the GEON node.
• GMR capture service running at UTEP reads and replicates data to the postgres database running at SDSC.
• GMR apply and monitor service run at SDSC to store data sent by the capture service.
• OGSA-DAI data access service provides access to database on both UTEP and SDSC nodes.
• The user application grid service accepts two parameters,
– The name of the node you want to access and
– An SQL query to get data of interest that will be sent to the grav application.
• Based on the SQL query an XML query document is generated.
• Also based on the node, an appropriate service handle is selected.
• The application grid service invokes the OGSA-DAI grid service handle to access data from the database.
• The application grid service receives the data, and parses it to extract the relevant data values that are submitted to the grav application.