GriPhyN - SDSC Research and Infrastructure
description
Transcript of GriPhyN - SDSC Research and Infrastructure
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
GriPhyN -SDSC Research and
Infrastructure
Reagan MooreSan Diego Supercomputer Center
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
Topics
• Research activities• Advanced query interfaces - Amarnath
Gupta• Knowledge bases - Bertram Ludaescher
• Infrastructure development• SRB replication - Michael Wan• MCAT information catalog - Arcot Rajasekar• Grid Portals - Mary Thomas• WSDL web services - Arun Jagatheesan
• Grids
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
LIGO Support Opportunities
• Pattern recognition in template and chirp-transform data using database technology
• Derived data product optimization through optimization of input parameters - controlled parameter sweeps
• Utilization of SRB/MCAT for storage of virtual data products
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
SDSS Support Opportunities
• Federation of sky survey services• Development of a dynamic cross-match service between
SDSS and other sky surveys• WSDL based web interface for sky survey services• UDDI based service directory
• Build topic map providing relationships between “Strasbourg sky survey” attributes• Correlate attributes through physical laws as well as
derived observations
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
Integration of XSIL and XQuery
• An XML query language designed for heterogeneous data sources
• Authors: Don Chamberlin (IBM), Jonathan Robie (SoftwareAG), and Deniela Florescu (INRIA)
• Quilt is built on previous XML query languages :
-- XPath, XQL, XML-QL, XMAS, Lorel, YATL
• Become a standard query language for XML, called XQuery
“List the titles of all books published by Addison Wesley after 1991, in alphabetic order.” FOR $b IN document("www.bn.com/bib.xml")//book
[publisher = "Addison Wesley" AND @year > "1991"] RETURN
$b/@year, $b/title SORTBY (title)
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
• A flexible, XML based, hierarchical, extensible, transport language for scientific data objects
Extensible Scientific Interchange Language (XSIL)
<?xml version="1.0"?><!DOCTYPE XSIL SYSTEM "xsil.dtd"><XSIL> <XSIL> <Array Name="hello" Type="double">
<Dim>10</Dim> <Stream Encoding="Text" Type="Local" Delimiter=",">
0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 </Stream> </Array> </XSIL> <XSIL Type="Simple.Label" Name="Example"> <Param Name="Message">Hello Auntie Joan</Param> <Param Name="FontSize">96</Param> </XSIL></XSIL>
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
Quilt Extensions
• Added the concept of data types• Float, integer, and boolean versus string
• Added operator overloading• “Sum” on type string concatenates• “Sum” on type integer adds
• Added array operations• Get, set, element summation, array
summation. Subsequence, concatenate
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
Logicalcollection-Elements- attributes
Exportelements
& attributes
Grid Container-Logical name-Container metadata-Element attributes -(Data model)-Elements
Grid metadata catalog
Mapping of logicalcontainers tophysical files
Grid replicacatalog
Import intoexisting ornew logicalcollection
Logicalcollection
TransformsOn elements
AvailableTransforms
Derived data
productsDerived
Datametadata
Derived data
process metadata
Data Grids Linking Collections
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
SRB Status
• SRB Features• Demonstration of the ability to coordinate
bulk metadata and bulk data loads• Aggregate files into a “container”,
simultaneously write metadata into a file for bulk load into the MCAT information repository
• Achieved file import rate of 250 files/second• Development in progress
• Improved error statement management• mySRB.html web interface for collection
support
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
MCAT Web Interface
• Provide collection management• Create a collection• Define collection attributes• Ingest data / move / replicate• Browse• Query• Annotate• Comment
• https://srb.npaci.edu/mySRB.html
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
Grid Portal Development
• Integrate collection management of derived data products with Grid execution portal
• Based on Grid Port and SRB
• Funded by GriPhyN, NPACI, NASA IPG
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
GridPort + SRB Architecture
• With SRB capabilities, file access is direct, uniform• Uses same authentication as portal and other Grid services• Single SRB account access allows for more flexible data management
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
Other Data Grids
• NSF - National Virtual Observatory• DOE - Particle Physics Data Grid -
Babar• NSF - United Kingdom data grid• NSF - Distributed Terascale Facility
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
Compute Resources Catalogs Data Archives
InformationDiscovery
Metadatadelivery
Data Discovery
Data Delivery
Catalog Mediator Data mediator
1. Portals and Workbenches
Bulk DataAnalysis
CatalogAnalysis
MetadataView
DataView
4.GridSecurityCachingReplicationBackupScheduling
2.Knowledge & ResourceManagement
Standard Metadata format, Data model, Wire format
Catalog/Image Specific Access
Standard APIs and Protocols Concept space
3.
5.
6.
7. Derived Collections
Astronomy Sky SurveyData Grid
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
PPDG - Babar Support
• Installed SRB at Stanford• Added Babar specific metadata
attributes to MCAT catalog• Developed ability to support “soft
links” between collections• Allows same file to appear in multiple
collections• Release in SRB version 1.1.9
• UK data grid (SRB / Condor / Globus)• Rutherford - opportunity for international
demonstration of Babar data replication
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
TeraGrid Wide Area Network
NCSA/UIUC
ANL
UICMultiple Carrier Hubs
Starlight / NW Univ
Ill Inst of Tech
Univ of Chicago Indianapolis (Abilene NOC)
I-WIRE
StarLightInternational Optical Peering Point
(see www.startap.net)
Los Angeles
San Diego
DTF Backbone
Abilene
Chicago
IndianapolisUrbana
OC-48 (2.5 Gb/s, Abilene)
Multiple 10 GbE (Qwest)
Multiple 10 GbE (I-WIRE Dark Fiber)
• Solid lines in place and/or available by October 2001• Dashed I-WIRE lines planned for summer 2002
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
PACI 13.6 TF Linux TeraGrid32
32
5
32
32
5
Cisco 6509 Catalyst Switch/Router
32 quad-processor McKinley Servers(128p @ 4GF, 8GB memory/server)
Fibre Channel Switch
HPSS
HPSS
ESnetHSCCMREN/AbileneStarlight
10 GbE
16 quad-processor McKinley Servers(64p @ 4GF, 8GB memory/server)
NCSA500 Nodes
8 TF, 4 TB Memory240 TB disk
SDSC256 Nodes
4.1 TF, 2 TB Memory225 TB disk
Caltech32 Nodes
0.5 TF 0.4 TB Memory
86 TB disk
Argonne64 Nodes
1 TF0.25 TB Memory
25 TB disk
IA-32 nodes
4
Juniper M160
OC-12
OC-48
OC-12
574p IA-32 Chiba City
128p Origin
HR Display & VR Facilities
= 32x 1GbE
= 64x Myrinet
= 32x FibreChannel
Myrinet Clos SpineMyrinet Clos Spine Myrinet Clos SpineMyrinet Clos Spine
Chicago & LA DTF Core Switch/RoutersCisco 65xx Catalyst Switch (256 Gb/s Crossbar)
= 8x FibreChannel
OC-12
OC-12
OC-3
vBNSAbileneMREN
Juniper M40
1176p IBM SPBlue Horizon
OC-48
NTON
32
24
8
32
24
8
4
4
Sun E10K
4
1500p Origin
UniTree
1024p IA-32 320p IA-64
2
14
8
Juniper M40vBNS
AbileneCalrenESnet
OC-12
OC-12
OC-12
OC-3
8
SunStarcat
16
GbE
= 32x Myrinet
HPSS
256p HP X-Class
128p HP V2500
92p IA-32
24Extreme
Black Diamond
32 quad-processor McKinley Servers(128p @ 4GF, 12GB memory/server)
OC-12 ATM
Calren
2 2
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
Further Information
http://www.npaci.edu/DICE
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
SDSC Storage Resource Broker & Meta-data Catalog
SRBArchives
HPSS, ADSM,UniTree, DMF
DatabasesDB2, Oracle,
Postgres
File SystemsUnix, NT,Mac OSX
Application
C, C++, Linux I/O
Unix Shell
Dublin Core
Resource,User
User Defined
ApplicationMeta-data
RemoteProxies
DataCutter
Third-partycopy
Java, NTBrowsers
WebPrologPredicate
MCAT
HRM
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
Replication Attributes• DATA_NAME
• Global SRB data object name
• DATA_REPL_ENUM• Replica copy number
• SIZE• Size of data in bytes
• DATA_TYP_NAME• Data type (primarily specification of the data format)
• DATA_CLASS_NAME• Logical classification of the data (description of the type).
• DATA_CLASS_TYPE• Classification type
• ACCESS_CONSTRAINT• Access restrictions on data DATA_COMMENTS
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
Replication Attributes (2)• DATA_COMMENTS_TIMESTAMP
• Time and date stamp for when comments were made on the data object
• REPL_TIMESTAMP• Time and date stamp when the owner modified the data object.
• PATH_NAME• Physical path name of the data object.
• DATA_CREATE_TIMESTAMP• Time and date stamp for when the data was created
• DATA_IS_DELETED• A flag can be turned on that indicates a data object has been deleted, while
retaining the data set on storage.
• DATA_OWNER• Data object creator name.
• DATA_OWNER_DOMAIN• Domain/ group of the data object creator.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
Quilt Extension (1) – Data Type• Original Quilt: No difference between dt1.xml and dt2.xml
<bills> <bill name="S.10"> <id type="string">21</id> … … <sponsor_id type="string">122 </sponsor_id> </bill> <bill name="S.100"> <id type="string">123</id> … … <sponsor_id type="string">203 </sponsor_id> </bill> … …</bills>
dt1.xml
<bills> <bill name="S.10"> <id type=“float">21</id> … … <sponsor_id type=“float">122 </sponsor_id> </bill> <bill name="S.100"> <id type=“float">123</id> … … <sponsor_id type=“float">203 </sponsor_id> </bill> … …</bills>
dt2.xml
• After we add data type …
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
Quilt Extension (2) – Operator Overloading
<results>
FOR $bill in document(“dt1.xml")//bill
RETURN <bill $bill/@name>
$bill//id, $bill//sponsor_id,
<sum>$bill//id/text() + $bill//sponsor_id/text()</sum>
</bill>
</results>
Query 1 : sum of id and sponsor_id ( type = string )
<results> <bill name="S.10"> <id type="string"> 21 </id> <sponsor_id type="string"> 122 </sponsor_id> <sum> 21122 </sum> </bill> <bill name="S.100"> <id type="string"> 123 </id> <sponsor_id type="string"> 203 </sponsor_id> <sum> 123203 </sum> </bill> … …
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
Quilt Extension (2) – Operator Overloading
<results>
FOR $bill in document(“dt2.xml")//bill
RETURN <bill $bill/@name>
$bill//id, $bill//sponsor_id,
<sum>$bill//id/text() + $bill//sponsor_id/text()</sum>
</bill>
</results>
Query 2 : sum of id and sponsor_id ( type = integer )
<results> <bill name="S.10"> <id type="integer"> 21 </id> <sponsor_id type="integer"> 122 </sponsor_id> <sum> 143.0 </sum> </bill> <bill name="S.100"> <id type="integer"> 123 </id> <sponsor_id type="integer"> 203 </sponsor_id> <sum> 326.0 </sum> </bill> … …
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid
Quilt Extension (3) – Array Operation
Value
ValueArray
ValueIntegerArray ValueFloatArray ValueStringArray ValueBoolArray
• Value : Interface for Kweelt base type• ValueArray : Extend Value. Implement Compare and array-specific operation
• Accessor – getter, setter• Element summation• Array summation• Subsequence• Zip, Unscroll, concatenation, etc
Demo : http://pamina2.sdsc.edu/cgi-bin/kweelt/demo.cgi