GriPhyN - SDSC Research and Infrastructure

25
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid GriPhyN - SDSC Research and Infrastructure Reagan Moore San Diego Supercomputer Center

description

GriPhyN - SDSC Research and Infrastructure. Reagan Moore San Diego Supercomputer Center. Topics. Research activities Advanced query interfaces - Amarnath Gupta Knowledge bases - Bertram Ludaescher Infrastructure development SRB replication - Michael Wan - PowerPoint PPT Presentation

Transcript of GriPhyN - SDSC Research and Infrastructure

Page 1: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

GriPhyN -SDSC Research and

Infrastructure

Reagan MooreSan Diego Supercomputer Center

Page 2: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

Topics

• Research activities• Advanced query interfaces - Amarnath

Gupta• Knowledge bases - Bertram Ludaescher

• Infrastructure development• SRB replication - Michael Wan• MCAT information catalog - Arcot Rajasekar• Grid Portals - Mary Thomas• WSDL web services - Arun Jagatheesan

• Grids

Page 3: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

LIGO Support Opportunities

• Pattern recognition in template and chirp-transform data using database technology

• Derived data product optimization through optimization of input parameters - controlled parameter sweeps

• Utilization of SRB/MCAT for storage of virtual data products

Page 4: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

SDSS Support Opportunities

• Federation of sky survey services• Development of a dynamic cross-match service between

SDSS and other sky surveys• WSDL based web interface for sky survey services• UDDI based service directory

• Build topic map providing relationships between “Strasbourg sky survey” attributes• Correlate attributes through physical laws as well as

derived observations

Page 5: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

Integration of XSIL and XQuery

• An XML query language designed for heterogeneous data sources

• Authors: Don Chamberlin (IBM), Jonathan Robie (SoftwareAG), and Deniela Florescu (INRIA)

• Quilt is built on previous XML query languages :

-- XPath, XQL, XML-QL, XMAS, Lorel, YATL

• Become a standard query language for XML, called XQuery

“List the titles of all books published by Addison Wesley after 1991, in alphabetic order.” FOR $b IN document("www.bn.com/bib.xml")//book

[publisher = "Addison Wesley" AND @year > "1991"] RETURN

$b/@year, $b/title SORTBY (title)

Page 6: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

• A flexible, XML based, hierarchical, extensible, transport language for scientific data objects

Extensible Scientific Interchange  Language (XSIL)

<?xml version="1.0"?><!DOCTYPE XSIL SYSTEM "xsil.dtd"><XSIL> <XSIL> <Array Name="hello" Type="double">

<Dim>10</Dim> <Stream Encoding="Text" Type="Local" Delimiter=",">

0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 </Stream> </Array> </XSIL> <XSIL Type="Simple.Label" Name="Example"> <Param Name="Message">Hello Auntie Joan</Param> <Param Name="FontSize">96</Param> </XSIL></XSIL>

Page 7: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

Quilt Extensions

• Added the concept of data types• Float, integer, and boolean versus string

• Added operator overloading• “Sum” on type string concatenates• “Sum” on type integer adds

• Added array operations• Get, set, element summation, array

summation. Subsequence, concatenate

Page 8: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

Logicalcollection-Elements- attributes

Exportelements

& attributes

Grid Container-Logical name-Container metadata-Element attributes -(Data model)-Elements

Grid metadata catalog

Mapping of logicalcontainers tophysical files

Grid replicacatalog

Import intoexisting ornew logicalcollection

Logicalcollection

TransformsOn elements

AvailableTransforms

Derived data

productsDerived

Datametadata

Derived data

process metadata

Data Grids Linking Collections

Page 9: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

SRB Status

• SRB Features• Demonstration of the ability to coordinate

bulk metadata and bulk data loads• Aggregate files into a “container”,

simultaneously write metadata into a file for bulk load into the MCAT information repository

• Achieved file import rate of 250 files/second• Development in progress

• Improved error statement management• mySRB.html web interface for collection

support

Page 10: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

MCAT Web Interface

• Provide collection management• Create a collection• Define collection attributes• Ingest data / move / replicate• Browse• Query• Annotate• Comment

• https://srb.npaci.edu/mySRB.html

Page 11: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

Grid Portal Development

• Integrate collection management of derived data products with Grid execution portal

• Based on Grid Port and SRB

• Funded by GriPhyN, NPACI, NASA IPG

Page 12: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

GridPort + SRB Architecture

• With SRB capabilities, file access is direct, uniform• Uses same authentication as portal and other Grid services• Single SRB account access allows for more flexible data management

Page 13: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

Other Data Grids

• NSF - National Virtual Observatory• DOE - Particle Physics Data Grid -

Babar• NSF - United Kingdom data grid• NSF - Distributed Terascale Facility

Page 14: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

Compute Resources Catalogs Data Archives

InformationDiscovery

Metadatadelivery

Data Discovery

Data Delivery

Catalog Mediator Data mediator

1. Portals and Workbenches

Bulk DataAnalysis

CatalogAnalysis

MetadataView

DataView

4.GridSecurityCachingReplicationBackupScheduling

2.Knowledge & ResourceManagement

Standard Metadata format, Data model, Wire format

Catalog/Image Specific Access

Standard APIs and Protocols Concept space

3.

5.

6.

7. Derived Collections

Astronomy Sky SurveyData Grid

Page 15: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

PPDG - Babar Support

• Installed SRB at Stanford• Added Babar specific metadata

attributes to MCAT catalog• Developed ability to support “soft

links” between collections• Allows same file to appear in multiple

collections• Release in SRB version 1.1.9

• UK data grid (SRB / Condor / Globus)• Rutherford - opportunity for international

demonstration of Babar data replication

Page 16: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

TeraGrid Wide Area Network

NCSA/UIUC

ANL

UICMultiple Carrier Hubs

Starlight / NW Univ

Ill Inst of Tech

Univ of Chicago Indianapolis (Abilene NOC)

I-WIRE

StarLightInternational Optical Peering Point

(see www.startap.net)

Los Angeles

San Diego

DTF Backbone

Abilene

Chicago

IndianapolisUrbana

OC-48 (2.5 Gb/s, Abilene)

Multiple 10 GbE (Qwest)

Multiple 10 GbE (I-WIRE Dark Fiber)

• Solid lines in place and/or available by October 2001• Dashed I-WIRE lines planned for summer 2002

Page 17: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

PACI 13.6 TF Linux TeraGrid32

32

5

32

32

5

Cisco 6509 Catalyst Switch/Router

32 quad-processor McKinley Servers(128p @ 4GF, 8GB memory/server)

Fibre Channel Switch

HPSS

HPSS

ESnetHSCCMREN/AbileneStarlight

10 GbE

16 quad-processor McKinley Servers(64p @ 4GF, 8GB memory/server)

NCSA500 Nodes

8 TF, 4 TB Memory240 TB disk

SDSC256 Nodes

4.1 TF, 2 TB Memory225 TB disk

Caltech32 Nodes

0.5 TF 0.4 TB Memory

86 TB disk

Argonne64 Nodes

1 TF0.25 TB Memory

25 TB disk

IA-32 nodes

4

Juniper M160

OC-12

OC-48

OC-12

574p IA-32 Chiba City

128p Origin

HR Display & VR Facilities

= 32x 1GbE

= 64x Myrinet

= 32x FibreChannel

Myrinet Clos SpineMyrinet Clos Spine Myrinet Clos SpineMyrinet Clos Spine

Chicago & LA DTF Core Switch/RoutersCisco 65xx Catalyst Switch (256 Gb/s Crossbar)

= 8x FibreChannel

OC-12

OC-12

OC-3

vBNSAbileneMREN

Juniper M40

1176p IBM SPBlue Horizon

OC-48

NTON

32

24

8

32

24

8

4

4

Sun E10K

4

1500p Origin

UniTree

1024p IA-32 320p IA-64

2

14

8

Juniper M40vBNS

AbileneCalrenESnet

OC-12

OC-12

OC-12

OC-3

8

SunStarcat

16

GbE

= 32x Myrinet

HPSS

256p HP X-Class

128p HP V2500

92p IA-32

24Extreme

Black Diamond

32 quad-processor McKinley Servers(128p @ 4GF, 12GB memory/server)

OC-12 ATM

Calren

2 2

Page 18: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

Further Information

http://www.npaci.edu/DICE

Page 19: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

SDSC Storage Resource Broker & Meta-data Catalog

SRBArchives

HPSS, ADSM,UniTree, DMF

DatabasesDB2, Oracle,

Postgres

File SystemsUnix, NT,Mac OSX

Application

C, C++, Linux I/O

Unix Shell

Dublin Core

Resource,User

User Defined

ApplicationMeta-data

RemoteProxies

DataCutter

Third-partycopy

Java, NTBrowsers

WebPrologPredicate

MCAT

HRM

Page 20: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

Replication Attributes• DATA_NAME

• Global SRB data object name

• DATA_REPL_ENUM• Replica copy number

• SIZE• Size of data in bytes

• DATA_TYP_NAME• Data type (primarily specification of the data format)

• DATA_CLASS_NAME• Logical classification of the data (description of the type).

• DATA_CLASS_TYPE• Classification type

• ACCESS_CONSTRAINT• Access restrictions on data DATA_COMMENTS

Page 21: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

Replication Attributes (2)• DATA_COMMENTS_TIMESTAMP

• Time and date stamp for when comments were made on the data object

• REPL_TIMESTAMP• Time and date stamp when the owner modified the data object.

• PATH_NAME• Physical path name of the data object.

• DATA_CREATE_TIMESTAMP• Time and date stamp for when the data was created

• DATA_IS_DELETED• A flag can be turned on that indicates a data object has been deleted, while

retaining the data set on storage.

• DATA_OWNER• Data object creator name.

• DATA_OWNER_DOMAIN• Domain/ group of the data object creator.

Page 22: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

Quilt Extension (1) – Data Type• Original Quilt: No difference between dt1.xml and dt2.xml

<bills> <bill name="S.10"> <id type="string">21</id> … … <sponsor_id type="string">122 </sponsor_id> </bill> <bill name="S.100"> <id type="string">123</id> … … <sponsor_id type="string">203 </sponsor_id> </bill> … …</bills>

dt1.xml

<bills> <bill name="S.10"> <id type=“float">21</id> … … <sponsor_id type=“float">122 </sponsor_id> </bill> <bill name="S.100"> <id type=“float">123</id> … … <sponsor_id type=“float">203 </sponsor_id> </bill> … …</bills>

dt2.xml

• After we add data type …

Page 23: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

Quilt Extension (2) – Operator Overloading

<results>

FOR $bill in document(“dt1.xml")//bill

RETURN <bill $bill/@name>

$bill//id, $bill//sponsor_id,

<sum>$bill//id/text() + $bill//sponsor_id/text()</sum>

</bill>

</results>

Query 1 : sum of id and sponsor_id ( type = string )

<results> <bill name="S.10"> <id type="string"> 21 </id> <sponsor_id type="string"> 122 </sponsor_id> <sum> 21122 </sum> </bill> <bill name="S.100"> <id type="string"> 123 </id> <sponsor_id type="string"> 203 </sponsor_id> <sum> 123203 </sum> </bill> … …

Page 24: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

Quilt Extension (2) – Operator Overloading

<results>

FOR $bill in document(“dt2.xml")//bill

RETURN <bill $bill/@name>

$bill//id, $bill//sponsor_id,

<sum>$bill//id/text() + $bill//sponsor_id/text()</sum>

</bill>

</results>

Query 2 : sum of id and sponsor_id ( type = integer )

<results> <bill name="S.10"> <id type="integer"> 21 </id> <sponsor_id type="integer"> 122 </sponsor_id> <sum> 143.0 </sum> </bill> <bill name="S.100"> <id type="integer"> 123 </id> <sponsor_id type="integer"> 203 </sponsor_id> <sum> 326.0 </sum> </bill> … …

Page 25: GriPhyN - SDSC Research and Infrastructure

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

SAN DIEGO SUPERCOMPUTER CENTERParticle Physics Data Grid

Quilt Extension (3) – Array Operation

Value

ValueArray

ValueIntegerArray ValueFloatArray ValueStringArray ValueBoolArray

• Value : Interface for Kweelt base type• ValueArray : Extend Value. Implement Compare and array-specific operation

• Accessor – getter, setter• Element summation• Array summation• Subsequence• Zip, Unscroll, concatenation, etc

Demo : http://pamina2.sdsc.edu/cgi-bin/kweelt/demo.cgi