Introduction to Scientific Data Grid Kai Nan Computer Network Information Center, CAS...

19
Introduction to Scientific Data Grid Kai Nan Computer Network Information Center, CAS [email protected]
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Introduction to Scientific Data Grid Kai Nan Computer Network Information Center, CAS...

Introduction to Scientific Data Grid

Kai Nan

Computer Network Information Center,

CAS

[email protected]

23/4/19 CANS '2002 Shanghai 2

What is Scientific Data Grid

one-sentence statementa grid which focuses on sharing multi-discipline scientific data and advancing cooperative research based on the utilization of scientific data

more wordsbuilt upon the Scientific Databases of CASstarted in 2001plan to provide service by 2004-2005for academic and researchbuilt by CAS, open to the world

23/4/19 CANS '2002 Shanghai 3

Scientific Databases (SDB)SDB is a project funded by CAS since 1986SDB is a collection of scientific databases, which cover multiple disciplines including chemistry, biology, geography, astronomy, ecology, …By 2005, SDB will be

40+ member institutions across China300+ databasesdata volume 10TB+

Distributed & Heterogeneous

23/4/19 CANS '2002 Shanghai 4

Grid Computing

Information Power Grid

23/4/19 CANS '2002 Shanghai 5

why SDG – motivationresource level – sharing and development

make the scientific data more accessibledata integrationdata – information – knowledge

app level – emerging scientific applicationsdo what we can’t do beforerely on datacross multiple databases / cross-disciplinarydemand more resources (cycle, storage, bandwidth, instrument, sensor, …)

23/4/19 CANS '2002 Shanghai 6

RequirementsIdentificationProvenanceMetadata

technical/context/content/management

Access ControlUniversal Access InterfacePublishing/Discovery/RetrievalData Lifecycle…

23/4/19 CANS '2002 Shanghai 7

Simplified 3 stepsfind the data

and get related info. (metadata)

obtain proper rights towards the dataaccess the data

maybe multiple distributed and heterogeneous databases involved within one requestmaybe not just data, but processing and/or analysis

these steps seem to be easy, but …

23/4/19 CANS '2002 Shanghai 8

TasksTestbed

One data centerThree subject centers

MiddlewareInformation ServiceSecurity SystemData Access Interface

Applicationchemistry/biology/astronomy/geoscience/…

23/4/19 CANS '2002 Shanghai 9

Data Center (CNIC)Cluster 16 nodes15TB

Bio CenterCluster 8 nodes1-2TBBeijing

Chemistry CenterCluster 8 nodes1-2TBShanghai

SDG Resources: 20 TB 4 PC Clusters CSTNET

Geo CenterCluster 8 nodes1-2TBBeijing

1000M

1000M

155M

23/4/19 CANS '2002 Shanghai 10

Mass Storage Database Application Server

MDS Server

CA Server

PortalServer

SDG Data Center

Supercomputers at CNIC ~2 TFLOPS

23/4/19 CANS '2002 Shanghai 11

Grid MiddlewareGlobus

Resource Management (GRAM)Information Service (MDS)Data Management (GridFTP)Security (GSI)

Storage Request Broker (SRB)SDSC’s solution for data grid

23/4/19 CANS '2002 Shanghai 12

SDG MiddlewareApplication

GAPI

DRB

UAI

Local DBMS

xMDS

GSI

coordinated access to multiple data resources

universal access interface to single data resource

local data management system, could be DBMS or file system

app-oriented, unified program interface

applications

databases

23/4/19 CANS '2002 Shanghai 13

Use case (1)

GAPI

DRB

UAI

DBMS

Node A

App

GAPI

DRB

UAI

DBMS

Node H

App X MDS

23/4/19 CANS '2002 Shanghai 14

Use case (2)

GAPI

DRB

UAI

DBMS

Node A

App

GAPI

DRB

UAI

DBMS

Node H

App X MDS

GAPI

DRB

UAI

DBMS

Node B

App

GAPI

DRB

UAI

DBMS

Node C

App

1. Single sign-on

2. Query MDS

3. AppGAPIDRB

4. DRB(H)UAI(A, B, C)

5. UAIlocal DBMSDB

23/4/19 CANS '2002 Shanghai 15

Use case (3)

GAPI

DRB

UAI

DBMS

Node A

App

GAPI

DRB

UAI

DBMS

Node H

App XMDS

GAPI

DRB

UAI

DBMS

Node B

App

GAPI

DRB

UAI

DBMS

Node C

App

GAPI

DRB

UAI

DBMS

Node Z

App X

23/4/19 CANS '2002 Shanghai 16

ProjectsCAS

the Tenth Five-year Program (2001-2005)– funded (37M RMB)

863 Program (by MOST)a special program for grid – proposed

23/4/19 CANS '2002 Shanghai 17

Milestonesmid 2003

testbed built

end 2003middleware developed

2004deployment and test run

2005applications developed and production run

23/4/19 CANS '2002 Shanghai 18

CollaborationPRAGMAAPGridSDSCKISTIASCCTexas A&M Univ.…

23/4/19 CANS '2002 Shanghai 19

Thank you !