Introduction to Scientific Data Grid Kai Nan Computer Network Information Center, CAS...
-
date post
22-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Introduction to Scientific Data Grid Kai Nan Computer Network Information Center, CAS...
Introduction to Scientific Data Grid
Kai Nan
Computer Network Information Center,
CAS
23/4/19 CANS '2002 Shanghai 2
What is Scientific Data Grid
one-sentence statementa grid which focuses on sharing multi-discipline scientific data and advancing cooperative research based on the utilization of scientific data
more wordsbuilt upon the Scientific Databases of CASstarted in 2001plan to provide service by 2004-2005for academic and researchbuilt by CAS, open to the world
23/4/19 CANS '2002 Shanghai 3
Scientific Databases (SDB)SDB is a project funded by CAS since 1986SDB is a collection of scientific databases, which cover multiple disciplines including chemistry, biology, geography, astronomy, ecology, …By 2005, SDB will be
40+ member institutions across China300+ databasesdata volume 10TB+
Distributed & Heterogeneous
23/4/19 CANS '2002 Shanghai 5
why SDG – motivationresource level – sharing and development
make the scientific data more accessibledata integrationdata – information – knowledge
app level – emerging scientific applicationsdo what we can’t do beforerely on datacross multiple databases / cross-disciplinarydemand more resources (cycle, storage, bandwidth, instrument, sensor, …)
23/4/19 CANS '2002 Shanghai 6
RequirementsIdentificationProvenanceMetadata
technical/context/content/management
Access ControlUniversal Access InterfacePublishing/Discovery/RetrievalData Lifecycle…
23/4/19 CANS '2002 Shanghai 7
Simplified 3 stepsfind the data
and get related info. (metadata)
obtain proper rights towards the dataaccess the data
maybe multiple distributed and heterogeneous databases involved within one requestmaybe not just data, but processing and/or analysis
these steps seem to be easy, but …
23/4/19 CANS '2002 Shanghai 8
TasksTestbed
One data centerThree subject centers
MiddlewareInformation ServiceSecurity SystemData Access Interface
Applicationchemistry/biology/astronomy/geoscience/…
23/4/19 CANS '2002 Shanghai 9
Data Center (CNIC)Cluster 16 nodes15TB
Bio CenterCluster 8 nodes1-2TBBeijing
Chemistry CenterCluster 8 nodes1-2TBShanghai
SDG Resources: 20 TB 4 PC Clusters CSTNET
Geo CenterCluster 8 nodes1-2TBBeijing
1000M
1000M
155M
23/4/19 CANS '2002 Shanghai 10
Mass Storage Database Application Server
MDS Server
CA Server
PortalServer
SDG Data Center
Supercomputers at CNIC ~2 TFLOPS
23/4/19 CANS '2002 Shanghai 11
Grid MiddlewareGlobus
Resource Management (GRAM)Information Service (MDS)Data Management (GridFTP)Security (GSI)
Storage Request Broker (SRB)SDSC’s solution for data grid
23/4/19 CANS '2002 Shanghai 12
SDG MiddlewareApplication
GAPI
DRB
UAI
Local DBMS
xMDS
GSI
coordinated access to multiple data resources
universal access interface to single data resource
local data management system, could be DBMS or file system
app-oriented, unified program interface
applications
databases
23/4/19 CANS '2002 Shanghai 13
Use case (1)
GAPI
DRB
UAI
DBMS
Node A
App
GAPI
DRB
UAI
DBMS
Node H
App X MDS
23/4/19 CANS '2002 Shanghai 14
Use case (2)
GAPI
DRB
UAI
DBMS
Node A
App
GAPI
DRB
UAI
DBMS
Node H
App X MDS
GAPI
DRB
UAI
DBMS
Node B
App
GAPI
DRB
UAI
DBMS
Node C
App
1. Single sign-on
2. Query MDS
3. AppGAPIDRB
4. DRB(H)UAI(A, B, C)
5. UAIlocal DBMSDB
23/4/19 CANS '2002 Shanghai 15
Use case (3)
GAPI
DRB
UAI
DBMS
Node A
App
GAPI
DRB
UAI
DBMS
Node H
App XMDS
GAPI
DRB
UAI
DBMS
Node B
App
GAPI
DRB
UAI
DBMS
Node C
App
GAPI
DRB
UAI
DBMS
Node Z
App X
23/4/19 CANS '2002 Shanghai 16
ProjectsCAS
the Tenth Five-year Program (2001-2005)– funded (37M RMB)
863 Program (by MOST)a special program for grid – proposed
23/4/19 CANS '2002 Shanghai 17
Milestonesmid 2003
testbed built
end 2003middleware developed
2004deployment and test run
2005applications developed and production run