Scientific Data Infrastructure in CAS
description
Transcript of Scientific Data Infrastructure in CAS
![Page 1: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/1.jpg)
Scientific Data Infrastructure in CAS
Dr. Jianhui Li([email protected])
Scientific Data Center
Computer Network Information Center
Chinese Academy of Sciences
![Page 2: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/2.jpg)
Scientific Data infrastructure
Middle ware(Scientific data grid middleware,
internet-based storage service middleware…)
Scientific databases
Massive storage systemData-intensive computing facilities
High speed network
Application enabled environments and typical applications
Software and Toolkits
(scientific data collection, curation, and publishing, data analyzing and
visualization…)
![Page 3: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/3.jpg)
DRC: Data Resource Center
• A new organization responsible for data preservation, curation and access service in CAS
Mass data backup
Data online service
Mas
s da
ta a
naly
sis
and
proc
ess
Long-term preservation of important data
Data ResourceCenter
Tech
nolo
gy se
rvic
e Netw
ork storage space
system environment
Application
service
mas
s da
ta
Managemen
t system
collaborator
staf
f
![Page 4: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/4.jpg)
Infrastructure for DRC• High Speed Network
– 2Gbps linked with CSTNET– 2 Gbps linked with CSTNET-CNGI– GLORIAD
• Data Intensive Computing facilities– ~1000 CPU Core Clusters + Scientific Computing
Grid( ~200Tflops)• Massive Storage System
– 1PB online disk + 5PB Tape– A storage network will start to build this year
• 1 center + 1 archive center + 10 storage nodes around China
• Over 20PB
![Page 5: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/5.jpg)
Scientific Databases (SDB)
• A Long-term mission started in 1986 which funded by CAS– many institutes involved– long-term, large-scale
collaboration– data from research, for research
• Collecting multi-discipline research data and promoting data sharing– More than 350 research
databases and 400 datasets by 61 institutes
– Over 60TB data available to open access and download
http://www.csdb.cn
![Page 6: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/6.jpg)
Scientific Databases (cont.) • SDB Contents
– Physics & Chemistry, Geosciences, Biosciences, Atmospheric & Ocean Science, Energy Science, Material Science, Astronomy & Space Science
GeoSci ence 43%
Chemi stry 9%Bi oSci ence 18%
I CT 6%
Space 4%
Astronomy 1%
Physi cs 6%Ocean 5%Materi al 5% Energy 3%
![Page 7: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/7.jpg)
Scientific Databases (cont.) • Database integration
– Resource database– Reference database– Application oriented database
Research databaseResearch database
Resource database
Reference database
Applicationorienteddatabase
![Page 8: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/8.jpg)
Scientific Databases (cont.)
• 8 Resource databases– Geo-Science– Biodiversity– Chemistry– Astronomy– Space Science– Micro biology and virus– Material science– Environment
• 2 Reference databases– China Species
– compound• 4 application-Oriented
databases– High Energy (ITER)– Western Environment
Research– Ecology research– Qinghai Lake Research
![Page 9: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/9.jpg)
CAS Scientific Data Grid
• Based on Scientific Data Grid Middleware (SDG)– SDG is built upon the Scientific Database, supporting to find
and access large scale, distributed and heterogeneous scientific data uniformly and conveniently in a SECURE and proper way
• Building scientific data application grid according to domain requirements– Integrate distributed data, analysis tools and storage and
computing facilities, providing a uniform data service interface
– 4 pilot grids • bioscience grid• geoscience grid• Chemistry grid• Astronomy and space science grid
![Page 10: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/10.jpg)
Function Framework of SDG• A scalable and integrated data sharing environment
– Providing services for grid users, grid managers and resource provides
– Operating by the operation center, science gateways and data nodes
最终用户
数据资源提供者
网格管理者
网格运行服务总中心 网格主节点
所享受的服务
所承担的职责
所承担的职责
数据导航数据查询和获取用户注册单点登录
学科应用入口监控和统计信息
数据查询和获取学科应用单点登录
监控和统计信息
政策标准和规范管理网格组织机构管理
数据管理存储管理服务管理用户管理运维管理
监控和统计分析网格运行服务总中心门户
学科领域标准规范管理数据管理用户管理服务管理运维管理
监控和统计分析主题库门户
数据质量保障数据服务维护
网格节点
数据查询和获取学科应用单点登录
应用咨询服务
硬件资源管理数据服务管理
数据增长和维护数据质量管理
基于数据的网格应用
User
Grid Manager
Resource Provider
Operation Center Science Gateway Data Node
![Page 11: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/11.jpg)
Access Scientific Data Grid
Software Tool
Research Database Research Database Research Database
Resource Databases
Reference Databases
Research Database
App-Oriented Databases
External Data Source
Science Gateway and access portal
Grid MiddlewareGrid Middleware
![Page 12: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/12.jpg)
VisualDB - Powered your database
• A toolkit to manage, publish and share scientific database by visual configure interface without writing codes
• A database integration access broker• A data quality assessment tool• A database access and usage statistics tool
![Page 13: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/13.jpg)
Function Framework of VisualDB
![Page 14: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/14.jpg)
Catalog Builder
![Page 15: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/15.jpg)
Security Center
![Page 16: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/16.jpg)
Data Forge
![Page 17: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/17.jpg)
vReport
![Page 18: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/18.jpg)
Application enabled environments and typical applications
• Domain specific data intensive application environment– Support one specific research area– Integrated scientific data, storage, computing analysis model
and tools– An easily and friendly interactive interface– Scalable user defined data process workflow
• Typical pilot systems– Remote sensing data on-demand accessing and processing
service environment– CFCI - China FLUX Cyber-Infrastructure– DarwinTree——Molecular data analysis and application
environment– Atmospheric science data integration analysis platform
![Page 19: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/19.jpg)
Atmospheric science data integration analysis platform • Status quo
Atmospheric Scientists and Researchers
Iteration
Data Preprocessing
NCL、Matlab、CDO
Scientific Data Storage
Web Service、SRB、FTP、HTTP
Data accessing
NCL、Matlab、CDO
Data Computing
NCL、Matlab、CDO
Data Analysis
NCL、Matlab、CDO
Result Output
Data VisualizingResult Data
![Page 20: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/20.jpg)
Atmospheric science data integration analysis platform
• Problems– The size of Atmospheric data has reached
TB level and they are distributed.– The personal computer hard disk, memory
limit of the research work– Many algorithm finished by scientific
researcher can’t be shared easily.
![Page 21: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/21.jpg)
Scientific Data Analysis Online Platform
DistributedDistributed data
Algorithm Model
Web browser 1)custom2)visualize
Algorithm Chosen Data FindingComputing for
Workflow
Combined with data and model
Define workflow
IterativeResercher
Result
Result
Using
Architecture
![Page 22: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/22.jpg)
work flow
Select Data
Choose algorithmConfig param
plot
Analyse result
Iterative
Five step
![Page 23: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/23.jpg)
Select data
![Page 24: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/24.jpg)
Choose algorithm
![Page 25: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/25.jpg)
Config param
![Page 26: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/26.jpg)
plot and result
![Page 27: Scientific Data Infrastructure in CAS](https://reader034.fdocuments.us/reader034/viewer/2022050909/56814ee4550346895dbc7672/html5/thumbnails/27.jpg)
Thank you!