The Inclusive City- Delivering a More Accessible Urban Environment
NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf ·...
Transcript of NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf ·...
![Page 1: NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf · GLobally Accessible Data Environment • Unified and consistent data environment for](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed467bfaa4e9611404f1a29/html5/thumbnails/1.jpg)
17 May 2016
NCAR Globally Accessible User Environment
Spectrum Scale User Group – UK Meeting
17 May 2016 Pamela Hill, Manager, Data Analysis Services
![Page 2: NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf · GLobally Accessible Data Environment • Unified and consistent data environment for](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed467bfaa4e9611404f1a29/html5/thumbnails/2.jpg)
17 May 2016
Data Analysis Services Group NCAR / CISL / HSS / DASG
• Data Transfer and Storage Services • Pamela Hill • Joey Mendoza • Craig Ruff
• High-Performance File Systems
• Data Transfer Protocols • Science Gateway
Support • Innovative I/O Solutions
• Visualization Services • John Clyne • Scott Pearse • Alan Norton
• VAPOR development and support
• 3D visualization • Visualization User
Support
![Page 3: NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf · GLobally Accessible Data Environment • Unified and consistent data environment for](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed467bfaa4e9611404f1a29/html5/thumbnails/3.jpg)
17 May 2016
GLADE Mission GLobally Accessible Data Environment
• Unified and consistent data environment for NCAR HPC • Supercomputers, Data Analysis and Visualization Clusters • Support for project work spaces • Support for shared data transfer interfaces • Support for Science Gateways and access to RDA & ESG
data sets
• Data is available at high bandwidth to any server or supercomputer within the GLADE environment
• Resources outside the environment can manipulate data using common interfaces
• Choice of interfaces supports current projects; platform is flexible to support future projects
![Page 4: NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf · GLobally Accessible Data Environment • Unified and consistent data environment for](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed467bfaa4e9611404f1a29/html5/thumbnails/4.jpg)
17 May 2016
GLADE Environment
GLADE
HPSS
DataTransferGateways ScienceGateways
ProjectSpacesDataCollec;ons$HOME$WORK
$SCRATCH
RemoteVisualiza;on
Computa;on Analysis&Visualiza;on
RDAESGCDP
geysercalderapronghorn
yellowstonecheyenne
GlobusOnlineGlobus+DataShare
GridFTPHSI/HTAR
scp,sPp,bbcp
VirtualGL
![Page 5: NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf · GLobally Accessible Data Environment • Unified and consistent data environment for](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed467bfaa4e9611404f1a29/html5/thumbnails/5.jpg)
17 May 2016
GLADE Today
• 90 GB/s bandwidth • 16 PB useable capacity • 76 IBM DCS3700 • 6840 3TB drives
• shared data + metadata • 20 GPFS NSD servers • 6 management nodes • File Services
• FDR • 10 GbE
Expansion Fall 2016 • 200 GB/s bandwidth • ~21.5 PB useable
capacity • 4 DDN SFA14KX • 3360 8TB drives
• data only • 48 800GB SSD
• Metadata • 24 GPFS NSD servers • File Services
• EDR • 40 GbE
![Page 6: NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf · GLobally Accessible Data Environment • Unified and consistent data environment for](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed467bfaa4e9611404f1a29/html5/thumbnails/6.jpg)
17 May 2016
GLADE File System Structure
GLADE GPFS Cluster
scratch
SSD metadata
SFA14KX 15 PB
8M block, 4K inode
p
DCS3700 10 PB
4M block 512 byte inode
p2
SSD metadata
SFA14KX 5 PB
8M block, 4K inode
DCS3700 5 PB
home apps
SSD metadata
SSD applications
DDN 7700 100 TB
512K block, 4K inode
• ILM rules to migrate between pools • Data collections pinned to DCS3700
![Page 7: NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf · GLobally Accessible Data Environment • Unified and consistent data environment for](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed467bfaa4e9611404f1a29/html5/thumbnails/7.jpg)
17 May 2016
GLADE I/O Network • Network architecture providing global access to data
storage from multiple HPC resources • Flexibility provided by support of multiple connectivity
options and multiple compute network topologies • 10GbE, 40GbE, FDR, EDR • Full Fat Tree, Quasi Fat Tree, Hypercube
• Scalability allows for addition of new HPC or storage resources
• Agnostic with respect to vendor and file system • Can support multiple solutions simultaneously
![Page 8: NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf · GLobally Accessible Data Environment • Unified and consistent data environment for](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed467bfaa4e9611404f1a29/html5/thumbnails/8.jpg)
17 May 2016
HPSS Archive160PB, 12.5 GB/s
60PB holdings
HPSS
GLADE16 PB, 90 GB/s, GPFS
21.5 PB, 200GB/s, GPFS
(4) DTN
Data Services20GB/s
FRGP Network
Science Gateways
10 Gb Ethernet 40 Gb Ethernet 100 Gb Ethernet Infiniband
100 GbE
Data Networks
Infiniband FDR
Geyser, Caldera Pronghorn 46 nodes
HPC / DAV
HPSS (Boulder)Disaster Recovery
HPSS
Cheyenne 4,032 nodes
Infiniband EDR
NWSC Core
Ethernet Switch
10/40/100
10 GbE 10 GbE
40 GbE
40 GbE
Yellowstone 4,536 nodes
Infiniband FDR
![Page 9: NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf · GLobally Accessible Data Environment • Unified and consistent data environment for](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed467bfaa4e9611404f1a29/html5/thumbnails/9.jpg)
17 May 2016
GLADE I/O Network Connections • All NSD servers are connected to 40GbE network • Some NSD servers are connected to the FDR IB network
• Yellowstone is a full fat tree network • Geyser/Caldera/Pronghorn quasi fat tree, up/dwn routing • NSD servers are up/dwn routing
• Some NSD servers are connected to the EDR IB network • Cheyenne is an enhanced hypercube • NSD servers are nodes in the hypercube
• Each NSD server is connected to two points in the hypercube
• Data transfer gateways, RDA, ESG and CDP science gateways are connected to 40GbE and 10GbE networks
• NSD servers will route traffic over the 40GbE network to serve data to both IB networks
![Page 10: NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf · GLobally Accessible Data Environment • Unified and consistent data environment for](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed467bfaa4e9611404f1a29/html5/thumbnails/10.jpg)
17 May 2016
HPC Network Architecture FDR
EDR
NWSC10GbE
YS
4DataMover
MLHPSS
CH
DAV
ML10GbE
40GbE
20NSDServers
24NSDServers
6mgtServers
NWSCHPSS
NWSC-1
NWSC-2
RDA,ESG/CDP
2NSDServers
3chgpfsmgt
$HOME A B
![Page 11: NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf · GLobally Accessible Data Environment • Unified and consistent data environment for](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed467bfaa4e9611404f1a29/html5/thumbnails/11.jpg)
17 May 2016
GPFS Multi-Cluster • Clusters can serve file systems or be diskless • Each cluster is managed separately
• can have different administrative domains • Each storage cluster controls who has access to which file
systems • Any cluster mounting a file system must have TCP/IP access
from all nodes to the storage cluster • Diskless clusters only need TCP/IP access to the storage
cluster hosting the file system they wish to mount • No need for access to other diskless clusters
• Each cluster is configured with a GPFS subnet which allows NSD servers in the storage cluster to act as routers • Allows blocking of a subnet when necessary
• User names and group need to be sync’d across the multi-cluster domain
![Page 12: NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf · GLobally Accessible Data Environment • Unified and consistent data environment for](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed467bfaa4e9611404f1a29/html5/thumbnails/12.jpg)
17 May 2016
GPFS Multi-Cluster
Geyser, Caldera Pronghorn 46 nodes
Cheyenne 4,032 nodes
Yellowstone 4,536 nodes n1 n2 n4536
n1 n2 n46 nsd21
nsd22
nsd46
nsd1
nsd2
nsd20
ESG / CDP Science Gateways
Data Transfer Services
RDA Science Gateway
web n1 n8
n1 n2 n4
web n1 n8
EDR Network
FDR Network
40GbE Network
. . .
. . .
. . .
. . .
. . .
. . .
Storage Cluster YS Cluster
SAGE Cluster
RDA Cluster
DAV Cluster
CH Cluster
DTN Cluster
n1 n2 n10 CH PBS Cluster
n1 n2 n4032
Cheyenne PBS, Login Nodes
. . .
. . .. . .
GLADE
![Page 13: NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf · GLobally Accessible Data Environment • Unified and consistent data environment for](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed467bfaa4e9611404f1a29/html5/thumbnails/13.jpg)
17 May 2016
NCARP – Requirements The NCAR Common Address Redundancy Protocol
• Must provide IP routing between Infiniband-only GPFS clients and Ethernet-only GPFS servers. • Some GPFS protocol traffic is always carried over IP, even if
RDMA is used for I/O • All GPFS protocol traffic can be carried over IP if necessary
• Need high availability of the virtual router addresses (VRAs) to provide the clients uninterrupted file system access • Routine maintenance • Unattended automatic VRA failure recovery • Use gratuitous ARP to reduce fail over time • Actively detect network interface failures • Use digital signatures for valid message detection and
spoofing rejection
![Page 14: NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf · GLobally Accessible Data Environment • Unified and consistent data environment for](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed467bfaa4e9611404f1a29/html5/thumbnails/14.jpg)
17 May 2016
NCARP – Requirements The NCAR Common Address Redundancy Protocol
• Make use of existing hardware where possible • Current need is for 1 year, then all NSD servers migrate to
the EDR Hypercube network • Be scalable if client cluster requirements change
• Phase in/phase out of client clusters and cluster membership over time
• There are thousands of client nodes in multiple clusters • Each router node has limited bandwidth available
• IP over IB throughput is the limiting issue in testing • Partition nodes among the routers
• Allows a measure of load balancing • Reduce the number of static rules in the Linux kernel routing
table • Usable for both servers and clients
![Page 15: NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf · GLobally Accessible Data Environment • Unified and consistent data environment for](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed467bfaa4e9611404f1a29/html5/thumbnails/15.jpg)
17 May 2016
NCARP – Requirements The NCAR Common Address Redundancy Protocol
• Must run on subnets managed by the organization's Networking Group as well as on the cluster private I/O networks. • Must not collide with Virtual Router Redundancy Protocol
(VRRP) on the organization's subnets. • Must not appear as general purpose routers for non-GPFS
traffic. • Will not support MPI job traffic
• Use a common configuration description for all participating networks and routers.
![Page 16: NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf · GLobally Accessible Data Environment • Unified and consistent data environment for](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed467bfaa4e9611404f1a29/html5/thumbnails/16.jpg)
17 May 2016
NCARP The NCAR Common Address Redundancy Protocol
• NCAR Common Address Redundancy Protocol • Based loosely on the BSD Common Address
Redundancy Protocol (CARP) • Reduce the number of NCARP packets traversing the
networks to keep overhead lower • Each node should handle a primary Virtual Router
Address (VRA) and some number of secondary VRAs • Use the node's load input when deciding to assume
mastership of a secondary VRA.
![Page 17: NCAR Globally Accessible User Environmentfiles.gpfsug.org/.../09_PamelaHill_NCAR_17May16.pdf · GLobally Accessible Data Environment • Unified and consistent data environment for](https://reader030.fdocuments.us/reader030/viewer/2022041023/5ed467bfaa4e9611404f1a29/html5/thumbnails/17.jpg)
17 May 2016
NCARP – Deployment Strategy The NCAR Common Address Redundancy Protocol
• Deploy NCARP on existing NSD server nodes. • CPUs usually mostly idle even under file I/O load • Spare memory and PCI bandwidth available
• Deploy dedicated inexpensive router nodes if bandwidth insufficient or impacts GPFS too much.
• Use 40G (or 100G) Ethernet as the interchange network between the router nodes and the GPFS servers. • Insulates us from IB dependencies and restrictions • Have to have this network anyway for non-IB attached
GPFS clusters.