93 (800) 850-2473 / (719) 578-9703 The comprehensive ABA ...
High Performance Storage System Harry Hulen 281-488-2473 [email protected].
-
Upload
james-shields -
Category
Documents
-
view
214 -
download
0
Transcript of High Performance Storage System Harry Hulen 281-488-2473 [email protected].
![Page 2: High Performance Storage System Harry Hulen 281-488-2473 hulen@us.ibm.com.](https://reader036.fdocuments.us/reader036/viewer/2022082711/56649f175503460f94c2dab6/html5/thumbnails/2.jpg)
HSM: Hierarchical storage management
• Purposes of HSM:– Extend disk space– Back up disk files to tape – Managed permanent archive
• User sees virtually unlimited file system– Data migrates “down” the hierarchy– Migrated files may be asychronously purged
from higher level (e.g. disk) to free up space
• Multiple classes of service in a single name space– Disk to tape– Tape only (SLAC approach)– Complex, e.g. Striped disk to mirrored tape
File System
disk
robotic tape
shelf tape
data
![Page 3: High Performance Storage System Harry Hulen 281-488-2473 hulen@us.ibm.com.](https://reader036.fdocuments.us/reader036/viewer/2022082711/56649f175503460f94c2dab6/html5/thumbnails/3.jpg)
Big storage, like big computing, is fundamentally an aggregation problem
SAN or LAN
A typical commercial SAN allocates a few high-function disk arrays among many non-shared file systems and data bases on many computers
Our large shared-data SANs must aggregate many disk arrays among a few very large file systems and data bases shared by many computers
C
Reserve
B
A
A B C
SAN
Administrator manages spare capacity SAN File System manages spare capacity
![Page 4: High Performance Storage System Harry Hulen 281-488-2473 hulen@us.ibm.com.](https://reader036.fdocuments.us/reader036/viewer/2022082711/56649f175503460f94c2dab6/html5/thumbnails/4.jpg)
HPSS architecture• Shared, secure global file system• Aggregate disks, tapes, and
bandwidth• SAN and/or LAN connected• Metadata-mediated via database
based on IBM DB2 • Highly distributed with multiple data
movers and subsystems for scalability
• API for maximum control and performance (e.g. “hints”)
• Parallel FTP (PFTP)• Multi-petabyte capability in a single
name space (e.g. SLAC, LLNL, BNL, ECMWF, DOD)
Robotic Tape Libraries
ClientComputers
LAN
SAN
Disk Arrays
BackupCore Server
MetadataDisks
Core Server
Based on HPSS 6
Tape-Disk Movers
![Page 5: High Performance Storage System Harry Hulen 281-488-2473 hulen@us.ibm.com.](https://reader036.fdocuments.us/reader036/viewer/2022082711/56649f175503460f94c2dab6/html5/thumbnails/5.jpg)
The HPSS Collaboration• U.S Department of Energy Laboratories are Co-Developers
– Lawrence Livermore National Lab. - Sandia National Laboratories
– Los Alamos National Laboratory - Oak Ridge National Laboratory
– Lawrence Berkeley National Lab.
• IBM Global Services in Houston, Texas
– Access to IBM technology (DB2, for example)
– Project management
– Quality assurance and testing (SEI CMM Level 3)
– Outreach: commercial sales and service
• Advantages of Collaborative Development
– Developers are users: focus on what is needed and what works
– Keeps focus on the high end: the largest data stores
– A limited “open source” model for collaboration members and users
• “Since 1993”
![Page 6: High Performance Storage System Harry Hulen 281-488-2473 hulen@us.ibm.com.](https://reader036.fdocuments.us/reader036/viewer/2022082711/56649f175503460f94c2dab6/html5/thumbnails/6.jpg)
HPSS performance trivia• Capacity
– Largest HPSS installation (BNL) has 2 petabytes in a single address space with no indications of an upper bound
– Calculations show ability to handle 100s of millions of files in a name space
• File Access Rate (recent data with DB2, not tuned)– 50 create-writes per second with 6-processor Power4 and AIX
(ECMWF)– 20 create-writes per sec with 4-processor XEON with Linux (Test
lab)– Hope to achieve 100 c-w/sec with optimization and newer hw
• Data Bandwidth– Data rate benchmark 1 gigabyte per sec to 16 movers with 16
disks each (4 year old data)– 2-way and 4-way striping of disk arrays and tapes for higher single-
file transfers– Concurrent transfers among many clients, disk arrays, and tape
libraries for very high aggregate transfer capability
![Page 7: High Performance Storage System Harry Hulen 281-488-2473 hulen@us.ibm.com.](https://reader036.fdocuments.us/reader036/viewer/2022082711/56649f175503460f94c2dab6/html5/thumbnails/7.jpg)
Disaster Recovery: Difficulty grows with size
• For most cluster file systems, loss of disk corrupts entire file system– Entire file system must be rebuilt or restored from backup– Disk array availability about .9998 - .9999
• HPSS keeps metadata separate from data– Metadata kept in a DB2 database– HPSS disk files and tape files use the same metadata– Loss of an entire disk array causes only loss of data not migrated to
tape (or to another disk), HPSS continues to run– Restoration of system = reloading metadata
• Recovery Performance– Capable of recovery in minutes from loss of any or all disk data (hours
to days in other large systems)– Capable of recovery in hours from loss of all metadata (hours to days in
other large systems)
![Page 8: High Performance Storage System Harry Hulen 281-488-2473 hulen@us.ibm.com.](https://reader036.fdocuments.us/reader036/viewer/2022082711/56649f175503460f94c2dab6/html5/thumbnails/8.jpg)
HPSS Plans• 2004
– New HPSS infrastructure based on DB2 and eliminating DCE (transparent to users)
– HPSS for Linux and “HPSS Light”– LAN-less data transfers (SAN capability)– Include support for HTAR and HSI utility packages– Stand-alone PFTP offering and push protocol
• 2005– ASCI Parallel Local File Movers for Lustre archive– Globus Grid Gridftp capability– True VFS interface (initially Linux)– Additional small file performance improvements– Exploit multilevel hierarchy (e.g. MAID)– Better integration with application agents (e.g. Objectivity)
• 2006– Object-based disk technology– Exploit DB2 metadata engine for content management
![Page 9: High Performance Storage System Harry Hulen 281-488-2473 hulen@us.ibm.com.](https://reader036.fdocuments.us/reader036/viewer/2022082711/56649f175503460f94c2dab6/html5/thumbnails/9.jpg)
HPSS for Linux will make HPSS more widely available
• HPSS serves 8 of the top 20 HPC sites
• HPSS for Linux will enable HPSS to extend down from XXL and XL to L later this year
• HPSS for Linux will be offered in lower-cost pre-configured packages
1000s
100s
10s
HPSS
Other HSMs
HPSS for Linux
D2D and D2D2T Backup
~1.5 PB
~.5 PB
![Page 10: High Performance Storage System Harry Hulen 281-488-2473 hulen@us.ibm.com.](https://reader036.fdocuments.us/reader036/viewer/2022082711/56649f175503460f94c2dab6/html5/thumbnails/10.jpg)
ASCI Purple Parallel Local File Movers
HPSS
Lustre Lustre DiskDisk
Client ArchiveAgent
Application
CapabilityOr
Capacity Platform
12
3
Simplicity (configuration, equipment expenditures, networking)Performance potential Minimize disk cache
Lustre is a shared global file system in development by DOE, HP, and others.
Site-provided agent controls migration based on file content and not on empirical data
HPSS Parallel Local File Movers open, read, and write Lustre files using Unix semantics
![Page 11: High Performance Storage System Harry Hulen 281-488-2473 hulen@us.ibm.com.](https://reader036.fdocuments.us/reader036/viewer/2022082711/56649f175503460f94c2dab6/html5/thumbnails/11.jpg)
Local Disks
Global Disks
10 Records, 10 Files10 Metadata Entries
30 Records, 3 Files3 Metadata Entries
Data ismirrored,format is not
A file with multiplerecords is acontainer
HTAR: Use Of Containerssaves metadata overhead
![Page 12: High Performance Storage System Harry Hulen 281-488-2473 hulen@us.ibm.com.](https://reader036.fdocuments.us/reader036/viewer/2022082711/56649f175503460f94c2dab6/html5/thumbnails/12.jpg)
A multi-level hierarchy
Robotic Tape Libraries
ClientComputers
LAN
SAN
BackupCore Server
MetadataDisks
Core Server
Based on HPSS 6
Tape-Disk Movers
• Example 3 level– Disk Arrays
– Massive Arrays of Idle Disks
– Tape Libraries
• MAID will fill the “big middle” between disk and tape
• HPSS supports multilevel hierarchies today
• Disk
ArraysMassive Array of Idle Disks(MAID)
![Page 13: High Performance Storage System Harry Hulen 281-488-2473 hulen@us.ibm.com.](https://reader036.fdocuments.us/reader036/viewer/2022082711/56649f175503460f94c2dab6/html5/thumbnails/13.jpg)
HPSS Grid support
• Short-term plans include GSI FTP– LBL/NERSC has prototyped a GSI-enabled
HPSS PFTP client and daemon– Tested by KEK lab (Japan) and U of Tokyo
• Long-term plans include HPSS-compatible GridFTP– Argonne National Lab is designing and
implementing– Fully Globus compatible– Target is later this year (2004)
![Page 14: High Performance Storage System Harry Hulen 281-488-2473 hulen@us.ibm.com.](https://reader036.fdocuments.us/reader036/viewer/2022082711/56649f175503460f94c2dab6/html5/thumbnails/14.jpg)
How to build a really large system to ingest and process data
Institutionalmetadata
Databaseengine
ProcessPrimary(e.g. GPFS) ProcessPrimary
(e.g. GPFS) ProcessPrimary(e.g. SGFS)
Writing concurrently does not interferewith processing
access to primary disk
Ingest
Secondary(e.g. HPSS)
Tertiary(e.g. HPSS)Batch into
containers
Institutional metadatacan direct Process
to Secondary disk incase of loss of Primary
Single hierarchical archive file system
Multiple primary file systems
![Page 15: High Performance Storage System Harry Hulen 281-488-2473 hulen@us.ibm.com.](https://reader036.fdocuments.us/reader036/viewer/2022082711/56649f175503460f94c2dab6/html5/thumbnails/15.jpg)