HPSS Best PracticesErich Thanhardt
Bill AndersonMarc Genty
B
Overview● Idea is to “Look Under the Hood” of HPSS to
help you better understand Best Practices○ Expose you to concepts, architecture, and tape tech○ Cite Best Practice’s in context along the way○ Talk ends with references to further resources
● Talk is interactive, please ask questions along the way
HPSS - What is it?
● Acronym
● Stands for High Performance Storage System● “HPSS is software that manages petabytes of
data on disk and robotic tape libraries”.■ Quoted from:http://www.hpss-collaboration.org
HPSS - What makes it different?
● Hardware: Use of tape technology is a distinguishing characteristic of HPSS
● Use case: HPSS is an archive and not a (parallel) file system○ system is remote, not cross mounted○ operation set is limited to metadata and file transfers
Best Practice: Be aware what makes HPSS very different than GLADE - intended use
HPSS Main Use Cases
● Archive○ Data is stored and preserved indefinitely
■ While system components come and go■ Model data and observational data collections
● Disaster Recovery○ Leverage dual sites for geographic separation○ Additional level of archival preservation
CLint Interface (CLI)
HPSS Software Architecture
HPSS
HPSS End User
Metadata
DATA
Control
HSI/HTARClient
4x Gateway Servers
Linux/UnixHost
Gateway
AUTH Authentication
HPSS Software Architecture
● Best Practice: Reporting errors via EV ticket○ include: name, host, datetime, -d4 error tracing○ authentication problems○ those pesky parallel file transfer limits
■ your guaranteed on-ramp to the system■ “data bandwidth” allocation■ will be increasing over the next few months
HPSS Software Architecture
● Best Practice: Validating that a file was written○ “ls -l” both locally and on HPSS○ compare pathname and size○ not sufficient to see the pathname (ls)
● Here is what can happen:○ Creating pathname in HPSS happens first○ Then data transfer between client and HPSS○ That transfer can be interrupted
HPSS
Oracle SL8500 Tape Library
NWSCCheyenne
MLCFBoulder
HPSS - One System/Two Sites
Oracle Tape Drives + Media
Disk Cache
ARCHIVE DISASTER RECOVERY
HPSS Libraries - Oracle SL8500
HPSS Tape Libraries Frontal View
ACSLS Server
MLCF
SL8500 Tape Library
HPSS Libraries Top View
Tape Library
HPSS Libraries - Photos
ORACLE DRIVE & MEDIA
Small File Problem● Cost of a random read:
○ Robot retrieval, mount, seek: 70 secs to avg file○ Transfer data rate: 240 MB/sec○ 184 MB file means 99% latency 1% transfer
● Cost of returning tape○ Double it - indirect cost to you○ 368 MB file means 99% latency 1% transfer
● Compare these with avg filesize of 166 MB
Small File Problem● Best Practice: best is to avoid small files, but
where needed - aggregate with htar
File Deletion● Deleting files
○ Deleting data on tape creates unusable spaces on tape because it’s linear and continuous
○ Mischaracterizations and system data migrations● Best Practice - delete un-needed files but also
avoid temporary files (whether rewriting or create/delete’s)
Repeated Reads and Writes● Best Practice: avoid both repeated reads from
and repeated writes to an archive file - bring the file out and park it somewhere else
File Rescue● Adopting orphaned files from others
○ user/proj combo goes invalid after period of time○ someone needs to take ownership and pay storage
costs● Best Practice - never use “cp” to copy data
internally in order to move it if you don’t have proper permissions - open ticket
Optimizing Reads● Best Practice - if you are reading back data at
large scales, contact Helpdesk at [email protected] for ways to order your requests - it can be done!
● Process is not perfect but usually has a positive effect
Disk
Tape
Memory
CPU
Storage Hierarchy Concept
Attributes of Storage Hierarchy● Cost & Characteristics
○ Speed & Capacity○ Persistence & Reliability
■ hardware, RAID/RAIT, dual copy○ Availability
■ online/nearline/offline○ Location
■ onsite/offsite
HPSS Storage Pyramid
Disk
Tape
DISK CACHE
TAPE LIBSROBOTICSDRIVES & MEDIA
Hierarchical Storage Manager (HSM)
DISK
TAPE
Stage Migrate
Purge
User Interaction with HPSS
DISK
TAPE
Stage Migrate
Purge
Basic Stats Jun-Aug 2014● Writes/Reads ratio ~4-5 to 1● User response times
○ ~116 sec/read vs. ~9-10 sec/write○ ratio read/write response times ~ 13 to 1
Tape Technology Upgrades
DISK
TAPE
Stage Migrate
Purge
Migrate
Data Services Pyramid - Workflow
GLADEGPFS
HPSS
PFS
Archive DR
90 GB/sec
9 GB/sec
Workflow - Optimal➔ Create data on GLADE/GPFS➔ Post process (new data plus deletes)➔ Commit data selectively to HPSS➔ Best Practice!
Workflow - Realistic➔ Create data on GLADE/GPFS➔ Commit to HPSS (back it up)➔ Post process (new data)
◆ Commit post-processed data (selectively?) to HPSS
Workflow - To Avoid➔ Create data on GLADE/GPFS➔ Commit to HPSS (back it up)➔ Delete from GLADE/GPFS➔ …. time passes➔ Stage from HPSS back to GLADE/GPFS➔ …. process staged data
Workflow - To Avoid
➔ Create data on GLADE/GPFS➔ Commit to HPSS (back it up)➔ Delete from GLADE/GPFS➔ …. time passes➔ Stage from HPSS back to GLADE/GPFS➔ …. process staged data
BEST PRACTICE - contact [email protected]
Additional Resources ● CISL Support & Allocations
○ Helpdesk & CISL Consulting■ send email to [email protected]
● HPSS Documentation○ http://www2.cisl.ucar.edu/docs/hpss
● Best Practices doc○ http://www2.cisl.ucar.edu/docs/best_practices
The End
Top Related