Download - HPSS Best Practices - CISL Home Best Practices Erich Thanhardt Bill Anderson ... Stage from HPSS back to GLADE/GPFS …. process staged data BEST PRACTICE ...

HPSS Best PracticesErich Thanhardt

Bill AndersonMarc Genty

B

Overview● Idea is to “Look Under the Hood” of HPSS to

help you better understand Best Practices○ Expose you to concepts, architecture, and tape tech○ Cite Best Practice’s in context along the way○ Talk ends with references to further resources

● Talk is interactive, please ask questions along the way

HPSS - What is it?

● Acronym

● Stands for High Performance Storage System● “HPSS is software that manages petabytes of

data on disk and robotic tape libraries”.■ Quoted from:http://www.hpss-collaboration.org

http://www.hpss-collaboration.org

HPSS - What makes it different?

● Hardware: Use of tape technology is a distinguishing characteristic of HPSS

● Use case: HPSS is an archive and not a (parallel) file system○ system is remote, not cross mounted○ operation set is limited to metadata and file transfers

Best Practice: Be aware what makes HPSS very different than GLADE - intended use

HPSS Main Use Cases

● Archive○ Data is stored and preserved indefinitely

■ While system components come and go■ Model data and observational data collections

● Disaster Recovery○ Leverage dual sites for geographic separation○ Additional level of archival preservation

CLint Interface (CLI)

HPSS Software Architecture

HPSS

HPSS End User

Metadata

DATA

Control

HSI/HTARClient

4x Gateway Servers

Linux/UnixHost

Gateway

AUTH Authentication


● Best Practice: Reporting errors via EV ticket○ include: name, host, datetime, -d4 error tracing○ authentication problems○ those pesky parallel file transfer limits

■ your guaranteed on-ramp to the system■ “data bandwidth” allocation■ will be increasing over the next few months


● Best Practice: Validating that a file was written○ “ls -l” both locally and on HPSS○ compare pathname and size○ not sufficient to see the pathname (ls)

● Here is what can happen:○ Creating pathname in HPSS happens first○ Then data transfer between client and HPSS○ That transfer can be interrupted

HPSS

Oracle SL8500 Tape Library

NWSCCheyenne

MLCFBoulder

HPSS - One System/Two Sites

Oracle Tape Drives + Media

Disk Cache

ARCHIVE DISASTER RECOVERY

HPSS Libraries - Oracle SL8500

HPSS Tape Libraries Frontal View

ACSLS Server

MLCF

SL8500 Tape Library

HPSS Libraries Top View

Tape Library

HPSS Libraries - Photos

ORACLE DRIVE & MEDIA

Small File Problem● Cost of a random read:

○ Robot retrieval, mount, seek: 70 secs to avg file○ Transfer data rate: 240 MB/sec○ 184 MB file means 99% latency 1% transfer

● Cost of returning tape○ Double it - indirect cost to you○ 368 MB file means 99% latency 1% transfer

● Compare these with avg filesize of 166 MB

Small File Problem● Best Practice: best is to avoid small files, but

where needed - aggregate with htar

File Deletion● Deleting files

○ Deleting data on tape creates unusable spaces on tape because it’s linear and continuous

○ Mischaracterizations and system data migrations● Best Practice - delete un-needed files but also

avoid temporary files (whether rewriting or create/delete’s)

Repeated Reads and Writes● Best Practice: avoid both repeated reads from

and repeated writes to an archive file - bring the file out and park it somewhere else

File Rescue● Adopting orphaned files from others

○ user/proj combo goes invalid after period of time○ someone needs to take ownership and pay storage

costs● Best Practice - never use “cp” to copy data

internally in order to move it if you don’t have proper permissions - open ticket

Optimizing Reads● Best Practice - if you are reading back data at

large scales, contact Helpdesk at [email protected] for ways to order your requests - it can be done!

● Process is not perfect but usually has a positive effect

mailto:[email protected]

Disk

Tape

Memory

CPU

Storage Hierarchy Concept

Attributes of Storage Hierarchy● Cost & Characteristics

○ Speed & Capacity○ Persistence & Reliability

■ hardware, RAID/RAIT, dual copy○ Availability

■ online/nearline/offline○ Location

■ onsite/offsite

HPSS Storage Pyramid

Disk

Tape

DISK CACHE

TAPE LIBSROBOTICSDRIVES & MEDIA

Hierarchical Storage Manager (HSM)

DISK

TAPE

Stage Migrate

Purge

User Interaction with HPSS

DISK

TAPE

Stage Migrate

Purge

Basic Stats Jun-Aug 2014● Writes/Reads ratio ~4-5 to 1● User response times

○ ~116 sec/read vs. ~9-10 sec/write○ ratio read/write response times ~ 13 to 1

Tape Technology Upgrades

DISK

TAPE

Stage Migrate

Purge

Migrate

Data Services Pyramid - Workflow

GLADEGPFS

HPSS

PFS

Archive DR

90 GB/sec

9 GB/sec

Workflow - Optimal➔ Create data on GLADE/GPFS➔ Post process (new data plus deletes)➔ Commit data selectively to HPSS➔ Best Practice!

Workflow - Realistic➔ Create data on GLADE/GPFS➔ Commit to HPSS (back it up)➔ Post process (new data)

◆ Commit post-processed data (selectively?) to HPSS

Workflow - To Avoid➔ Create data on GLADE/GPFS➔ Commit to HPSS (back it up)➔ Delete from GLADE/GPFS➔ …. time passes➔ Stage from HPSS back to GLADE/GPFS➔ …. process staged data

Workflow - To Avoid

➔ Create data on GLADE/GPFS➔ Commit to HPSS (back it up)➔ Delete from GLADE/GPFS➔ …. time passes➔ Stage from HPSS back to GLADE/GPFS➔ …. process staged data

BEST PRACTICE - contact [email protected]


Additional Resources ● CISL Support & Allocations

○ Helpdesk & CISL Consulting■ send email to [email protected]

● HPSS Documentation○ http://www2.cisl.ucar.edu/docs/hpss

● Best Practices doc○ http://www2.cisl.ucar.edu/docs/best_practices


https://www2.cisl.ucar.edu/docs/hpss

https://www2.cisl.ucar.edu/docs/hpss

http://www2.cisl.ucar.edu/docs/best_practices



The End