Post on 18-Jul-2015
ARCHIVE
LTO FAILURE AND DATA LOSS
Who we are: WGBH MLA
Who We Are: AAPB
...and more than 120 public radio and television stations and archives nationwide
Digitization recently completed
WGBH’s 7,010 tapes that were sent for digitization
Returned on 17 LTO-6 tapes
• 5,000 hours of digitized and born digital media
• Up to 59,000 files
• Not to exceed 5.24 terabytes after transcoding has occurred
The Born Digital Deliverable
• Lack of staff resources at stations• Absence of existing metadata• Unique identifiers ≠ actual names of
files• Limitations of our metadata
management system • Bicycling hard drives• Access quality vs. preservation
quality• 5.24 terabytes became 300+
terabytes
We had some challenges
• Send multiple batches totaling 13,500 video and audio files
• Pull 300TB of files over our network and place on 76 3TB hard drives
– Stored on LTO-4 robotic machine in IT
– Checksums for most files did not exist
– Many files up to 100GB each
The Plan at WGBH
THE PROBLEM
Out of a set of 2069 files pulled for Batch 3 part 1, 1195 proved to have failed on reaching Crawford
693 failed initial analysis
394 failed QC
108 failed transcode
= 57% failure rate
The next batch had 1310 failures out of 2826 files
THE PROCESS
start with csv file containing final name of file at receiving end, full path to file on source end, ID value of offline storage tape
shell script:
- sorts files by # of storage tape
- logs into DAM using ssh
- transfers file using scp through Artesia from LTO 4 tape (stored as tarball) onto 3 TB hard drive
later versions used tar rather than scp
THE PROCESS (REVISED)post-transfer, compare the megabyte block counts of source and destination products
(no checksum – took too much time to perform on such large files while under time pressure)
failed items automatically removed from drive
transfer script re-run until all files download successfully
if files fail repeatedly, assume they have failed on LTO; backup tape called from Iron Mountain and attempted to be staged from there
THE PROGRESS
Many files that initially failed eventually transferred successfully, either from the initial tape or from a backup, after multiple attempts
Others were never successfully transferred
Out of a planned 10,648 files in the batch, 2173 were never successfully downloaded – a 20% failure rate
BREAKING DOWN THE FAILURES
ffmpeg –i ${filename}mediainfo –f ${filename}
“moov atom not found”
QC FAILURE
Playable files with evidence of corruption defined by Crawford as “issues that would make the file unusable,” for example:
a green screen with no audio
a video that plays for two seconds before the screen going black or grey
pixels shift out of place in zigzag pattern
audio is digital noise only
THE PROGNOSIS
Sample data: 5000 files with checksums generated at creation
1012 of those files could not be transferred from LTO, after multiple attempts
However, MD5s on LTO show the files are unchanged
So the files are good – but can’t be reached?
THE POSSIBILITIES
Files were bad before they went onto LTO –production environment provides little opportunity for QC
Files are good, but inaccessible on LTO because of problems with the way the data is stored on the tape or the interaction of the different technologies used to get it out c
THE PROBLEMS NOW
Administrative distance between institutional IT and archival needs makes it difficult to get clear answers about the technology we’re using
Staff turnover means information about original systems/data transfer processes are lost
Local LTO systems incompatible with older tapes, making direct testing currently impossible
NEXT STEPS
Acquire Linux machine for direct testing of LTO 4 tapes
Test different transfer protocols
More investigation into the SL8500 SAMFS/QFS
Look for patterns in inaccessible files (file size, date uploaded, system architecture on storage tape)
Rebecca Fraimow & Casey Davis@rhfraim
@CaseyEDavis1rebecca_fraimow@wgbh.org
casey_davis@wgbh.org