LOC 13 June 2003 1 NSSDC Role and OAIS Implementation Brief Overview Don Sawyer.

22
LOC 13 June 2003 1 NSSDC Role and OAIS Implementation Brief Overview Don Sawyer

Transcript of LOC 13 June 2003 1 NSSDC Role and OAIS Implementation Brief Overview Don Sawyer.

LOC 13 June 20031

NSSDC Role and OAIS Implementation

Brief Overview

Don Sawyer

LOC 13 June 20032

NSSDC RolesNSSDC Roles

NSSDC is the NASA Office of Space Science (OSS) permanent archive

— Astronomy, Solar & Space Plasma Physics, Planetary & Lunar data— Digital and film data spanning 1958-2002 from >1300 instruments flown

on >375 spacecraft— Distinguished from OSS Active Archives (AA)

Interacts in a timely manner with all distributed OSS active archives in space physics, solar physics, astrophysics, and planetary science disciplines to acquire the OSS data and supporting metadata needed for long term preservation and understanding; — interact directly with projects when mediated by an

active archive; — interact with PI's and related individuals when they

have data needing long-term preservation.

LOC 13 June 20033

OSS Archive RelationshipsOSS Archive Relationships

Planetary AAs Solar AAs SEC AAsAstrophysics AAs

Various OSS S/C Projects

NSSDC Permanent Archive

DLTs, Tapes, CD/DVDs,

Film, Paper

AnonymousFTP

OSS Researchers, Non-OSS ResearchersEducation Community, General Public

PDS and SEC data on media

LOC 13 June 20034

NSSDC Roles (concl’d)NSSDC Roles (concl’d)

NASA's lead for Consultative Committee for Space Data Systems (CCSDS) Archiving and Data Packaging/Registry Working Groups (on-ground data management)

— Led development of CCSDS/ISO Open Archival Information System reference model standard

Comprehensive information base about all launched spacecraft (~6000)

Host of World Data System for Satellite Information— Part of worldwide World Data Center infrastructure established

~1958

LOC 13 June 20035

NSSDC’s Permanent Archive NSSDC’s Permanent Archive Environment - Legacy ViewEnvironment - Legacy View

~20 TB in ~2,300 digital data sets on ~40,000 offline media

— Most on tape— Most newly arriving media are CD's or DVD's

"Data set" is all data from a given source (e.g., instrument on a spacecraft) at a given "processing level."

Wide range of data characteristics (e.g., documented binaries specific to now-obsolete computers)

Also, ~2,000 data sets on large number of film media of various form factors.

— Gradually being digitized into TIFF via scanning.

LOC 13 June 20036

Initial Drivers for OAIS Re-engineeringInitial Drivers for OAIS Re-engineering

Needed to solve a migration problem— Remove dependencies of VAX VMS files on the

operating system— Include record defining attributes in a standard form to

accompany the data file content— Result was package of data/metadata

Had software, based on CCSDS/ISO packaging standard, that could be augmented

OAIS reference model provided an architectural view

LOC 13 June 20037

Created Archival Information PackageCreated Archival Information Package

Single File (binary/ascii content) Uses CCSDS/ISO packaging (SFDU) to hold

multiple data objects— NSSDC defined attribute object expressed in

CCSDS/ISO Parameter Value Language (PVL)— NSSDC data file content in one of four canonical forms

• Two flavors each of binary and ascii— 20-byte SFDU ascii labels to separate data objects

LOC 13 June 20038

NSSDC Attribute ObjectNSSDC Attribute Object NSSDC Attribute Object

— Object identification and version— Archival Storage Id ( unique)— Collection Id— Checksum over rest of attribute object— Attributes for original data stream

• Date/time created, operating system, size in bytes, record format, binary/ascii flag, file name, checksum, etc.

— Attributes for canonical form of data stream• Date/time created, operating system, size in bytes, record

format, binary/ascii flag, file name, checksum, processing report, format identifier (ADID), etc.

— Order applied encodings (e.g., tar,gzip)— Start date/time of data observations

LOC 13 June 20039

NSSDC Permanent Archive - New NSSDC Permanent Archive - New DirectionDirection

Bundle data files (objects) with data_file-descriptive attribute file (object) and pointers to further documentation into OAIS "Archive Information Package (AIP)"

— Write to Digital Linear Tape (DLT)-based jukebox in unix environment— Write data files and attribute files to RAID disk for ftp-based access by

external customer

AIP Structure

Attribute Object(AO)

Label Label Label Sensor Data Object(SDO)

CCSDS/ISO Labelfor Packaging

CCSDS/ISO Label forAttribute Object

CCSDS/ISO Label forSensor Data Object

Globally UniqueRegistry Identifiers

Globally UniqueRegistry Identifier

Expressed usingCCSDS/ISO language

LOC 13 June 200310

“New Direction”

LOC 13 June 200311

Migrating Data into AIPsMigrating Data into AIPs

Have created AIPs for data previously on NSSDC's newly retired 12" WORM data dissemination jukebox

— VMS-based, so some attributes placed in attribute objects compensate for loss of VMS/Files-11 support

— Modified data files in cases of variable-length records, and introduced "CR/LF" for appropriate ASCII data

Now creating multi-data-file AIP and upgrading software to accommodate data migrating from legacy offline tapes

— Will start ingest from tape imminently

LOC 13 June 200312

Facilitating Archiving via Data Supplier Facilitating Archiving via Data Supplier SupportSupport

NSSDC has provided software to the IMAGE spacecraft project— Generates attribute objects and bundles these with data files into Archive

Information Packages (AIP— IMAGE script transmits these to NSSDC

Looking for other opportunities to support NASA spacecraft projects equivalently

— Cost-effective data ingest

Data files

Configurationinformation

NSSDCPackageGenerator

AIPs NationalSpaceScienceDataCenter

ftp

IMAGEScript

IMAGE Science Operations Centre

LOC 13 June 200313

NSSDC Architecture SummaryNSSDC Architecture Summary

For the system architecture:— compliant with the OAIS functional model

separates different functions : ingest, archival storage, data management, access

— Compliant with the OAIS information model defines an Archival Information Package (AIP) for preservation in Archival Storage

Data are being migrated into Archival Information Packages for long-term storage on DLTs

New data received arrive as AIPs (e.g., the IMAGE project) or are put into AIPs during the Ingest process

LOC 13 June 200314

Current ActivitiesCurrent Activities

Developing a better integration of our metadata databases— Many have grown up over the years— Taking advantage of Java and web capabilities

Developing an Archival Information Package type that allows multiple ‘canonical data files’ in a single package file.

— Needed for the migration of legacy data on magnetic tape

— Needed to put small files together for ease of management

Planning a better overall integration of our architecture— E.g., tighter coupling between AIPs and other

information bases

LOC 13 June 200315

BackupsBackups

LOC 13 June 200316

NSSDC AIP SchematicNSSDC AIP Schematic

LOC 13 June 200317

NSSDC Archive - Logical ArchitectureNSSDC Archive - Logical Architecture

LOC 13 June 200318

Archive ChallengesArchive Challenges

Making most cost-benefit favorable judgements on modernization of low-access-potential older data sets.

— Convert vendor-specific binaries to IEEE-binary? Via EAST? Convert to ASCII?

Implement efficient production process for migrating data from ~10,000 tapes through AIP-creation software to nearline DLT-based permanent archive

Define post-DLT permanent archive environment

Ensuring existence of all material needed to make data correctly and independently usable

— Couple such material to the data being supported

LOC 13 June 200319

NSSDC Metadata EnvironmentNSSDC Metadata Environment

Information base (JEDS) about— All launched spacecraft, — Instruments on space science spacecraft,— NSSDC-held data sets therefrom. — Underlies "NSSDC Master Catalog" interface.

Information base (DIOnAS) about data files — Written to new nearline permanent archive — Written to anonymous nssdcftp/spacecraft_data/

Attribute objects with technical information about data files

Information base (JIN) about data media

LOC 13 June 200320

NSSDC Metadata Environment (concl’d)NSSDC Metadata Environment (concl’d)

Information base (CAOIS) of CCSDS-registered data set-descriptive information (e.g., formats)

— Assigns globally-unique registry identifiers— Relevant to growing fraction of NSSDC data plus other data

Array of "data set catalogs" with detailed information on NSSDC-held legacy data sets

— Presently on CD's as TIFF and PDF images

Other special purpose information bases and metadata collections

NSSDC data set ID's are primary mechanism currently linking these "metadata modules"

LOC 13 June 200321

NSSDC’s Metadata ChallengesNSSDC’s Metadata Challenges

To ensure flow to NSSDC of material needed for the correct and independent use of data along with the flow of data to NSSDC

To optimally integrate metadata modules to support:— Users' finding, retrieval and use of data,— NSSDC staffers' archive management activities

To ensure that all relevant supporting material is visible to and readily retrievable by NSSDC's data-accessing customers.

LOC 13 June 200322

SoftwareSoftware

NSSDC has growing amount of low-processing-level (lpl) data— Started archiving such data only in past decade

NSSDC has very little data set-specific READ/PROCESS software

— This greatly limits usability of lpl data

Lpl data handled by systems/formats like SDDAS/IDFS and IMAGE_Archive/UDF

Major need for software standards/approaches to accompany lpl data into archives

— Ensure long-term usability of such data Archiving of relevant software source code a minimal

requirement