Data Archives: Migration and Maintenance Douglas J. Mink Telescope Data Center Smithsonian...

18
Data Archives: Migration and Maintenance Douglas J. Mink Telescope Data Center Smithsonian Astrophysical Observatory http://tdc-www.harvard.edu NSF 2004-03-23

Transcript of Data Archives: Migration and Maintenance Douglas J. Mink Telescope Data Center Smithsonian...

Data Archives:Migration and Maintenance

Douglas J. MinkTelescope Data Center

Smithsonian Astrophysical Observatory

http://tdc-www.harvard.eduNSF 2004-03-23

Archiving Issues

NSF 2004-03-23

What do we save?Reduced Data?

Raw Data?Calibration data?

Data Products?Publications?

How do we access the data?Google Search? (ADS)

Through discipline Portal(s)? (VO registries)Through Data Center?

Where does the data live?One or few Data Center(s)?

Many Data Centers reachable through few PortalsNumber of Centers is limited by long-term funding

Migration Issues

NSF 2004-03-23

Why do we migrate archives?Better access, Cheaper storage, more compact storage

What do we migrate from an archive?Everything?

Reduced Data only? Data Products?

What do we do with the old media?Paper and glass are more stable than digital media!Magnetic tapes may be more stable than optical disks!

Who pays?Is migration maintenance?

Is a new, more useful archive being created?

Maintenance Issues

NSF 2004-03-23

What are the costs?Space,

StaffingEquipment maintenance and repair

Backup protection?What is the safest way to back up a multi-Terabyte archive?Cloning to other sites improves access as well as providing backup

How do we maintain old media?Old media may be more stable over time than new media!Can we maintain older, less compact data?

Some US Astronomical Archives

NSF 2004-03-23

Online (All NASA Funded)Hubble Space Telescope 17.6 TerabytesTwo-Micron All-Sky Survey 5 TerabytesSloane Digital Sky Survey 1 Terabyte online (50 more offline)Palomar-QUEST 6 Terabytes (1/month since 9/2003)

Off-LineNOAO Save-the-Bits 44.4 Terabytes (7.6 Terabytes/year)HPSSP (Harvard Plate Stack Scanning Project) 200 Terabytes

FutureLSST (Large Scale Synoptic Array) 7 Terabytes per night!

Growing Astronomical Catalogs

● 1989 HST Guide Star Catalog 25,541,952 sources● 1996 USNO-A1.0 Catalog 488,006,860

sources● 1998 USNO-A2.0 Catalog 526,280,881 sources● 2001 GSC II Catalog (2.2.01) 998,402,801 sources● 2002 USNO-B1.0 Catalog 1,036,366,767 sources

NSF 2004-03-23

Virtual Observatory Portals

NSF 2004-03-23

US: ADS (links from publications) Goddard (Skyview vizualization) JHU,NCSA,Caltech (VO Registry modelling) IPAC (IRSA, etc.) SAO WCSTools (desktop catalog access)

England: The Grid

France: CDS (Aladin/Vizier/Simbad)

International Virtual Observatory Alliance

(IVOA)

NSF 2004-03-23

Registries: Searchable databases containing descriptions ofdata available in the Virtual Observatory

Data Model: Standards for data format and content

VOTable: XML transfer format for metadata

UCD: Uniform Column Descriptors(so everyone doesn't make up their own names for the same things)

Data Access Layer (DAL): User interface

Protocols: Open interfaces to large archives ease multi-level links

(NSF funds US participation in IVOA)

IVOA Registries

NSF 2004-03-23

Full SearchableRegistry

Full SearchableRegistry

Replicate

LocalPublisher

(harvestable registry) LocalPublisher

(harvestable registry)

LocalSearchable

RegistryClient

Data

Data Data

Replicate

DAL

DAL

● 500,000 glass plates covering the entire sky from 1885-1989

● Basis for fundamental discoveries in astronomy, such as using Cepheid variable stars as cosmic yardsticks

● A legacy of long-term commitment to astronomical photography and research

● Astronomy will not have an equivalent time frame from digital observations until 2080.

Migrating Harvard'sAstronomical Plate Collectionfrom Paper and Glass to Bits

CfA/PSSG, 2002-11-18

International Astronomical Union Resolution B3, 2000

Safeguarding the Information in Photographic Observations

The International Astronomical Union,

Recognising

that unless urgent action is taken, this unique historical record of astronomical phenomena will be lost to future generations of astronomers,

Recommends

the transfer of the historic observations onto modern media by digital techniques, which will provide worldwide access to the data so as to benefit astronomical research in a way that is well matched to the tools of the researcher in the future.

CfA/PSSG, 2002-11-18

Step 0: List what is in the archive(on the web)

NSF 2004-0323

Typical large glass plate

NSF 2004-03-23

First: Digitize Metadata

From hand-written cards and logbooks

NSF 2004-03-23

Digital access to plate metadata

(interactive web page)

NSF 2004-02-23

Results of metadata search

NSF 2004-02-23

Next: Digital access to image data

Move the plates out of the 20th century

NSF 2004-03-23

Proposed access to digital images

UserStack Catalog search

FITS or TiffImage Archive(100 Terabyte)

FITS Header Archive(WCS information)

FITS extractor

Object or coordinates and time

Plate names and object (x,y)

FITS imagesof plate portions

NSF 2004-03-23