Digital Immortality Dr David Holdsworth Keeping Digital Data for Ever OR.

34
Digital Immortality Digital Immortality Dr David Holdsworth <[email protected]> http://www.leeds.ac.uk/cedars/ Keeping Digital Data for Ever OR OR

Transcript of Digital Immortality Dr David Holdsworth Keeping Digital Data for Ever OR.

Page 1: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Digital Immortality

Dr David Holdsworth

<[email protected]>

http://www.leeds.ac.uk/cedars/

Keeping Digital Data for Ever

OROR

Page 2: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Obsolete(?) Data

• 1 Things that must be kept by law

• 2 Things that must be destroyed by law

• 3 Things that we choose to keep

• 4 Things that we are certain can be thrown away

Page 3: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Obsolete(?) Data

• 5 Things that we would like to keep if we have room

• 6 Things that we would like to throw away, but are not sure about

• 7 Things that we think we have kept but cannot find

• 8 Things that we have kept but now cannot decypher

• 9 Things that we have not kept but now wish that we had

Page 4: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

What to Keep

• All of 1 and 3– 1 Things that must be kept by law– 3 Things that we choose to keep

• As much of 5 and 6 as is cost-effective– 5 Things that we would like to keep if we have room

– 6 Things that we would like to throw away, but are not sure about

• Data discarded from 5 and 6 has the potential to be in 9 in the future– 9 Things that we have not kept but now wish that we had

• Minimise cost per item

Page 5: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Some Pitfalls

• Errors are usually not correctable

• Failure to index adequately puts data into category 7– 7 Things that we think we have kept but cannot find

• Failure to know the format puts data into category 8– 8 Things that we have kept but now cannot decypher

Page 6: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

• Curl Exemplars in Digital ARchiveS

• Collaborative project for libraries

• Funded by HEFCE/JISC

• Oxford, Cambridge and Leeds

CEDARS

Personal Involvement

Page 7: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

CAMiLEON

• Creative Archiving at Michigan and LeedsEmulating the Old on the New

• Collaborative project on emulation

• Funded by NSF/JISC

Personal Involvement - contd.

Page 8: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Challenges to digital preservation

• Deteriorating media– Magnetic dropout– Obsolete equipment

• Obsolete data formats– EBCDIC– UNICODE has established itself– Machine code software is an extreme

example

Page 9: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Philips LaserVision

Page 10: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Challenges to digital preservation

• Needles in haystacks– ISBN– Meta-data

• Deteriorating Institutions– Where are the digital legal deposits?– .. Or even Digital Equipment Corporation

• Proprietary systems become obsolete– leaving data inaccessible

contd

Page 11: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Compatibility - Friend or Foe

• e.g. OS/z evolves from OS/360• Windows Vista evolves from

16-bit Windows 3.1• Modern machines run old software

…… but faster• Who keeps old versions?

– Computer Museum in California– Microsoft -- ?

Page 12: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Times Change

• People don’t always want to process their old data using the tools of yesteryear

Page 13: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

THIS IS GEORGE 3 MARK 8.67 ON 31DEC9910.19.03_

TIMED OUT 10.19.33

THE SYSTEM HAS TEMPORARILY CLOSED DOWN

Page 14: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Times Change

• People don’t always want to process their old data using the tools of yesteryear

• Need to bridge the gap between data’s origins and the time of access

Page 15: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Use the Past to Illuminate the Future

• In 1987 EDCDIC was king

• In 2007 UNICODE is heir apparent

• In 2027 …….

• In 2038 UNIX time_t overflows 31 bits

• What has survived the decades?

Page 16: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Survival of the Abstract

• Character sets

• Bytes

• Unstructured Files (stream of bytes)

• Hierarchical file tree

• Associative mappings

• Programming languages

Page 17: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

All is not lost

• We can keep a byte-stream for everThe abstract data separated from the medium is technology-neutral

• i.e. files can be kept for ever

• Copies are perfect

• File formats do not last for ever

• ….. Remember WORDSTAR

Page 18: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Non-File Objects

• e.g. CDs, DVDs, magnetic tapes, web sites

• Map each digital object into a byte-stream and then preserve

• Multiple files (e.g. websites) can go in a ZIP or tar archive

Page 19: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Abstraction

• Identify significant properties of the object

• represent them in a byte stream

Page 20: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Example -- magnetic tape

• Significant properties– blocks of data– tape marks– start and end of tape

• Representation– block

-- raw bytes, preceded by 32-bit byte count– tape mark -- 4 bytes all ones– start & end -- ends of stream

Page 21: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

When to convert

• Conversion is inevitable

• a) as soon as the format becomes obsolete

• b) only when we want to read the data

• c) never - emulate the original system

Page 22: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Convert as soon as Obsolete

• Copying to new technology is no longer trivial

• Any errors are cast in stone

• Digital signatures are lost

• Only viable when the number of different formats is small

Page 23: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Convert when we want to read

• Preserve the original by simply copying onto current technology

• Record the format of each stored object

• Keep an index of all the formats held

• Maintain access to conversion software from the old to the current

• Treasure open-source conversion software

Page 24: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Format Registries

• National Archives PRONOM

• Harvard Global Digital Format Registry

• OAIS ISO14721:2003 Representation Information

Page 25: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Emulation of Yesteryear

• Today’s desktop machine far exceeds the mainframe of the 1970s or even 80s

• George3– Emulate the George3 executive

• i.e. order code + system calls + peripherals

• BBC micro– Publicly available emulation on WWW

Page 26: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Abstraction for Emulation of 1900 system

• George3 sits on 1900 instruction set plus executive calls

• Executive sits on 1900 instruction set plus Fancy I/O stuff

• George3 provides lots of embellishment of 1900 instruction set

• Emulate executive + 1900 instruction set

Page 27: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

George3 demo

Page 28: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Malawi Census Data

• Data stored on ICL magnetic tapes

• Rescued by using emulated ICL 1900

Page 29: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Standards

• Open Archival Information System– OAIS ISO14721:2003– Originated by Space Data Community

• Proprietary “standards”– Big enough to be reverse engineered

e.g. MS Word– XYZ Software Ltd

• Open standards, e.g. RFCs

Page 30: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Really Long-Term

• Look back 20 years to see how things have changed

• Today’s Vista is not the final scene

• Ensure that systems can accommodate new formats

• Even the standards are likely to change

Page 31: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Domesday 1986

• 900th anniversary of William the Conqueror’s version

• BBC collects data (inc pictures)

• Data written on 12" LaserVision discs

• Discs last 100 years, but not the drives

• Access is via BBC Master computer

• That won’t last 100 years either

• Can we preserve it until the 1000th anniversary?

Page 32: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Stewardship

• Copies of the discs are lodged with:

• BBC

• British Library

• National Archives (ex PRO)

• Abstract data held by:

• DH / Leeds University

• Longlife Data Ltd

Page 33: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Stewardship

• Current archival activity stresses retention of media

• Retention of digital media is useless

• Need digital safe deposits

Page 34: Digital Immortality Dr David Holdsworth  Keeping Digital Data for Ever OR.

Digital Immortality

Keeping Digital Data for Ever

Dr David Holdsworth

<[email protected]>

http://www.leeds.ac.uk/cedars/

Digital Immortality

OROR