Digital Immortality Dr David Holdsworth Keeping Digital Data for Ever OR.
-
Upload
julien-sedlock -
Category
Documents
-
view
220 -
download
5
Transcript of Digital Immortality Dr David Holdsworth Keeping Digital Data for Ever OR.
Digital Immortality
Digital Immortality
Dr David Holdsworth
http://www.leeds.ac.uk/cedars/
Keeping Digital Data for Ever
OROR
Digital Immortality
Obsolete(?) Data
• 1 Things that must be kept by law
• 2 Things that must be destroyed by law
• 3 Things that we choose to keep
• 4 Things that we are certain can be thrown away
Digital Immortality
Obsolete(?) Data
• 5 Things that we would like to keep if we have room
• 6 Things that we would like to throw away, but are not sure about
• 7 Things that we think we have kept but cannot find
• 8 Things that we have kept but now cannot decypher
• 9 Things that we have not kept but now wish that we had
Digital Immortality
What to Keep
• All of 1 and 3– 1 Things that must be kept by law– 3 Things that we choose to keep
• As much of 5 and 6 as is cost-effective– 5 Things that we would like to keep if we have room
– 6 Things that we would like to throw away, but are not sure about
• Data discarded from 5 and 6 has the potential to be in 9 in the future– 9 Things that we have not kept but now wish that we had
• Minimise cost per item
Digital Immortality
Some Pitfalls
• Errors are usually not correctable
• Failure to index adequately puts data into category 7– 7 Things that we think we have kept but cannot find
• Failure to know the format puts data into category 8– 8 Things that we have kept but now cannot decypher
Digital Immortality
• Curl Exemplars in Digital ARchiveS
• Collaborative project for libraries
• Funded by HEFCE/JISC
• Oxford, Cambridge and Leeds
CEDARS
Personal Involvement
Digital Immortality
CAMiLEON
• Creative Archiving at Michigan and LeedsEmulating the Old on the New
• Collaborative project on emulation
• Funded by NSF/JISC
Personal Involvement - contd.
Digital Immortality
Challenges to digital preservation
• Deteriorating media– Magnetic dropout– Obsolete equipment
• Obsolete data formats– EBCDIC– UNICODE has established itself– Machine code software is an extreme
example
Digital Immortality
Philips LaserVision
Digital Immortality
Challenges to digital preservation
• Needles in haystacks– ISBN– Meta-data
• Deteriorating Institutions– Where are the digital legal deposits?– .. Or even Digital Equipment Corporation
• Proprietary systems become obsolete– leaving data inaccessible
contd
Digital Immortality
Compatibility - Friend or Foe
• e.g. OS/z evolves from OS/360• Windows Vista evolves from
16-bit Windows 3.1• Modern machines run old software
…… but faster• Who keeps old versions?
– Computer Museum in California– Microsoft -- ?
Digital Immortality
Times Change
• People don’t always want to process their old data using the tools of yesteryear
Digital Immortality
THIS IS GEORGE 3 MARK 8.67 ON 31DEC9910.19.03_
TIMED OUT 10.19.33
THE SYSTEM HAS TEMPORARILY CLOSED DOWN
Digital Immortality
Times Change
• People don’t always want to process their old data using the tools of yesteryear
• Need to bridge the gap between data’s origins and the time of access
Digital Immortality
Use the Past to Illuminate the Future
• In 1987 EDCDIC was king
• In 2007 UNICODE is heir apparent
• In 2027 …….
• In 2038 UNIX time_t overflows 31 bits
• What has survived the decades?
Digital Immortality
Survival of the Abstract
• Character sets
• Bytes
• Unstructured Files (stream of bytes)
• Hierarchical file tree
• Associative mappings
• Programming languages
Digital Immortality
All is not lost
• We can keep a byte-stream for everThe abstract data separated from the medium is technology-neutral
• i.e. files can be kept for ever
• Copies are perfect
• File formats do not last for ever
• ….. Remember WORDSTAR
Digital Immortality
Non-File Objects
• e.g. CDs, DVDs, magnetic tapes, web sites
• Map each digital object into a byte-stream and then preserve
• Multiple files (e.g. websites) can go in a ZIP or tar archive
Digital Immortality
Abstraction
• Identify significant properties of the object
• represent them in a byte stream
Digital Immortality
Example -- magnetic tape
• Significant properties– blocks of data– tape marks– start and end of tape
• Representation– block
-- raw bytes, preceded by 32-bit byte count– tape mark -- 4 bytes all ones– start & end -- ends of stream
Digital Immortality
When to convert
• Conversion is inevitable
• a) as soon as the format becomes obsolete
• b) only when we want to read the data
• c) never - emulate the original system
Digital Immortality
Convert as soon as Obsolete
• Copying to new technology is no longer trivial
• Any errors are cast in stone
• Digital signatures are lost
• Only viable when the number of different formats is small
Digital Immortality
Convert when we want to read
• Preserve the original by simply copying onto current technology
• Record the format of each stored object
• Keep an index of all the formats held
• Maintain access to conversion software from the old to the current
• Treasure open-source conversion software
Digital Immortality
Format Registries
• National Archives PRONOM
• Harvard Global Digital Format Registry
• OAIS ISO14721:2003 Representation Information
Digital Immortality
Emulation of Yesteryear
• Today’s desktop machine far exceeds the mainframe of the 1970s or even 80s
• George3– Emulate the George3 executive
• i.e. order code + system calls + peripherals
• BBC micro– Publicly available emulation on WWW
Digital Immortality
Abstraction for Emulation of 1900 system
• George3 sits on 1900 instruction set plus executive calls
• Executive sits on 1900 instruction set plus Fancy I/O stuff
• George3 provides lots of embellishment of 1900 instruction set
• Emulate executive + 1900 instruction set
George3 demo
Digital Immortality
Malawi Census Data
• Data stored on ICL magnetic tapes
• Rescued by using emulated ICL 1900
Digital Immortality
Standards
• Open Archival Information System– OAIS ISO14721:2003– Originated by Space Data Community
• Proprietary “standards”– Big enough to be reverse engineered
e.g. MS Word– XYZ Software Ltd
• Open standards, e.g. RFCs
Digital Immortality
Really Long-Term
• Look back 20 years to see how things have changed
• Today’s Vista is not the final scene
• Ensure that systems can accommodate new formats
• Even the standards are likely to change
Digital Immortality
Domesday 1986
• 900th anniversary of William the Conqueror’s version
• BBC collects data (inc pictures)
• Data written on 12" LaserVision discs
• Discs last 100 years, but not the drives
• Access is via BBC Master computer
• That won’t last 100 years either
• Can we preserve it until the 1000th anniversary?
Digital Immortality
Stewardship
• Copies of the discs are lodged with:
• BBC
• British Library
• National Archives (ex PRO)
• Abstract data held by:
• DH / Leeds University
• Longlife Data Ltd
Digital Immortality
Stewardship
• Current archival activity stresses retention of media
• Retention of digital media is useless
• Need digital safe deposits
Digital Immortality
Keeping Digital Data for Ever
Dr David Holdsworth
http://www.leeds.ac.uk/cedars/
Digital Immortality
OROR