Transcript of Digital Preservation Andrea Goethals Wendy Gogel From Harvard University Library NELA 18 October...
- Slide 1
- Digital Preservation Andrea Goethals Wendy Gogel From Harvard
University Library NELA 18 October 2010
- Slide 2
- Digital Preservation 1. Why digital preservation? 2. Whats the
problem? 3. Whats being done? 4. What can you do? 5.
Questions?
- Slide 3
- 1. Why digital preservation?
- Slide 4
- Slide 5
- Everything is digital 1957 first digital image 1969 ARPAnet
1971 first email sent 1972 first video game 1998 first digital
theatrical release
- Slide 6
- Digital content may be after 12:00 noon January 20, 2001, the
National Archives and Records Administration ("NARA") shall have
sole legal custody of all ClintonGore Administration electronic
mail records that are governed by the Presidential Records Act
("PRA"), 44 U.S.C. 2201, Memorandum of Understanding between NARA
and The Executive Office of the President, dated January 11, 2001
accessed Oct. 2010 at:
http://www.archives.gov/presidential-libraries/laws/access/email-records-memo.html
historically significant.
- Slide 7
- Digital content may be or your favorite movie. your favorite
song,
- Slide 8
- Digital content may be Harvard Magazine May/June 2009 the only
version.
- Slide 9
- Digital content may be a work of art. Doug Aitken. (American,
born 1968). sleepwalkers. 2007. Six-channel video (color, sound),
seven monitors, 12:57 min. The Dunn Bequest. 2008 Doug Aitken.
Photo: Fred Charles.
- Slide 10
- Digital content may be important to scholarship.
- Slide 11
- Who cares? Cultural Resource Institutions Museums, historical
societies MOMAs Matters in Media Arts Libraries, archives, special
collections Academic institutions Governments National Library Of
New Zealands NDHA NARAs ERA The Entertainment Industry AFI Digital
Preservation Project
- Slide 12
- Who cares? You and me, personally!
- Slide 13
- 2. Whats the problem?
- Slide 14
- Digital content is Transient Fragile Hidden 2400 B.C.E. 1450
C.E.
- Slide 15
- Digital content is transient The average lifespan of a web site
is between 44 and 100 days Captured April 8, 2009Visited October
13, 2010
- Slide 16
- Digital content is fragile Digital things are amazingly easy to
destroy Bad people Software or hardware failure Human mistakes The
slip of a finger or an unnoticed consequence of change can happen
easily - and are potentially catastrophic Help! Accidental
deletion. I accidentally deleted 62 images can you please recover
them from backups?
- Slide 17
- Digital content is hidden Loss is not always apparent Are
either of these corrupt?
- Slide 18
- Digital content is hidden Loss is not always apparent Both are
corrupt! Use helps but its not enough
- Slide 19
- Even if its safe is it usable??? Its not enough to preserve the
bits if the format of the bits is obsolete! WordStar? AppleWorks?
Excel 1.0? To use digital content we are dependent on software that
can understand the format
- Slide 20
- The importance of format Understanding formats is fundamental
to preservation ffd8ffe000104a46494600010201
008300830000ffed0fb050686f74 6f73686f7020332e30003842494d
03e90a5072696e7420496e666f00 0000007800000000004800480000
000002f40240ffeeffee03060252 0347052803fc0002000000480048
0000000002d80228000100000064 000000010003030300000001270f
0001000100000000000000000000 0000600800190190000000000000
0000000000000000000000000000 0000000000000000000000003842
494d03ed0a5265736f6c7574696f 6e0000000010008313a3000200...
- Slide 21
- The importance of format Understanding formats is fundamental
to preservation ffd8ffe000104a46494600010201
008300830000ffed0fb050686f74 6f73686f7020332e30003842494d
03e90a5072696e7420496e666f00 0000007800000000004800480000
000002f40240ffeeffee03060252 0347052803fc0002000000480048
0000000002d80228000100000064 000000010003030300000001270f
0001000100000000000000000000 0000600800190190000000000000
0000000000000000000000000000 0000000000000000000000003842
494d03ed0a5265736f6c7574696f 6e0000000010008313a3000200... SOI APP0
JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 RST0
ECS1 RST1 ECS2...
- Slide 22
- The importance of format Understanding formats is fundamental
to preservation ffd8ffe000104a46494600010201
008300830000ffed0fb050686f74 6f73686f7020332e30003842494d
03e90a5072696e7420496e666f00 0000007800000000004800480000
000002f40240ffeeffee03060252 0347052803fc0002000000480048
0000000002d80228000100000064 000000010003030300000001270f
0001000100000000000000000000 0000600800190190000000000000
0000000000000000000000000000 0000000000000000000000003842
494d03ed0a5265736f6c7574696f 6e0000000010008313a3000200... SOI APP0
JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 RST0
ECS1 RST1 ECS2...
- Slide 23
- Using information content information content bits formats SW
HW HW (paper) information content HW (paper) symbols language
Analog book Unmediated use Digital book Technology-mediated
use
- Slide 24
- Formats are key to determining usability information content
bits formats SW HW supporting technologies digital content Formats
are the bridge between the content we want to preserve and
supporting technologies
- Slide 25
- Dependence on fleeting technology We are dependent on
technology to interpret digital content... Technologies must
understand the format of the content Technologies age and
disappear!
- Slide 26
- 3. Whats being done?
- Slide 27
- Primary goals of digital preservation 1. Keep the bits safe 2.
Keep the bits useful to people
- Slide 28
- 1. Keep the bits safe Infrastructure, processes, policies and
professional staff to counter risks High quality storage Redundancy
(multiple copies, multiple locations) Media refreshing (replacing)
Integrity monitoring (check for corruption) Security and access
management Content recovery
- Slide 29
- 2. Keep the bits useful Provide ways for people to find it
Provide ways to manage it Keep records of history and significant
events Know what formats you have Make sure theres technology to
support the formats! Technology watch And if theres not, force
there to be technology that supports the formats (migration,
emulation, creation of viewing software)
- Slide 30
- Degrees of preservation passive preservation aka bit-level
preservation active preservation aka full preservation aka logical
preservation better understood & less costly will not ensure
long-term usability - ensures current and near-term usability more
complex, challenging & costly requires more expertise but
better ensures very long- term usability requires passive
preservation
- Slide 31
- Degrees of preservation passive preservation aka bit-level
preservation active preservation aka full preservation aka logical
preservation Store Secure Maintain Prevent Migrate Re-engineer
software Emulate Digital archaeology Monitor Restore Add value
- Slide 32
- Strategic thinking The least expensive, and most effective
preservation measure is to think about the future when digital
content is created! The content production matters! It makes good
sense to try to influence the content creation process
- Slide 33
- Preservation lifecycle Create or acquire digital content Ingest
into a preservation repository Continuous cycle of: Monitoring
Planning Intervention Subject to collection management decisions
Transfer to next generation of the repository or to a different
repository A series of hand-offs over time
- Slide 34
- Ongoing commitment Requires continual pro-active program You
cant just stop and start Time frames are MUCH shorter than for
preservation of physical collections Requires ongoing investment in
both technology and staffing
- Slide 35
- Cant do it alone More than any other library activity,
preservation responsibility must be shared across institutions Even
collectively we do not have adequate resources or
understanding
- Slide 36
- Preservation community efforts Collaborative organizations
(NDSA, IIPC, OPF) Collaborative projects (AIHT, TIPR) Standards and
metadata Technical metadata for still images, audio, documents METS
(package for metadata and digital objects) PREMIS (preservation
metadata) Preservable formats (PDF/A) Repository certification
Infrastructure Formats registry (UDFR, Pronom) Repository software
(Fedora, DAITTSS, LOCKSS, etc.) Tools (Jhove, FITS, etc.)
- Slide 37
- 4. What can you do?
- Slide 38
- First steps Inventory your content Identify where it is all
kept web locations computer hard drive Removable media (CDs, etc.)
Select Decide what is worth keeping Given a choice keep the highest
quality version Is someone else already preserving it? Consider
deleting content that's not needed
- Slide 39
- Second steps Organize your digital content Create a logical
directory/folder structure for the content Give descriptive names
to the files If possible tag or embed with descriptions Catalog
your content Draft a summary description Keep your inventory and a
summary description of the content and how you have it organized in
a secure location
- Slide 40
- Third steps Make multiple copies of your content Use formats
that are amenable to long-term survival Use open formats when
possible Store on durable media Store in multiple locations
Preferably in different disaster zones. Use it! Periodically check
that you can access the content Migrate to new media over
time.
- Slide 41
- Fourth steps Keep informed. LC's website
http://www.digitalpreservation.gov/you/http://www.digitalpreservation.gov/you/
Research, training and outreach (DCC, DPC, JISC, IIPC, NEDCC)
http://www.nedcc.org/curriculum/lesson.introduction.ph p
http://www.nedcc.org/curriculum/lesson.introduction.ph p
Professional organizations (ALA, SAA) Conference proceedings
(iPRES, IS&T Archiving, DLF) How to preserve your own digital
materials (LC): http://www.digitalpreservation.gov/you/
http://www.digitalpreservation.gov/you/ 10 basic characteristics of
digital preservation repositories (CRL website):
http://www.crl.edu/archiving-preservation/digital-
archives/metrics-assessing-and-certifying/core-re
http://www.crl.edu/archiving-preservation/digital-
archives/metrics-assessing-and-certifying/core-re
- Slide 42
- Image Credits First digital image
http://www.worldalmanac.com/blog/2007/05/the_first_digital_image.html
Pong:
http://www.simondelliott.com/blog/2009/01/pong-is-more-than-just-a-game-its-a-way-of-life
1998: First theatrically released:
http://en.wikipedia.org/wiki/The_Last_Broadcast iPod ad:
http://www.ipodhistory.com/ipod-advertising Avatar:
http://www.imdb.com/title/tt0499549 Cuneiform 2400 BC:
http://en.wikipedia.org/wiki/Cuneiform_script 1450 Book of Hours in
French and Latin:
http://www.griffons.com/index.cfm?frm=details&piid=2811&cid=1&scid1=2&CFID=2459509&CFTOKEN=33670424
Server: http://regmedia.co.uk/2007/11/06/hp_mediasmart_server.jpg
Sleepwalkers at MOMA:
http://www.moma.org/explore/collection/conservation/media_art PRS
data sets:
http://www.prsgroup.com.ezp-prod1.hul.harvard.edu/prsgroup_shoppingcart/cdSub4.aspx
Corrupt images:
http://old.hki.uni-koeln.de/people/herrmann/forschung/heydegger_archiving2008.ppthttp://old.hki.uni-koeln.de/people/herrmann/forschung/heydegger_archiving2008.ppt
New Yorker Cover, June 8 and 15, 2009 and October 18, 2010
- Slide 43
- Slide 44
- 5.Questions?