Besser--VALA 2/8/02 1 Moving from Isolated Digital Collections to Interoperable Digital Libraries...
-
Upload
virgil-hicks -
Category
Documents
-
view
223 -
download
0
Transcript of Besser--VALA 2/8/02 1 Moving from Isolated Digital Collections to Interoperable Digital Libraries...
Besser--VALA 2/8/02 1
Moving from Isolated Digital Collections to Interoperable Digital
Libraries
VALA 2002 Conference
Howard Besser
UCLA School of Education & Information
http://www.gseis.ucla.edu/~howard
Besser--VALA 2/8/02 2
Moving from Isolated Digital Collections to Interoperable Digital
Libraries-_ Digital Collections vs Digital Libraries -- What’s missing?_ Importance of Standards and New Metadata Models_ Best Practices & Standards for Managing Digital Projects_ Longevity_ Other issues remaining in order to create real Digital
Libraries
Besser--VALA 2/8/02 4
Components
_ Service to a clientele_ Stewardship over a collection_ Sustainability_ Ability to find material outside that
collection
Besser--VALA 2/8/02 5
Ethics & Traditions
_ Free speech_ Privacy_ Equal access to info_ Diversity of info_ Serving the underserved
Besser--VALA 2/8/02 6
Brief Digital LibraryFunding History
Stage Date Sponsor What
I 1994 NSF/ARPA/NASA Experiments
IIa 1998/99 NSF/ARPA/NASA Begin to considercustodialship,sustainability, usercommunities
IIb Late 1990s CLIR Further work on IIaissues
III ? ? Real digitallibraries
Besser--VALA 2/8/02 7
Moving from Digital Collections to Digital Libraries
_ What’s the difference?– not experiments– real users– service– longevity
Besser--VALA 2/8/02 8
Traditional Digital Collection Model
DL
DL
DL
DL
useruser
search & presentation
search & presentation
search & presentation
search & presentation
Besser--VALA 2/8/02 10
Developmental Stages
_ Experiment with methods_ Build real operational systems_ Build interoperable operational systems_ Make the system useful for users
– For DL Initiatives– For OPACs– For I & A Services– For Image Retrieval
Besser--VALA 2/8/02 11
To move from Collections to Libraries, we need
_ Standards & Metadata_ Sustainability_ Other issues involving components and
ethics/traditions
Besser--VALA 2/8/02 12
For Interoperability Digital Collections Need Standards
_ Descriptive Metadata for consistent description
_ Discovery Metadata for finding_ Administrative Metadata for viewing and
maintaining_ Structural Metadata for navigation_ ... Terms & Conditions Metadata for
controlling access...
Besser--VALA 2/8/02 13
Metadata is not just indexing terms
_ CBIR attributes used for retrieval on color, shape, texture, etc._ Structural attributes used for page-turning_ Administrative attributes used for managing a digital work
over time_ IPR attributes to limit unauthorized use_ Identification attributes to determine what application software
is needed to view a particular digital work
_ Can be located anywhere
Besser--VALA 2/8/02 14
Why are Standards and Metadata consensus
important? Managing digital files over time Longevity Interoperability Veracity Recording in a consistent manner Will give vendors incentive to create
applications that support this
Besser--VALA 2/8/02 15
Moving to New Metadata Models-
_ Containers & Packages_ Qualifiers_ Crosswalks
Besser--VALA 2/8/02 16
Containers and Packages of Metadata
Warwick, not MARC
_ modular_ overlapping_ extensible_ community-based_ designed for a networked world to aid
commonality btwn communities while still providing full functionality within each community
Besser--VALA 2/8/02 17
DC Qualifiers
_ allows one community to express important nuances and qualifications, while still making the basic importance available to communities with simple needs
_ our community can reflect alternate title, transliterated title, and main title, yet they will all be found under a simple Web search under “title”
Besser--VALA 2/8/02 18
Crosswalks
mapping btwn differing metadata structures eliminate the need for monolithic,
universally adopted standards focus on flexibility and interoperatiblity RDF-based metadata registries
Besser--VALA 2/8/02 19
Crosswalk ExampleCDWAObject IDCIMISchema FDAVRA CoreCategories USMARCDUBLINCOREOBJECT/WORK (core) DocumentClassification-CatalogLevel (core)DocumentClassification-Group Type
Object/Work-Type (core) Type ofObject objectNAMEDocumentClassification- DocumentType (core)Purpose-Purpose(Broad) (core)Purpose-Purpose(Narrow)
W1. WorkType 655 Genre-Form Type
Object/Work-Components quantity DocumentClassification-Extent 300a PhysicalDescription-Extent ORIENTATION/ARRANGEMENT
DescriptionTITLES ORNAMES(core)
Title objectTitlebibliographicTitleGroup/ItemIdentification-RepositoryTitleGroup/ItemIdentification-DescriptiveTitle (core)Group/ItemIdentification-InscribedTitle
W2. Title 24Xa Titleand Title-RelatedInformationTitle
Besser--VALA 2/8/02 20
Best Practices & Standards for Managing Digital Projects-
_ Who will your users be?_ Best Practices Guidelines (CDL, MOA2)_ NISO/DLF Imaging Technical Standards_ Managing Multiple Image Files
Besser--VALA 2/8/02 21
Why are you Managing this Information?
Organizational mission & type Users Uses
Besser--VALA 2/8/02 22
Scanning Best Practices
_ Think about users (and potential users), uses, and type of material/collection
_ Scan at the highest quality that does not exceed the likely potential users/uses/material
_ Do not let today’s delivery limitations influence your scanning file sizes; understand the difference between digital masters and derivative files used for delivery
_ Many documents which appear to be bitonal actually are better represented with greyscale scans
_ Include color bar and ruler in the scan
_ Use objective measurements to determine scanner settings (do NOT attempt to make the image good on your particular monitor or use image processing to color correct)
_ Don’t use lossy compression_ Store in a common (standardized)
file format_ Capture as much metadata as is
reasonably possible (including metadata about the scanning process itself)
Besser--VALA 2/8/02 25
Metadata Standards(from MOA2)
_ Administrative Metadata– for enhancing resource management
_ Structural Metadata– for reflecting internal hierarchies and
relationships btwn parts
_ Raw/Seared/Cooked
Besser--VALA 2/8/02 26
The number of variant forms of a work can be enormous
_ different views of the same object_ different scans of the same photo_ different resolutions_ different compression schemes_ different compression ratios_ different file storage formats_ different details of the same image_ ...
Besser--VALA 2/8/02 28
Identification/Provenance
how to deal with different versions (browse, hi-res, medium res) derived from the same scan or different encoding schemes (TIFF, PICT, JFIF)
Vocabulary Standards to express this– VRA Surrogate Categories– CIMI's "Image Elements”
Besser--VALA 2/8/02 29
Digital Longevity
Serious Longevity Problems
_ What we know from prior widespread digital file formats
_ Images separating from their metadata_ Inaccessibility of software needed to view
an image_ Inability to even decode the file format of
an image
Besser--VALA 2/8/02 30
Digital Longevity
The Short Life of Digital Info: Digital Longevity Problems-
_ Disappearing Information_ The Viewing Problem_ The Scrambling Problem_ The Inter-relation Problem_ The Custodial Problem_ The Translation Problem
Besser--VALA 2/8/02 31
Digital Longevity:
The Viewing Problem
Digital Info requires a whole infrastructure to view it
Each piece of that infrastructure is changing at an incredibly rapid rate
How can we ever hope to deal with all the permutations and combinations
Besser--VALA 2/8/02 32
Digital Longevity:
The Scrambling Problem
Dangers from: Compression to ease storage & delivery Container Architecture to enhance digital
commerce
Besser--VALA 2/8/02 33
Digital Longevity:
The Inter-relation Problem
-Info is increasingly inter-related to other info
-How do we make our own Info persist when it points to and integrates with Info owned by others?
-What is the boundary of a set of information (or even of a digital object)?
Besser--VALA 2/8/02 34
Digital Longevity:
The Custodial Problem
How do we decide what to save? Who should save it? How should they save it?
– -methods for later access: emulation, migration, etc.
– -issues of authenticity and evidence
Besser--VALA 2/8/02 35
Digital Longevity:
The Translation Problem
Content translated into new delivery devices changes meaning– -A photo vs. a painting– -If Info is produced originally in digital form in
one encoded format, will it be the same when translated into another format?
– Behaviors
Besser--VALA 2/8/02 36
Digital Longevity
Pieces of the Solution (1/2)
-We need to insist upon clearly readable standardized ways for digital objects to self-identify their formats
-We should discourage scrambling -We need to better understand information
inter-relates to other Info, and what constitutes “boundaries” of Info objects
Besser--VALA 2/8/02 37
Digital Longevity
Pieces of the Solution (2/2)
-People and organizations wishing to make information persist need guidelines of how to go about doing it
-We need to better understand how translating from one storage or display format to another affects the meaning of a work
-We need to save the “behaviors” of a digital object, not just it’s “contents”
Besser--VALA 2/8/02 38
Digital Longevity
Metadata can be the first line of defense
Can tell you– where the file is (if you can’t find the file)– where more info about the file is (if you have the
file but most other metadata has become separated)
– what the file format is– what the compression scheme is– what application program and version is needed
for the file
Besser--VALA 2/8/02 40
Digital Longevity
Older Longevity Projectshttp://sunsite.berkeley.edu/Longevity/
CPA Task Force Getty “Time & Bits” Conference & Follow-ups- Preservation experiments in US and Elsewhere
NEDLIB, CURL, Michigan, Pandora
Internet Archive Long Now
Besser--VALA 2/8/02 41
Digital Longevity
Preservation Repositories:Open Archival Info System Model
High-level reference model describing submission, organization and management, and continuing access
Conceptual framework for different organizations to share discussions with a common language
Producers, consumers, management, actual repository SIP, DIP, AIP AIP consists of data objects plus representation info
(Content, Preservation Description, Packaging, Descriptive)
Originally developed for Space Science community
Besser--VALA 2/8/02 42
Digital Longevity
Preservation Repositories:Projects based on OAIS Model
CEDARS NEDLIB Pandora CDL OCLC/RLG Working Group on
Preservation Metadata, Attributes of a Trusted Digital Repository, August 2001-
Besser--VALA 2/8/02 43
Digital Longevity
OCLC/RLGDigital Repository Attributes
_ Administrative responsibility_ Organizational viability_ Financial sustainability_ Technological suitability_ System security_ Procedural accountability
Besser--VALA 2/8/02 44
Digital Longevity
OCLC/RLGSelected Recommendations
_ Policies, Certification processes, Risk management, Persistent ID, Migration/Emulation experiments
_ Stakeholders meet to decide how to describe what is in a dig repository
_ Examine special properties of particular classes of digital objects
_ Technical standards for exchange and interoperability btwn repositories
_ Develop projects and case studies_ Copyright issues
Besser--VALA 2/8/02 45
Digital Longevity
Preservation Metadata
OCLC/RLG Working Group on Preservation Metadata, Preservation Metadata for Digital Objects: A Review of the State of the Art, January 31 2001
OCLC/RLG Working Group on Preservation Metadata, A Recommendation for Content Information, October 2001
Besser--VALA 2/8/02 46
Digital Longevity
Other Digital Preservation Activities-
LC Natl Dig Info Infrastructure & Preservation InterPARES Emulation Projects E-Journal Archiving ERPANET Persistent Naming
Besser--VALA 2/8/02 47
Digital Longevity
LC’s National Digital Information Infrastructure and
Preservation Program_ Authorized Dec 2000_ LC, Dept of Commerce, NARA, White House
Office of Sci & Tech Policy_ with help from CLIR, NLM, NAL, OCLC, RLG_ Ongoing collab process_ Commissioned papers on preserving: the Web,
periodicals, digital sound, E-Books, Digital TV, Digital Video
Besser--VALA 2/8/02 48
Digital Longevity
InterPARES International Research on Permanent Authentication
Records in Electronic Systems_ Ongoing international archival world project
examining how to make electronically-generated records last over time
_ Developing the theoretical and methodological knowledge needed, then will formulate model policies, strategies, and standards
_ Next year will be extended to include images and rich media
Besser--VALA 2/8/02 50
Digital Longevity
E-Journal Archiving
_ Issues– License, don’t own; may not be even able to obtain right to make archival
copy
– Increasingly no paper back-up at all
– Usually we don’t have the important redundancy factor
_ Mellon funded projects (2001)– Yale, Harvard, Penn working w/individual publishers
– Cornell, NYPL--specific disciplines
– MIT exploring characteristics that change (dynamic)\
– Stanford--archiving software tools
Besser--VALA 2/8/02 51
Digital Longevity
Electronic Resource Preservation and Access NETwork (ERPANET)
_ Best practices and skills development for digital preservation of cultural heritage and scientific objects
_ 3 year project launched Nov 2001; 1.2 million Euros
Besser--VALA 2/8/02 52
Digital Longevity
What’s special about Cult Heritage Materials?
_ Images & rich media_ Inter-relationships btwn parts_ For Contemporary Art: What is the Work?-
Besser--VALA 2/8/02 53
Digital Longevity
One Final Longevity Question:Who will collect the digital works of
today that should become the Special Collections of tomorrow?
_ web sites_ zines_ electronic journals_ listserve and email discussions_ drafts of works that later become famous
Besser--VALA 2/8/02 54
Other Standards Issues-
_ Persistent Naming_ Making your works accessible throughout
the Net_ Problems with works residing outside the
library’s jurisdiction
Besser--VALA 2/8/02 55
Persistent IDs--the Problem
_ Need to separate work ID from work location
_ URNs probably won’t be ready until 2003_ Becomes a business process issue when one
organization maintains the resource and another organization references it (ie. licensed from vendors or managed by separate administrative structures)
Besser--VALA 2/8/02 57
Making your works accessible throughout the Net
_ Open Archives & Metadata Harvesting_ An administrative and political issue as
much as a a technical one
Besser--VALA 2/8/02 58
Problems with works residing outside the library’s
jurisdiction_ Open URL_ Authentication
Besser--VALA 2/8/02 59
Digitization meansNew Audiences
_ more access for more people_ outreach to new groups_ but new groups have different usability requirements
– different user interfaces
– different vocabulary
– new methods of navigation
_ we already have enough differences btwn different institution types (& even within the same type)– MESL results
– Organization & indexing reflects the biases of the original intent when records were formed
Besser--VALA 2/8/02 60
Still Further Research
_ Development of good tools to encourage use_ Seamless integration of Remote-source
content with locally-scanned content_ Making specialized vocabulary more
accessible to general audiences_ Building Adaptive delivery systems_ Understanding what really is the work-
Besser--VALA 2/8/02 61
What Really is the Work?
_ Artifact or informational content?_ Creator’s Intent (Gary Hill)_ With artistic works, sometimes it’s very
difficult to determine what the work really is, what its boundaries are, etc. (more later if time remains)
Besser--VALA 2/8/02 69
What do we need for Real Digital Libraries?
_ Components– Service to a clientele– Stewardship over a collection– Sustainability– Ability to find material outside that collection
_ Ethics & Traditions– Free speech– Privacy– Equal access to info– Diversity of info– Serving the underserved
Besser--VALA 2/8/02 70
Moving from Isolated Digital Collections to Interoperable Digital Libraries
http://www.getty.edu/gri/standard/intrometadata/
http://www.ifla.org/II/metadata.htm
http://sunsite.Berkeley.EDU/Imaging/Databases/
http://www.ucop.edu/irc/cdl/tasw/Current/current.html
http://sunsite.Berkeley.EDU/moa2/
http://sunsite.Berkeley.EDU/Longevity/
http://www.gseis.ucla.edu/~howard/image-meta.html
http://sunsite.berkeley.edu/Metadata/sp2000.html
http://www.gseis.ucla.edu/~howard/
http://is.gseis.ucla.edu/impact/f95/special-collectns.html
http://is.gseis.ucla.edu/us-interpares/
http://www.diglib.org/preserve/ejp.htm
http://www.longnow.com/10klibrary/TimeBitsDisc/
http://www.archive.org/