1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
-
date post
22-Dec-2015 -
Category
Documents
-
view
217 -
download
1
Transcript of 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
1
CS 502: Computing Methods for Digital Libraries
Lecture 4
Identifiers and Reference Links
2
Desirable Properties of Identifiers
• Location independent name
• Globally unique
• Persistent across time
• Choice of human generated or automatic generation
• Fast resolution
• Decentralized administration
• Supported from standard user interfaces
3
Syntax<naming_authority>/<locally_unique_string>
orhdl:<naming_authority>/<locally_unique_string>
Examples10.1234/1995.02.12.16.42.21;9 (date-time stamp)cornell.cs/cstr-94.45 (mnemonic name)loc/a43v-8940cgr (random string)
Syntax of Handles
4
Examples of DOIs
10.156 / catalog-96
Publisher ID
assigned byDOI Agency
Item IDassigned byPublisher
10.1048 / 872
10.1532 / PII
10.18698 / SICI
5
Elements of the Handle System
• Handle services:
global handle service
local handle services
caching services
• Clients:
client libraries
browser extension
WWW proxy servers
• Handle administration
• System utilities
6
Hierarchy of Naming Authorities
loc 10 cornell
loc.cords
10.1234
cornell.temp
cornell.cs
cornell.cs.d
7
Handle Servers and Handle Service
• The Global Handle Service provides central coordination for all handle services.
• Each naming authority has a home handle service (which may be Global) where its handles are maintained.
• Each handle service may be implemented as several handle servers.• A hashing algorithm determines the server used to store a given handle.
8
Handle Record for a Digital Object
Adm Admin Data
URL
cnri.dlib/arms-09 Adm Admin Data
http://www.cnri/xyz
RAP merlin.dlib.orgNEWorb:#cornell[]norb
9
Address Rules
The Global Handle Service stores:
a record for each naming authority
a record for each local handle service
The record for each naming authority includes:
the home handle service for that naming authority
For each handle, the home handle service stores:
the handle record
10
Resolving a Handle Without CachesHandle cnri.dlib/wya in Global G
GlobalClient
? cnri.dlib/wya ?
handle data
G
cnri.dlib/wya
11
Resolving a Handle Without CachesHandle cnri.dlib/wya in Home Service abc
Global
Home HS for cnri.dlib
Client
? cnri.dlib/wya ? G
abc
pointer to abc
? cnri.dlib/wya ?
handle datacnri.dlib/wya
12
Caching Handle Service
Client Caching Server Handle Servers
Hash
Cache
Hash table
13
Replication
All data is replicated at several sitesfor performance and reliability
Washington, DCLos Angeles, CA
14
Applications of Identifiers
The challenges:Persistent, unique identifiersEliminate broken linksControl duplicates
Applications:On-line publicationRegistrationCitation (reference links)Collection managementArchives
15
User
HandleSystem
DOIs and URNs in Action
Publisher
DOI
16
Flexibility for Publisher
Warehouse
Database
Repository
Every publisher can have adifferent system.
DOIDOI
DOI
17
Reorganization by Publisher
Database
Repositories
The publisher can create a newsystem.
DOI
DOI
DOI
18
Change of Publisher
Halfmoon
Millenium
User
DOI
HandleSystem
19
HandleSystem
Citation
Publisher
User 1
DOI
User 2
DOI
20
User
HandleSystem
Publisher
Search System
DOI
Catalogs and Indexes
21
Copyright Registration
Copyright Registry
User
Handle System
Halfmoon
DOI
22
Multiple Copies
Halfmoon Europe
User
DOI
HandleSystem
Halfmoon USA
23
User
HandleSystem
Archives
Archive
DOI
24
Reference Linking: The Problem
Generic
Given the information in a standard citation, how does one get to the thing to which the citation refers?
Specific
Given the information in a citation to a journal article, howdoes a user get from the citation to an appropriate copy of the article?
25
The General Model
Referencedatabase
Locationdatabase
Content
Publisher
Client
Publisher places information in databases
26
The General Model
Referencedatabase
Locationdatabase
Content
Publisher
ClientCitation
Identifiers
27
The General Model
Referencedatabase
Locationdatabase
Content
Publisher
Client
URLsIdentifier
28
The General Model
Referencedatabase
Locationdatabase
Content
Publisher
Client
URLContent
29
The General Model
Referencedatabase
Locationdatabase
Content
Publisher
Client
CitationIdentifiers
URLsIdentifierURL
Content
30
Target of Citations
• Work
• Expression
• Manifestation
• Item
IFLA model
Citations can refer to any specific creation but for journals usually refer to the work.
31
Identifiers
• Are identifiers necessary?
– Persistence
– Flexible targets
• Examples:
– PubMed ID, BibCode, DOI, etc.
32
How are Identifiers Obtained?
Often the client knows the citation, but not the identifier.
• In the general model identifiers are obtained by searching the reference database.
• In limited domains, identifiers can be calculated from metadata.
• The identifier may be embedded in the citation.
33
Reference Database Lookup
• Static: Reference links are established once for all time.
– Current model in journal publishing
– Not suitable for general user queries
• Dynamic: Reference links are established on demand.
– Provides link based on most recent information
– Success can not be guaranteed
Quality of metadata in reference database(s) is crucial.
34
Metadata in Reference Database
• Existing schemes
– Considerable agreement on minimal elements
– Considerable differences in details and syntax
35
Minimal Metadata Elements for Journal Article
• Title of journal article
• Creator(s)
• Journal title
• Date of publication
• Enumeration (e.g., volume and issue)
• Location (e.g., page or article number)
• Type (e.g., "journal article")
36
Resolution of Identifier
• Choice of resolver (distributed resolution)
– Simple model: identifier determines resolver
• Selection from multiple copies (selective resolution)
– Performance criteria
– Economic and related criteria
– User requirements
37
Interoperability
Several reference linking services under development:
PubMed
Astrophysics Data Center
DOI reference service
Los Alamos National Laboratory internal reference service
What levels of agreement and tools are needed for cross-linking?