Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science...
-
Upload
amaya-marshman -
Category
Documents
-
view
215 -
download
1
Transcript of Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science...
![Page 1: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/1.jpg)
Digital Library Architecture:A Service-Based Approach
Sandra PayetteDepartment of Computer Science
Cornell University
Mo i Rana, Norway
November 10, 1998
http://www2.cs.cornell.edu/payette/presentations/DL-architecture.ppt
![Page 2: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/2.jpg)
Overview
• Why talk about DL architecture?
• Digital Libraries - the architectural perspective
• Review of service-based architecture
• NCSTRL - a working example
• Dienst - existing service-oriented architecture
• Cornell next generation (component-oriented)
• Conclusion
![Page 3: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/3.jpg)
Why Talk about Digital Library Architecture?
• Web alone is not a digital library
• Commercial packages limited– limited flexibility– standards issues– network-enabled applications not DL architecture
• Must position for broader DL opportunities
![Page 4: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/4.jpg)
Web by itself not a DL Architecture
• Documents - Files, CGI, MIME-Types
• Naming - URLs
• Document Servers - HTTP servers
• Resource Discovery - web crawlers
• Collections - web pages, ad-hoc
• IP - Access Control List, passwords, ad-hoc
![Page 5: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/5.jpg)
WWW Infrastructure Evolving
• Resource Description Framework (RDF)– will allow rich metadata semantics for documents– http://www.w3.org/RDF/
• Extensible Markup Language (XML)– will allow highly structured documents and rich
linking (relationship) capabilities– http://www.w3.org/XML/
• Uniform Resource Names (URNs)– will allow for persistent, globally unique identifiers
![Page 6: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/6.jpg)
But still need Digital Library Architecture
• Richer document model - digital objects
• Persistent, unique naming - URNs
• Well-defined digital library services
• Better facilities for resource discovery
• Flexible definition of collections
• Management of distributed content & services
• Rights management for intellectual property
![Page 7: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/7.jpg)
NordicDigital Library
Cornell Digital Library
Digital Library Interoperability
![Page 8: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/8.jpg)
Digital Library Architecture:Key Principles
• Open Architecture– functionality partitioned into set of well-defined services
– services accessible via well-defined protocol
• Modularization– promotes interoperability
– scalable to different clientele (research library, informal web)
• Federation – enable aggregations into logical collections
• Distribution– of content (collections) and services
– of administration and management of DL
![Page 9: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/9.jpg)
Repository Services
Component-Ware Digital LibrariesCollection Services
Index Services
PersistentNAMES
NameService
UserInterfaceGateway
DigitalObjects
![Page 10: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/10.jpg)
NCSTRL A Working Example
120+ Institutions in US, Europe, and Asia
A Globally Distributed Digital Library
![Page 11: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/11.jpg)
NCSTRL Participants: collections federated
• 120+ institutions– Universities/labs - research reports– European Research Consortium for Informatics
and Mathematics (ERCIM)– Los Alamos (Physics pre-prints, ACM )– D-Lib Magazine
• 40+ independent servers
![Page 12: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/12.jpg)
Federation of
Collections
![Page 13: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/13.jpg)
Documents inDistributedRepositories
![Page 14: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/14.jpg)
Multi-FormatDocument
Model
![Page 15: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/15.jpg)
• modular system based on a standard open architecture
• study of hard, real-world problems: policy issues, quality of service, federation of publishers
• creation of a self-sustaining international federated digital collection
NCSTRLReal-world testbed for ...
![Page 16: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/16.jpg)
Dienst NCSTRL technical base
• Implements a service-based architecture for distributed digital libraries
• Protocol and reference implementation
• Network of services
• WWW browser access
• Uniform search over distributed indexes
• Access to documents in distributed repositories
• Access to multi-formatted documents
![Page 17: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/17.jpg)
Dienst:Service-Based Architecture
• Document model
• Naming service (CNRI’s Handle System)
• Repository service
• Indexer service
• Collection service
• User Interface service
![Page 18: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/18.jpg)
Dienst Document Model
decompositionsrepresentations
Handle (URN)
physical logical
AS
CII
TIF
F
Pos
tScr
ipt
met
adat
a
underlying formats
![Page 19: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/19.jpg)
![Page 20: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/20.jpg)
Dienst: Document Protocol
• Documents addressable through their URNs
• Document service requests– get document metadata– get document formats– get document in format– get document partition (page) in format
![Page 21: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/21.jpg)
Dienst 5.0 : Document Protocol
• More complex document model:– versions– hierarchical part specification– binders (multi-part documents)
• “Structure” service request– Reveal, in XML, full or collapsed structure of a
document• e.g., chapters, sections, figures, etc.
– Describe multiple views of a document• e.g., bibliography, content, thumbnails
![Page 22: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/22.jpg)
Dienst: Core Services
WWWbrowser
Dienst UserInterface
Repository
IndexIndex Index
Repository Repository
receive unified hit list
send search request
send site specific search requestreceive hit list
send document requestreceive MIME-typed document
send document requestreceive MIME-typed document
![Page 23: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/23.jpg)
Dienst ProtocolBuilding Gateways to non-Conforming Sites
FTP/HTTP “Repositories”
Standard Servers
User Interface Gateway Server
![Page 24: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/24.jpg)
Dienst: Collection Service
![Page 25: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/25.jpg)
Naming Service
• Documents identified by globally unique names
• Names are persistent, permanent
• Registered names resolve to specific location (URL)
cnri.dlib/april97-payette
http://www.somewebserver.org/somedirectory/somefile
NamingAuthority
ItemName
PersistentIdentifier
(e.g., URN)
Location(URL)
![Page 26: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/26.jpg)
Identifiers: Current Initiatives
• IETF Uniform Resource Names (URN) – specification of URN framework– requirements for resolution systems– syntax definition
• Existing Systems– CNRI’s Handle System (**NCSTRL uses)– OCLC PURLs– DOI Initiative
![Page 27: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/27.jpg)
Looking Ahead: Current Research at Cornell
• Digital Objects and Repository– FEDORA– Joint work in Interoperability with CNRI– Access Management
• Resource Discovery– STARTS (Cornell/Stanford collaboration)– Intelligent Distributed Searching
• Collection Definition
![Page 28: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/28.jpg)
Digital Object is...
recognizable by what it can do
getChaptergetPage
getTrackgetLabel
getSectiongetArticle
getFramegetLength
![Page 29: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/29.jpg)
Structure
Mechanism
Content-TypeInterfaces
Book
MARC
What the client sees vs.What the object is
![Page 30: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/30.jpg)
application/MARC DS1
application/postscript DS2
GenericDisseminator
FEDORA DigitalObject
Book, DublinCore
ListContentTypes
BookDisseminator
DublinCoreDisseminator
GetChapterGetIndexGetPage
Get(Book.getPage(1))
![Page 31: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/31.jpg)
FEDORA:Extensibility for Content Types
• Simple, familiar content types
• Complex, compound, dynamic content types
![Page 32: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/32.jpg)
Resource Discovery
• Meta-Searching for Resource Discovery– query multiple document sources– choose best sources to evaluate a query– evaluate the query at these sources– merge the query results from these sources
• Stanford Protocol Proposal for Internet Retrieval and Search (STARTS) – www-db.stanford.edu/~gravano/starts.html
– www.cs.cornell.edu/NCSTRL/STARTS/STARTShome.htm
![Page 33: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/33.jpg)
Distributed Collection Service Definition and Access
Central Collection
Server
Collection QueryRouter
Collection QueryRouter
Collection QueryRouter
User InterfaceIntelligent routing
based on regional conditions
![Page 34: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/34.jpg)
Conclusions: Design with an Eye Toward the Future
• Know limitations of ad-hoc web development and commercial packages
• Embrace a service-based approach – modular designs increase flexibility, extensibility,
plug-in/plug-out– well-defined services with protocols to enable
federation and interoperability– can utilize various technologies or commercial
software underneath the service layers
• Watch Web developments in XML and RDF
![Page 35: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,](https://reader035.fdocuments.us/reader035/viewer/2022062511/551a82905503466b3a8b486b/html5/thumbnails/35.jpg)
Further reading• Lagoze and Payette: An Infrastructure for Open-Architecture
Digital Libraries http://ncstrl.cs.cornell.edu/Dienst/UI/1.0/Display/ncstrl.cornell/TR98-1690
• Davis and Lagoze: NCSTRL: Design and Deployment of a Globally Distributed Digital Library, Draft of submission to IEEE Computer Special Issue on Digital Libraries, February 1999.
http://www2.cs.cornell.edu/lagoze/papers/NCSTRL-IEEE3.doc
• Payette: Persistent Identifiers, RLG DigiNews http://www.rlg.org/preserv/diginews/diginews22.html
• Payette and Lagoze: Flexible and Extensible Digital Object and Repository Architecture (FEDORA)http://www2.cs.cornell.edu/NCSTRL/CDLRG/FEDORA.html