The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.
-
date post
20-Dec-2015 -
Category
Documents
-
view
220 -
download
2
Transcript of The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.
The Fedora Project
March 19, 2003ISTEC Symposium, Brazil
Sandy Payette
Cornell Information Science
Motivation
The Problem of Complex Content
Digital Library Contentnot just documents ...
Some familiar objects
Complex, compound, dynamic objects
Research Questions How can clients interact with heterogeneous
collections of complex objects in a simple and interoperable manner?
How can complex objects be designed to be both generic and genre-specific at the same time?
How can we associate services and tools with objects to provide different presentations or transformations of the object content?
How can we associate fine-grained access control policies with specific objects, or with groups of objects?
How can we facilitate the long-term management and preservation of complex objects that have dependencies on distributed content and services?
The Flexible Extensible Digital Object Repository Architecture (FEDORA)
DARPA and NSF-funded research at Cornell (1997-present) CORBA-based reference implementation (Payette/Lagoze) Extensive interoperability testing (with Arms/Blanchi/Overly) Policy Enforcement (Payette/Schneider)
Interpreted and re-implemented at U of Virginia (1999-) Simple web-oriented implementation, focused on access to collections Java servlet and relational db Testbed of 10,000,000 objects with performance metrics (1999-2001)
Mellon-Funded FEDORA Software(2002-) University of Virginia and Cornell - joint development Open source Web services and XML Mediation of distributed services Preservation focus
Fedora: Key Features Open System – public APIs, exposed as web services Flexible Digital Object Model
XML submission and storage (METS Schema) Local and distributed content Data (any type) and metadata (any schema – DC, other) Supports inter-relationships among objects Behavior “contracts” for objects Associate services with objects Objects can provide launch-pad or tool to use object content
Repository System: Management Service - manage digital resources, metadata, as well as
computer programs, services and tools that support them Access Service – repository search and object disseminations Mediation - interacts with other distributed web services for content
transformation and presentation OAI Provider Access Control
Preservation service (future release)
Requirements:Heterogeneous Digital Collections
BooksRare Books
Multimedia Music
E-texts Maps Photographs Statistics
Video Art Manuscripts Data
Images3-D
ObjectsJournals
Sound Effects
Shortcomings of commercial digital library products
Narrow focus on specific media formats (e.g. image databases, document management)
Fail to effectively address interrelationships among digital entities
Fail to address interoperability; no open interfaces to facilitate sharing of services; no standard protocols for cross-system interoperability
Fail to provide facilities for managing programs and tools that are integral to delivering digital content.
Not extensible; does not enable easy integration of new tools and services
Do not address fine-grained access control and preservation issues.
The Fedora Architecture
Digital Object Model The Repository Web Services
FEDORA Basic Object Architecture
Digital Object Model Container to aggregate digital content of any type
Data or metadata Local or distributed
Behavior “contracts” Definitions of abstract operations Fulfillment via bindings to external services
Enables multiple “disseminations” of content
PI D
B e h a v io r
B e h a v io r
B e h a v io r
B e h a v io r
D a ta s tre a m
D a ta s tre a m
D a ta s tre a mUs e rs
File
F ile
F ile
B e h a v io rO bje ct s
Application
Digital Object Model Functional View
Dynamicdata
services
Persistent ID (PID)
Disseminators
System Metadata
Datastreams
Globally unique persistent id
Public view: access methods for obtaining “disseminations” of digital object content
Internal view: metadata necessary to manage the object
Protected view: content that makes up the “basis” of the object
Digital Object Model Architectural View
Persistent ID (PID)
Default
Disseminators
Simple Image
System Metadata
Datastreams
Digital Object Model Example Disseminators
Get ProfileList ItemsGet Item
List MethodsGet DC Record
Get ThumbnailGet Medium
Get HighGet VeryHigh
Persistent ID (PID)
Behavior DefinitionMetadata
SystemMetadata
DatastreamsData Object
Persistent ID (PID)
Service BindingMetadata (WSDL)
SystemMetadata
Datastreams WebService
Object Behavior Contracts
behavior contract
behavior
subscriptio
n
data contract
Persistent ID (PID)
Disseminators
Datastreams
System Metadata
Behavior Mechanism Object
Behavior Definition Object
FEDORA Basic Repository Architecture
Repository System Object Management
Lifecycle (Ingest/create Store Delete Approve Purge) Validation PID Generation Version management Access Control Preservation support
Object Access Object Dissemination Object Reflection Service Mediation
Fedora Implementation
Understanding the system implementation
Web ServicesServer Design
What is a Web Service?
A distributed application that runs over the internet.
A web application that publishes an open interface through which clients can send requests and received responses
Standards Transport protocol: HTTP, others Messaging protocol: SOAP, HTTP GET/POST Message encoding: XML Service description: WSDL
Fedora and Web Services
Fedora Repository system is a web service Access/Search (API-A) and Management (API-M) Service descriptions published using WSDL Both SOAP and HTTP bindings
Back-end services Digital object behaviors implemented as linkages to
other distributed web services Service binding metadata (WSDL) stored in special
Fedora Behavior Mechanism objects. Fedora acts as mediator to these services.
Fedora Repository SystemClient and Web Service Interactions
FedoraRepository
System
ContentTransform
Service
ContentTransform
Service
user
Web
Ser
vice
Dis
patc
h
We
b S
erv
ice
Ser
vice
Ser
vice
BackendFrontend
clie
nt
app
lica
tio
n
clie
nt
app
lica
tio
nw
ebb
row
ser
user
Fedora Server Design
3-Tiered Architecture Modular & Extensible
System Diagram
Server Design: 3 Layers
Interface Service Exposure
API-A, API-M, pure HTTP and SOAP via HTTP.
Application Logic Implements requests in terms of the Fedora object model.
Storage Database, File system, Object serializations and cache(s).
Fedora System Diagram
E x ter n a lC o n ten tS o u r c e
E x ter n a lC o n ten tS o u r c e
HT
TP
E x ter n a l C o n ten tR etr iev er
X M L F ile s
Re la t io n a l D B
S e s s io n M a n a g e me n tU s e r A u th e n t ic a t io n
P o l icies
U s ers /G ro u p s
H T T P
F T P
D atas tr eam s
D ig ita l O b jec tsS to rag e S u b s ys te m
S e c u rityS u b s ys te m
W e b Se r vi c eE xpo s ur eL aye r
SO
AP
R em o teS er v ic e
L o c alS er v ic e
M an ag e A c c e s s S e arc h O A I P ro v id e r
M an ag e m e n tS u b s ys te m
A c c e s sS u b s ys te m
HT
TP
FT
P
H T T PH T T P S O A P H T T P S O A P H T T P S O A P
C lie n tA pplica t io n
B a tchPro g ra m
S e rv e rA pplica t io n
W e bB ro ws e r
Co mp o n e n t M g mt
O b je c t M g mt
O b je c t Va lid a t io n
P ID Ge n e ra t io n
O b je c t D is s e min a t io n
O b je c t Re fle c t io n
P o lic y En fo rc e me n t
P o lic y M g mt
Co n te n t
Open Source Fedora: Implementation Technologies Fedora Web Services Layer
Apache Axis for SOAP over HTTP Apache Tomcat 4.1
Core Repository System Sun Java J2SDK1.4 Xerces 2-2.0.2 for XML parsing and validation Saxon 6.5 for XSLT transformation Schematron 1.5 for validation MySQL-2.23.52 and Mckoi relational database
Deployment Platforms Windows 2000, NT, XP Solaris Linux
Release Plan
Phase 1 – Fedora 1.0 (May 1, 2003 public) Phase 2/3 (2003-2005)
Advanced Access Control Preservation Service R2R Repository Federation Reliability
Fault tolerance Mirroring and replication
Performance tuning Caching Load balancing Storage scalability
Deployment Partners
Los Alamos National Laboratory: Research Library Library of Congress: Motion Picture and Recorded
Sound Division Indiana University: Digital Library group Kings College London: Humanities Computing NYU: Humanities Computing Northwestern University: Academic Computing Oxford: Oxford Digital Library and The Refugee Studies
Center Tufts: Digital Collections and Archives Department
More Information
www.fedora.info