The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

27
The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    2

Transcript of The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Page 1: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

The Fedora Project

March 19, 2003ISTEC Symposium, Brazil

Sandy Payette

Cornell Information Science

Page 2: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Motivation

The Problem of Complex Content

Page 3: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Digital Library Contentnot just documents ...

Some familiar objects

Complex, compound, dynamic objects

Page 4: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Research Questions How can clients interact with heterogeneous

collections of complex objects in a simple and interoperable manner?

How can complex objects be designed to be both generic and genre-specific at the same time?

How can we associate services and tools with objects to provide different presentations or transformations of the object content?

How can we associate fine-grained access control policies with specific objects, or with groups of objects?

How can we facilitate the long-term management and preservation of complex objects that have dependencies on distributed content and services?

Page 5: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

The Flexible Extensible Digital Object Repository Architecture (FEDORA)

DARPA and NSF-funded research at Cornell (1997-present) CORBA-based reference implementation (Payette/Lagoze) Extensive interoperability testing (with Arms/Blanchi/Overly) Policy Enforcement (Payette/Schneider)

Interpreted and re-implemented at U of Virginia (1999-) Simple web-oriented implementation, focused on access to collections Java servlet and relational db Testbed of 10,000,000 objects with performance metrics (1999-2001)

Mellon-Funded FEDORA Software(2002-) University of Virginia and Cornell - joint development Open source Web services and XML Mediation of distributed services Preservation focus

Page 6: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Fedora: Key Features Open System – public APIs, exposed as web services Flexible Digital Object Model

XML submission and storage (METS Schema) Local and distributed content Data (any type) and metadata (any schema – DC, other) Supports inter-relationships among objects Behavior “contracts” for objects Associate services with objects Objects can provide launch-pad or tool to use object content

Repository System: Management Service - manage digital resources, metadata, as well as

computer programs, services and tools that support them Access Service – repository search and object disseminations Mediation - interacts with other distributed web services for content

transformation and presentation OAI Provider Access Control

Preservation service (future release)

Page 7: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Requirements:Heterogeneous Digital Collections

BooksRare Books

Multimedia Music

E-texts Maps Photographs Statistics

Video Art Manuscripts Data

Images3-D

ObjectsJournals

Sound Effects

Page 8: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Shortcomings of commercial digital library products

Narrow focus on specific media formats (e.g. image databases, document management)

Fail to effectively address interrelationships among digital entities

Fail to address interoperability; no open interfaces to facilitate sharing of services; no standard protocols for cross-system interoperability

Fail to provide facilities for managing programs and tools that are integral to delivering digital content.

Not extensible; does not enable easy integration of new tools and services

Do not address fine-grained access control and preservation issues.

Page 9: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

The Fedora Architecture

Digital Object Model The Repository Web Services

Page 10: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

FEDORA Basic Object Architecture

Digital Object Model Container to aggregate digital content of any type

Data or metadata Local or distributed

Behavior “contracts” Definitions of abstract operations Fulfillment via bindings to external services

Enables multiple “disseminations” of content

Page 11: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

PI D

B e h a v io r

B e h a v io r

B e h a v io r

B e h a v io r

D a ta s tre a m

D a ta s tre a m

D a ta s tre a mUs e rs

File

F ile

F ile

B e h a v io rO bje ct s

Application

Digital Object Model Functional View

Dynamicdata

services

Page 12: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Persistent ID (PID)

Disseminators

System Metadata

Datastreams

Globally unique persistent id

Public view: access methods for obtaining “disseminations” of digital object content

Internal view: metadata necessary to manage the object

Protected view: content that makes up the “basis” of the object

Digital Object Model Architectural View

Page 13: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Persistent ID (PID)

Default

Disseminators

Simple Image

System Metadata

Datastreams

Digital Object Model Example Disseminators

Get ProfileList ItemsGet Item

List MethodsGet DC Record

Get ThumbnailGet Medium

Get HighGet VeryHigh

Page 14: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Persistent ID (PID)

Behavior DefinitionMetadata

SystemMetadata

DatastreamsData Object

Persistent ID (PID)

Service BindingMetadata (WSDL)

SystemMetadata

Datastreams WebService

Object Behavior Contracts

behavior contract

behavior

subscriptio

n

data contract

Persistent ID (PID)

Disseminators

Datastreams

System Metadata

Behavior Mechanism Object

Behavior Definition Object

Page 15: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

FEDORA Basic Repository Architecture

Repository System Object Management

Lifecycle (Ingest/create Store Delete Approve Purge) Validation PID Generation Version management Access Control Preservation support

Object Access Object Dissemination Object Reflection Service Mediation

Page 16: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Fedora Implementation

Understanding the system implementation

Web ServicesServer Design

Page 17: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

What is a Web Service?

A distributed application that runs over the internet.

A web application that publishes an open interface through which clients can send requests and received responses

Standards Transport protocol: HTTP, others Messaging protocol: SOAP, HTTP GET/POST Message encoding: XML Service description: WSDL

Page 18: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Fedora and Web Services

Fedora Repository system is a web service Access/Search (API-A) and Management (API-M) Service descriptions published using WSDL Both SOAP and HTTP bindings

Back-end services Digital object behaviors implemented as linkages to

other distributed web services Service binding metadata (WSDL) stored in special

Fedora Behavior Mechanism objects. Fedora acts as mediator to these services.

Page 19: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Fedora Repository SystemClient and Web Service Interactions

FedoraRepository

System

ContentTransform

Service

ContentTransform

Service

user

Web

Ser

vice

Dis

patc

h

We

b S

erv

ice

Ser

vice

Ser

vice

BackendFrontend

clie

nt

app

lica

tio

n

clie

nt

app

lica

tio

nw

ebb

row

ser

user

Page 20: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Fedora Server Design

3-Tiered Architecture Modular & Extensible

System Diagram

Page 21: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Server Design: 3 Layers

Interface Service Exposure

API-A, API-M, pure HTTP and SOAP via HTTP.

Application Logic Implements requests in terms of the Fedora object model.

Storage Database, File system, Object serializations and cache(s).

Page 22: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Fedora System Diagram

E x ter n a lC o n ten tS o u r c e

E x ter n a lC o n ten tS o u r c e

HT

TP

E x ter n a l C o n ten tR etr iev er

X M L F ile s

Re la t io n a l D B

S e s s io n M a n a g e me n tU s e r A u th e n t ic a t io n

P o l icies

U s ers /G ro u p s

H T T P

F T P

D atas tr eam s

D ig ita l O b jec tsS to rag e S u b s ys te m

S e c u rityS u b s ys te m

W e b Se r vi c eE xpo s ur eL aye r

SO

AP

R em o teS er v ic e

L o c alS er v ic e

M an ag e A c c e s s S e arc h O A I P ro v id e r

M an ag e m e n tS u b s ys te m

A c c e s sS u b s ys te m

HT

TP

FT

P

H T T PH T T P S O A P H T T P S O A P H T T P S O A P

C lie n tA pplica t io n

B a tchPro g ra m

S e rv e rA pplica t io n

W e bB ro ws e r

Co mp o n e n t M g mt

O b je c t M g mt

O b je c t Va lid a t io n

P ID Ge n e ra t io n

O b je c t D is s e min a t io n

O b je c t Re fle c t io n

P o lic y En fo rc e me n t

P o lic y M g mt

Co n te n t

Page 23: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Open Source Fedora: Implementation Technologies Fedora Web Services Layer

Apache Axis for SOAP over HTTP Apache Tomcat 4.1

Core Repository System Sun Java J2SDK1.4 Xerces 2-2.0.2 for XML parsing and validation Saxon 6.5 for XSLT transformation Schematron 1.5 for validation MySQL-2.23.52 and Mckoi relational database

Deployment Platforms Windows 2000, NT, XP Solaris Linux

Page 24: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

DEMO: Use Cases

Connect to Repository

www.fedora.info

Page 25: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Release Plan

Phase 1 – Fedora 1.0 (May 1, 2003 public) Phase 2/3 (2003-2005)

Advanced Access Control Preservation Service R2R Repository Federation Reliability

Fault tolerance Mirroring and replication

Performance tuning Caching Load balancing Storage scalability

Page 26: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

Deployment Partners

Los Alamos National Laboratory: Research Library Library of Congress: Motion Picture and Recorded

Sound Division Indiana University: Digital Library group Kings College London: Humanities Computing NYU: Humanities Computing Northwestern University: Academic Computing Oxford: Oxford Digital Library and The Refugee Studies

Center Tufts: Digital Collections and Archives Department

Page 27: The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.

More Information

www.fedora.info