1 CS 502: Computing Methods for Digital Libraries Lecture 22 Repositories.
-
date post
20-Dec-2015 -
Category
Documents
-
view
220 -
download
0
Transcript of 1 CS 502: Computing Methods for Digital Libraries Lecture 22 Repositories.
2
Administration
Final examination
May 19, 20001:00 - 2:30pm Phillips 219
5 or 6 questions on whole courseDo you want an open laptop examination?
There will be a make up examination near the beginning of the examination period. Please send me email if you might wish to take it.
3
Administration
Discussion class, Wednesday April 19
One class only, from 7:30 to 8:30 p.m.
Online survey
http://create.hci.cornell.edu/cssurvey.cfm
4
Repositories
Definitions
A repository is any computer system whose primary function is to store digital material for use in a library.
An archive is a repository that is organized to emphasize the long-term preservation of information.
6
Repository layers and interfaces
Interface
Persistent Store
Object Management Layer
External Interface
Shell API
Store API
Clients
7
Requirements 2
Object models
• Support for a flexible range of object models.
• Few restrictions on data, metadata, external links, and internal relationships.
• New categories of information do not require fundamental changes to other aspects of the digital library.
8
Multiple disseminations
Client can access a choice of forms of digital object:
• Format -- PDF or HTML
• Performance -- 8 bit/pixel or 24 bit/pixel color
• Content -- thumbnail, medium-resolution, high-resolution
Repository might store alternative disseminations or derive them when requested.
9
Dynamic content
Dissemination is produced by executing code at time client makes request
• Real-time sensor, e.g., traffic camera, satellite picture
• User characteristics, e.g., location, user profile
• Dissemination is intrinsically dynamic, e.g.,
simulation
virtual reality
computer program
Java applet
10
Metadata
Metadata can be linked to digital object:
• external catalog or index
• embedded in the digital object
• generated at run time
Granularity of metadata
• collection of digital objects
• digital object
• element of digital object
11
Requirements 3
Open protocols and formats • Clients use well-defined protocols, data types, and formats. • Architecture must allow incremental changes of protocols. Access management • Allow a broad set of policies• All levels of granularity• Prepared for future developments.
Reliability and performance • Very large volumes of data• Absolutely reliable in retention of data• Good performance
15
Common repository systems
Web server
• File-based object model plus hyperlinks
• Good tools for access
• Weak on long-term preservation
Relational database
• Table-based object model -- schema and data dictionary
• Good tools for data management
• Used for long-term preservation in data processing
16
Dumb and smart objects
Smart repositories objects
• behaviors provided by the repository• e.g., relational database
Smart clients
• behaviors provided by the client• e.g., web server
Smart objects
• repository is very simple• digital objects provide their own behaviors• compare with object-oriented programming (data + code)
17
Example: CNRI repository
Dumb repository for access to digital objects
• All information stored as typed data in digital objects.
• A single digital object has both data and metadata.
• Identification of digital objects is by location independent, persistent URNs.
• Access controls built into methods for accessing digital object.
18
Repository Access Protocol (RAP)
RAP is a simple protocol with two main groups of commands:
• Deposit digital object• Verify digital object• Delete digital object• Edit digital object
• Access digital object• Access metadata
19
Repository layers and interfaces
RAP Interface
Persistent Store
Object Management Layer
RAP Interface
Shell API
Store API
RAP Command
20
Client and repository architectures
RAP Interface
Object Management
Digital Object Processing
RAP Interface
Object Management
Object Persistence
ORB
StoreEnd Client
RAP Requests RAP Replies
Client Repository
21
Components
Hardware
• Repository: Sun Sparc with Solaris or IBM RS/6000 with AIX.
Software
• Communications: CORBA/IIOP distributed object system.
• Repository shell and object management layer: CORBA and Python.
• Persistent store: Unix file system, Oracle, Shore.
• Client: CGI scripts, Java applets.