Www.kb.se Depositing e-material to The National Library of Sweden.
-
Upload
cathleen-mclaughlin -
Category
Documents
-
view
222 -
download
0
Transcript of Www.kb.se Depositing e-material to The National Library of Sweden.
![Page 1: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/1.jpg)
www.kb.se
Depositing e-material to
The National Library of Sweden
![Page 2: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/2.jpg)
www.kb.se
KB - Overview
1661 – First legal deposit law
1877 – Becomes a government institution
1996 – First steps in digitization
1997 – Kulturarw3 - the first collection of the Swedish web
20?? – Deposit law expanded to include electronically published documents
![Page 3: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/3.jpg)
www.kb.se
KB – Aim of repository
• Be able to receive different kinds of data in different kinds of formats
• Be able to handle large amounts of incoming data (scalability)
• Have a flexible and modular design
• Be able to utilize services that can receive data from organizations with different technical capabilities
• A system for long term preservation and presentation
![Page 4: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/4.jpg)
www.kb.se
Overview - Architecture
![Page 5: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/5.jpg)
www.kb.se
Reality – Types of material
• Will receive widely different kinds of materials– Different:
• file formats
• metadata formats
• structure of data
• naming schemas
• From a lot of different sources– Local file system, FTP, Database, URL on the web
– Should still try to use the same services
• Solution: – Normalize received material to an internal format
– Represent data + metadata as DIDL XML
![Page 6: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/6.jpg)
www.kb.se
Overview – Deposit system
![Page 7: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/7.jpg)
www.kb.se
Fundamentals of deposit system
• Modular design
• One internal format for representing packages
• Try to use as simple interfaces between services as possible– REST services (HTTP + XML)
– Message Queue to drop packages for the system in
– This makes the system independent of platform and programming framework
• Each module should be highly configurable with smaller sub-components– Build services as chains of simple components concerned with just one task
– Use Spring Framework for configuration
![Page 8: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/8.jpg)
www.kb.se
Internal package format
• Uses Digital Item Declaration Language (DIDL)– An MPEG-21 standard
– An XML format for both data and metadata• Do not inline data, just metadata
• Store datastreams centrally and reference
• 1 DIDL file = 1 ”object”
• One package has:– ID
– Type
– List of Attributes(name/value pairs)
– List of Metadata(as XML)
– List of Resources (as references)
![Page 9: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/9.jpg)
www.kb.se
Internal package format
• Represent a package as a DIDL file– Parser to read a DIDL file into a Java object
– Serializer to write a Java object to a DIDL file
• Usually works with the package as a Java object
• BUT:– Only plain XML is sent between services
– Decouples services from programming language, anything that can handle XML is fine
![Page 10: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/10.jpg)
www.kb.se
Internal package format - Attributes
• Attributes– Name/value pairs (Example: page-number = 5)
– Flexible way of representing additional information about a package
In DIDL:
<didl:Descriptor id="attributes"> <didl:Statement mimeType="text/plain"> #Attributes foo=bar </didl:Statement> </didl:Descriptor>
![Page 11: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/11.jpg)
www.kb.se
Internal package format - Metadata
• Metadata– Name
– Description (optional)
– XML that represents the metadata
In DIDL:
<didl:Descriptor id="mods"> <didl:Statement mimeType="text/xml"> <mods>
... </mods>
</didl:Statement> </didl:Descriptor>
![Page 12: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/12.jpg)
www.kb.se
Internal package format - Resource
• Resource– ID
– Mimetype
– List of Attributes (for this Resource only)
– List of Metadata (for this Resource only)
– Reference to the datastream (a URL)
In DIDL:
<didl:Component> <didl:Descriptor> <didl:Statement mimeType="text/xml">
<dii:Identifier xmlns:dii="urn:mpeg:mpeg21:2002:01-DII-NS">123456</dii:Identifier> </didl:Statement> </didl:Descriptor> <!-- ATTRIBUTES --> <!-- METADATA --> <didl:Resource mimeType="application/pdf" ref="http://resourcestore.kb.se:8080/store/123456.pdf"/> </didl:Component>
![Page 13: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/13.jpg)
www.kb.se
Package normalizer
![Page 14: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/14.jpg)
www.kb.se
Package normalizer
• Takes data in one format and creates an internal package– Creates the DIDL file and writes the datastreams to the Resource Store
• Places the package on a queue for further processing
• One normalizer per type of data package delivered– Has to know the contract for the delivered data
• Looks in an inbox at regular intervals for new packages– File system directory
• Data could be delivered via FTP or file copy on local file system
– URL• OAI-PMH server with metadata that has links to actual resources• OAI-ORE fits in nicely here
– Database– Web form operated by human– Anything else?
![Page 15: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/15.jpg)
www.kb.se
Enricher
![Page 16: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/16.jpg)
www.kb.se
Enriching a package
• REST service– POST a DIDL file and get it back enriched
• Implemented with Spring and a chain of enrichers– Each doing one specific task, for example adding a urn:nbn
– Some only make sense for a specific kind of package
– Can be a different set of enrichers for different package types
• Examples of enrichers– Adding urn:nbn
– Updating MARCXML to reflect that it is an electronic copy
– Adding extracted technical metadata from JHove or DROID
– And so on...
• Possible to have enrichers that involves human intervention
![Page 17: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/17.jpg)
www.kb.se
Validator
![Page 18: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/18.jpg)
www.kb.se
Validating a package
• Similar in design to Enricher
• REST service– POST a DIDL file and get back a status report
• Implemented with Spring and a chain of tests– Each test doing one specific task
– Some only make sense for a specific kind of package
– Can be a different set of tests for different package types
• Examples of tests– Verifying that a PDF is readable
– Validating metadata
– And so on...
• Possible to have tests that involves human intervention
![Page 19: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/19.jpg)
www.kb.se
Ingest
![Page 20: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/20.jpg)
www.kb.se
Ingest
• REST service– PUT a DIDL file and get back an id pointing into the repository
• In future:– Perhaps add possibility to update or delete package in repository using
POST and DELETE
• Abstraction that hides the actual repository used– Can change repository without affecting rest of the system
– Repository dependant enrichments and tests can be done here
• We use Fedora as our repository
• The same principal is used for ingestion into the long-term preservation archive
![Page 21: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/21.jpg)
www.kb.se
Fedora
• Fedora is used as the repository– Reasons why:
• Open-source
• Actively developed
• Large (and growing) user base
• Good design and nice features
– We use version 2.2• obviously going to move to 3.0 in the future
• Used for storage and presentation– Stores both relevant datastreams and metadata
– Have relations between datastreams (i.e. sequence-number)
• Possible to search against the repository– As standard search against DC fields
![Page 22: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/22.jpg)
www.kb.se
Fedora – Content Models
• Content Model – A contract of available Datastreams and Behaviour Definitions in a Fedora
record• In Fedora 2.x just an informal agreement
• But from Fedora 3.0 a new mechanism exists for this– Called Content Model Architecture (CMA)
– A Content Model could involve multiple Fedora records• Atomistic versus Compund model
– Also specifies relations• Both between datastreams and Fedora records
• Using RDF in the RELS-EXT datastream
![Page 23: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/23.jpg)
www.kb.se
Fedora - An example Content Model
•PagedObject Content Model– Used for digitized material where each page is an image– Atomistic, i.e. one page becomes one Fedora record– Also has one Fedora record for the object as a whole
• Record for the object– Datastreams
• DC• MODS• MARCXML
– Behaviour Definitions• view• list• getPreview
– Relations• member of a collection• member of OAI-PMH set
• Record for an individual page– Datastreams
• WEBIMAGE• THUMBNAIL
– Behaviour Definitions• getImage• getZoom
– Relations• member of the object• sequence-number etc.
![Page 24: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/24.jpg)
www.kb.se
Fedora - Ingest
• Gets a DIDL package and creates corresponding FOXML– Different FOXML for different Content Models
– Which Content Model depends on Type of package
– A Content Model can result in multiple FOXML files (and accordingly multiple Fedora records)
• Uses Fedora's Web Services to ingest the FOXML to the repository
• The datastreams are also transferred to the Fedora repository
• (Also a urn:nbn is mapped to the objects location in Fedora)
![Page 25: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/25.jpg)
www.kb.se
Fedora - Access
• Built-in search system– Search for DC terms and some Fedora terms
• Built-in OAI-PMH provider– We give access to DC, MODS and MARCXML
• Built-in RDF Query Server– Query against the RDF in RELS-EXT
• In future: OAI-ORE provider for Fedora
• We provide our own viewer for digitized objects– Developed with Google Web Toolkit (GWT)– Has one tab with an overview of all pages– Another tab with an individual page with zooming functionality and
the ability to navigate between pages– Some simple metadata displayed
![Page 26: Www.kb.se Depositing e-material to The National Library of Sweden.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d145503460f949e812e/html5/thumbnails/26.jpg)
www.kb.se
Example
A demo of viewing e-material from our Fedora repository.
Accessing SOT from LIBRIS.