Natasa Bulatovic Max Planck Digital Library Research and Development
description
Transcript of Natasa Bulatovic Max Planck Digital Library Research and Development
This work is licensed under a Creative Commons Attribution 2.0 Germany License http://creativecommons.org/licenses/by/2.0/de/
eSciDoc, VIRR and Digitization Lifecycle - insights into an infrastructure for management of digitized resources Natasa Bulatovic
Max Planck Digital Library
Research and Development
Max Planck Digital Library (MPDL) is a service unit within the Max Planck Society (MPG)
MPG consists of about 80 institutes in three scientific sections the Chemistry, Physics and Technology Section
the Biology and Medicine Section
the Human Sciences Section
The core activities of the MPDL lie in building up service infrastructure and tools for publications and research data
MPDL develops software solutions in close cooperation with scientists, librarians and technicians
In the Human Sciences Section several institutes have digitized cultural artefacts and want to make them open access
The Max Planck Digital Library (MPDL) in a Nutshell
eSciDoc SOA Landscape
Which data are managed?
How?
PubMan – Publication Management
VIRR – Textual digitized resources management
IMEJI – Image management
PubMan: Management of publications
21.04.23
Collaboration of the MPDL with the Max Planck Institute for European Legal History
Motivation: The period of the Holy Roman Empire produced a enormous corpus of legislative sources.Till now no complete collection of this works exist.
VIRR is about
21.04.23
ViRR Key features
Web-based collaborative application
Editor (bibliographic metadata, table of contents and structural metadata)
Viewer (online representation)
Browser
21.04.23
ViRR Editor
Combines a set of tools
Paginator
Table of Contents Editor
Metadata Editor
One complex, but flexible workspace
No default order for the usage of the tools
21.04.23
ViRR Editor - Paginator
Assign the logical page numbers to the physical ones
Choose between different formats (Arabic, Latin, custom)
Paginate manually or automatically
21.04.23
ViRR Editor - ToC Editor
Gather the logical structure of a work by breaking it down in structural elements
Arrange the hierarchical order of structural elements in the tree
Assign scans to structural elements
Choose from fine granular structural element types (over sixty)
21.04.23
ViRR Editor – Metadata Editor
Assign descriptive metadata to structural elements
Detailed description of every structural element
Systematic browsing
Dedicated search will be possible
ViRR Viewer
Browse by scanBrowse by ToC
Navigate to page
View metadata of structural element
Page (web resolution)
Page(full resolution)on click
ViRR: Sharing and reuse
http://virr.mpdl.mpg.de
From ViRR to Digitization Lifecycle Project Goal
support the complete Digitization Lifecycle with guideliness, standards, tools and a publishing platform
Partners: MPI for European Legal History, Frankfurt
Kunsthistorisches Institut, Florenz (KHI)
Bibliotheca Hertziana, Rom
MPI for Human Development, Berlin
Related projects: ViRR (see http://colab.mpdl.mpg.de/mediawiki/ViRR:_Virtueller_Raum_Reichsrecht)
XML-Workflow (see http://colab.mpdl.mpg.de/mediawiki/MPDL_Project_XML_Workflow)
Imeji: Management of image collections
Imeji: repository of Digital Images
Organized into
Collections
Created and defined by the institution, project, working group
Albums
Created and defined by the researcher
Imeji: what is so different about it?
Imeji is not Flickr, nor Facebook...
Freely definable metadata profiles at collection level
Controlled Vocabularies may be integrated
Smart search for dates, ranges (based on the metadata type)
Helps gathering the metadata more effectively
Focusses on collaboration and metadata quality
Repository: Data can be exported at any time
eSciDoc and other services
eSciDoc SOA Landscape
eSciDoc core infrastructure
Set Handler (OAI-PMH)
Admin Handler
Aggregation Definition
Handl.
Statistics Data Handler
Scope Handler
Report Handler
Report Definition Handler
Item Handler
Container Handler
Context Handler
Organizational Unit Handler
Content Model Manager
User Account Handler
Role Handler
Group Handler
Resources & Data Statistics Security
Content Relation Handler
CoNE Service● Manages named entities
○ Journals
○ Persons
○ Dewey Decimal Classification (3 public levels)
○ Creative Commons Licenses (CC licenses)
○ ISO 639-3 Languages
○ MIME Types
○ PACS classification
○ Custom classifications
● Reuse○ Data delivered in multiple formats (JSON, HTML, RDF/XML, Options list)
● Motivation○ Metadata quality: autosuggest components in solutions during metadata editing
○ Disambiguation: each entity is a named graph
○ Data linking: CoNE identifiers in publication metadata
○ Technical facilitation: all lists in one place
○ Persons: Researcher Portfolio
● Extensions○ Refresh data from external sources
CoNE – Control of Named Entitieshttp://cone.mpdl.mpg.de/
http://pubman.mpdl.mpg.de/cone/persons/resource/persons2450+
Content negotiation supported
Transformation Service
● Transforms textual data formats○ Metadata
○ Resources
○ Standard formats
○ Specific formats (e.g. EndNote custom fields)
● Motivation○ Migration of data from MPI
○ Exports and dissemination
○ Imports
○ Continuous interoperability enhancement
○ Implement once, use wherever needed
eDoc
BibTex
APA
OpenURL
EndNote
arXiv
Pmc
TEI
AJPBmc
METS
Spires
eSciDoc-Publication
eSciDoc-TOC
eDoc
BibTex
APA
OpenURL
EndNote
arXiv
Pmc
TEI
AJPBmc
METS
Spires
eSciDoc-Publication
eSciDoc-TOC
Search&Export ServiceCiation style manager
● Searches and exports results ● Citation styles (Citation style manager)
○ EndNote
○ BibTex
○ …
● Reuse○ Data delivered in multiple formats (PDF, HTML, XML, ODT)
○ By external systems (content management, wordpress)
● Motivation○ Search results should be available in various outputs
○ One service – many presentations (e.g. Wordpress Plug-in)
○ One interface – easy inclusion of various export formats
Syndication Service
● Provides with the latest data updates ● RSS
● Atom
● Reuse○ Subscription to feeds and data reuse
○ By any external clients
● Extensions○ Media RSS
Validation service
Semantical validation
Contextual validation
Validation rule editor (upcoming)
Data acquisition service• Fetches data from known sources via identifier (unAPI
interface)
• Transforms data to other format
Pubman SWORD Server
• Deposit of data packages (metadata and fulltexts)
• Logic implements a pubman specific workflow
PID Cache manager● Fetches Handles from the GWDG Handle System (dummy
resolution)
● Assigns a pre-fetched handle to the resource
● Synchronizes the assigned handle with the resolution to a resource in the Handle system
EPIC – European Persistent Identifier Consortium (GWDG Germany, SARA Netherlands, CSC Finland, http://www.pidconsortium.eu/ )
A note on the metadata profiles
● DCAP based (Dublin Core Application Profile)
● DC terms (identified URIs)
● eSciDoc solution specific terms (identified by URIs)
● METS/MODS
● Publicly available
● Functional description http://colab.mpdl.mpg.de/mediawiki/ESciDoc_Application_Profiles
● Schemas http://metadata.mpdl.mpg.de/escidoc/metadata/schemas/0.1/
● Interoperability levels
● Shared term definitions (done)
● Semantic interoperability (done)
● Description set syntactic interoperability (prepared)
● Description set profile interoperability (prepared)
Premises● Applications
○ Web-based
○ Internationalized
○ Integrated Help system
○ Easy to use
○ Easy to install
● Services and infrastructure
○ Reusable, interoperable, composed, technology-independent
○ Extensible, Scalable and performant ● Data
○ Persistently identified, versioned, discoverable, provenance and authenticity information, fine-grained authorization
○ Described with published metadata profiles
○ Interoperable and enabled for reuse and repurpose
Related projects and new developments
DARIAH
Digital Research Infrastructure for Arts and Humanities (see http://dariah.eu)
Imeji
AWOB
Astronomers Workbench
Resource Registries
ECHO – European Cultural Heritage Online (see http://echo.mpiwg-berlin.mpg.de/home )