Post on 26-Mar-2015
Introducing the ELAR information system
architecture
Robert Munro &David Nathan
Endangered Languages Archive (ELAR), School of Oriental and African Studies, London
Outline
1. Introduction
2. The ELAR architecture
3. User Requirements
4. Ingestion
5. Archive & dissemination
6. Conclusions
Introduction – who we are
Part of the Hans Rausing Endangered Languages Project (HRELP), based at the School of Oriental and African Studies (SOAS), University of London.
Funded by the Lisbet Rausing Charitable fundThe other two parts are:
Academic Programme (ELAP) runs postgraduate courses, seminars and workshops
Documentation Programme (ELDP) funds endangered language documentation projects
ELAR – current state
In the process of designing and implementing key systems:accession system (ingestion system)archive information systemcatalogue serving systemarchive access systemdata storagelong-term backup system
ELAR – current state
Source of materials supporting the systems analysis and design:literature reviewreview of exemplar materialsinteraction with associated archivesinteraction with ELDP granteesinteraction with members of ELAPdepartmental seminars on language documentationseminars focused on archiving
ELAR – architecture
Strongly informed by the Open Archive Information System (OAIS) Reference Model (CCSDS, 2002)
The OAIS model
Ingestion Archive Dissemination
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
Designated communities
Producers
The OAIS model
Identify the nature of the materials (content, format and structures) that data producers will create
Ingestion Archive Dissemination
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
Designated communities
Producers
The OAIS model
Identify the intended users of the archive, and their user requirements
Ingestion Archive Dissemination
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
Designated communities
Producers
The OAIS model
Define dissemination formats, data structures and procedures that support the user requirements of the designated communities
Ingestion Archive Dissemination
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
Designated communities
Producers
The OAIS model
Design an archive information system able to store all the information and produce the required dissemination packages.
Ingestion Archive Dissemination
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
Designated communities
Producers
The OAIS model
Define ingestion (accession) formats and structures that minimise the conversion cost
Ingestion Archive Dissemination
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
Designated communities
Producers
The OAIS model
Ingestion Archive Dissemination
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
Designated communities
Producers
The archive needs to define three types of ‘packages’: ingestion, archive and dissemination.
User requirements
EL speakers and communities:
continuation of ownership of language and materials
depositors: preserve deposit structure; update material; be correctly attributed
researchers: search (broad, narrow, domain specific); add materials; add relationships
publisher– repurposing: obtain high quality data for repurposing
publisher– public heritage: archive to act as mediator
public: browse
long-term preserver: obtain clearly structured data
Ingestion
A set of formats & structures that can be converted to archive formats with minimal effort:file formats conforming to the 7 + 1 dimensions of
portability (Simons and Bird, 2003; Johnson 2004)support incremental assembly of the depositwell-documented structures: XML with schema ideal
ELAR preferences:uncompressed, nonpropriety formatswell-documented structures: (OLAC, IMDI, custom)
Ingestion
Filenames and structure of deposit:we convert deposits to formats / structures appropriate
for the archive information system…but, we record the filenames and directory structures
of the deposit, allowing depositors to navigate the materials via them
Ingestion
Access protocols… tomorrow
Archive and dissemination
Granularity:archive objects can be bundles archive objects can be a subsection of a filethe types of related materials and their relationships
should play a part in the search options
Archive and dissemination
Version control:modeling versions of materials are requiredmultiple types of versioning might be required
(migration / dissemination / content update)versions will be ‘invisible’ to most dissemination
packages
Archive and dissemination
Adding materials and metadata:users can add comments to datausers can add metadata values not provided by a
depositorusers can make relationships between items,
including mapping users can supplement the kinds of metadata and
relationships in the archive. note: all the above require moderation and supporting
architecture
Archive and dissemination
Language support:users should be able to add comments / metadata in
any languageusers should be able to navigate the archive access
system via the language preference(s) of their choicethe archive architecture needs to support translations
of metadata and comments
Archive and dissemination
Archive servicesadvice and conversion services to depositorsresponse to requests for informationsupporting communications between individuals
associated with the archive
Archive and dissemination
Archive information system:separate metadata from materialsavoid redundancy
Dissemination packages:favour embedding metadataredundancy ok if an aid interpretation
Technical solutions:we use MySQL to support the archivefor dissemination, we favour XML and formats
allowing metadata to be embedded (PDF, BWF)
Conclusions
ELAR is newly opened for depositsKey systems are in the process of developmentSignificant features include:
modelling archive objects at different granularitiesmodelling relationships between objectsusers can enter/define their own metadatausers can translate information into the language of
their choiceusers can navigate via the language(s) of choice