Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered...

Post on 26-Mar-2015

225 views 2 download

Tags:

Transcript of Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered...

Introducing the ELAR information system

architecture

Robert Munro &David Nathan

Endangered Languages Archive (ELAR), School of Oriental and African Studies, London

Outline

1. Introduction

2. The ELAR architecture

3. User Requirements

4. Ingestion

5. Archive & dissemination

6. Conclusions

Introduction – who we are

Part of the Hans Rausing Endangered Languages Project (HRELP), based at the School of Oriental and African Studies (SOAS), University of London.

Funded by the Lisbet Rausing Charitable fundThe other two parts are:

Academic Programme (ELAP) runs postgraduate courses, seminars and workshops

Documentation Programme (ELDP) funds endangered language documentation projects

ELAR – current state

In the process of designing and implementing key systems:accession system (ingestion system)archive information systemcatalogue serving systemarchive access systemdata storagelong-term backup system

ELAR – current state

Source of materials supporting the systems analysis and design:literature reviewreview of exemplar materialsinteraction with associated archivesinteraction with ELDP granteesinteraction with members of ELAPdepartmental seminars on language documentationseminars focused on archiving

ELAR – architecture

Strongly informed by the Open Archive Information System (OAIS) Reference Model (CCSDS, 2002)

The OAIS model

Ingestion Archive Dissemination

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

Designated communities

Producers

The OAIS model

Identify the nature of the materials (content, format and structures) that data producers will create

Ingestion Archive Dissemination

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

Designated communities

Producers

The OAIS model

Identify the intended users of the archive, and their user requirements

Ingestion Archive Dissemination

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

Designated communities

Producers

The OAIS model

Define dissemination formats, data structures and procedures that support the user requirements of the designated communities

Ingestion Archive Dissemination

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

Designated communities

Producers

The OAIS model

Design an archive information system able to store all the information and produce the required dissemination packages.

Ingestion Archive Dissemination

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

Designated communities

Producers

The OAIS model

Define ingestion (accession) formats and structures that minimise the conversion cost

Ingestion Archive Dissemination

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

Designated communities

Producers

The OAIS model

Ingestion Archive Dissemination

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

afd_34

dfa dfadf

fds fdafds

Designated communities

Producers

The archive needs to define three types of ‘packages’: ingestion, archive and dissemination.

User requirements

EL speakers and communities:

continuation of ownership of language and materials

depositors: preserve deposit structure; update material; be correctly attributed

researchers: search (broad, narrow, domain specific); add materials; add relationships

publisher– repurposing: obtain high quality data for repurposing

publisher– public heritage: archive to act as mediator

public: browse

long-term preserver: obtain clearly structured data

Ingestion

A set of formats & structures that can be converted to archive formats with minimal effort:file formats conforming to the 7 + 1 dimensions of

portability (Simons and Bird, 2003; Johnson 2004)support incremental assembly of the depositwell-documented structures: XML with schema ideal

ELAR preferences:uncompressed, nonpropriety formatswell-documented structures: (OLAC, IMDI, custom)

Ingestion

Filenames and structure of deposit:we convert deposits to formats / structures appropriate

for the archive information system…but, we record the filenames and directory structures

of the deposit, allowing depositors to navigate the materials via them

Ingestion

Access protocols… tomorrow

Archive and dissemination

Granularity:archive objects can be bundles archive objects can be a subsection of a filethe types of related materials and their relationships

should play a part in the search options

Archive and dissemination

Version control:modeling versions of materials are requiredmultiple types of versioning might be required

(migration / dissemination / content update)versions will be ‘invisible’ to most dissemination

packages

Archive and dissemination

Adding materials and metadata:users can add comments to datausers can add metadata values not provided by a

depositorusers can make relationships between items,

including mapping users can supplement the kinds of metadata and

relationships in the archive. note: all the above require moderation and supporting

architecture

Archive and dissemination

Language support:users should be able to add comments / metadata in

any languageusers should be able to navigate the archive access

system via the language preference(s) of their choicethe archive architecture needs to support translations

of metadata and comments

Archive and dissemination

Archive servicesadvice and conversion services to depositorsresponse to requests for informationsupporting communications between individuals

associated with the archive

Archive and dissemination

Archive information system:separate metadata from materialsavoid redundancy

Dissemination packages:favour embedding metadataredundancy ok if an aid interpretation

Technical solutions:we use MySQL to support the archivefor dissemination, we favour XML and formats

allowing metadata to be embedded (PDF, BWF)

Conclusions

ELAR is newly opened for depositsKey systems are in the process of developmentSignificant features include:

modelling archive objects at different granularitiesmodelling relationships between objectsusers can enter/define their own metadatausers can translate information into the language of

their choiceusers can navigate via the language(s) of choice