Research and Education Space: what are we going to be tangled up with?

10
Research and Education Space: what are we going to be tangled up with? Chiara Del Vescovo & Alex Tucker

description

The Research & Education Space (RES) is a project being jointly delivered by Jisc, the British Universities Film & Video Council (BUFVC), and the BBC. Its aim is to bring as much as possible of the UK’s publicly-held archives, and more besides, to learners and teachers across the UK. At the heart of RES is Acropolis, a technical platform which will collect, index and organise rich structured data about those archive collections published as Linked Open Data (LOD) on the Web. The collected data is organised around the people, places, events, concepts and things related to the items in the archive collections—and, if the archive assets themselves are available in digital form, that data includes the information on how to access them, and for which use they are copyright-cleared, all in a consistent machine-readable form. Building on the Acropolis platform, applications can make use of this index, along with the source data itself, in order to make those collections accessible and meaningful. A project like RES is by definition full with challenges and obstacles. However, even in these early days we are discovering that not everything is as easy, or as hard, as expected. In this talk, we will give a give a brief overview of the Acropolis architecture, describe the lessons learnt so far, and involve the audience in a - hopefully inspiring - discussion about the hurdles we are going to face in the future.

Transcript of Research and Education Space: what are we going to be tangled up with?

Page 1: Research and Education Space: what are we going to be tangled up with?

Research and Education Space: what are we going to be tangled up with?

Chiara Del Vescovo & Alex Tucker

Page 2: Research and Education Space: what are we going to be tangled up with?

Collect | Connect | Create• Aimed at teachers, researchers, pupils

• Collect: make all the relevant resources available and discoverable

• Connect: find meaningful interrelations between these resources

• Create: knowledge and tools to make use of these resources

Page 3: Research and Education Space: what are we going to be tangled up with?

• A LOD platform to collect, index, and organise metadata about publicly-held archives (and more) !

!

!

!

!

• Challenges ahead! (some solved by making use of LOD)

Acropolis

Core Platform: “Acropolis”

Project RES: Technical Approach

1The crawler fetches data via HTTP from publishedsources. Once retrieved, it is indexed by the full-textstore and passed to the aggregation engine for evaluation.

2

The results of the aggregation engine's evaluation processare stored in the aggregate store, which contains minimalbrowse information and information about the similarity ofentities.

3

The public face of the core platform is an extremely basicbrowsing interface (which presents the data in tabular formto aid application developers), and read-write RESTful APIs.

4Applications may use the APIs to locate information aboutaggregated entities, and also to store annotations and activitydata.

5Each component employs standard protocols and formats.For example, we can make use of any capable quad-storeas our aggregate store.

Linked data

crawlerAnansi Aggregation

engineSpindle

Full-text store

Aggregate store

Minimal browse interface &

APIs

Quilt

Activity storedigitised

resources and their

RDF metadata

Acropolis

User

crawls and indexes

asks for relevant

resources

Page 4: Research and Education Space: what are we going to be tangled up with?

• A LOD platform to collect, index, and organise metadata about publicly-held archives (and more) !

!

!

!

!

• Challenges ahead! (some solved by making use of LOD)

Acropolis

digitised resources and their

RDF metadata

Acropolis

User

crawls and indexes

asks for relevant

resources

exploits and

“informs”

Core Platform: “Acropolis”

Project RES: Technical Approach

1The crawler fetches data via HTTP from publishedsources. Once retrieved, it is indexed by the full-textstore and passed to the aggregation engine for evaluation.

2

The results of the aggregation engine's evaluation processare stored in the aggregate store, which contains minimalbrowse information and information about the similarity ofentities.

3

The public face of the core platform is an extremely basicbrowsing interface (which presents the data in tabular formto aid application developers), and read-write RESTful APIs.

4Applications may use the APIs to locate information aboutaggregated entities, and also to store annotations and activitydata.

5Each component employs standard protocols and formats.For example, we can make use of any capable quad-storeas our aggregate store.

Linked data

crawlerAnansi Aggregation

engineSpindle

Full-text store

Aggregate store

Minimal browse interface &

APIs

Quilt

Activity store

• A LOD platform to collect, index, and organise metadata about publicly-held archives (and more) !

!

!

!

!

• Challenges ahead! (some solved by making use of LOD)

Page 5: Research and Education Space: what are we going to be tangled up with?

1. Which metadata?• Currently, resources metadata mostly oriented

towards “physical proximity” i.e., indexes reflect similarity of author’s surname, broad subject, format, media, etc.

• Heterogeneous platforms and data models incompatibility, transformations needed

• Even when RDF is used, there’s a proliferation of terms, vocabularies, formats adopted little (if any) validation

Page 6: Research and Education Space: what are we going to be tangled up with?

2. Linking

• Systems that do not use RDF do not allow collection holders to express their knowledge as they wish underspecified knowledge

• Even when RDF is used, information often provided as literals rather than links to URIs ad hoc solutions unavailable in a machine-readable format

Page 7: Research and Education Space: what are we going to be tangled up with?

3. Usability• Search quality: efficiency, precision, recall

• Reliability

• Lack of toolsdevelopers have little contact with collection holders

• Licensing issuesresources licensing (not always explicit)metadata licensingusers need to be aware of what that mean(note that in educations things are slightly easier - blanket licensing etc.)

Page 8: Research and Education Space: what are we going to be tangled up with?

How does RES help?1. “Which metadata” issue:

• data model oriented towards the content • technical support in the generation of RDF metadata • controlled vocabulary

(Acropolis will focus on a selection of predicates and their equivalent terms) 2. “Linking”

• recommend existing relevant vocabularies or datasets to link to • strong recommendation to link to resources whenever possible

3. “Usability” • Resources are indexed in Acropolis to be efficiently retrieved • Enabling users to make use of stable URIs, and to coreference to equivalent resources • Connected Studio • Requirement: both individual resources and their metadata need to be explicitly

licensed, and tools developed on top of Acropolis should take into account this info

• BBC - Jisc - BUFVC: Authority, Motivation

Page 9: Research and Education Space: what are we going to be tangled up with?

How to get involved?

• for collection holders: get in touch!

• for developers: look here!http://acropolis.org.uk

• for both:http://bbcarchdev.github.io/inside-acropolis/

Page 10: Research and Education Space: what are we going to be tangled up with?

What are we missing?

• … we welcome any experience / suggestion!

• perhaps collections? (please let us know!)

[email protected]

[email protected]