Semantics, technology and linked data in open access repositories on agriculture and related...

42
Semantics, Technology and Linked Data in Open Access Repositories on Agriculture and Related Sciences [email protected] [email protected] IT-Enhanced Organic, Agro- Ecological and Environmental Education September 16-17, 2010

description

This presentation provides a practical overview of current practices in creating vocabularies and linked data in the area of agriculture and related sciences and also on authority control of bibliografic data practices. Finally the survey carried out by FAO in December 2009 - January 2010 on the state of the art of the use of semantics and technology in open access document repositories in the field of agriculture and related sciences is presented.

Transcript of Semantics, technology and linked data in open access repositories on agriculture and related...

Page 1: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Semantics, Technology and Linked Data in Open Access Repositories on Agriculture and Related Sciences

[email protected]@fao.orgIT-Enhanced Organic, Agro-

Ecological and Environmental EducationSeptember 16-17, 2010Budapest (Hungary)

Page 2: Semantics, technology and linked data in open access repositories on agriculture and related sciences

About ourselves…

Imma Subirats & Sarah Dister Information & knowledge management specialists at FAOActively involved in the promotion of open access in agriculture and

related sciences, assuring the quality of repository content through implementing metadata standards, thesauri, and other forms of authority control

Page 3: Semantics, technology and linked data in open access repositories on agriculture and related sciences

…about FAO of the UN

It is the specialized agency of the United Nations that leads international efforts to defeat hunger

acts as a neutral forum where all nations meet as equals to negotiate agreements and debate policy

is also a source of knowledge and information.

Page 4: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Semantics & Technology in Open Access Document Repositories

Page 5: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Short introduction about…

What linked data is and its benefits What the authority control content model

means and its benefits for the open access repositories in the agricultural domain

Overview of the current situation of the use of technology and semantics in open access repositories in agriculture

Page 6: Semantics, technology and linked data in open access repositories on agriculture and related sciences

What can we say about Linked Data?

Page 7: Semantics, technology and linked data in open access repositories on agriculture and related sciences

What is linked data?

Data which contains URI’s as identifiers for concepts described in the data and URIs to identify the relationships between those concepts

A richer linking mechanism for the web that takes us from hypertext links (document to document) to hyperdata links (across things that documents are about)…

A term coined by Tim Berners-Lee

Page 8: Semantics, technology and linked data in open access repositories on agriculture and related sciences

So?

TALIS, 2009

Page 9: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Linked Data Principles

Use of URIs as names for thingsUse of HTTP URIsProvide useful information in RDFIncluding RDF links to other URIs

Page 10: Semantics, technology and linked data in open access repositories on agriculture and related sciences

What is RDF?

Resource Description FrameworkRDF is the data format for linked dataDescribes relationships between thingsRDF uses URIs to name things, preferably HTTP

http://www.w3.org/RDF/

Page 11: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Graphically

TALIS, 2009

RelationsRelations

LiteralsLiterals

ResourcesResources

Page 12: Semantics, technology and linked data in open access repositories on agriculture and related sciences

RDF

TALIS, 2009

Page 13: Semantics, technology and linked data in open access repositories on agriculture and related sciences

What data?

PeopleDocumentsPhotographsPlacesJournalsCorporate bodies (Institutions)ConferencesEtc...

Page 14: Semantics, technology and linked data in open access repositories on agriculture and related sciences

What vocabularies?

FOAFDublin CoreBIBOSKOSEtc...

Page 16: Semantics, technology and linked data in open access repositories on agriculture and related sciences
Page 17: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Examples in Agriculture

A I M S

Not much yet

http://linkeddata.org/data-sets

Page 18: Semantics, technology and linked data in open access repositories on agriculture and related sciences

AGROVOC

What is AGROVOC?Multilingual structured thesaurus for all subject fields in agriculture, forestry, fisheries, food and related domains

What is its purpose?standardize the indexing process in order to make searching simpler and more efficient and to guide the user to the most relevant sourcesWho uses AGROVOC?Downloaded on average 1000 times per year, and individuals in about ninety countries regularly access AGROVOC online

Page 19: Semantics, technology and linked data in open access repositories on agriculture and related sciences

More about AGROVOC

It is a concept/term based systemAround 30,000 concepts600,000 labels in around 20 languages

A knowledge base of related concepts organized in relationships (hierarchical, associative, equivalence)One-stop shop for terminological knowledge related to agriculture in general

Page 20: Semantics, technology and linked data in open access repositories on agriculture and related sciences

AGROVOC as linked data

A I M S

Page 21: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Concept Based Authority Control System for bibliographic data

Page 22: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Authority Control for Bibliographic Data

Context: library information systemsUsed for: access points to bibliographic recordsCorporate bodies, Conferences, Projects, Journal titles…

Definition: Technique/process of assigning a unique form of name and the use of cross-references from obsolete and related forms

Scope: To bring all the works of a bibliographical entity together in one place by selecting a single form of name

Page 23: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Benefits

FAO

Food and Agriculture Organization

ExampleFood and Agriculture Organization of the United Nations

Benefits• Efficient system searching• Exhausitive search results

It improves access dramatically by providing consistency in the forms used to identify corporate authors, conferences, place names, subjects, etc.

FAODocuments

Food and agriculture Organisations of the United Nations

Search

Page 24: Semantics, technology and linked data in open access repositories on agriculture and related sciences

FAO Authority Control System

WhyFAO OA Repository project → 170,000 records of legacy data managed by a flat (no cross-references) authority control system↓Features of new Authority Control System • Concept based• Multilingual• URIs

Page 25: Semantics, technology and linked data in open access repositories on agriculture and related sciences

ExampleAUTHORIZED TERMSEnglish: Food and Agriculture Organization of the United NationsFrench: Organisation des Nations Unies pour l'alimentation et l'agriculture Spanish: Organización de las Naciones Unidas para la Agricultura y la Alimentación Arabic: منظمة األغذية والزراعة لألمم المتحدةRussian: Продовольственная и сельскохозяйственная организация Объединенных Наций Chinese ....ALTERNATIVE TERMSIncomplete form: Food and Agriculture OrganizationAcronym: FAODutch form: Voedsel en landbouw OrganisatieC-C RELATIONSHIPSIs spatially located in: ItalyHas parts: Office of Knowledge Exchange, Research and Extension

Page 26: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Methodology

Page 27: Semantics, technology and linked data in open access repositories on agriculture and related sciences

The Authority Control Content Model

It is based on a concept-based systemA concept is represented by all the forms,

preferred and non-preferred, in all languages, associated with it

A form is a word (simple term) or a multiword expression (complex term) that designates a particular concept

Page 28: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Content

Page 29: Semantics, technology and linked data in open access repositories on agriculture and related sciences
Page 30: Semantics, technology and linked data in open access repositories on agriculture and related sciences

http://202.73.13.50:54123/agrovocdevv10/http://202.73.13.50:54123/agrovocdevv10/

Page 31: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Conclusions

Arbitrary Politically sensitive Expensive

Sharing Standardization Simplification Consistency Reliability

But properly implemented, the authority control provides…

Page 32: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Do you have any question so far?

Page 33: Semantics, technology and linked data in open access repositories on agriculture and related sciences

What can we say about the current situation of open access document repositories in the agricultural domain?

Page 34: Semantics, technology and linked data in open access repositories on agriculture and related sciences

OA Document RepositoryDefinitionA digital archive to collect, preserve and disseminate scientific information in digital formBenefits Immediate, universal and free access to information available. Increase of visibility, usage and impact of work of researchers/institutionsImportanceMaking knowledge accessible → vital to (agricultural) development

Page 35: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Survey

WhyObtain a better understanding of the current situation Identify trends and issues that need attention

How30 questions divided in thematic groups web based survey on CIARD ring mail sent to 150 institutions and 9 specialized mailing lists

Page 36: Semantics, technology and linked data in open access repositories on agriculture and related sciences

General Data collection: 82 repositories compiled

surveys Type of Institution: Majority universities,

minority governmental, international and Nongovernmental org

Year of foundation: Founded between 1993-2009

1993 – 2000: 1/2 repositories a year2001≥ substantial increase of growth ↕promotion of OA

Page 37: Semantics, technology and linked data in open access repositories on agriculture and related sciences

OAI-PMH

Open Archives Initiative Protocol for Metadata HarvestingPurpose: To improve interoperability of digital repositories by exposing and harvesting metadata

45% no DC as metadata set to export data→ 55% is not OAI PMH compliant70% not interested improving metadata↓Promotion of OAI PMH

Page 38: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Authority Control

Bibliographical concepts62% no use of authority control when used, especially for journal titles 50% would be interested in applying an authority control system↓Promotion

Page 39: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Software

A I M S

Page 40: Semantics, technology and linked data in open access repositories on agriculture and related sciences

SoftwareComparing with other repositories

Page 41: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Summary

OAI-PMH – interoperability Authority control – accessibilitySoftware - standardization

CIARD RingData collected added to repository profiles on CIARD Ring

Page 42: Semantics, technology and linked data in open access repositories on agriculture and related sciences

Thank you for your [email protected]

[email protected]

IT-Enhanced Organic, Agro-Ecological and Environmental EducationSeptember 16-17, 2010Budapest (Hungary)