Linked data and language technologies

25
20/03/2014 1 Presenter name Linked Data and Language Technologies: The LIDER project A. Gómez-Pérez (UPM) [email protected] Project Coordinator CSA Budget: 1.482.000Starting date: 1. Nov. 2013 Duration: 2 Years

description

This set presents the concept of Linguistic Linked Licensed Data (known as 3LD) and the LIDER project http://www.lider-project.eu/. The project’s mission is to provide the basis for the creation of a Linguistic Linked Data cloud that can support content analytics tasks of unstructured multilingual cross-media content. By achieving this goal, LIDER will impact on the ease and efficiency with which Linguistic Linked Data will be exploited in content analytics processes.

Transcript of Linked data and language technologies

Page 1: Linked data and language technologies

20/03/2014 1 Presenter name

Linked Data and Language Technologies: The LIDER project

A. Gómez-Pérez (UPM)

[email protected]

Project Coordinator

CSA Budget: 1.482.000€ Starting date: 1. Nov. 2013 Duration: 2 Years

Page 2: Linked data and language technologies

20/03/2014 2 Asun Gómez-Pérez

• Motivation

• Linked Data for Language Technologies

• What is LIDER about

Page 3: Linked data and language technologies

20/03/2014 3 Asun Gómez-Pérez

Heterogeneity of Linguistic Resources

• Ecosystem of

– Open and Close resources

– Complementary resources • Lexicon

• Corpora

• Dictionaries

• ….

– Heterogeneous formats • E.g, for Lexicons: Lexinfo, LMF, LIR, Lemon, …

– Language Resources available on the web • Meta-share, ELDA, ELRA, Clarin, FLaReNet, MultiJEDI,

Page 4: Linked data and language technologies

20/03/2014 4 Asun Gómez-Pérez

Limitations when exploiting LRs

• The process of finding and integrating LR in third party applications is manual and time consuming

• LR metadata – cannot be queried using a common

language (e.g. SPARQL)

• LR content – is available in heterogeneous formats

– LR content is not linked with other linguistic content

Language resources and technologies supported are still far

from being Free, Open and Interoperable

Page 5: Linked data and language technologies

20/03/2014 5 Asun Gómez-Pérez

http://es.wiktionary.org

http://rae.es

http://www.wikilengua.org/index.php/Terminesp:red

http://es.wikipedia.org

http://www.wordreference.com/sinonimos/

An example

“Red” (computer network)

Page 6: Linked data and language technologies

20/03/2014 6 Asun Gómez-Pérez

6

http://rae.es

Complex queries using data from heterogeneous sources

Page 7: Linked data and language technologies

20/03/2014 7 Asun Gómez-Pérez

7 *Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell

http://es.wiktionary.org

http://rae.es

Page 8: Linked data and language technologies

20/03/2014 8 Asun Gómez-Pérez

8 *Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell

http://es.wiktionary.org

http://rae.es

http://www.wikilengua.org/index.php/Terminesp:red

Page 9: Linked data and language technologies

20/03/2014 9 Asun Gómez-Pérez

9 *Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell

http://es.wiktionary.org

http://rae.es

http://www.wikilengua.org/index.php/Terminesp:red

http://www.wordreference.com/sinonimos/

Page 10: Linked data and language technologies

20/03/2014 10 Asun Gómez-Pérez

10 *Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell

http://es.wiktionary.org

http://rae.es

http://www.wikilengua.org/index.php/Terminesp:red

http://es.wikipedia.org

http://www.wordreference.com/sinonimos/

Page 11: Linked data and language technologies

20/03/2014 11 Asun Gómez-Pérez

*Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell

“Red”

Etimologiy Del latin “rete”

Gender: “f”

Definition.: “Conjunto de

ordenadores o de equipos

informáticos conectados entre

sí….”

“Red”

Sinonyms: “sistema”, “malla”,” distribución”

“Red”

Norm: UNE 21302-131

English: network

German: Netzwerk

“Red”

Pronunciation: [red]

Grammar category: sustantivo femenino

Singular: “red”

Plural: “redes”

“Red_de_computadores”

Category: redes informáticas

Image

Complementary

but not connected

Page 12: Linked data and language technologies

20/03/2014 12 Asun Gómez-Pérez

LD allows linguistic data integration

12

Red

Phonetic form

Form

number singular

[RED]

Form

plural

[REDES]

Phonetic form

number

Red

Sense

written form

“red”

Sense

written form

“malla”

equivalent

Red

image

Red

Sense Sense

translation

es - en

written form

“red” “network”

written form

Red

written form

Form

gender

femenine

“red”

Page 13: Linked data and language technologies

20/03/2014 13 Asun Gómez-Pérez

LD as a possible solution

• Agree on 21st century vocabularies for describing resource metadata and content

• Unified and standardized language for describing resources ( RDF(S))

• Unified and standardized query language (SPARQL)

• Standardized non-propietary APIs

• Links to other resources

Page 14: Linked data and language technologies

20/03/2014 14 Presenter name

Linked Data

for

Language Technologies

Page 15: Linked data and language technologies

20/03/2014 15 Asun Gómez-Pérez

Linked Open Data and Language

1. LOD is increasingly multilingual

2. LOD interconnects resources

– In many domains

– in many languages

How many Linguistic Resources are exposed in RDF?

Page 16: Linked data and language technologies

20/03/2014 16 Asun Gómez-Pérez

Linked Data and Language Resources

Linguistic LOD (LLOD) Subset of LOD

Linguistic domain

Open License

Resources in RDF

Interconnected with other LD resources

• Long term experience • Huge amount of resources • Maturity • Curation • Legal liability

Page 17: Linked data and language technologies

20/03/2014 17 Presenter name

The LIDER project

Page 18: Linked data and language technologies

20/03/2014 18 Asun Gómez-Pérez

The LIDER consortium

18

Universidad Politécnica de Madrid

(UPM, Spain) [COORDINATOR]

Trinity College Dublin (Ireland)

DFKI (Germany)

National University of Ireland, Galway (Ireland)

Institut für Angewandte Informatik EV (INFAI, Germany)

University of Bielefeld (Germany)

Universita degli Studi di Roma La Sapienza (Italy)

GEIE ERCIM (France)

Page 19: Linked data and language technologies

20/03/2014 19 Asun Gómez-Pérez

What is 3LD?

3LD Linguistic Linked Licensed Data

Language resources such as:

- Lexica

- Corpora

- Dictionaries ..

NIF NLP Interchange Format

Using RDF and standard data models (vocabularies):

- Lexica

- Corpora

ODRL Open Digital Rights Language

Published along with

a machine-readable license.

Page 20: Linked data and language technologies

20/03/2014 20 Asun Gómez-Pérez

Challenge

• Which extensions to the LOD are needed to support a new generation of large-scale content analytics applications that will overcome language barriers. – Expose Linguistic Resources in LD format with license information

• Metadata

• Content

– Guidelines for Linguistic Linked Licensed Data (3LD)

– Specification of a new generation of 3LD aware NLP services

• Requirements: – Keep track of the License information

– Keep track of the Provenance of the resource

– Keep track of the use of the resource

Page 21: Linked data and language technologies

20/03/2014 21 Asun Gómez-Pérez

LOD as large background knowledge for NLP

Producers

Multimedia and Multilingual Content

Metadata Generation

Consumers

Content Analytics

Metadata as LD

... Language Resources (Lexicon, corpora, ...) some of

them are FOI other are private

Linguistic LOD generation (Metadata and Content)

Language resources as LD

LOD-aware NLP services

Page 22: Linked data and language technologies

20/03/2014 22 Asun Gómez-Pérez

Industry use cases

1. Roadmap on 3LD for Content Analytics

2. Guidelines for 3LD

3. 3LD Reference Architecture

Community building

networking LD4LT

BP-MLOD W3C-CG OntoLex W3C-CG

.- Surveys

.- Requirements

Page 23: Linked data and language technologies

20/03/2014 23 Asun Gómez-Pérez

Community Building

• Industrial Board

• Open community Events tailored to the different audiences

– Roadmapping Workshops 2013 • 21 March, EDF (Athens)

• 7-8 May, Multilingual Web WS (Madrid)

• 26-27 May, WS on Emotions (LREC – Reykjavik)

• 27 May, WS on LD and Linguistics (LREC – Reykjavik)

• 4-6 June, WS on Localization World (Dublin)

• 2 September, WS on Semantics Conference (Leipzig)

– Publication of best practices material via W3C community groups • LD4LT

• BP-MLOD W3C-CG

• OntoLex W3C-CG

– Hackathon on September - Semantics Conference (Leipzig)

– Surveys to localization industry and general Web companies

Page 24: Linked data and language technologies

20/03/2014 24 Asun Gómez-Pérez

Expected Contributions from the Community

• Use case definition from industry will be input to the roadmap

• Linguistic resources LLOD

• Validation of guidelines and reference architecture

• Participation in surveys

• Participation in events:

– Roadmapping WS, hackatons, etc.

Lider will help with travelling grants

to participants in Roadmapping WS

Page 25: Linked data and language technologies

20/03/2014 25 Asun Gómez-Pérez

Web channels

www.lider-project.eu

twitter.com/multilingweb

Hashtag: #LiderEU

Join the community

www.w3c.org/community/ld4lt