From Linked Data to Semantic Applications

54
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

description

In this talk we will discuss how to build (today) semantically intelligent systems, i.e. systems with the ability to process and interpret information by its meaning. We will take a multidisciplinary perspective showing how recent advances in other computer science areas such as Information Retrieval and Natural Language Processing can enable, together with Linked Data and Semantic Web resources, the construction of the next generation of information systems. A summary of the core principles and available resources from these areas will give a concrete understanding on how to jump-start your own semantic system.

Transcript of From Linked Data to Semantic Applications

Page 1: From Linked Data to Semantic Applications

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Page 2: From Linked Data to Semantic Applications

The Semantic Web vision & Linked Data

Multi-disciplinary perspective

Linked Data, IR, NLP

Case study: Treo

Talking to the Linked Data Web

Semantic application patterns

Take-away message

Page 3: From Linked Data to Semantic Applications
Page 4: From Linked Data to Semantic Applications

2001:

Software which is able to

understand meaning

(intelligent, flexible)

Leveraging the Web for

information scale

Page 5: From Linked Data to Semantic Applications

What was the plan to

achieve it?

Build a Semantic Web

Stack

Which covers both

representation and

reasoning

Page 6: From Linked Data to Semantic Applications

Adoption:

No significant data

growth

Ontologies are not

straightforward to

build:

People are not

familiriazed with the

tools and principles

Difficult to keep

consistency at Web scale

Scalability

Page 7: From Linked Data to Semantic Applications

Problems:

Consistecy

Scalability

Logic World

Web World

Page 8: From Linked Data to Semantic Applications

The Web as a Huge Database

Fundamental step for data

creation

2006:

Page 9: From Linked Data to Semantic Applications

Where is the intelligence and

flexibility?

We will be back to this point

in a minute

Page 10: From Linked Data to Semantic Applications

Data Model Features:

Graph-based data model

Extensible schema

Entity-centric data integration

Specific Features:

Designed over open Web standards

Based on the Web infrastructure (HTTP, URIs)

Page 11: From Linked Data to Semantic Applications

Positives:

Solid adoption in the Open Data context

(eGovernment, eScience, etc,...)

Existing data is relevant (you can build real

applications)

Negatives:

Data consumption is a problem

Data generation beyond databases

mapping/triplification is also a problem

Still far from the Semantic Web vision

Page 12: From Linked Data to Semantic Applications
Page 13: From Linked Data to Semantic Applications

How to address the previous challenges?

Linked Data:

Web-scale structured data representation

Information Retrieval:

Search, approximation, ranking strategies

Scalability

Natural Language Processing (NLP):

Analysing natural language

Semantic approximation (distributional semantics)

Page 14: From Linked Data to Semantic Applications

IBM Watson approach

Page 15: From Linked Data to Semantic Applications
Page 16: From Linked Data to Semantic Applications

From which university did the wife of

Barack Obama graduate?

With Linked Data we are still in the DB world

Page 17: From Linked Data to Semantic Applications

With Linked Data we are still in the DB world

(but slightly worse)

Page 18: From Linked Data to Semantic Applications
Page 19: From Linked Data to Semantic Applications

From which university did the wife of Barack Obama graduate?

Page 20: From Linked Data to Semantic Applications

): Direction, path

Demonstration

Page 21: From Linked Data to Semantic Applications
Page 22: From Linked Data to Semantic Applications
Page 23: From Linked Data to Semantic Applications
Page 24: From Linked Data to Semantic Applications
Page 25: From Linked Data to Semantic Applications
Page 26: From Linked Data to Semantic Applications
Page 27: From Linked Data to Semantic Applications
Page 28: From Linked Data to Semantic Applications
Page 29: From Linked Data to Semantic Applications
Page 30: From Linked Data to Semantic Applications

Transform natural language queries into triple patterns

Steps:

Entity Recognition

Dependency parsing

Query Pattern detection

Query Planning

“From which university did the wife of Barack Obama graduate?”

prep(graduate-10, From-1)

det(university-3, which-2)

pobj(From-1, university-3)

aux(graduate-10, did-4)

det(wife-6, the-5)

nsubj(graduate-10, wife-6)

prep(wife-6, of-7)

nn(Obama-9, Barack-8)

pobj(of-7, Obama-9)

root(ROOT-0, graduate-10)

From/IN

which/WDT

university/NN

did/VBD

the/DT

wife/NN

of/IN

Barack/NNP

Obama/NNP

graduate/VB

?/.

Using NLP

Page 31: From Linked Data to Semantic Applications

Using NLP

Query:

Page 32: From Linked Data to Semantic Applications

Entity Search:

Build an entity index (instances)

Extract terms from URIs and index the terms using your

favourite IR framework

Search instances by keywords

Using IR

Page 33: From Linked Data to Semantic Applications

Using IR

Query

Linked Data

Web

Page 34: From Linked Data to Semantic Applications

Use distributional semantics to semantically match

query terms to predicates and classes

Distributional principle: Words that co-occur together

tend to have related meaning

Allows the creation of a comprehensive semantic model from

unstructured text

Based on statistical patterns over large amounts of text

No human annotations

Distributional semantics can be used to compute a

semantic relatedness measure between two words

Using NLP

and IR

Page 35: From Linked Data to Semantic Applications

Computation of a measure of “semantic proximity”

between two terms

Allows a semantic approximate matching between

and

It supports a reasoning-like behavior based on the

knowledge embedded in the corpus

Using NLP

and IR

Page 36: From Linked Data to Semantic Applications

Query

Linked Data

Web

Using NLP

and IR

Which properties are

semantically related to ‘wife’?

Page 37: From Linked Data to Semantic Applications

Using NLP

and IR

Query

Linked Data

Web

Page 38: From Linked Data to Semantic Applications

Using NLP

and IR

Query

Linked Data

Web

Page 39: From Linked Data to Semantic Applications

Query

Linked Data

Web

Using NLP

and IR

Page 40: From Linked Data to Semantic Applications

Semantic approximation in databases (as in any IR

system): semantic best-effort

Need some level of user disambiguation,

refinement and feedback

As we move in the direction of semantic systems

we should expect the need for principled dialog

mechanisms (like in human communication)

Pull the the user interaction back into the system

Using NLP

and IR

Page 41: From Linked Data to Semantic Applications
Page 42: From Linked Data to Semantic Applications
Page 43: From Linked Data to Semantic Applications
Page 44: From Linked Data to Semantic Applications

Derived from the experience developing Treo

Not restricted to queries over Linked Data

The following list is not intended to be complete

Page 45: From Linked Data to Semantic Applications

Pattern #1: Maximize the amount of knowledge in

your semantic application

Meaning interpretation depends on knowledge

Using LOD: DBpedia, Freebase, YAGO can give you

a very comprehensive set of instances and their

types

Wikipedia can provide you a comprehensive

distributional semantic model

Page 46: From Linked Data to Semantic Applications

Pattern #2: Allow your database to grow

Dynamic schema

Entity-centric data integration

Page 47: From Linked Data to Semantic Applications

Pattern #3: Once the database grows in complexity

use semantic search instead of structured queries

Instances can be used as pivot entities to reduce

the search space

They are easier to search

Higher specificity and lower vocabulary variation

Page 48: From Linked Data to Semantic Applications

Pattern #4: Use distributional semantics and

semantic relatedness for a robust semantic

matching

Distributional semantics allows your application to

digest (and make use of) large amounts of

unstructured information

Multilingual solution

Can be complemented with WordNet

Page 49: From Linked Data to Semantic Applications

Pattern #5: POS-Tags, Syntactic Parsing + Rules will

go a long way to interpret natural language queries

and sentences

Use them to explore the regularities in natural

language

Define a scope for natural language processing in

your application (restrict by domain, syntactic

complexity)

These tools are easy to use and quite robust (at

least for English)

Page 50: From Linked Data to Semantic Applications

Pattern #6: Provide a user dialog mechanism in the

application

Improve the semantic model with user feedback

Page 51: From Linked Data to Semantic Applications

Part of the Semantic Web vision can be addressed

today with a multi-disciplinary perspective

Linked Data, IR and NLP

You can build your own IBM Watson-like application

Both data and tools are available and ready to use:

the barrier is the mindset

Large opportunity for new solutions

Page 52: From Linked Data to Semantic Applications

NLP

WordNet

VerbNet

Stanford parser

C&C parser/Boxer

NLTK

DBpedia Spotlight

Gate

UIMA

IR

Lucene/Solr

Terrier

Datasets

DBpedia

Freebase

YAGO

Tools that will be

available soon:

Treo

Treo-ESA

Graphia

Page 53: From Linked Data to Semantic Applications

André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain,

. IEEE Internet

Computing, Special Issue on Internet-Scale Data, 2012.

André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain,

International Journal of Semantic Computing (IJSC),

2012.

André Freitas, Sean O'Riain, Edward Curry,

. 27th ACM Applied Computing Symposium, Semantic Web and Its

Applications Track, 2012.

André Freitas, João Gabriel Oliveira, Sean O'Riain, Edward Curry, João Carlos Pereira da

Silva, In

Proceedings of the 16th International Conference on Applications of Natural Language to

Information Systems (NLDB) 2011.

André Freitas, Danilo S. Carvalho, João Carlos Pereira da Silva, Sean O'Riain, Edward Curry, A

Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia. In

Proceedings of the 1st Workshop on the Web of Linked Entities (WoLE 2012) at the 11th

International Semantic Web Conference (ISWC), 2012

Page 54: From Linked Data to Semantic Applications

andrefreitas.org

andre (dot) freitas – at – deri (dot) org