Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

30
Delft University of Technology Linkator: enriching web pages by automatically adding dereferenceable semantic annotations Samur Araujo, Geert-Jan Houben, Daniel Schwabe Web Information Systems Delft University of Technology, the Netherlands

description

In this paper, we introduce Linkator, an application architecture that exploits semantic annotations for automatically adding links to previously generated web pages. Linkator provides a mechanism for dereferencing these semantic annotations with what it calls semantic links. Automatically adding links to web pages improves the users’ navigation. It connects the visited page with external sources of information that the user can be interested in, but that were not identified as such during the web page design phase. The process of auto-linking encompasses: finding the terms to be linked and finding the destination of the link. Linkator delegates the first stage to external semantic annotation tools and it concentrates on the process of finding a relevant resource to link to. In this paper, a use case is presented that shows how this mechanism can support knowledge workers in finding publications during their navigation on the web.

Transcript of Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

Page 1: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

DelftUniversity ofTechnology

Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

Samur Araujo, Geert-Jan Houben, Daniel SchwabeWeb Information SystemsDelft University of Technology, the Netherlands

Page 2: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

2Enriching web pages with dereferenceable semantic annotations

Summary – dereferencing semantic annotations

• What dereferencing semantic annotations is about?• Automatic linking web pages.

• Summary1. Overview of the problem and motivation.2. Our approach for solving the problem.3. One example of use.

Page 3: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

3Enriching web pages with dereferenceable semantic annotations

Motivation

• Links between HTML pages are the main mechanism to navigate on web pages.

• However, a lot of pages are unlinked or poorly linked.

• Terms on pages have meaning and are intrinsically associated to concepts or entities that the user is interested in.

• These terms can be interpreted by machines and automatically linked to relevant resources on the web.

Page 4: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

4Enriching web pages with dereferenceable semantic annotations

Page 5: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

5Enriching web pages with dereferenceable semantic annotations

Problem Statement

The problem of automatic linking can be divided in 3 sub-problems:

1. How to identify candidate terms (anchors) for adding links?• It denotes concepts in which the user is interested.

2. Which concept does a candidate term represent?• Disambiguate a candidate term.

3. How to identify a web resource to be the link target?• How to select a source of data for finding the destination of the

link?

Page 6: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

6Enriching web pages with dereferenceable semantic annotations

State-of-the-Art in Automatic Linking

• Candidate Terms:• Focused on term disambiguation using an auxiliary knowledge base

or dictionaries (e.g. wikipedia and wordnet).• Link Target:

• It is selected from a specific knowledge base [1] or from a collection

[2] of target documents.

• Limitations• Does not support well users interested in a broader range of

domains.

[1] Mihalcea, R. and Csomai, A. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the 16th

ACM Conference on Information and Knowledge management (CIKM 07), Lisbon, Portugal, pp. 233-242, 2007.

[2] Gardner JJ, Krowne A, Xiong L. NNexus: An Automatic Linker for Collaborative Web-Based Corpora. IEEE Trans.

Knowl. Data Eng. 21(6). 829-839. 2009.

Page 7: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

7Enriching web pages with dereferenceable semantic annotations

Linkator Approach

Extract Terms from Web Pages

Associate Terms to Concepts

Find Resources that Represents these Concepts

Information Extraction Engine Core LinkatorSemantic Annotator

Linkator

Page 8: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

8Enriching web pages with dereferenceable semantic annotations

Page is accessed

Term are extracted

Page is semantically annotated

Annotated page

Annotation is extracted

Endpoint is chosen

Query is formulated

Search for a resource

Page Accessed Link Clicked

If notfound

Semantic Links created

Page 9: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

9Enriching web pages with dereferenceable semantic annotations

Linkator Approach

Linkator Client - Firefox Plugin

Web Browser

HTTP

Annotator

RDFa Annotator

Information Extraction Engine

HTTP Linkator Server

Linked Data

Query FormulationSparql

Endpoint Resolution

Page 10: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

10Enriching web pages with dereferenceable semantic annotations

Semantic Link – Definition

• A semantic link is an HTML tag A that is semantically annotated with RDFa.

• It contains RDF triples associated to it.

• Semantic Link causes a query over Linked Data.

Page 11: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

11Enriching web pages with dereferenceable semantic annotations

Semantic Links

RDF Triples associated to the Semantic Link

Page 12: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

12Enriching web pages with dereferenceable semantic annotations

Dereferencing Semantic Links

• Linkator uses the Linked Data cloud for discovering a destination for the semantic link as opposed to querying search engines or a fixed knowledge base.

• Algorithm for Endpoint Resolution

• Algorithm for Query Formulation

Page 13: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

13Enriching web pages with dereferenceable semantic annotations

Endpoint Resolution

• Task: Find endpoints that contain a specific concept.

• Linkator selects available endpoints based on the vocabularies used in the semantic links. voiD (Vocabulary of Interlinked Datasets)

Page 14: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

14Enriching web pages with dereferenceable semantic annotations

Endpoint Resolution

1. Select the vocabulary of all RDF types associated with the annotation.

2. Or select the vocabularies of all predicates associated with the annotation.

Page 15: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

15Enriching web pages with dereferenceable semantic annotations

Endpoint Resolution

1. The SelectEndpoint function find the resource: http://ontoware.org/swrc/swrc_v0.3.owl#Author

2. It extracts the vocabulary associated with this resource:http://ontoware.org/swrc/swrc_v0.3.owl

3. It queries the voiD descriptor of the available SPARQL endpoints, looking for such a vocabulary.

Page 16: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

16Enriching web pages with dereferenceable semantic annotations

Query Formulation

1. Query is based on the object of the triple.

2. Try to find a human-readable representation of the resource, i.e., try to match predicates such as: foaf:homepage, akt:has-web-address, rdfs:seeAlso.

Page 17: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

17Enriching web pages with dereferenceable semantic annotations

Proof of Concept

• Semantic links for pages that contain bibliographic citations.

• Extended version of FreeCite parsing engine.

• Example of bibliographic citation:

Kees van der Sluijs, Geert-Jan Houben, Erwin Leonardi, Jan Hidders. Hera:

Engineering Web Applications Using Semantic Web-Based Models. Book

chapter: Semantic Web Information Management: A Model-Based

Perspective, De Virgilio, Roberto; Giunchiglia, Fausto; Tanca, Letizia (Eds.),

Chapter 22, 2010, Springer.

Page 18: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

18Enriching web pages with dereferenceable semantic annotations

Html Page

Plain TextEntity

Extraction

Semantic Annotation

Text Semantically Annotated

Sparql Endpoint

Discovering and Selection

Endpoint Querying

URL Generation

FreeCite Extraction Engine

HTML Page Semantically Annotated

Core Linkator

MarkupRemoved

Insert annotations on the page

Sem

anti

c lin

k c

licke

d

Extract Terms from Web Pages

Associate Terms to Concepts

Find Resources that Represents these Concepts

Information Extraction Engine

Core LinkatorSemantic Annotator

Linkator

Page 19: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

19Enriching web pages with dereferenceable semantic annotations

Example – HTML Page without Links

Page 20: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

20Enriching web pages with dereferenceable semantic annotations

Page 21: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

21Enriching web pages with dereferenceable semantic annotations

Example – Page annotated with RDFa

Page 22: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

22Enriching web pages with dereferenceable semantic annotations

Example – Page with Semantic Links

Page 23: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

23Enriching web pages with dereferenceable semantic annotations

Page 24: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

24Enriching web pages with dereferenceable semantic annotations

Page 25: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

25Enriching web pages with dereferenceable semantic annotations

Conclusion and Future Work

• For a specific scenario of linking bibliographic citations Linkator provides a reasonable solution.

• The composition of the Semantic Web technologies can provide a reasonable solution for the problem of automatic linking.

• Linkator is a concrete application that uses Semantic Web technologies.

• Future Work: • Use Linkator in a broader scenario.• Enhance the Linkator algorithms.• Evaluate the precision and recall of the linking.

Page 26: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

26Enriching web pages with dereferenceable semantic annotations

Questions?

[email protected]

You can download Linkator at:http://www.wis.ewi.tudelft.nl/

Samur Araujo

Thank you for your attention!

Page 27: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

27Enriching web pages with dereferenceable semantic annotations

HTML Page

Annotated HTML Page

Page is annotated

RDF

Link is clicked

Annotation on the page are used to find the link destination

Page 28: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

28Enriching web pages with dereferenceable semantic annotations

State-of-the-Art in Automatic Linking

• Example: • Wikify! [1] is focused on linking keywords on web pages to

Wikipedia articles

• Nnexus [2] focus on linking keywords obtained from an index

extracted from target documents.

• [1] Mihalcea, R. and Csomai, A. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the

16th ACM Conference on Information and Knowledge management (CIKM 07), Lisbon, Portugal, pp. 233-242,

2007.

• [2] Gardner JJ, Krowne A, Xiong L. NNexus: An Automatic Linker for Collaborative Web-Based Corpora. IEEE

Trans. Knowl. Data Eng. 21(6). 829-839. 2009.

Page 29: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

29Enriching web pages with dereferenceable semantic annotations

Endpoint Resolution

FUNCTION SelectEndpointE := ArrayR : = select all rdf:type objects associated to the semantic linkT := ExtractVocabulary(R)

FOR EACH vocabulary in T DO{

E.add (select endpoints that contain this vocabulary)}IF E = Empty {

R := select all predicates associated to the semantic link

T := ExtractVocabulary(R)

FOR EACH vocabulary in T DO{

E.add (select endpoints that contain this vocabulary)

}}RETURN E

FUNCTION ExtractVocabulary (R)V := ArrayFOR EACH resource in R DO{

V.add (extract the vocabulary from the resource)}RETURN V

12345678910111213141516171819202122232425262728

Page 30: Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

30Enriching web pages with dereferenceable semantic annotations

Semantic Link – Example

• Triples associated with the semantic link.