Linkator: enriching web pages by automatically adding dereferenceable semantic annotations
-
Upload
samuraraujo -
Category
Technology
-
view
1.264 -
download
3
description
Transcript of Linkator: enriching web pages by automatically adding dereferenceable semantic annotations
DelftUniversity ofTechnology
Linkator: enriching web pages by automatically adding dereferenceable semantic annotations
Samur Araujo, Geert-Jan Houben, Daniel SchwabeWeb Information SystemsDelft University of Technology, the Netherlands
2Enriching web pages with dereferenceable semantic annotations
Summary – dereferencing semantic annotations
• What dereferencing semantic annotations is about?• Automatic linking web pages.
• Summary1. Overview of the problem and motivation.2. Our approach for solving the problem.3. One example of use.
3Enriching web pages with dereferenceable semantic annotations
Motivation
• Links between HTML pages are the main mechanism to navigate on web pages.
• However, a lot of pages are unlinked or poorly linked.
• Terms on pages have meaning and are intrinsically associated to concepts or entities that the user is interested in.
• These terms can be interpreted by machines and automatically linked to relevant resources on the web.
4Enriching web pages with dereferenceable semantic annotations
5Enriching web pages with dereferenceable semantic annotations
Problem Statement
The problem of automatic linking can be divided in 3 sub-problems:
1. How to identify candidate terms (anchors) for adding links?• It denotes concepts in which the user is interested.
2. Which concept does a candidate term represent?• Disambiguate a candidate term.
3. How to identify a web resource to be the link target?• How to select a source of data for finding the destination of the
link?
6Enriching web pages with dereferenceable semantic annotations
State-of-the-Art in Automatic Linking
• Candidate Terms:• Focused on term disambiguation using an auxiliary knowledge base
or dictionaries (e.g. wikipedia and wordnet).• Link Target:
• It is selected from a specific knowledge base [1] or from a collection
[2] of target documents.
• Limitations• Does not support well users interested in a broader range of
domains.
[1] Mihalcea, R. and Csomai, A. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the 16th
ACM Conference on Information and Knowledge management (CIKM 07), Lisbon, Portugal, pp. 233-242, 2007.
[2] Gardner JJ, Krowne A, Xiong L. NNexus: An Automatic Linker for Collaborative Web-Based Corpora. IEEE Trans.
Knowl. Data Eng. 21(6). 829-839. 2009.
7Enriching web pages with dereferenceable semantic annotations
Linkator Approach
Extract Terms from Web Pages
Associate Terms to Concepts
Find Resources that Represents these Concepts
Information Extraction Engine Core LinkatorSemantic Annotator
Linkator
8Enriching web pages with dereferenceable semantic annotations
Page is accessed
Term are extracted
Page is semantically annotated
Annotated page
Annotation is extracted
Endpoint is chosen
Query is formulated
Search for a resource
Page Accessed Link Clicked
If notfound
Semantic Links created
9Enriching web pages with dereferenceable semantic annotations
Linkator Approach
Linkator Client - Firefox Plugin
Web Browser
HTTP
Annotator
RDFa Annotator
Information Extraction Engine
HTTP Linkator Server
Linked Data
Query FormulationSparql
Endpoint Resolution
10Enriching web pages with dereferenceable semantic annotations
Semantic Link – Definition
• A semantic link is an HTML tag A that is semantically annotated with RDFa.
• It contains RDF triples associated to it.
• Semantic Link causes a query over Linked Data.
11Enriching web pages with dereferenceable semantic annotations
Semantic Links
RDF Triples associated to the Semantic Link
12Enriching web pages with dereferenceable semantic annotations
Dereferencing Semantic Links
• Linkator uses the Linked Data cloud for discovering a destination for the semantic link as opposed to querying search engines or a fixed knowledge base.
• Algorithm for Endpoint Resolution
• Algorithm for Query Formulation
13Enriching web pages with dereferenceable semantic annotations
Endpoint Resolution
• Task: Find endpoints that contain a specific concept.
• Linkator selects available endpoints based on the vocabularies used in the semantic links. voiD (Vocabulary of Interlinked Datasets)
14Enriching web pages with dereferenceable semantic annotations
Endpoint Resolution
1. Select the vocabulary of all RDF types associated with the annotation.
2. Or select the vocabularies of all predicates associated with the annotation.
15Enriching web pages with dereferenceable semantic annotations
Endpoint Resolution
1. The SelectEndpoint function find the resource: http://ontoware.org/swrc/swrc_v0.3.owl#Author
2. It extracts the vocabulary associated with this resource:http://ontoware.org/swrc/swrc_v0.3.owl
3. It queries the voiD descriptor of the available SPARQL endpoints, looking for such a vocabulary.
16Enriching web pages with dereferenceable semantic annotations
Query Formulation
1. Query is based on the object of the triple.
2. Try to find a human-readable representation of the resource, i.e., try to match predicates such as: foaf:homepage, akt:has-web-address, rdfs:seeAlso.
17Enriching web pages with dereferenceable semantic annotations
Proof of Concept
• Semantic links for pages that contain bibliographic citations.
• Extended version of FreeCite parsing engine.
• Example of bibliographic citation:
Kees van der Sluijs, Geert-Jan Houben, Erwin Leonardi, Jan Hidders. Hera:
Engineering Web Applications Using Semantic Web-Based Models. Book
chapter: Semantic Web Information Management: A Model-Based
Perspective, De Virgilio, Roberto; Giunchiglia, Fausto; Tanca, Letizia (Eds.),
Chapter 22, 2010, Springer.
18Enriching web pages with dereferenceable semantic annotations
Html Page
Plain TextEntity
Extraction
Semantic Annotation
Text Semantically Annotated
Sparql Endpoint
Discovering and Selection
Endpoint Querying
URL Generation
FreeCite Extraction Engine
HTML Page Semantically Annotated
Core Linkator
MarkupRemoved
Insert annotations on the page
Sem
anti
c lin
k c
licke
d
Extract Terms from Web Pages
Associate Terms to Concepts
Find Resources that Represents these Concepts
Information Extraction Engine
Core LinkatorSemantic Annotator
Linkator
19Enriching web pages with dereferenceable semantic annotations
Example – HTML Page without Links
20Enriching web pages with dereferenceable semantic annotations
21Enriching web pages with dereferenceable semantic annotations
Example – Page annotated with RDFa
22Enriching web pages with dereferenceable semantic annotations
Example – Page with Semantic Links
23Enriching web pages with dereferenceable semantic annotations
24Enriching web pages with dereferenceable semantic annotations
25Enriching web pages with dereferenceable semantic annotations
Conclusion and Future Work
• For a specific scenario of linking bibliographic citations Linkator provides a reasonable solution.
• The composition of the Semantic Web technologies can provide a reasonable solution for the problem of automatic linking.
• Linkator is a concrete application that uses Semantic Web technologies.
• Future Work: • Use Linkator in a broader scenario.• Enhance the Linkator algorithms.• Evaluate the precision and recall of the linking.
26Enriching web pages with dereferenceable semantic annotations
Questions?
You can download Linkator at:http://www.wis.ewi.tudelft.nl/
Samur Araujo
Thank you for your attention!
27Enriching web pages with dereferenceable semantic annotations
HTML Page
Annotated HTML Page
Page is annotated
RDF
Link is clicked
Annotation on the page are used to find the link destination
28Enriching web pages with dereferenceable semantic annotations
State-of-the-Art in Automatic Linking
• Example: • Wikify! [1] is focused on linking keywords on web pages to
Wikipedia articles
• Nnexus [2] focus on linking keywords obtained from an index
extracted from target documents.
• [1] Mihalcea, R. and Csomai, A. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the
16th ACM Conference on Information and Knowledge management (CIKM 07), Lisbon, Portugal, pp. 233-242,
2007.
• [2] Gardner JJ, Krowne A, Xiong L. NNexus: An Automatic Linker for Collaborative Web-Based Corpora. IEEE
Trans. Knowl. Data Eng. 21(6). 829-839. 2009.
29Enriching web pages with dereferenceable semantic annotations
Endpoint Resolution
FUNCTION SelectEndpointE := ArrayR : = select all rdf:type objects associated to the semantic linkT := ExtractVocabulary(R)
FOR EACH vocabulary in T DO{
E.add (select endpoints that contain this vocabulary)}IF E = Empty {
R := select all predicates associated to the semantic link
T := ExtractVocabulary(R)
FOR EACH vocabulary in T DO{
E.add (select endpoints that contain this vocabulary)
}}RETURN E
FUNCTION ExtractVocabulary (R)V := ArrayFOR EACH resource in R DO{
V.add (extract the vocabulary from the resource)}RETURN V
12345678910111213141516171819202122232425262728
30Enriching web pages with dereferenceable semantic annotations
Semantic Link – Example
• Triples associated with the semantic link.