Querying Heterogeneous Datasets on the Linked Data Web

22
Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.i e Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches, and Trends André Freitas, Edward Curry, João G. Oliveira, Seán O’Riain

description

The growing number of datasets published on the Web as linked data brings both opportunities for high data availability and challenges inherent to querying data in a semantically heterogeneous and distributed environment. Approaches used for querying siloed databases fail at Web-scale because users don't have an a priori understanding of all the available datasets. This article investigates the main challenges in constructing a query and search solution for linked data and analyzes existing approaches and trends.

Transcript of Querying Heterogeneous Datasets on the Linked Data Web

Page 1: Querying Heterogeneous Datasets on the Linked Data Web

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Querying Heterogeneous Datasets on the Linked Data Web:

Challenges, Approaches, and Trends

André Freitas, Edward Curry, João G. Oliveira, Seán O’Riain

Page 2: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

IEEE Internet Computing

http://doi.ieeecomputersociety.org/10.1109/MIC.2011.141http://andrefreitas.org

A. Freitas, E. Curry, J. G. Oliveira, and S. O’Riain,“Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches, and Trends,” IEEE ��Internet Computing, vol. 16, no. 1, pp. 24-33, 2012.

Page 3: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Motivation

Page 4: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Querying Data over the Web

We can see (a) natural language query over two search engines; (b) corresponding SPARQL representation; and (c) semantic gap between the user’s information needs and data representation.

Page 5: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Expressivity-Usability Trade-Off

Expressivity–usability trade-off for querying over structured data. Blue dots indicate an ideal query mechanism for linked data must provide both

high expressivity and high usability

Page 6: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Challenges

Page 7: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Challenges

Analysis focuses on investigation of existing approaches under the perspective of the usability-expressivity trade-off.

This focus guides the categorization and analysis of existing challenges, approaches and trends.

Page 8: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Challenge Dimensions

Query Expressivity Ability to query datasets by referencing

elements in data model structure, as well as to operate over the data (aggregate results, express conditional statements, etc.)

Usability Easy-to-operate, intuitive, and task-efficient

query interface Vocabulary-level Semantic Matching

Ability to semantically match user query terms to dataset vocabulary-level terms

Page 9: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Challenge Dimensions

Entity Reconciliation Matches entities expressed in the query to

semantically equivalent dataset entities Semantic Tractability

Ability to answer queries not supported by explicit dataset statements

– For example, “Is Natalie Portman an Actress?” can be supported by the statement “Natalie Portman starred Star Wars,” instead of an explicit statement “Natalie Portman occupation Actress,” which might not be present in dataset

Page 10: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Approaches

Page 11: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Approaches

Information Retrieval approaches Entity-centric search Structure search

Natural Language approaches Question Answering Semantic best-effort natural language interfaces

Page 12: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Entity-Centric Search

e.g. Sindice

Page 13: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Structure Search

e.g. Semplore

Page 14: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Question Answering

e.g. FreyA

Page 15: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Semantic Best-Effort/NL

e.g. Treo

Page 16: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Comparative Analysis (Approaches)

Page 17: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Trends

Page 18: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Addressing the Challenges

The functionality analysis of existing approaches provides insights on how the major challenges should be addressed.

This set of strategic functionalities define the set of trends.

Page 19: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Linked Data Web

Page 20: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Trends

Complementary Search and Query Services User Interaction and Feedback Mechanisms Semantic Best-Effort Query Model Natural Language Processing Techniques Distributional Semantic Model External Knowledge Sources for Semantic Enrichment Integrated Entity Reconciliation Techniques

Page 21: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

IEEE Internet Computing

http://doi.ieeecomputersociety.org/10.1109/MIC.2011.141http://andrefreitas.org

A. Freitas, E. Curry, J. G. Oliveira, and S. O’Riain,“Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches, and Trends,” IEEE ��Internet Computing, vol. 16, no. 1, pp. 24-33, 2012.

Page 22: Querying Heterogeneous Datasets on the Linked Data Web

Digital Enterprise Research Institute www.deri.ie

Further Reading

A. Freitas, E. Curry, J. G. Oliveira, and S. O’Riain, A Distributional Structured Semantic Space for Querying RDF Graph Data, International Journal of Semantic Computing, vol. 5, no. 4, pp. 433-462, 201

S. O’Riain, E. Curry, and A. Harth, XBRL and Open Data for Global Financial Ecosystems: A Linked Data Approach, International Journal of Accounting Information Systems, vol. 13, no. 2, pp. 141-162, 2012.

A. Freitas, E. Curry, and S. O'Riain, �A Distributional Approach for Terminology-Level Semantic Search on the Linked Data Web, in 27th ACM Symposium On Applied Computing (SAC 2012), 2012.

A. Freitas, J. G. Oliveira, S. O'Riain, and E. Curry, �A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data, in Fifth IEEE International Conference on Semantic Computing (ICSC 2011)

A. Freitas, T. Knap, S. O’Riain, and E. Curry, W3P: Building an OPM based provenance model for the Web, Future Generation Computer Systems, vol. 27, no. 6, pp. 766-774, Jun. 2011.