The state of the art in Linked Data

41
The state of the art in Linked Data Advanced Semantic Web, Spring 2009 Joshua Shinavier Literature Survey

description

A literature survey on Linked Data for a spring 2009 class at the Tetherless World Constellation.

Transcript of The state of the art in Linked Data

Page 1: The state of the art in Linked Data

The state of the art in Linked Data

Advanced Semantic Web, Spring 2009

Joshua Shinavier

Literature Survey

Page 2: The state of the art in Linked Data

• Linked Data

• Linking Open Data

• describing linked datasets

• growing the data web

• keeping Linked Data connected

• indexing and searching

• applications

• navigation

• state of the data web

Outline

2

Page 3: The state of the art in Linked Data

• resource -- an item of interest

• URI -- global identifier for a resource

• representation -- data corresponding to the state of a resource

• information resource -- a “document” containing information

• non-information resource -- anything else

• associated description -- representation describing a Semantic Web resource

Linked Data overview

3

Page 4: The state of the art in Linked Data

• “bootstrap” the data web with large, interconnected data sets to reach a critical mass of semantics

• strict adherence to W3C standards

• identification and transportation (URI, HTTP) of resource descriptions

• interpretation (RDF, RDFS, OWL) of resource descriptions

• LOD grows as data providers:

• publish structured data on the Web

• set RDF links between entities in different data sources

• transition of the web from a distributed document repository into a universal, ubiquitous database [Erling 09]

The Linking Open Data initiative

4

Page 5: The state of the art in Linked Data

The LOD cloud

5

Page 6: The state of the art in Linked Data

LOD data sets

6

Page 7: The state of the art in Linked Data

Link sets in LOD

7

Page 8: The state of the art in Linked Data

• voiD (Vocabulary of Interlinked Datasets) [Alexander, Cyganiak, Hausenblas, Zhao 09]

• describes data sets the link sets between them

• DING (Dataset RankING) [Toupikov, Umbrich, Delbru, Hausenblas, Tummarello 09]

• ranking of linked datasets using formal descriptions

• modeling of the Linked Data domain [Halpin, Presutti 09]

Describing linked datasets

8

Page 9: The state of the art in Linked Data

• network-shaped Entity Name System to enable systematic reuse of URIs [Bouquet, Stoermer, Cordioli, Tummarello 08]

• similar to DNS for interlinking hypertext

• n2Mate framework [Peterson, Cregan, Atkinson, Brisbin 08]

• use social networking principles to facilitate vocabulary and instance reuse

• graph-based disambiguation of Semantic Web entities with idMesh [Cudré-Mauroux, Haghani, Jost, Aberer, de Meer 09]

Keeping Linked Data connected

9

Page 10: The state of the art in Linked Data

• many conflated resources in DBpedia [Jaffri, Glaser, Millard 08]

• representative of LOD as a whole

• Co-Reference Resolution Service [Glaser, Jaffri, Millard 09]

• when co-reference is context-specific, owl:sameAs is inappropriate

• stores co-reference information as a first-class entity

• ontology-level alignment should precede data-level alignment [Nikolov, Uren, Motta 09]

Managing co-reference

10

Page 11: The state of the art in Linked Data

• how to get data out there?

• challenges of the read-write Semantic Web

• user awareness of social context of data (e.g. licensing, privacy)

• view update problem

• is the wiki model applicable?

• incentives for posting data on the SW

• validating existing Linked Data with Vapour [Berrueta, Fernandez, Frade 08]

Growing the data web

11

Page 12: The state of the art in Linked Data

• DBpedia [Auer, Bizer, Kobilarov, Lehmann, Cyganiak, Ives 07]

• extracts structured information from Wikipedia

• linking hub for the LOD cloud

• RDF Book Mashup [Bizer, Cyganiak, Gauss 07]

• product metadata from Amazon.com

Examples of LOD data sets

12

Page 13: The state of the art in Linked Data

• Linked Movie Database [Hassanzadeh, Consens 09]

• combines data from IMDb, Freebase, OMDB, DBPedia, RottenTomatoes.com, Stanford Movie Database

• interlinked music datasets [Raimond, Sutton, Sandler 08]

• combines data from Jamendo on DBTune, BBC John Peel sessions, SBSimilarity, Musicbrainz, DBpedia, Geonames

• links artists, albums, tracks, personal music collections

• generated links based similarity of resources, similarity of neighbors

Music and movies as Linked Data

13

Page 14: The state of the art in Linked Data

• the hypertext Web itself [Li, Zhao 08]

• extraction of semantic links from hypertext links and hierarchical relationships among Web documents

• RDF representation of HTML DOM from using SparqPlug [Coetzee, Heath, Motta 08]

• multimedia metadata

• interlinking multimedia fragments [Hausenblas, Troncy, Bürger, Raimond 09]

Other sources of data

14

Page 15: The state of the art in Linked Data

• XML Business Reporting Language (XBRL) [Garcia, Gil 09]

• mapping data to RDF and schemas to OWL facilitates interoperability

• large thesauri [Neubert 09]

• as interlinking hubs for professional communities

• enterprise data, e.g. technical documentation [Servant 08]

• MARC21 bibliographic records [Styles, Ayers, Shabir 08]

Other sources of data (cont.)

15

Page 16: The state of the art in Linked Data

• D2R Server for customizable mappings from relational databases to ontologies [Bizer, Cyganiak 06]

• browser-based tools for defining RDB-to-RDF mappings [Zhou, Xu, Chen, Idehen 08]

• Triplify [Auer, Dietzold, Lehmann, Hellmann, Aumueller 09]

• from generic data silos to Linked Data using OpenLink Data Spaces [Idehen, Erling 08]

Mapping tools

16

Page 17: The state of the art in Linked Data

• Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)

• can be made Web-accessible with OAI2LOD Server [Haslhofer, Schandl 08]

• Open Archives Initiative - Object Reuse and Exchange (OAI-ORE) [Van de Sompel, Lagoze, Nelson, Warner, Sanderson, Johnston 09]

• adheres to Web principles

Aggregated resources

17

Page 18: The state of the art in Linked Data

• existing Linked Data datasets are more appropriate for machine than human consumption

• template-generated interlinks are of limited quality

• data from existing silos quickly becomes out of date

• need human involvement to grow the data web organically

User-driven Linked Data

18

Page 19: The state of the art in Linked Data

User-driven Linked Data (cont.)

19

• direct modification using SPARQL/Update

• e.g. in Tabulator [Berners-Lee, Hollenbach, Lu, Presbrey, Prud’hommeaux, Schraefel 08]

• User Contributed Interlinking [Halb, Raimond, Hausenblas]

• semantic wikis

• Loomp [Roesch, Heese 09]

• semantic annotation of content using a text editor interface

Page 20: The state of the art in Linked Data

• public data from existing social networks

• wrappers for Web 2.0 services [Passant 08]

• unifying personal identity across various networks [Rowe 09]

• Semantically Interlinked Online Communities (SIOC)

• integrating social media sites (forums, blogs, wikis, etc. with the data web [Bojars, Passant, Cyganiak, Breslin 08]

• Meaning of a Tag (MOAT) ontology gives meaning to tags on Web 2.0 [Passant, Laublet 08]

User-driven Linked Data (cont.)

20

Page 21: The state of the art in Linked Data

• usability (for humans) of Linked Data [Halb, Raimond, Hausenblas 08]

• current LOD datasets are primarily for machine consumption

• low semantic strength of current LOD link sets

• provenance information for Linked Data [Hartig 09]

• Open Data Commons license [Miller, Styles, Heath 08]

Usability and licensing

21

Page 22: The state of the art in Linked Data

• W3C’s TAP semantic search [Guha, McCool 01]

• Swoogle [Ding, Finin, Joshi, Pan, Cost, Peng, Reddivari, Doshi, Sachs 04]

• adapts PageRank concept to ontologies

• SWSE [Hogan, Harth, Umbrich, Decker 07]

• MultiCrawler [Harth, Umbrich, Decker 06]

• RDF Gateway search

• Watson document-based search

• Falcons [Cheng, Ge, Wu, Qu 08]

• textual search using class hierarchies for query restriction

• Sindice Semantic Web index [Tummarello, Delbru, Oren 07]

22

Indexing and searching

Page 23: The state of the art in Linked Data

• Silk link discovery framework [Volz, Bizer, Gaedke, Kobilarov 09]

• find relationships between entities within different data sources

• generation of owl:sameAs links

• value of Web of Data depends on the amount and quality of links between data sources

Link discovery

23

Page 24: The state of the art in Linked Data

Navigation

24

• like early Web, it’s easy to get “Lost in Hyperspace”

• Tabulator generic Linked Data browser [Berners-Lee, Chen, Chilton, Connolly, Dhanaraj, Hollenbach, Lerer, Sheets 06]

• encourage deployment of Linked Data

• test, refine and promote Linked Data standards

• faceted views over large-scale linked data with Virtuoso Cluster Edition [Erling 09]

• Explorator RDF browser [Araujo, Schwabe 09]

• exploratory search using direct manipulation

Page 25: The state of the art in Linked Data

• DBPedia Mobile map view and faceted Linked Data browser [Becker, Bizer 08]

• explore the geospatial Semantic Web

• uses current GPS position as a starting point

• potential for Linked Data publishing

Navigation (cont.)

25

Page 26: The state of the art in Linked Data

• Fenfire generic Linked Data browser [Hastrup, Cyganiak, Bojars 08]

• uses graph views rather than tables or outlines

• shows graph data as directly as possible

• related to Fentwine [Fallenstein, Lukka 04]

Navigation (cont.)

26

Page 27: The state of the art in Linked Data

• Humboldt [Kobilarov, Dickinson 08]

• exploratory browsing

• faceted views

• “resource at a time”

• uses a “pivot” operation to refocus the view

Navigation (cont.)

27

Page 28: The state of the art in Linked Data

• zLinks plugin [Bergman, Giasson 08]

• WordPress plugin with supporting server

• relates hypertext links with contextually relevant Linked Data

• WOWY (WordNet, OpenCyc, Wikipedia, YAGO)

• distinguish between types of resources

• disambiguate alternate senses

Navigation (cont.)

28

Page 29: The state of the art in Linked Data

• mapping of Linked Data to a file system model [Schandl 09]

• enables use of this data within desktop applications

Navigation (cont.)

29

Page 30: The state of the art in Linked Data

• how to use the data that is out there?

• emerging applications which exploit Linked Data [Hausenblas 09]

• integrating data sources related to drug and clinical trials [Jentzsch, Andersson, Hassanzadeh, Stephens, Bizer 09]

• mashups

• MashQL [Jarrar, Dikaiakos 09]

• Internet is a database, mashup is a query over that database

• benefit of specialized, independent Linked Data services acting together [Bojars, Passant, Giasson, Breslin 07]

Other applications

30

Page 31: The state of the art in Linked Data

The gray area

31

• U-P2P framework for peer-to-peer linked data [Davoust, Esfandiari 09]

• data replication provides a measure of popularity

• Linked Data with Named Graphs

• e.g. interlinks with embedded provenance information [Zhao, Klyne, Shotton 08]

• Ripple scripting language [Shinavier 07]

• embeds Turing-complete programs in the Web of Data

Page 32: The state of the art in Linked Data

• where are we with the Linked Data graph?

• size

• number and type of links

• usefulness to end users

• network characteristics

• single-point-of-access (e.g. DBpedia, GeoNames) vs. distributed datasets (e.g. FOAF-o-sphere, SIOC-land)

• syntactic and semantic analysis of the LOD dataset [Hausenblas, Halb, Raimond, Heath 08]

State of the data web

32

Page 33: The state of the art in Linked Data

• today’s Linked Data is very different than the first-generation data web [Halpin 09]

• LOD data accounts for the vast majority of data

• power-law distributions are emerging

• data web is not growing organically

• Web standards are generally adhered to

• is Linked Data useful to ordinary users?

• sampling of Linked Data using Live.com query logs and FALCON-S semantic search engine

Statistics of the data web

33

Page 34: The state of the art in Linked Data

• ...

Query popularity follows a power law

34

Page 35: The state of the art in Linked Data

• ...

URI frequency... not so much

35

Page 36: The state of the art in Linked Data

• ...

Data publishing lacks a “long tail”

36

Page 37: The state of the art in Linked Data

A few dominant ontologies are emerging

37

# of URIs by vocabulary

Page 38: The state of the art in Linked Data

(DBpedia bias)

38

# of URIs by domain name

Page 39: The state of the art in Linked Data

• common network analysis techniques can be used to investigate interoperability and structural patterns of the LOD cloud [Rodriguez 09]

• results based on March 2009 statistics of the LOD data set graph:

• LOD graph is not strongly connected

• diameter of 8 is large given relatively small size of the cloud

• data sets have nearly identical incoming and outgoing link patterns (⇒ majority of reciprocal owl:sameAs links)

Graph analysis for the data web

39

Page 40: The state of the art in Linked Data

Ranking and clustering of LOD data sets

40