The state of the art in Linked Data

Post on 08-May-2015

4.020 views 3 download

description

A literature survey on Linked Data for a spring 2009 class at the Tetherless World Constellation.

Transcript of The state of the art in Linked Data

The state of the art in Linked Data

Advanced Semantic Web, Spring 2009

Joshua Shinavier

Literature Survey

• Linked Data

• Linking Open Data

• describing linked datasets

• growing the data web

• keeping Linked Data connected

• indexing and searching

• applications

• navigation

• state of the data web

Outline

2

• resource -- an item of interest

• URI -- global identifier for a resource

• representation -- data corresponding to the state of a resource

• information resource -- a “document” containing information

• non-information resource -- anything else

• associated description -- representation describing a Semantic Web resource

Linked Data overview

3

• “bootstrap” the data web with large, interconnected data sets to reach a critical mass of semantics

• strict adherence to W3C standards

• identification and transportation (URI, HTTP) of resource descriptions

• interpretation (RDF, RDFS, OWL) of resource descriptions

• LOD grows as data providers:

• publish structured data on the Web

• set RDF links between entities in different data sources

• transition of the web from a distributed document repository into a universal, ubiquitous database [Erling 09]

The Linking Open Data initiative

4

The LOD cloud

5

LOD data sets

6

Link sets in LOD

7

• voiD (Vocabulary of Interlinked Datasets) [Alexander, Cyganiak, Hausenblas, Zhao 09]

• describes data sets the link sets between them

• DING (Dataset RankING) [Toupikov, Umbrich, Delbru, Hausenblas, Tummarello 09]

• ranking of linked datasets using formal descriptions

• modeling of the Linked Data domain [Halpin, Presutti 09]

Describing linked datasets

8

• network-shaped Entity Name System to enable systematic reuse of URIs [Bouquet, Stoermer, Cordioli, Tummarello 08]

• similar to DNS for interlinking hypertext

• n2Mate framework [Peterson, Cregan, Atkinson, Brisbin 08]

• use social networking principles to facilitate vocabulary and instance reuse

• graph-based disambiguation of Semantic Web entities with idMesh [Cudré-Mauroux, Haghani, Jost, Aberer, de Meer 09]

Keeping Linked Data connected

9

• many conflated resources in DBpedia [Jaffri, Glaser, Millard 08]

• representative of LOD as a whole

• Co-Reference Resolution Service [Glaser, Jaffri, Millard 09]

• when co-reference is context-specific, owl:sameAs is inappropriate

• stores co-reference information as a first-class entity

• ontology-level alignment should precede data-level alignment [Nikolov, Uren, Motta 09]

Managing co-reference

10

• how to get data out there?

• challenges of the read-write Semantic Web

• user awareness of social context of data (e.g. licensing, privacy)

• view update problem

• is the wiki model applicable?

• incentives for posting data on the SW

• validating existing Linked Data with Vapour [Berrueta, Fernandez, Frade 08]

Growing the data web

11

• DBpedia [Auer, Bizer, Kobilarov, Lehmann, Cyganiak, Ives 07]

• extracts structured information from Wikipedia

• linking hub for the LOD cloud

• RDF Book Mashup [Bizer, Cyganiak, Gauss 07]

• product metadata from Amazon.com

Examples of LOD data sets

12

• Linked Movie Database [Hassanzadeh, Consens 09]

• combines data from IMDb, Freebase, OMDB, DBPedia, RottenTomatoes.com, Stanford Movie Database

• interlinked music datasets [Raimond, Sutton, Sandler 08]

• combines data from Jamendo on DBTune, BBC John Peel sessions, SBSimilarity, Musicbrainz, DBpedia, Geonames

• links artists, albums, tracks, personal music collections

• generated links based similarity of resources, similarity of neighbors

Music and movies as Linked Data

13

• the hypertext Web itself [Li, Zhao 08]

• extraction of semantic links from hypertext links and hierarchical relationships among Web documents

• RDF representation of HTML DOM from using SparqPlug [Coetzee, Heath, Motta 08]

• multimedia metadata

• interlinking multimedia fragments [Hausenblas, Troncy, Bürger, Raimond 09]

Other sources of data

14

• XML Business Reporting Language (XBRL) [Garcia, Gil 09]

• mapping data to RDF and schemas to OWL facilitates interoperability

• large thesauri [Neubert 09]

• as interlinking hubs for professional communities

• enterprise data, e.g. technical documentation [Servant 08]

• MARC21 bibliographic records [Styles, Ayers, Shabir 08]

Other sources of data (cont.)

15

• D2R Server for customizable mappings from relational databases to ontologies [Bizer, Cyganiak 06]

• browser-based tools for defining RDB-to-RDF mappings [Zhou, Xu, Chen, Idehen 08]

• Triplify [Auer, Dietzold, Lehmann, Hellmann, Aumueller 09]

• from generic data silos to Linked Data using OpenLink Data Spaces [Idehen, Erling 08]

Mapping tools

16

• Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)

• can be made Web-accessible with OAI2LOD Server [Haslhofer, Schandl 08]

• Open Archives Initiative - Object Reuse and Exchange (OAI-ORE) [Van de Sompel, Lagoze, Nelson, Warner, Sanderson, Johnston 09]

• adheres to Web principles

Aggregated resources

17

• existing Linked Data datasets are more appropriate for machine than human consumption

• template-generated interlinks are of limited quality

• data from existing silos quickly becomes out of date

• need human involvement to grow the data web organically

User-driven Linked Data

18

User-driven Linked Data (cont.)

19

• direct modification using SPARQL/Update

• e.g. in Tabulator [Berners-Lee, Hollenbach, Lu, Presbrey, Prud’hommeaux, Schraefel 08]

• User Contributed Interlinking [Halb, Raimond, Hausenblas]

• semantic wikis

• Loomp [Roesch, Heese 09]

• semantic annotation of content using a text editor interface

• public data from existing social networks

• wrappers for Web 2.0 services [Passant 08]

• unifying personal identity across various networks [Rowe 09]

• Semantically Interlinked Online Communities (SIOC)

• integrating social media sites (forums, blogs, wikis, etc. with the data web [Bojars, Passant, Cyganiak, Breslin 08]

• Meaning of a Tag (MOAT) ontology gives meaning to tags on Web 2.0 [Passant, Laublet 08]

User-driven Linked Data (cont.)

20

• usability (for humans) of Linked Data [Halb, Raimond, Hausenblas 08]

• current LOD datasets are primarily for machine consumption

• low semantic strength of current LOD link sets

• provenance information for Linked Data [Hartig 09]

• Open Data Commons license [Miller, Styles, Heath 08]

Usability and licensing

21

• W3C’s TAP semantic search [Guha, McCool 01]

• Swoogle [Ding, Finin, Joshi, Pan, Cost, Peng, Reddivari, Doshi, Sachs 04]

• adapts PageRank concept to ontologies

• SWSE [Hogan, Harth, Umbrich, Decker 07]

• MultiCrawler [Harth, Umbrich, Decker 06]

• RDF Gateway search

• Watson document-based search

• Falcons [Cheng, Ge, Wu, Qu 08]

• textual search using class hierarchies for query restriction

• Sindice Semantic Web index [Tummarello, Delbru, Oren 07]

22

Indexing and searching

• Silk link discovery framework [Volz, Bizer, Gaedke, Kobilarov 09]

• find relationships between entities within different data sources

• generation of owl:sameAs links

• value of Web of Data depends on the amount and quality of links between data sources

Link discovery

23

Navigation

24

• like early Web, it’s easy to get “Lost in Hyperspace”

• Tabulator generic Linked Data browser [Berners-Lee, Chen, Chilton, Connolly, Dhanaraj, Hollenbach, Lerer, Sheets 06]

• encourage deployment of Linked Data

• test, refine and promote Linked Data standards

• faceted views over large-scale linked data with Virtuoso Cluster Edition [Erling 09]

• Explorator RDF browser [Araujo, Schwabe 09]

• exploratory search using direct manipulation

• DBPedia Mobile map view and faceted Linked Data browser [Becker, Bizer 08]

• explore the geospatial Semantic Web

• uses current GPS position as a starting point

• potential for Linked Data publishing

Navigation (cont.)

25

• Fenfire generic Linked Data browser [Hastrup, Cyganiak, Bojars 08]

• uses graph views rather than tables or outlines

• shows graph data as directly as possible

• related to Fentwine [Fallenstein, Lukka 04]

Navigation (cont.)

26

• Humboldt [Kobilarov, Dickinson 08]

• exploratory browsing

• faceted views

• “resource at a time”

• uses a “pivot” operation to refocus the view

Navigation (cont.)

27

• zLinks plugin [Bergman, Giasson 08]

• WordPress plugin with supporting server

• relates hypertext links with contextually relevant Linked Data

• WOWY (WordNet, OpenCyc, Wikipedia, YAGO)

• distinguish between types of resources

• disambiguate alternate senses

Navigation (cont.)

28

• mapping of Linked Data to a file system model [Schandl 09]

• enables use of this data within desktop applications

Navigation (cont.)

29

• how to use the data that is out there?

• emerging applications which exploit Linked Data [Hausenblas 09]

• integrating data sources related to drug and clinical trials [Jentzsch, Andersson, Hassanzadeh, Stephens, Bizer 09]

• mashups

• MashQL [Jarrar, Dikaiakos 09]

• Internet is a database, mashup is a query over that database

• benefit of specialized, independent Linked Data services acting together [Bojars, Passant, Giasson, Breslin 07]

Other applications

30

The gray area

31

• U-P2P framework for peer-to-peer linked data [Davoust, Esfandiari 09]

• data replication provides a measure of popularity

• Linked Data with Named Graphs

• e.g. interlinks with embedded provenance information [Zhao, Klyne, Shotton 08]

• Ripple scripting language [Shinavier 07]

• embeds Turing-complete programs in the Web of Data

• where are we with the Linked Data graph?

• size

• number and type of links

• usefulness to end users

• network characteristics

• single-point-of-access (e.g. DBpedia, GeoNames) vs. distributed datasets (e.g. FOAF-o-sphere, SIOC-land)

• syntactic and semantic analysis of the LOD dataset [Hausenblas, Halb, Raimond, Heath 08]

State of the data web

32

• today’s Linked Data is very different than the first-generation data web [Halpin 09]

• LOD data accounts for the vast majority of data

• power-law distributions are emerging

• data web is not growing organically

• Web standards are generally adhered to

• is Linked Data useful to ordinary users?

• sampling of Linked Data using Live.com query logs and FALCON-S semantic search engine

Statistics of the data web

33

• ...

Query popularity follows a power law

34

• ...

URI frequency... not so much

35

• ...

Data publishing lacks a “long tail”

36

A few dominant ontologies are emerging

37

# of URIs by vocabulary

(DBpedia bias)

38

# of URIs by domain name

• common network analysis techniques can be used to investigate interoperability and structural patterns of the LOD cloud [Rodriguez 09]

• results based on March 2009 statistics of the LOD data set graph:

• LOD graph is not strongly connected

• diameter of 8 is large given relatively small size of the cloud

• data sets have nearly identical incoming and outgoing link patterns (⇒ majority of reciprocal owl:sameAs links)

Graph analysis for the data web

39

Ranking and clustering of LOD data sets

40