Aaai2012

10
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu Institute of Applied Informatics and Formal Description Methods (AIFB) Institute of Applied Informatics and Formal Description Methods (AIFB) Crowdsourcing tasks in open query answering Elena Simperl, 1 Barry Norton, 2 Denny Vrandecic 1 1 Institute AIFB, Karlsruhe Institute of Technology, Germany 2 Ontotext AD, Bulgaria

Transcript of Aaai2012

Page 1: Aaai2012

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu

Institute of Applied Informatics and Formal Description Methods (AIFB)

Institute of Applied Informatics and Formal Description Methods (AIFB)

Crowdsourcing tasks in open query answering Elena Simperl,1 Barry Norton,2 Denny Vrandecic1

1Institute AIFB, Karlsruhe Institute of Technology, Germany 2Ontotext AD, Bulgaria

Page 2: Aaai2012

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

2 07.06.2012

Background: what is Linked Data?

Linked Data: set of best practices to publish and connect structured data on the Web.

URIs to identify entities and concepts in the world HTTP to access and retrieve resources and descriptions of these resources RDF as generic graph-based data model to structure and link data

Taken together Linked Data is said to form a ‘cloud’ of shared references and vocabularies.

http://linkeddata.org/faq

Page 3: Aaai2012

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

3 07.06.2012

Background: why is Linked Data important? Data.gov & public sector information:

more transparency and accountability in governance

BBC & media: added value of content through interlinking

Google, Yahoo, Bing & schema.org: enhanced search

Page 4: Aaai2012

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

4 07.06.2012

Crowdsourcing Linked Data management

Tasks requiring human contributions Interlinking Conceptual modeling Labeling and translation Classification Ordering

Crowdsourcing already in use

Page 5: Aaai2012

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

5 07.06.2012

Example: open query answering

Query FOAF data using the vCard vocabulary hp:Harry foaf:mbox <mailto:[email protected]> ;

foaf:nick "Harry" ; foaf:familyName "Potter" .

SELECT ?name ?email WHERE

{ ?p vcard:email ?email ; vcard:fn ?name }

In order to answer the query as intended

Vocabulary mapping and entity resolution (FOAF to vCard) Metadata completion (full name is “Harry Potter”)

Page 6: Aaai2012

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

6 07.06.2012

Crowdsourcing-enabled query answering

• Integral part of a query engine At design time application developer specifies which data portions workers can process and via which types of HITs At run time

The system materializes the data Workers process it Data and application are updated to reflect crowdsourcing results

Formal, declarative description of the data and tasks using SPARQL patterns as a basis for the automatic design of HITs Reducing the number of tasks through automatic reasoning

Page 7: Aaai2012

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

7 07.06.2012

Example: Identity resolution

Identity resolution involves the creation of links, either by comparison of metadata or by investigation of links on the human Web.

{?station a metar:Station; rdfs:label ?slabel; wgs84:lat ?slat; wgs84:long ?slong . ?airport a dbp-owl:Airport; rdfs:label ?alabel; wgs84:lat ?alat; wgs84:long ?along}

Input:

{OPTIONAL {?airport owl:sameAs ?station}}

Output:

Page 8: Aaai2012

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

8 07.06.2012

Example: Classification

Classification of entities to classes cannot be always automatically inferred from the schema.

{?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long}

Input:

{?station a ?type. ?type rdfs:subClassOf metar:Station}

Output:

Page 9: Aaai2012

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

9 07.06.2012

Challenges Decomposition of queries

Query optimisation obfuscates what is used and should involve costs for human tasks

Query execution and caching Naively we can materialise HIT results into datasets How to deal with partial coverage and dynamic datasets

Appropriate level of granularity for HITs design for specific SPARQL constructs and typical functionality of Linked Data management components Optimal user interfaces of graph-like content

(Contextual) Rendering of LOD entities and tasks Pricing and workers’ assignment

Can we connect the end-users of an application and their wish for specific data to be consumed with the payment of workers and prioritization of HITs? Dealing with spam / gaming

Page 10: Aaai2012

Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)

10 07.06.2012

QUESTIONS