LarKC Tutorial at ISWC 2009 - Introduction

Post on 11-Jun-2015

650 views 0 download

Tags:

description

The aim of the EU FP 7 Large-Scale Integrating Project LarKC is to develop the Large Knowledge Collider (LarKC, for short, pronounced “lark”), a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. The LarKC platform is available at larkc.sourceforge.net. This talk, is part of a tutorial for early users of the LarKC platform, and introduces the platform and the project in general.

Transcript of LarKC Tutorial at ISWC 2009 - Introduction

1

Agenda for today

Time Presentation Title Presenter

08.30-09.00 Setup

09.00 – 09.30 Introduction to LarKC Frank van Harmelen

09.30 – 10:30 LarKC Architecture Michael Witbrock

10:30 – 11:00 Coffee break

11:00 – 11:30 Hands-on: work with an existing LarKC workflow Florian Fischer

11:30 – 12:00 LarKC Data Layer Florian Fischer

12:00 – 13.00 Builder a LarKC DECIDEr & create a workflow from existing plugins Luka Bradesko,Blaz Fortuna

13.00 – 14:00 Lunch

14:00 – 14:30 Distributed Processing in LarKC Michael Witbrock

14:30 – 15:30 Hands-on: Building a LarKC Plugin and integrating it in a worflow Florian Fischer

15.30 – 16:00 Coffee break Eyal Oren

16:00 – 17:00 Hands-on: the Urban computing workflow Emanuele della Valle

17.00 – 18:00 Wrap up - Discussion and feedback Frank van Harmelen

October 2010 @ ISWC

Welcome to the 2nd

LarKC Early Adopters Workshop

Frank van HarmelenVrije Universiteit Amsterdam

3

Health Warning

Today is a WORK shop

• we first tell you some stuff,

• then you do stuff

(repeat)

Goal of today:

• ours: show LarKC to outsiders <who are we>,

• yours: <tell us now>

4

Goals of today

At the end of today you will

• understand the goals of LarKC

• understand the architecture of LarKC

• have hands on experience with platform and plugins

At the end of the day, you will be able to:– roll your own LarKC plugin– roll your own LarKC application

5

Goals of LarKC

LarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoning

“LarKC's value is as an experimental platform. LarKC is as an environment where people can go to replicate (or extend) their results in an environment where all the infrastructural heavy lifting has already been taken care of”

Quote from EU Project Officer:

5

6

Goals of LarKC

LarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoning

Semantic web research is stifled by the complexity of writing a large scale engine, with services for data access, storage, aggregation, inference, transport, transformation, etc,

Physics research has dealt with a similar problem by providing large scale infrastructure into which experiments can be plugged.

The idea behind LarKC, which I found so compelling, is that people who wanted to build small scale plugins, for example, plugins for some non-standard deduction, or transformation of text to triples, or estimating the weights for relational models, could do so, taking advantage of the EU's investment in a platform with significant capabilities.“

Quote from US high-tech CTO:

6

7

Goals of LarKC

LarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoning

“Significant progress is sometimes made not by making something possible that was impossible before, but by substantially lowering the costs of something that was only possible before at high cost”

Quote from EU Reviewer:

7

8

What do we mean by:

• reusable components • reconfigurable workflows• provide infrastructure needed by all users:

– storage & retrieval– registration of plugins– communication (plugin2datalayer, plugin2plugins) – synchronisation (anytime behaviour)– remote execution (abstracts from local/remote storage)– remote data-access (abstracts from local/remote invation)– (will) provide instrumentation & measuring– (will) provide caching & data-locality

• integration of very heterogeneous components– heterogeneous data: unstructured text, (semi)structured data– heterogeneous code: Java, scripts, remote services

("wrap & integrate")

LarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoning

8

9

What do we mean by:

LarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoning

raw large numbers• from performant data-layer• from parallel deployment of plugins• from load-balancing strategies• …

interaction of multiple components• e.g. avoid reasoning through selection: SELECT + REASON

• allowing for incompletenes and anytime behaviour

but also from

not only from

9

10

What do we mean by:

not only: deductive inference over given axiomsbut also:

LarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoningLarKC = a platform for large scale reasoning

where do the axioms come from? (IDENTIFY)which part of knowledge & data is required (SELECTion)when is an answer "good enough" or "best possible" (DECIDEr)non-deductive inference (inductive, statistical) (REASONer)

“ReaSearch: integrating reasoning and search"

10

11

Overall approach of LarKC

• Very lightweight platform– communication, synchronisation, registration– LarKC = “SPARQL endpoint on steroids”

• The real work happens in the plugins• LarKC gives you:

– very scalable datalayer– standardised interfaces for combining components– utilities & infrastructure

• Three types of LarKC users:– people building plugins– people configuring workflows– people using workflows

12

How to deploy LarKC

• All local:– platform local, plugins local– Example: workstation

• Calling remote plugins:– platform local, (some) plugins remote– Example: laptop

• Fully remote– platform remote (eg. as a web-service)– plugins remote– Example: cluster

13

Why would people (like you)want to use LarKC

• workflow builders: – easier to get some application scenario running

• Plugin builders: – easier integration with components by others,– wider take up of your own component by others

14

What does a workflow look like?

14

IdentifierIdentifier Info Set Transformer

Info Set Transformer ReasonerReasoner

DeciderDecider

SelectorSelectorQueryTransformer

QueryTransformer

Data LayerData Layer

15

What does a workflow look like?

15

IdentifierIdentifier Info Set Transformer

Info Set Transformer ReasonerReasoner

DeciderDecider

SelectorSelectorQueryTransformer

QueryTransformer

Data LayerData Layer Data LayerData Layer Data LayerData Layer Data LayerData Layer Data LayerData Layer

16

What does a workflow look like?

16

IdentifierIdentifier Info Set Transformer

Info Set Transformer ReasonerReasoner

DeciderDecider

SelectorSelectorQueryTransformer

QueryTransformer

17

What does a workflow look like?

17

IdentifierIdentifier ReasonerReasoner

DeciderDecider

SelectorSelector

18

What does a workflow look like?

18

ReasonerReasoner

DeciderDecider

SelectorSelector

19

What does a workflow look like?

19

ReasonerReasoner

DeciderDecider

SelectorSelector

20

What does a workflow look like?

20

IdentifierIdentifier Info Set Transformer

Info Set Transformer

ReasonerReasoner

DeciderDecider

SelectorSelectorQueryTransformer

QueryTransformer IdentifierIdentifier

IdentifierIdentifier Info Set Transformer

Info Set Transformer

ETCETERA

ETCETERA

21

What does a DECIDEr look like?

• Can be a hardcoded sequence of plugins

• Can be a self-configuring selection of plugins

• Can make run-time decisions on progress and resource consumption

• Coded as: – Java– a Cyc knowledge base– ...

as long as it complies with the DECIDEr API

22

Already any plugins available?

• 5x IDENTIFY• 3x TRANSFORM• 10x SELECT• 4x REASON• 4x DECIDE

• Sometimes sophisticated, sometimes simple• Sometimes novel, sometimes wrapped

•existing web-services (e.g. Sindice, Swoogle)•another RDF store (geo-queries in Allegrograph)•a very large (workflow-based) system (GATE)•existing reasoners (Jena, Pellet, Cyc, IRIS)•XSLT scripts (XML-2-RDF)•spreading activitation (new)•RDF-2-weightedRDF (new)

23

Goals of LarKC, and where we are

• Scalable: > 109 triples, lazy pipes

• Reconfigurable: plugins with standard API’s

• Open: Apache license

• heterogenous: TRANSFORM, wrappers

• experimentation: wrap & integrate

• allow incompleteness: IDENTIFY, SELECT

• enable distribution: plugin containers

• anytime behaviour: streaming APIs

• web-enabled: remote plugins & data

24

What we will not show today

Available but not demo’d:• lot’s of plugins• C-SPARQL: extension of SPARQL to enable stream-querying• cognition-based heuristics (e.g. selection rules, stopping rules)• very cool data-sets

– Linked Life Data (1.4B explicit, 2.3B closure, 1.3M links)– Milan traffic grid (2M explicit +2Tb sensor-data (to come))– Interest-enhanced DBLP (615k authors + interests) – LDSR (358M explit + 512 inferred, 100m URIs)

• very large/fast inference engines: MarVIN, Reasoning-Hadoop

Not yet available (but will be):– plugin-farming on remote CPU’s (cloud, cluster)– instrumentation & measuring– smart data caching

25

Agenda for today

Time Presentation Title Presenter

08.30-09.00 Setup

09.00 – 09.30 Introduction to LarKC Frank van Harmelen

09.30 – 10:30 LarKC Architecture Michael Witbrock

10:30 – 11:00 Coffee break

11:00 – 11:30 Hands-on: work with an existing LarKC workflow Florian Fischer

11:30 – 12:00 LarKC Data Layer Florian Fischer

12:00 – 13.00 Builder a LarKC DECIDEr & create a workflow from existing plugins Luka Bradesko,Blaz Fortuna

13.00 – 14:00 Lunch

14:00 – 14:30 Distributed Processing in LarKC Michael Witbrock

14:30 – 15:30 Hands-on: Building a LarKC Plugin and integrating it in a worflow Florian Fischer

15.30 – 16:00 Coffee break Eyal Oren

16:00 – 17:00 Hands-on: the Urban computing workflow Emanuele della Valle

17.00 – 18:00 Wrap up - Discussion and feedback Frank van Harmelen

October 2010 @ ISWC