Ontologies Ontop Databases - SWAT4LS€¦ · -ontop- is a platform to query databases as virtual...

Post on 23-Aug-2020

7 views 0 download

Transcript of Ontologies Ontop Databases - SWAT4LS€¦ · -ontop- is a platform to query databases as virtual...

Ontologies Ontop Databases

Martin Rezk (PhD)Faculty of Computer Science

Free University of Bozen-Bolzano, Italy

Semantic Web Applications and Tools for Life Sciences9/12/MMXIV

1 / 61

Who am I?

Postdoc at: Free University of Bozen-Bolzano, Bolzano, Italy.Research topics: Ontology-based Data Access (OBDA),Efficient Query Answering, Query rewriting, Data integration.Leader of the -ontop- project.

2 / 61

What are we going to learn today?

How to organize and access your data using ontologies.How to do it with our system: –ontop–.How to use this approach for data integration and consistencychecking.

3 / 61

Declaimer

Part of the material here has been taken from a tutorial byMariano Rodriguez (2011)

http://www.slideshare.net/marianomx/ontop-a-tutorial

4 / 61

Declaimer: License

LicenseThis work is supported by the Optique Project and is licensed

under the Creative Commons Attribution-Share Alike 3.0 Licensehttp://creativecommons.org/licenses/by-sa/3.0/

5 / 61

Overview

Introduction: Optique and Ontop

Ontology Based Data AccessThe Database:OntologiesMappingsVirtual GraphQuerying

Data Integration

Checking Consistency

Conclusions

6 / 61

Outline

Introduction: Optique and Ontop

Ontology Based Data AccessThe Database:OntologiesMappingsVirtual GraphQuerying

Data Integration

Checking Consistency

Conclusions

7 / 61

Ontop and Optique: Why?

8 / 61

When does this Go Wrong?

I’m sure the information is there but there are so manyconcepts involved that I can’t find it in the application.

I can’t say what I want to the application.

The application doesn’t know about this data.

They Get Lost !!

9 / 61

How they Tackle This Problem

EngineerApplication

IT-expert

information need specialised

query

answers

10 / 61

How they Tackle This Problem

May take weeks to respond.Takes several years to master data stores and userneeds.

Siemens loses up to 50.000.000 euro per year because of this!!

11 / 61

Data Access Bottleneck

12 / 61

Optique

Provides a semantic end-to-end connection between users anddata sources;Enables users to rapidly formulate intuitive queries usingfamiliar vocabularies and conceptualisations.

13 / 61

How Optique Tackles the Problem

ComplexityVariety

Velocity Volume

BIG DATA

(i) Providing scalable big/fast data infrastructures.(ii) Providing the ability to cope with variety and complexity in the

data.(iii) Providing a good understanding of the data.

14 / 61

How Optique Tackles the Problem

ComplexityVariety

Velocity Volume

BIG DATA

(i) Providing scalable big/fast data infrastructures.(ii) Providing the ability to cope with variety and complexity in the

data. (OBDA)(iii) Providing a good understanding of the data. (OBDA)

14 / 61

-ontop-: Our OBDA System

-ontop- is a platform to query databases as virtual RDF Graphsusing SPARQL, and exploiting OWL 2 QL ontologies andmappingsIt is currently being developed in the context of the Optiqueproject.Development of -ontop- started 4.5 years ago.-ontop- is already well established

website has +-500 visits per monthgot 500 new registrations in the past year

-ontop- is open-source and released under Apache license.We support all major relational DBs (Oracle, DB2, Postgres,MySQL, etc.)

15 / 61

Outline

Introduction: Optique and Ontop

Ontology Based Data AccessThe Database:OntologiesMappingsVirtual GraphQuerying

Data Integration

Checking Consistency

Conclusions

16 / 61

Ontology-based data access (OBDA)

Ontology-based Data Access

Source SourceSource

Ontology

Mapping

Queries

Data source(s): are external and independent (possibly multipleand heterogeneous).Ontology: provides a unified common vocabulary, and aconceptual view of the data.Mappings: relate each term in the ontology to a set of (SQL)views over (possibly federated) data.

17 / 61

Simple Life Science Running Example

Data source(s): Hospital Databases with cancer patients (First1, then 2)Ontology: A common domain vocabulary defining Patient,Cancer, LungCancer, etc.Mappings: Relating the vocabulary and the databases.

18 / 61

Pre-requisites

Before we start you need:Java 1.7 (check typing: java -version)The material online:https://www.dropbox.com/sh/316cavgoavjtnu3/AAADoau5NuNGq3zXO4JJQ6rya?dl=0

19 / 61

Outline

Introduction: Optique and Ontop

Ontology Based Data AccessThe Database:OntologiesMappingsVirtual GraphQuerying

Data Integration

Checking Consistency

Conclusions

20 / 61

DB Engines and SQL

Standard way to store LARGE volumes of data: Mature,Robust and FAST.Domain is structured as tables, data becomes rows in thesetables.Powerful query language (SQL) to retrieve this data.Major companies developed SQL DBs for the last 30 years(IBM, Microsoft, Oracle)..

21 / 61

Data Source

Cancer Patient Database 1Table: tbl_patient

PatientId Name Type Stage1 Mary false 42 John true 7

Type is:false for Non-Small Cell Lung Cancer (NSCLC)true for Small Cell Lung Cancer (SCLC)

Stage is:1-6 for NSCLC stages: I,II,III,IIIa,IIIb,IV7-8 for SCLC stages: Limited,Extensive

22 / 61

Data Source

Cancer Patient Database 1Table: tbl_patient

PatientId Name Type Stage1 Mary false 42 John true 7

Type is:false for Non-Small Cell Lung Cancer (NSCLC)true for Small Cell Lung Cancer (SCLC)

Stage is:1-6 for NSCLC stages: I,II,III,IIIa,IIIb,IV7-8 for SCLC stages: Limited,Extensive

22 / 61

Creating the DB in H2

H2 is a pure java SQL databaseJust unzip the downloaded packageEasy to run, just run the scripts:

Open a terminal (in mac Terminal.app, in windows run cmd.exe)Move to the H2 folder (e.g., cd h2)

Start H2 using the h2 scriptssh h2.sh (in mac/linux - You might need “chmod u+x h2.sh ”)h2w.bat (in windows)

23 / 61

How it looks:

jdbc:h2:tcp: = protocol informationlocahost = server locationhelloworld= database name 24 / 61

How to access it from the web

25 / 61

Creating the tableYou can use the files create.sql and insert.sql

CREATE TABLE "tbl_patient" (patientid INT NOT NULL PRIMARY KEY,name VARCHAR(40),type BOOLEAN,stage TINYINT)

Adding Data:

INSERT INTO "tbl_patient"(patientid,name,type,stage)VALUES(1,'Mary',false,4),(2,'John',true,7);

26 / 61

Example SQL Query

Patients with type false and stage IIIa or above (select.sql)

SELECT patientidFROM "tbl_patient"WHERETYPE = false AND stage >= 4

27 / 61

The More Meaningful Question

Give me the id and the name of the patients with a tumor at stageIIIa

28 / 61

Outline

Introduction: Optique and Ontop

Ontology Based Data AccessThe Database:OntologiesMappingsVirtual GraphQuerying

Data Integration

Checking Consistency

Conclusions

29 / 61

Ontologies: Conceptual view

DefinitionAn artifact that contains a vocabulary, relations betweenthe terms in the vocabulary, and that is expressed in alanguage whose syntax and semantics (meaning of thesyntax) are shared and agreed upon.

Mike type PatientNSCLC subClassOf LungCancerLungCancer subClassOf Cancer

30 / 61

Ontologies: Protege

Go to the protégé-ontop folder from your material. This is aProtégé 4.3 package that includes the ontop pluginRun Protégé from the console using the run.bat or run.shscripts. That is, execute:cd Protege_4.3/; run.sh

31 / 61

The ontology: Creating Concepts andProperties

Add the concept: Patient.

(See PatientOnto.owl)32 / 61

The ontology: Creating Concepts andProperties

Add these object properties:

(See PatientOnto.owl)32 / 61

The ontology: Creating Concepts andProperties

Add these data properties:

(See PatientOnto.owl)

32 / 61

Outline

Introduction: Optique and Ontop

Ontology Based Data AccessThe Database:OntologiesMappingsVirtual GraphQuerying

Data Integration

Checking Consistency

Conclusions

33 / 61

Mappings

We have the vocabulary, the database, now we need to linkthose two.Mappings define triples (subject, property, object) out of SQLqueries.These triples is accessible during query time (the on-the-flyapproach) or can be imported into the OWL ontology (the ETLapproach)

Definition (Intuition)A mapping have the following form:

TripleTemplate ← SQL Query to build the triples

represents triples constructed from each result rowreturned by the SQL query in the mapping.

34 / 61

The Mappings

p.Id Name Type Stage1 Mary false 2

(:db1/{p.id},type, :Patient) ← Select p.id From tbl_patient(:db1/{p.id},:hasName, {name}) ← Select p.id,name From tbl_patient(:db1/{p.id},:hasNeoplasm, :db1/neoplasm/{p.id}) ←

Select p.id From tbl_patient(:db1/neoplasm/{p.id},:hasStage, :stage-IIIa) ←

Select p.id From tbl_patient where stage=4

35 / 61

The Mappings

p.Id Name Type Stage1 Mary false 2

(:db1/{p.id},type, :Patient) ← Select p.id From tbl_patient(:db1/{p.id},:hasName, {name}) ← Select p.id,name From tbl_patient(:db1/{p.id},:hasNeoplasm, :db1/neoplasm/{p.id}) ←

Select p.id From tbl_patient(:db1/neoplasm/{p.id},:hasStage, :stage-IIIa) ←

Select p.id From tbl_patient where stage=4

35 / 61

The Mappings

Using the Ontop Mapping tab, we now need to define theconnection parameters to our lung cancer databaseSteps:

1. Switch to the Ontop Mapping tab2. Add a new data source (give it a name, e.g., PatientDB)3. Define the connection parameters as follows:

Connection URL: jdbc:h2:tcp://localhost/helloworldUsername: saPassword: (leave empty)Driver class: org.h2.Driver (choose it from the drop down menu)

4. Test the connection using the “Test Connection” button

36 / 61

The Mappings

37 / 61

The Mappings

Switch to the “Mapping Manager” tab in the ontop mappingstab.Select your datasourceclick Create:

target: :db1/{patientid} a :Patient .

source: SELECT patientid FROM "tbl_patient"

target: :db1/{patientid} :hasName {name} .

source: Select patientid,name FROM "tbl_patient"

target: :db1/{patientid} :hasNeoplasm :db1/neoplasm/{patientid}.

source: SELECT patientid FROM "tbl_patient"

target: :db1/neoplasm/{patientid} :hasStage :stage-IIIa .

source: SELECT patientid FROM "tbl_patient" where stage=4

38 / 61

The Mappings

Now we classify the neoplasm individual using our knowledge ofthe database.We know that “false” in the table patient indicates a “NonSmall Cell Lung Cancer”, so we classify the patients as a:NSCLC.

39 / 61

Outline

Introduction: Optique and Ontop

Ontology Based Data AccessThe Database:OntologiesMappingsVirtual GraphQuerying

Data Integration

Checking Consistency

Conclusions

40 / 61

Virtual Graph

Data:

The vocabulary is more domain oriented and independent fromthe DB.No more values to encode types or stages.Later, this will allow us to easily integrate new data or domaininformation (e.g., an ontology).Our data sources are now documented!.

41 / 61

Virtual Graph

Data and Inference:

There is a new individual :db1/neoplasm/1 that stands for thecancer (tumor) of Mary. This allows the user to query specificproperties of the tumor independently of the patient.We get extra information as shown above.

41 / 61

Outline

Introduction: Optique and Ontop

Ontology Based Data AccessThe Database:OntologiesMappingsVirtual GraphQuerying

Data Integration

Checking Consistency

Conclusions

42 / 61

On-the-fly access to the DB

Recall our information need: Give me the id and the name ofthe patients with a tumor at stage IIIa.Enable Ontop in the “Reasoner” menu

43 / 61

On-the-fly access to the DB

In the ontop SPARQL tab add all the prefixes

44 / 61

On-the-fly access to the DB

Write the SPARQL QuerySELECT ?p ?name WHERE{ ?p rdf:type :Patient .?p :hasName ?name .?p :hasNeoplasm ?tumor .?tumor :hasStage :stage-IIIa .}

Click executeThis is the main way to access data in ontop and its done byquerying ontop with SPARQL.

45 / 61

How we do Inference...

We embed inference into the queryWe do not need to reason with the (Big) Data

46 / 61

Standards

Ontology languages: RDF, RDFS, OWL (W3Crecommendations)Query: SQL, SPARQLMappings: R2RML (W3C recommendation)

47 / 61

Outline

Introduction: Optique and Ontop

Ontology Based Data AccessThe Database:OntologiesMappingsVirtual GraphQuerying

Data Integration

Checking Consistency

Conclusions

48 / 61

What about Data Integration?Cancer Patient Database 2

T_NamePId Nombre1 Anna2 Mike DB information is distributed in multiple tables.

The IDs of the two DBs overlap.T_NSCLC

Id hosp Stge1 X two2 Y one Information is encoded differently. E.g. Stage of

cancer is text (one, two...)T_SCLC

key hosp St1 XXX2 YYY

49 / 61

New Mappings

50 / 61

New Mappings

50 / 61

New Mappings

The URI’s for the new individuals differentiate the data sources(db2 vs. db1)Being an instance of NSCLC and SCLC depends now on thetable, not a column value

50 / 61

Outline

Introduction: Optique and Ontop

Ontology Based Data AccessThe Database:OntologiesMappingsVirtual GraphQuerying

Data Integration

Checking Consistency

Conclusions

51 / 61

Consistency

A logic based ontology language, such as OWL, allowsontologies to be specified as logical theories, this implies thatit is possible to constrain the relationships between concepts,properties, and data.In OBDA inconsistencies arise when your mappings violate theconstraints imposed by the ontology.In OBDA we have two types of constraints:

Disjointness: The intersection between classes Patient andEmployee should be empty. There can also be disjointproperties.Functional Properties: Every patient has at most one name.

52 / 61

Consistency: Setting up a Constraint

53 / 61

Consistency: Building a wrong mapping

54 / 61

Consistency: Checking Inconsistency

55 / 61

Consistency: Finding out the Problem

56 / 61

Outline

Introduction: Optique and Ontop

Ontology Based Data AccessThe Database:OntologiesMappingsVirtual GraphQuerying

Data Integration

Checking Consistency

Conclusions

57 / 61

Conclusions

Ontologies give you a common vocabulary to formulate thequeries, and mappings to find the answers.Ontologies and Semantic Technology can help to handle theproblem of accessing Big Data

Diversity:Using ontologies describing particular domains allows to hidethe storage complexity.Agreement on data identifiers allows for integration of datasets.

Understanding: Agreement on vocabulary allow to better defineyour data and allows for easy information exchange.

There is no need of computationally expensive ETL processes.Reasoning is scalable because we reason at the query level.You do not need to have everything ready to use it!

58 / 61

What I left outside this talk...

Semantic Query OptimisationSWRL and RecursionPerformance EvaluationAggregates and bag semanticsGive out about Database EnginesTons of theoryetc. etc. etc...

59 / 61

Thanks!!!

THANKS!!!

http://ontop.inf.unibz.itwww.optique-project.eu

Extra: Where Reasoning takes Place

If we pose the query asking for all the instances of the classNeoplasm:

SELECT ?x WHERE { ?x rdf:type :Noeplasms). }

61 / 61

Extra: Where Reasoning takes Place

If we pose the query asking for all the instances of the classNeoplasm:

SELECT ?x WHERE { ?x rdf:type :Noeplasms). }

(Intuitively) -ontop- will translate it into:SELECT ?x WHERE { { ?x rdf:type :Neoplasms. }

UNION{ ?x rdf:type :BenignNeoplasms. }UNION{ ?x rdf:type :MalignantNeoplasm. }UNION...{ ?x rdf:type :NSCLC). }UNION{ ?x rdf:type :SCLC). } }

61 / 61

Extra: Where Reasoning takes Place

If we pose the query asking for all the instances of the classNeoplasm:

SELECT ?x WHERE { ?x rdf:type :Noeplasms). }

(Intuitively) -ontop- will translate it into:

SELECT ?x WHERE { {?x rdf:type :Neoplasms.}UNION{ ?x rdf:type :BenignNeoplasms. }UNION{ ?x rdf:type :MalignantNeoplasm . }UNION...{ ?x rdf:type :NSCLC). }UNION{ ?x rdf:type :SCLC). } }

61 / 61

Extra: Where Reasoning takes Place

If we pose the query asking for all the instances of the classNeoplasm:

SELECT ?x WHERE { ?x rdf:type :Noeplasms). }

(Intuitively) -ontop- will translate it into:

SELECT Concat(:db1/neoplasm/, TBL.PATIENT.id) AS ?xFROM TBL.PATIENT

61 / 61