Transforming your Graph Analytics with GraphDB 8.3

65
Graph Analytics with GraphDB Ontotext Webinar, Oct 5, 2017

Transcript of Transforming your Graph Analytics with GraphDB 8.3

Graph Analytics with GraphDB

Ontotext Webinar Oct 5 2017

Presentation Outline

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

2

Presentation Outline

FactForge Data Integration

3

DBpedia (the English version) 496M

Geonames (all geographic features on Earth) 150M

owlsameAs links between DBpedia and Geonames 471K

Company registry data (GLEI) 3M

Panama Papers DB (LinkedLeaks) 20M

Other datasets and ontologies WordNet WorldFacts FIBO

News metadata (2000 articlesday enriched by NOW) 473M

Total size (1611M explicit + 328M inferred statements) 1 939М

The New FactForge Statistics

4

DBpedia ndash freshly generated from Wikipedia Organisations 308459 (old 287307)

Persons 1314437 (old 1518283)

Locations 862148 (old 816120)

Geonames locations 11341387 (old 11834908)

5

News Metadata Metadata from Ontotextrsquos Dynamic Semantic Publishing platform

News stream from Google

Automatically generated as part of the NOWontotextcom semantic news showcase

News stream from Google since Feb 2015 about 50k newsmonth

Over 1M news articles at present

~70 tags (annotations) per news article

Tags link text mentions of concepts to the knowledge graph

Technically these are URIs for entities (people organizations locations etc) and key phrases

6

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GDB WB Builtin Overview Class Relations

7

GDB WB Builtin Class Instances amp Hierarchy

8

GDB WB Builtin DomainRange Graph

9

Explore Your Data Graph

Navigate to Explore -gt Visual graph You see a search input to choose a resource as a starting point for graph exploration

10

11

Visualize the graph of Sofia

12

Visualize the graph of Sofia

GDB WB Builtin Detail Visual Graph

13

14

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

What is SPARQLSPARQL is a SQL-like query language forRDF graph data with the following query types

SELECT returns tabular results

CONSTRUCT creates a new RDF graph based on query results

ASK returns lsquoyesrsquo if the query has a solution otherwise lsquonorsquo

DESCRIBE returns RDF graph data about a resource useful when the query client does not know the structure of the RDF data in the data source

INSERT inserts triples into a graph

DELETE deletes triples from a graph

15

An Example RDF Model

hasSpouse

hasSpouse

hasSpouse

hasChild

hasChild hasChild

hasChild hasChild

hasChild hasChild hasChild hasChild

worksFor

livesIn

livesIn

worksFor

Wilma

Flintstone

Pebbles

Flintstone

Pearl

Slaghoople

Roxy

RubblePearl

Slaghoople

Bamm-Bamm

Rubble

Prehistoric

AmericaCobblestone

CountyBedrock

Rock

Quarry

partOf locatedIn

Fred

Flinstone

Barney

Rubble

Betty

Rubble

partOf

Chip

16

Using SPARQL to Insert TriplesTo create an RDF graph perform these steps

Define prefixes to URIs with the PREFIX keyword

Use INSERT DATA to signify you want to insert statements Write the subject-predicate-object statements (triples)

Execute this query

pebbles bamm-bamm

fred wilma

roxy chip

hasSpouse

hasChild hasChild

hasChild hasChild

PREFIX br lthttpbedrockgtINSERT DATA

brfred brhasSpouse brwilma brfred brhasChild brpebbles brwilma brhasChild brpebbles brpebbles brhasSpouse brbamm-bamm

brhasChild brroxy brchip

17

lsquobrrsquo refers to the namespace lsquohttpbedrockrsquo so that lsquobrFredrsquo

expands to lthttpbedrockFredgt a Universal Resource

Identifier (URI)

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Presentation Outline

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

2

Presentation Outline

FactForge Data Integration

3

DBpedia (the English version) 496M

Geonames (all geographic features on Earth) 150M

owlsameAs links between DBpedia and Geonames 471K

Company registry data (GLEI) 3M

Panama Papers DB (LinkedLeaks) 20M

Other datasets and ontologies WordNet WorldFacts FIBO

News metadata (2000 articlesday enriched by NOW) 473M

Total size (1611M explicit + 328M inferred statements) 1 939М

The New FactForge Statistics

4

DBpedia ndash freshly generated from Wikipedia Organisations 308459 (old 287307)

Persons 1314437 (old 1518283)

Locations 862148 (old 816120)

Geonames locations 11341387 (old 11834908)

5

News Metadata Metadata from Ontotextrsquos Dynamic Semantic Publishing platform

News stream from Google

Automatically generated as part of the NOWontotextcom semantic news showcase

News stream from Google since Feb 2015 about 50k newsmonth

Over 1M news articles at present

~70 tags (annotations) per news article

Tags link text mentions of concepts to the knowledge graph

Technically these are URIs for entities (people organizations locations etc) and key phrases

6

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GDB WB Builtin Overview Class Relations

7

GDB WB Builtin Class Instances amp Hierarchy

8

GDB WB Builtin DomainRange Graph

9

Explore Your Data Graph

Navigate to Explore -gt Visual graph You see a search input to choose a resource as a starting point for graph exploration

10

11

Visualize the graph of Sofia

12

Visualize the graph of Sofia

GDB WB Builtin Detail Visual Graph

13

14

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

What is SPARQLSPARQL is a SQL-like query language forRDF graph data with the following query types

SELECT returns tabular results

CONSTRUCT creates a new RDF graph based on query results

ASK returns lsquoyesrsquo if the query has a solution otherwise lsquonorsquo

DESCRIBE returns RDF graph data about a resource useful when the query client does not know the structure of the RDF data in the data source

INSERT inserts triples into a graph

DELETE deletes triples from a graph

15

An Example RDF Model

hasSpouse

hasSpouse

hasSpouse

hasChild

hasChild hasChild

hasChild hasChild

hasChild hasChild hasChild hasChild

worksFor

livesIn

livesIn

worksFor

Wilma

Flintstone

Pebbles

Flintstone

Pearl

Slaghoople

Roxy

RubblePearl

Slaghoople

Bamm-Bamm

Rubble

Prehistoric

AmericaCobblestone

CountyBedrock

Rock

Quarry

partOf locatedIn

Fred

Flinstone

Barney

Rubble

Betty

Rubble

partOf

Chip

16

Using SPARQL to Insert TriplesTo create an RDF graph perform these steps

Define prefixes to URIs with the PREFIX keyword

Use INSERT DATA to signify you want to insert statements Write the subject-predicate-object statements (triples)

Execute this query

pebbles bamm-bamm

fred wilma

roxy chip

hasSpouse

hasChild hasChild

hasChild hasChild

PREFIX br lthttpbedrockgtINSERT DATA

brfred brhasSpouse brwilma brfred brhasChild brpebbles brwilma brhasChild brpebbles brpebbles brhasSpouse brbamm-bamm

brhasChild brroxy brchip

17

lsquobrrsquo refers to the namespace lsquohttpbedrockrsquo so that lsquobrFredrsquo

expands to lthttpbedrockFredgt a Universal Resource

Identifier (URI)

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

FactForge Data Integration

3

DBpedia (the English version) 496M

Geonames (all geographic features on Earth) 150M

owlsameAs links between DBpedia and Geonames 471K

Company registry data (GLEI) 3M

Panama Papers DB (LinkedLeaks) 20M

Other datasets and ontologies WordNet WorldFacts FIBO

News metadata (2000 articlesday enriched by NOW) 473M

Total size (1611M explicit + 328M inferred statements) 1 939М

The New FactForge Statistics

4

DBpedia ndash freshly generated from Wikipedia Organisations 308459 (old 287307)

Persons 1314437 (old 1518283)

Locations 862148 (old 816120)

Geonames locations 11341387 (old 11834908)

5

News Metadata Metadata from Ontotextrsquos Dynamic Semantic Publishing platform

News stream from Google

Automatically generated as part of the NOWontotextcom semantic news showcase

News stream from Google since Feb 2015 about 50k newsmonth

Over 1M news articles at present

~70 tags (annotations) per news article

Tags link text mentions of concepts to the knowledge graph

Technically these are URIs for entities (people organizations locations etc) and key phrases

6

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GDB WB Builtin Overview Class Relations

7

GDB WB Builtin Class Instances amp Hierarchy

8

GDB WB Builtin DomainRange Graph

9

Explore Your Data Graph

Navigate to Explore -gt Visual graph You see a search input to choose a resource as a starting point for graph exploration

10

11

Visualize the graph of Sofia

12

Visualize the graph of Sofia

GDB WB Builtin Detail Visual Graph

13

14

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

What is SPARQLSPARQL is a SQL-like query language forRDF graph data with the following query types

SELECT returns tabular results

CONSTRUCT creates a new RDF graph based on query results

ASK returns lsquoyesrsquo if the query has a solution otherwise lsquonorsquo

DESCRIBE returns RDF graph data about a resource useful when the query client does not know the structure of the RDF data in the data source

INSERT inserts triples into a graph

DELETE deletes triples from a graph

15

An Example RDF Model

hasSpouse

hasSpouse

hasSpouse

hasChild

hasChild hasChild

hasChild hasChild

hasChild hasChild hasChild hasChild

worksFor

livesIn

livesIn

worksFor

Wilma

Flintstone

Pebbles

Flintstone

Pearl

Slaghoople

Roxy

RubblePearl

Slaghoople

Bamm-Bamm

Rubble

Prehistoric

AmericaCobblestone

CountyBedrock

Rock

Quarry

partOf locatedIn

Fred

Flinstone

Barney

Rubble

Betty

Rubble

partOf

Chip

16

Using SPARQL to Insert TriplesTo create an RDF graph perform these steps

Define prefixes to URIs with the PREFIX keyword

Use INSERT DATA to signify you want to insert statements Write the subject-predicate-object statements (triples)

Execute this query

pebbles bamm-bamm

fred wilma

roxy chip

hasSpouse

hasChild hasChild

hasChild hasChild

PREFIX br lthttpbedrockgtINSERT DATA

brfred brhasSpouse brwilma brfred brhasChild brpebbles brwilma brhasChild brpebbles brpebbles brhasSpouse brbamm-bamm

brhasChild brroxy brchip

17

lsquobrrsquo refers to the namespace lsquohttpbedrockrsquo so that lsquobrFredrsquo

expands to lthttpbedrockFredgt a Universal Resource

Identifier (URI)

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

The New FactForge Statistics

4

DBpedia ndash freshly generated from Wikipedia Organisations 308459 (old 287307)

Persons 1314437 (old 1518283)

Locations 862148 (old 816120)

Geonames locations 11341387 (old 11834908)

5

News Metadata Metadata from Ontotextrsquos Dynamic Semantic Publishing platform

News stream from Google

Automatically generated as part of the NOWontotextcom semantic news showcase

News stream from Google since Feb 2015 about 50k newsmonth

Over 1M news articles at present

~70 tags (annotations) per news article

Tags link text mentions of concepts to the knowledge graph

Technically these are URIs for entities (people organizations locations etc) and key phrases

6

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GDB WB Builtin Overview Class Relations

7

GDB WB Builtin Class Instances amp Hierarchy

8

GDB WB Builtin DomainRange Graph

9

Explore Your Data Graph

Navigate to Explore -gt Visual graph You see a search input to choose a resource as a starting point for graph exploration

10

11

Visualize the graph of Sofia

12

Visualize the graph of Sofia

GDB WB Builtin Detail Visual Graph

13

14

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

What is SPARQLSPARQL is a SQL-like query language forRDF graph data with the following query types

SELECT returns tabular results

CONSTRUCT creates a new RDF graph based on query results

ASK returns lsquoyesrsquo if the query has a solution otherwise lsquonorsquo

DESCRIBE returns RDF graph data about a resource useful when the query client does not know the structure of the RDF data in the data source

INSERT inserts triples into a graph

DELETE deletes triples from a graph

15

An Example RDF Model

hasSpouse

hasSpouse

hasSpouse

hasChild

hasChild hasChild

hasChild hasChild

hasChild hasChild hasChild hasChild

worksFor

livesIn

livesIn

worksFor

Wilma

Flintstone

Pebbles

Flintstone

Pearl

Slaghoople

Roxy

RubblePearl

Slaghoople

Bamm-Bamm

Rubble

Prehistoric

AmericaCobblestone

CountyBedrock

Rock

Quarry

partOf locatedIn

Fred

Flinstone

Barney

Rubble

Betty

Rubble

partOf

Chip

16

Using SPARQL to Insert TriplesTo create an RDF graph perform these steps

Define prefixes to URIs with the PREFIX keyword

Use INSERT DATA to signify you want to insert statements Write the subject-predicate-object statements (triples)

Execute this query

pebbles bamm-bamm

fred wilma

roxy chip

hasSpouse

hasChild hasChild

hasChild hasChild

PREFIX br lthttpbedrockgtINSERT DATA

brfred brhasSpouse brwilma brfred brhasChild brpebbles brwilma brhasChild brpebbles brpebbles brhasSpouse brbamm-bamm

brhasChild brroxy brchip

17

lsquobrrsquo refers to the namespace lsquohttpbedrockrsquo so that lsquobrFredrsquo

expands to lthttpbedrockFredgt a Universal Resource

Identifier (URI)

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

5

News Metadata Metadata from Ontotextrsquos Dynamic Semantic Publishing platform

News stream from Google

Automatically generated as part of the NOWontotextcom semantic news showcase

News stream from Google since Feb 2015 about 50k newsmonth

Over 1M news articles at present

~70 tags (annotations) per news article

Tags link text mentions of concepts to the knowledge graph

Technically these are URIs for entities (people organizations locations etc) and key phrases

6

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GDB WB Builtin Overview Class Relations

7

GDB WB Builtin Class Instances amp Hierarchy

8

GDB WB Builtin DomainRange Graph

9

Explore Your Data Graph

Navigate to Explore -gt Visual graph You see a search input to choose a resource as a starting point for graph exploration

10

11

Visualize the graph of Sofia

12

Visualize the graph of Sofia

GDB WB Builtin Detail Visual Graph

13

14

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

What is SPARQLSPARQL is a SQL-like query language forRDF graph data with the following query types

SELECT returns tabular results

CONSTRUCT creates a new RDF graph based on query results

ASK returns lsquoyesrsquo if the query has a solution otherwise lsquonorsquo

DESCRIBE returns RDF graph data about a resource useful when the query client does not know the structure of the RDF data in the data source

INSERT inserts triples into a graph

DELETE deletes triples from a graph

15

An Example RDF Model

hasSpouse

hasSpouse

hasSpouse

hasChild

hasChild hasChild

hasChild hasChild

hasChild hasChild hasChild hasChild

worksFor

livesIn

livesIn

worksFor

Wilma

Flintstone

Pebbles

Flintstone

Pearl

Slaghoople

Roxy

RubblePearl

Slaghoople

Bamm-Bamm

Rubble

Prehistoric

AmericaCobblestone

CountyBedrock

Rock

Quarry

partOf locatedIn

Fred

Flinstone

Barney

Rubble

Betty

Rubble

partOf

Chip

16

Using SPARQL to Insert TriplesTo create an RDF graph perform these steps

Define prefixes to URIs with the PREFIX keyword

Use INSERT DATA to signify you want to insert statements Write the subject-predicate-object statements (triples)

Execute this query

pebbles bamm-bamm

fred wilma

roxy chip

hasSpouse

hasChild hasChild

hasChild hasChild

PREFIX br lthttpbedrockgtINSERT DATA

brfred brhasSpouse brwilma brfred brhasChild brpebbles brwilma brhasChild brpebbles brpebbles brhasSpouse brbamm-bamm

brhasChild brroxy brchip

17

lsquobrrsquo refers to the namespace lsquohttpbedrockrsquo so that lsquobrFredrsquo

expands to lthttpbedrockFredgt a Universal Resource

Identifier (URI)

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

6

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GDB WB Builtin Overview Class Relations

7

GDB WB Builtin Class Instances amp Hierarchy

8

GDB WB Builtin DomainRange Graph

9

Explore Your Data Graph

Navigate to Explore -gt Visual graph You see a search input to choose a resource as a starting point for graph exploration

10

11

Visualize the graph of Sofia

12

Visualize the graph of Sofia

GDB WB Builtin Detail Visual Graph

13

14

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

What is SPARQLSPARQL is a SQL-like query language forRDF graph data with the following query types

SELECT returns tabular results

CONSTRUCT creates a new RDF graph based on query results

ASK returns lsquoyesrsquo if the query has a solution otherwise lsquonorsquo

DESCRIBE returns RDF graph data about a resource useful when the query client does not know the structure of the RDF data in the data source

INSERT inserts triples into a graph

DELETE deletes triples from a graph

15

An Example RDF Model

hasSpouse

hasSpouse

hasSpouse

hasChild

hasChild hasChild

hasChild hasChild

hasChild hasChild hasChild hasChild

worksFor

livesIn

livesIn

worksFor

Wilma

Flintstone

Pebbles

Flintstone

Pearl

Slaghoople

Roxy

RubblePearl

Slaghoople

Bamm-Bamm

Rubble

Prehistoric

AmericaCobblestone

CountyBedrock

Rock

Quarry

partOf locatedIn

Fred

Flinstone

Barney

Rubble

Betty

Rubble

partOf

Chip

16

Using SPARQL to Insert TriplesTo create an RDF graph perform these steps

Define prefixes to URIs with the PREFIX keyword

Use INSERT DATA to signify you want to insert statements Write the subject-predicate-object statements (triples)

Execute this query

pebbles bamm-bamm

fred wilma

roxy chip

hasSpouse

hasChild hasChild

hasChild hasChild

PREFIX br lthttpbedrockgtINSERT DATA

brfred brhasSpouse brwilma brfred brhasChild brpebbles brwilma brhasChild brpebbles brpebbles brhasSpouse brbamm-bamm

brhasChild brroxy brchip

17

lsquobrrsquo refers to the namespace lsquohttpbedrockrsquo so that lsquobrFredrsquo

expands to lthttpbedrockFredgt a Universal Resource

Identifier (URI)

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

GDB WB Builtin Overview Class Relations

7

GDB WB Builtin Class Instances amp Hierarchy

8

GDB WB Builtin DomainRange Graph

9

Explore Your Data Graph

Navigate to Explore -gt Visual graph You see a search input to choose a resource as a starting point for graph exploration

10

11

Visualize the graph of Sofia

12

Visualize the graph of Sofia

GDB WB Builtin Detail Visual Graph

13

14

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

What is SPARQLSPARQL is a SQL-like query language forRDF graph data with the following query types

SELECT returns tabular results

CONSTRUCT creates a new RDF graph based on query results

ASK returns lsquoyesrsquo if the query has a solution otherwise lsquonorsquo

DESCRIBE returns RDF graph data about a resource useful when the query client does not know the structure of the RDF data in the data source

INSERT inserts triples into a graph

DELETE deletes triples from a graph

15

An Example RDF Model

hasSpouse

hasSpouse

hasSpouse

hasChild

hasChild hasChild

hasChild hasChild

hasChild hasChild hasChild hasChild

worksFor

livesIn

livesIn

worksFor

Wilma

Flintstone

Pebbles

Flintstone

Pearl

Slaghoople

Roxy

RubblePearl

Slaghoople

Bamm-Bamm

Rubble

Prehistoric

AmericaCobblestone

CountyBedrock

Rock

Quarry

partOf locatedIn

Fred

Flinstone

Barney

Rubble

Betty

Rubble

partOf

Chip

16

Using SPARQL to Insert TriplesTo create an RDF graph perform these steps

Define prefixes to URIs with the PREFIX keyword

Use INSERT DATA to signify you want to insert statements Write the subject-predicate-object statements (triples)

Execute this query

pebbles bamm-bamm

fred wilma

roxy chip

hasSpouse

hasChild hasChild

hasChild hasChild

PREFIX br lthttpbedrockgtINSERT DATA

brfred brhasSpouse brwilma brfred brhasChild brpebbles brwilma brhasChild brpebbles brpebbles brhasSpouse brbamm-bamm

brhasChild brroxy brchip

17

lsquobrrsquo refers to the namespace lsquohttpbedrockrsquo so that lsquobrFredrsquo

expands to lthttpbedrockFredgt a Universal Resource

Identifier (URI)

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

GDB WB Builtin Class Instances amp Hierarchy

8

GDB WB Builtin DomainRange Graph

9

Explore Your Data Graph

Navigate to Explore -gt Visual graph You see a search input to choose a resource as a starting point for graph exploration

10

11

Visualize the graph of Sofia

12

Visualize the graph of Sofia

GDB WB Builtin Detail Visual Graph

13

14

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

What is SPARQLSPARQL is a SQL-like query language forRDF graph data with the following query types

SELECT returns tabular results

CONSTRUCT creates a new RDF graph based on query results

ASK returns lsquoyesrsquo if the query has a solution otherwise lsquonorsquo

DESCRIBE returns RDF graph data about a resource useful when the query client does not know the structure of the RDF data in the data source

INSERT inserts triples into a graph

DELETE deletes triples from a graph

15

An Example RDF Model

hasSpouse

hasSpouse

hasSpouse

hasChild

hasChild hasChild

hasChild hasChild

hasChild hasChild hasChild hasChild

worksFor

livesIn

livesIn

worksFor

Wilma

Flintstone

Pebbles

Flintstone

Pearl

Slaghoople

Roxy

RubblePearl

Slaghoople

Bamm-Bamm

Rubble

Prehistoric

AmericaCobblestone

CountyBedrock

Rock

Quarry

partOf locatedIn

Fred

Flinstone

Barney

Rubble

Betty

Rubble

partOf

Chip

16

Using SPARQL to Insert TriplesTo create an RDF graph perform these steps

Define prefixes to URIs with the PREFIX keyword

Use INSERT DATA to signify you want to insert statements Write the subject-predicate-object statements (triples)

Execute this query

pebbles bamm-bamm

fred wilma

roxy chip

hasSpouse

hasChild hasChild

hasChild hasChild

PREFIX br lthttpbedrockgtINSERT DATA

brfred brhasSpouse brwilma brfred brhasChild brpebbles brwilma brhasChild brpebbles brpebbles brhasSpouse brbamm-bamm

brhasChild brroxy brchip

17

lsquobrrsquo refers to the namespace lsquohttpbedrockrsquo so that lsquobrFredrsquo

expands to lthttpbedrockFredgt a Universal Resource

Identifier (URI)

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

GDB WB Builtin DomainRange Graph

9

Explore Your Data Graph

Navigate to Explore -gt Visual graph You see a search input to choose a resource as a starting point for graph exploration

10

11

Visualize the graph of Sofia

12

Visualize the graph of Sofia

GDB WB Builtin Detail Visual Graph

13

14

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

What is SPARQLSPARQL is a SQL-like query language forRDF graph data with the following query types

SELECT returns tabular results

CONSTRUCT creates a new RDF graph based on query results

ASK returns lsquoyesrsquo if the query has a solution otherwise lsquonorsquo

DESCRIBE returns RDF graph data about a resource useful when the query client does not know the structure of the RDF data in the data source

INSERT inserts triples into a graph

DELETE deletes triples from a graph

15

An Example RDF Model

hasSpouse

hasSpouse

hasSpouse

hasChild

hasChild hasChild

hasChild hasChild

hasChild hasChild hasChild hasChild

worksFor

livesIn

livesIn

worksFor

Wilma

Flintstone

Pebbles

Flintstone

Pearl

Slaghoople

Roxy

RubblePearl

Slaghoople

Bamm-Bamm

Rubble

Prehistoric

AmericaCobblestone

CountyBedrock

Rock

Quarry

partOf locatedIn

Fred

Flinstone

Barney

Rubble

Betty

Rubble

partOf

Chip

16

Using SPARQL to Insert TriplesTo create an RDF graph perform these steps

Define prefixes to URIs with the PREFIX keyword

Use INSERT DATA to signify you want to insert statements Write the subject-predicate-object statements (triples)

Execute this query

pebbles bamm-bamm

fred wilma

roxy chip

hasSpouse

hasChild hasChild

hasChild hasChild

PREFIX br lthttpbedrockgtINSERT DATA

brfred brhasSpouse brwilma brfred brhasChild brpebbles brwilma brhasChild brpebbles brpebbles brhasSpouse brbamm-bamm

brhasChild brroxy brchip

17

lsquobrrsquo refers to the namespace lsquohttpbedrockrsquo so that lsquobrFredrsquo

expands to lthttpbedrockFredgt a Universal Resource

Identifier (URI)

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Explore Your Data Graph

Navigate to Explore -gt Visual graph You see a search input to choose a resource as a starting point for graph exploration

10

11

Visualize the graph of Sofia

12

Visualize the graph of Sofia

GDB WB Builtin Detail Visual Graph

13

14

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

What is SPARQLSPARQL is a SQL-like query language forRDF graph data with the following query types

SELECT returns tabular results

CONSTRUCT creates a new RDF graph based on query results

ASK returns lsquoyesrsquo if the query has a solution otherwise lsquonorsquo

DESCRIBE returns RDF graph data about a resource useful when the query client does not know the structure of the RDF data in the data source

INSERT inserts triples into a graph

DELETE deletes triples from a graph

15

An Example RDF Model

hasSpouse

hasSpouse

hasSpouse

hasChild

hasChild hasChild

hasChild hasChild

hasChild hasChild hasChild hasChild

worksFor

livesIn

livesIn

worksFor

Wilma

Flintstone

Pebbles

Flintstone

Pearl

Slaghoople

Roxy

RubblePearl

Slaghoople

Bamm-Bamm

Rubble

Prehistoric

AmericaCobblestone

CountyBedrock

Rock

Quarry

partOf locatedIn

Fred

Flinstone

Barney

Rubble

Betty

Rubble

partOf

Chip

16

Using SPARQL to Insert TriplesTo create an RDF graph perform these steps

Define prefixes to URIs with the PREFIX keyword

Use INSERT DATA to signify you want to insert statements Write the subject-predicate-object statements (triples)

Execute this query

pebbles bamm-bamm

fred wilma

roxy chip

hasSpouse

hasChild hasChild

hasChild hasChild

PREFIX br lthttpbedrockgtINSERT DATA

brfred brhasSpouse brwilma brfred brhasChild brpebbles brwilma brhasChild brpebbles brpebbles brhasSpouse brbamm-bamm

brhasChild brroxy brchip

17

lsquobrrsquo refers to the namespace lsquohttpbedrockrsquo so that lsquobrFredrsquo

expands to lthttpbedrockFredgt a Universal Resource

Identifier (URI)

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

11

Visualize the graph of Sofia

12

Visualize the graph of Sofia

GDB WB Builtin Detail Visual Graph

13

14

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

What is SPARQLSPARQL is a SQL-like query language forRDF graph data with the following query types

SELECT returns tabular results

CONSTRUCT creates a new RDF graph based on query results

ASK returns lsquoyesrsquo if the query has a solution otherwise lsquonorsquo

DESCRIBE returns RDF graph data about a resource useful when the query client does not know the structure of the RDF data in the data source

INSERT inserts triples into a graph

DELETE deletes triples from a graph

15

An Example RDF Model

hasSpouse

hasSpouse

hasSpouse

hasChild

hasChild hasChild

hasChild hasChild

hasChild hasChild hasChild hasChild

worksFor

livesIn

livesIn

worksFor

Wilma

Flintstone

Pebbles

Flintstone

Pearl

Slaghoople

Roxy

RubblePearl

Slaghoople

Bamm-Bamm

Rubble

Prehistoric

AmericaCobblestone

CountyBedrock

Rock

Quarry

partOf locatedIn

Fred

Flinstone

Barney

Rubble

Betty

Rubble

partOf

Chip

16

Using SPARQL to Insert TriplesTo create an RDF graph perform these steps

Define prefixes to URIs with the PREFIX keyword

Use INSERT DATA to signify you want to insert statements Write the subject-predicate-object statements (triples)

Execute this query

pebbles bamm-bamm

fred wilma

roxy chip

hasSpouse

hasChild hasChild

hasChild hasChild

PREFIX br lthttpbedrockgtINSERT DATA

brfred brhasSpouse brwilma brfred brhasChild brpebbles brwilma brhasChild brpebbles brpebbles brhasSpouse brbamm-bamm

brhasChild brroxy brchip

17

lsquobrrsquo refers to the namespace lsquohttpbedrockrsquo so that lsquobrFredrsquo

expands to lthttpbedrockFredgt a Universal Resource

Identifier (URI)

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

12

Visualize the graph of Sofia

GDB WB Builtin Detail Visual Graph

13

14

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

What is SPARQLSPARQL is a SQL-like query language forRDF graph data with the following query types

SELECT returns tabular results

CONSTRUCT creates a new RDF graph based on query results

ASK returns lsquoyesrsquo if the query has a solution otherwise lsquonorsquo

DESCRIBE returns RDF graph data about a resource useful when the query client does not know the structure of the RDF data in the data source

INSERT inserts triples into a graph

DELETE deletes triples from a graph

15

An Example RDF Model

hasSpouse

hasSpouse

hasSpouse

hasChild

hasChild hasChild

hasChild hasChild

hasChild hasChild hasChild hasChild

worksFor

livesIn

livesIn

worksFor

Wilma

Flintstone

Pebbles

Flintstone

Pearl

Slaghoople

Roxy

RubblePearl

Slaghoople

Bamm-Bamm

Rubble

Prehistoric

AmericaCobblestone

CountyBedrock

Rock

Quarry

partOf locatedIn

Fred

Flinstone

Barney

Rubble

Betty

Rubble

partOf

Chip

16

Using SPARQL to Insert TriplesTo create an RDF graph perform these steps

Define prefixes to URIs with the PREFIX keyword

Use INSERT DATA to signify you want to insert statements Write the subject-predicate-object statements (triples)

Execute this query

pebbles bamm-bamm

fred wilma

roxy chip

hasSpouse

hasChild hasChild

hasChild hasChild

PREFIX br lthttpbedrockgtINSERT DATA

brfred brhasSpouse brwilma brfred brhasChild brpebbles brwilma brhasChild brpebbles brpebbles brhasSpouse brbamm-bamm

brhasChild brroxy brchip

17

lsquobrrsquo refers to the namespace lsquohttpbedrockrsquo so that lsquobrFredrsquo

expands to lthttpbedrockFredgt a Universal Resource

Identifier (URI)

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

GDB WB Builtin Detail Visual Graph

13

14

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

What is SPARQLSPARQL is a SQL-like query language forRDF graph data with the following query types

SELECT returns tabular results

CONSTRUCT creates a new RDF graph based on query results

ASK returns lsquoyesrsquo if the query has a solution otherwise lsquonorsquo

DESCRIBE returns RDF graph data about a resource useful when the query client does not know the structure of the RDF data in the data source

INSERT inserts triples into a graph

DELETE deletes triples from a graph

15

An Example RDF Model

hasSpouse

hasSpouse

hasSpouse

hasChild

hasChild hasChild

hasChild hasChild

hasChild hasChild hasChild hasChild

worksFor

livesIn

livesIn

worksFor

Wilma

Flintstone

Pebbles

Flintstone

Pearl

Slaghoople

Roxy

RubblePearl

Slaghoople

Bamm-Bamm

Rubble

Prehistoric

AmericaCobblestone

CountyBedrock

Rock

Quarry

partOf locatedIn

Fred

Flinstone

Barney

Rubble

Betty

Rubble

partOf

Chip

16

Using SPARQL to Insert TriplesTo create an RDF graph perform these steps

Define prefixes to URIs with the PREFIX keyword

Use INSERT DATA to signify you want to insert statements Write the subject-predicate-object statements (triples)

Execute this query

pebbles bamm-bamm

fred wilma

roxy chip

hasSpouse

hasChild hasChild

hasChild hasChild

PREFIX br lthttpbedrockgtINSERT DATA

brfred brhasSpouse brwilma brfred brhasChild brpebbles brwilma brhasChild brpebbles brpebbles brhasSpouse brbamm-bamm

brhasChild brroxy brchip

17

lsquobrrsquo refers to the namespace lsquohttpbedrockrsquo so that lsquobrFredrsquo

expands to lthttpbedrockFredgt a Universal Resource

Identifier (URI)

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

14

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

What is SPARQLSPARQL is a SQL-like query language forRDF graph data with the following query types

SELECT returns tabular results

CONSTRUCT creates a new RDF graph based on query results

ASK returns lsquoyesrsquo if the query has a solution otherwise lsquonorsquo

DESCRIBE returns RDF graph data about a resource useful when the query client does not know the structure of the RDF data in the data source

INSERT inserts triples into a graph

DELETE deletes triples from a graph

15

An Example RDF Model

hasSpouse

hasSpouse

hasSpouse

hasChild

hasChild hasChild

hasChild hasChild

hasChild hasChild hasChild hasChild

worksFor

livesIn

livesIn

worksFor

Wilma

Flintstone

Pebbles

Flintstone

Pearl

Slaghoople

Roxy

RubblePearl

Slaghoople

Bamm-Bamm

Rubble

Prehistoric

AmericaCobblestone

CountyBedrock

Rock

Quarry

partOf locatedIn

Fred

Flinstone

Barney

Rubble

Betty

Rubble

partOf

Chip

16

Using SPARQL to Insert TriplesTo create an RDF graph perform these steps

Define prefixes to URIs with the PREFIX keyword

Use INSERT DATA to signify you want to insert statements Write the subject-predicate-object statements (triples)

Execute this query

pebbles bamm-bamm

fred wilma

roxy chip

hasSpouse

hasChild hasChild

hasChild hasChild

PREFIX br lthttpbedrockgtINSERT DATA

brfred brhasSpouse brwilma brfred brhasChild brpebbles brwilma brhasChild brpebbles brpebbles brhasSpouse brbamm-bamm

brhasChild brroxy brchip

17

lsquobrrsquo refers to the namespace lsquohttpbedrockrsquo so that lsquobrFredrsquo

expands to lthttpbedrockFredgt a Universal Resource

Identifier (URI)

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

What is SPARQLSPARQL is a SQL-like query language forRDF graph data with the following query types

SELECT returns tabular results

CONSTRUCT creates a new RDF graph based on query results

ASK returns lsquoyesrsquo if the query has a solution otherwise lsquonorsquo

DESCRIBE returns RDF graph data about a resource useful when the query client does not know the structure of the RDF data in the data source

INSERT inserts triples into a graph

DELETE deletes triples from a graph

15

An Example RDF Model

hasSpouse

hasSpouse

hasSpouse

hasChild

hasChild hasChild

hasChild hasChild

hasChild hasChild hasChild hasChild

worksFor

livesIn

livesIn

worksFor

Wilma

Flintstone

Pebbles

Flintstone

Pearl

Slaghoople

Roxy

RubblePearl

Slaghoople

Bamm-Bamm

Rubble

Prehistoric

AmericaCobblestone

CountyBedrock

Rock

Quarry

partOf locatedIn

Fred

Flinstone

Barney

Rubble

Betty

Rubble

partOf

Chip

16

Using SPARQL to Insert TriplesTo create an RDF graph perform these steps

Define prefixes to URIs with the PREFIX keyword

Use INSERT DATA to signify you want to insert statements Write the subject-predicate-object statements (triples)

Execute this query

pebbles bamm-bamm

fred wilma

roxy chip

hasSpouse

hasChild hasChild

hasChild hasChild

PREFIX br lthttpbedrockgtINSERT DATA

brfred brhasSpouse brwilma brfred brhasChild brpebbles brwilma brhasChild brpebbles brpebbles brhasSpouse brbamm-bamm

brhasChild brroxy brchip

17

lsquobrrsquo refers to the namespace lsquohttpbedrockrsquo so that lsquobrFredrsquo

expands to lthttpbedrockFredgt a Universal Resource

Identifier (URI)

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

An Example RDF Model

hasSpouse

hasSpouse

hasSpouse

hasChild

hasChild hasChild

hasChild hasChild

hasChild hasChild hasChild hasChild

worksFor

livesIn

livesIn

worksFor

Wilma

Flintstone

Pebbles

Flintstone

Pearl

Slaghoople

Roxy

RubblePearl

Slaghoople

Bamm-Bamm

Rubble

Prehistoric

AmericaCobblestone

CountyBedrock

Rock

Quarry

partOf locatedIn

Fred

Flinstone

Barney

Rubble

Betty

Rubble

partOf

Chip

16

Using SPARQL to Insert TriplesTo create an RDF graph perform these steps

Define prefixes to URIs with the PREFIX keyword

Use INSERT DATA to signify you want to insert statements Write the subject-predicate-object statements (triples)

Execute this query

pebbles bamm-bamm

fred wilma

roxy chip

hasSpouse

hasChild hasChild

hasChild hasChild

PREFIX br lthttpbedrockgtINSERT DATA

brfred brhasSpouse brwilma brfred brhasChild brpebbles brwilma brhasChild brpebbles brpebbles brhasSpouse brbamm-bamm

brhasChild brroxy brchip

17

lsquobrrsquo refers to the namespace lsquohttpbedrockrsquo so that lsquobrFredrsquo

expands to lthttpbedrockFredgt a Universal Resource

Identifier (URI)

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Using SPARQL to Insert TriplesTo create an RDF graph perform these steps

Define prefixes to URIs with the PREFIX keyword

Use INSERT DATA to signify you want to insert statements Write the subject-predicate-object statements (triples)

Execute this query

pebbles bamm-bamm

fred wilma

roxy chip

hasSpouse

hasChild hasChild

hasChild hasChild

PREFIX br lthttpbedrockgtINSERT DATA

brfred brhasSpouse brwilma brfred brhasChild brpebbles brwilma brhasChild brpebbles brpebbles brhasSpouse brbamm-bamm

brhasChild brroxy brchip

17

lsquobrrsquo refers to the namespace lsquohttpbedrockrsquo so that lsquobrFredrsquo

expands to lthttpbedrockFredgt a Universal Resource

Identifier (URI)

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Using SPARQL to Select TriplesTo access the RDF graph you just created perform these steps Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select certain information and WHERE to signify your conditions restrictions and filters

Execute this query

PREFIX br lthttpbedrockgtSELECT subject predicate object

WHERE subject predicate object

Subject Predicate Object

brfred brhasChild brpebblesbrpebbles brhasChild brroxybrpebbles brhasChild brchipbrwilma brhasChild brpebbles

18

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Using SPARQL to Find Fredrsquos Grandchildren

To find Fredrsquos grandchildren first find out if Fred has any grandchildren

Define prefixes to URIs with the PREFIX keyword

Use ASK to discover whether Fred has a grandchild and WHERE to signify your conditions

YESPREFIX br lthttpbedrockgtASKWHERE

brfred brhasChild child child brhasChild grandChild

19

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Using SPARQL to Find Fredrsquos Grandchildren

Now that we know he has at least one grandchild perform these steps to find the grandchild(ren)

Define prefixes to URIs with the PREFIX keyword

Use SELECT to signify you want to select a grandchild and WHERE to signify your conditions

PREFIX br lthttpbedrockgtSELECT grandChild WHERE

brfred brhasChild child child brhasChild grandChild

grandChild

1 brroxy2 brchip

20

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

21

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

GraphDBtrade Workbench is a web-based tool for

Management of repositories

Loading and exporting data

RDF graph exploration and visualization

Query writing and evaluation

Monitoring query execution and resource usage

Management of users and connectors

Management of clusters (Enterprise Edition)

GraphDBtrade Workbench

22

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

On the following slide is an example of the GraphDBtrade Workbench screen

Access the GraphDBtrade Workbench from a browser

The splash page provides a summary of the installed GraphDBtrade Workbench

GraphDBtrade Workbench

23

The Workbench has a side menu bar with convenient drop down menus organized under ldquoImportrdquo ldquoExplorerdquo ldquoSPARQLrdquo ldquoMonitorrdquo ldquoSetuprdquo and ldquoHelprdquo

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Test the repository by

Selecting ldquoSPARQLrdquo

Submitting queries

GraphDBtrade Workbench Execute Queries

2 Query

24

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

SPARQL Editing

GDB WB integrates the YASGUI editor Automatic prefix addition (best practice load prefixesttl) Autocompletion of URIs (classes properties entitieshellip)

25

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

FactForge Saved Queries

FactForgequery F04

26

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

By selecting the SPARQL menu the SPARQL query editor displays and

Allows you to render your query results as Table Pivot Table or Google Analytic Charts

Execute Queries With GraphDBtrade Workbench

27

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

28

GraphDBtrade Workbench Query Editor

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

FactForge Pie chart of Industries by Org Count

29

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Query Monitoring Abort Query

30

GraphDBtrade allows you to abort long queries that are executing

Eg you create a query that is long running and you would like to halt it and perhaps modify it and resubmit it and not wait until it completes

From the side menu panel select Monitor then Queries

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

SPARQL editor switches inference and owlsameAs expansion

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtselect where

dbrSofia p location location a dboLocation

31

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

owlsameAs optimisation sameAs is useful in semantic data integration

Often independent agencies mint different URLs for the same entity

sameAs an equivalence relation declares them the same (ldquosmushingrdquo)

All statements of URL X in equivalence cluster are ldquocopiedrdquo to all Y in the same cluster

Such inference causes combinatorial explosion of statements

If unchecked decreases memory and query time performance

sameAs Optimisation Compact representation statements are made against clusters not against individual URLs

Backward chaining finds all solutions across cluster

Query results compacted by picking one representative from cluster (option disableSameAs=true)

disableSameAs=false = ldquoExpand results over equivalent URIsrdquo

32

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

33

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

The Honey and the Sting of owlsameAs

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

34

The Honey and the Sting of owlsameAs

E11 E22

E12 E21

E23

geonames727011

dbpediaSofia

geonames732800

dbpediaBulgaria

opencyc-enBulgaria

geo-ontparentFeature

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

35

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Ontotext GraphDB Connectors

36

Provide fast full text range faceted search and aggregations

Utilize an external engine like Lucene Solr or Elasticsearch

Flexible schema mapping index only what you need

Real-time synchronization of data in GraphDB and the external engine

Connector management via SPARQL

Data querying amp update via SPARQL

Based on the GraphDB plug-in architecture

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Geo-spatial Plug-In

37

Special handling of 2D geo-spatial data

Custom index and query re-writing

Expressive query constraints

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Geo-spatial Extensions

Data must use the WGS84 ontologyhttpwwww3org200301geowgs84_pos

wgs84SpatialThingwgs84location

wgs84Point

rdfssubClassOf

wgs84long

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Geo-spatial Extensions

39

GraphDB overrides RDF LIST syntax eg link omgeonearby(lat1 long1 50mi)

Geo-spatial plug-in re-writes this tolink geo-poslat link_lat

link geo-poslong link_long

distance between (link_lat link_long) and (lat1 long1) is less than or equal to 50 miles

Query extensions includeNearby (lat long distance)

Within (rectangle)

Within (polygon)

distance function

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Geo-spatial Extensions

40

Powerful method to explore geo data

Provides query constraints otherwise not possible in SPARQL

Fast and efficient based on R-Trees

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

GeoSPARQL Support

41

GeoSPARQL is a standard for representing and querying geospatial linked data from the Open Geospatial Consortium using the Geography Markup Language

A small topological ontology in RDFSOWL for representation

Simple Features RCC8 and DE-9IM (aka Egenhofer) topological relationship vocabularies and ontologies for qualitative reasoning

A SPARQL query interface using a set of Topological SPARQL extension functions for quantitative reasoning

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

42

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

RDF Rank is a GraphDBtrade extension that

Is similar to PageRank and it identifies ldquoimportantrdquo nodes in an RDF graph based on their interconnectedness

Is accessed using the rankhasRDFRank system predicate

Incremental RDF Rank is useful for frequently changing data

For Example to select the top 100 important nodes in the RDF graph

RDF Rank

PREFIX rank lthttpwwwontotextcomowlimRDFRankgtSELECT n WHERE n rankhasRDFRank r ORDER BY DESC(r)LIMIT 100

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

RDF Rank

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

RDF Rank

Simple example - get the 100 most highly ranked resources

SELECT DISTINCT airport rrank

WHERE

SELECT dbrLondon geo-poslat lat geo-poslong long LIMIT 1

airport gdb-geonearby(lat long 50mi)

a dboAirport

rankhasRDFRank3 rrank

ORDER BY DESC(rrank)

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

46

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Semantic Media MonitoringFor each entity

popularity trends

relevant news

related entities

knowledge graph information

Sep 2017 Graph Analytics with GraphDB Webinar

47

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Graph Analytics with GraphDB Webinar

48

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Semantic Media MonitoringPress-Clipping

We can trace references to a specific company in the news This is pretty much standard however we can deal with syntactic variations in the names

because state of the art Named Entity Recognition technology is used

Whatrsquos more important we distinguish correctly in which mention ldquoParisrdquo refers to which of the following Paris (the capital of France) Paris in Texas Paris Hilton or to Paris (the Greek hero)

We can trace and consolidate references to daughter companies

We have comprehensive industry classification The one from DBPedia but refined to accommodate identifier variations and specialization

(eg company classified as dbrBank will also be considered classified as dbrFinancialServices)

Sep 2017 Graph Analytics with GraphDB Webinar

49

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Media Monitoring Queries

F6 News that mention a specific entity during specific period

F7 Most popular companies per industry

F8 Most popular companies per industry including children

Sep 2017 Graph Analytics with GraphDB Webinar

50

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

News popularity ranking of companies

Rankings can be customized by specifying a geographic region news category (eg business sport lifestyle etc) and time period

Unique features It is based on live streaming news

Tracks also mentions of subsidiaries

Rank uses the industry sectors of DBPedia with several refinements About 40 top-industry sectors

Sectors are linked in a hierarchical taxonomy (all together 251 sectors)

Industry sectors are de-duplicated (all designators used in Wikipedia are about 9 000)

Sep 2017 Graph Analytics with GraphDB Webinar

51

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Rank uses NOW FactForge and GraphDB

This ranking service is entirely based on FactForge FactForge allows public exploration and querying of a knowledge graph of more than 1 billion facts

which is loaded in GraphDB

GraphDB is a semantic graph database engine of Ontotext

Unlike FactForge this service is aimed at non-technical users as it does not require any knowledge of SPARQL or other technology

But it allows users to see the SPARQL query for each ranking and to customize it

Try rankontotextcom

Sep 2017 Graph Analytics with GraphDB Webinar

52

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

53

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

rankontotextcom demonstratorSep 2017 Graph Analytics with GraphDB Webinar

54

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

55

FactForge as a Playground

Visualizations

SPARQL Overview

GraphDBtrade Workbench

GraphDBtrade Plug-ins

RDFRank ndash Importance of node in the KG

Popularity Ranking of Companies

Node similarity in KG

Presentation Outline

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Why

I want to search in knowledge graphs Because they are too big and I donrsquot have years to write ldquoproperrdquo SPARQL

I want to detect duplicates

I want to do entity resolution For disambiguation of entity references in text

For reconciliation or for linking or data integration

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Adapt Information Retrieval (IR) ideas IRrsquos basic is the Vector space model

Documents are vectors in a geometrical space that has one dimension for each word Coordinates correspond to the number of times that the word appears in the document Imagine there are only 5 words used in all documents a ab acb ba bc Document content ldquoa bc ba a abcrdquo Document vector = lt20111gt Similarity is measure by the COS of the angle between the vectors Very popular and simple Read more at httpsenwikipediaorgwikiVector_space_model

How it can work

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

A simple model Nodes are described by the outgoing edges ie ltpredicate objectgt pairs

Concern there are too many different PO-pairs in a big KG

For FactForge it would be a space with 100M+ dimensions hellip

Experiment with node similarity in FactForgenet

The basic assumption two nodes are more similar if they share more PO-pairs

Ie they are connected to the same other nodes with the same types of relationships

Weighted sum weight PO-pairs based on their ldquoinformation valuerdquo

More ldquopopularrdquo pairs are less informative known as TFIDF

Vector Space Model for Knowledge Graphs

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Example Nodes sharing outgoing liks

idIOEW2389W idIOP235JKO

San Francisco

Aerospace

ldquoBFR LLCrdquonam

eldquoVenus Express Incrdquo

nam

e

Beta Inc

pare

nt shared edge

differentiating edge

ldquoVenus Express Incrdquo

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Demonstration

1 Shared edges for a node

2 Shared edges for a node filtered by predicate

3 Similar nodes by count of shared edges

4 Similar nodes by count of shared edges typed

5 Shared pairs for a node by popularity

6 Node similarity as weighted sum of shared pairs

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

rankontotextcom demonstratorSep 2017

61

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Teaching time is 9 + 1 hours including Pre-class assignments (recorded sessions) Live Class Sessions Individual Consultation (optional)

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Support and FAQrsquossupportontotextcom

Additional resources

OntotextCommunity Forum and Evaluation Support httpstackoverflowcomquestionstaggedgraphdbGraphDB Website and Documentation httpgraphdbontotextcomWhitepapers Fundamentals httpontotextcomknowledge-hubfundamentals

SPARQL OWL and RDFRDF httpwwww3orgTRrdf11-conceptsRDFS httpwwww3orgTRrdf-schemaSPARQL Overview httpwwww3orgTRsparql11-overviewSPARQL Query httpwwww3orgTRsparql11-querySPARQL Update httpwwww3orgTRsparql11-update

63

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

For Further Information

64

Peio Popov North America Sales and Business Development

peiopopovontotextcom

19292390659

Ilian Uzunov Europe Sales and Business Development

Ilianuzunovontotextcom

359888772248

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65

Thank you

Experience the technology with our demonstrators

NOW Semantic News Portal httpnowontotextcom

RANK News popularity ranking for companies httprankontotextcom

FactForge Hub for open data and news about People and Organizations

httpfactforgenet

65