gStore: A Graph-based SPARQL Query Engine

122
gStore: A Graph-based SPARQL Query Engine M. Tamer ¨ Ozsu University of Waterloo David R. Cheriton School of Computer Science Joint work with Lei Zou, Peking University and Lei Chen, Hong Kong University of Science and Technology 1 UPenn/2012-12-04

Transcript of gStore: A Graph-based SPARQL Query Engine

Page 1: gStore: A Graph-based SPARQL Query Engine

gStore: A Graph-based SPARQL Query Engine

M. Tamer Ozsu

University of WaterlooDavid R. Cheriton School of Computer Science

Joint work with Lei Zou, Peking University and Lei Chen, HongKong University of Science and Technology

1UPenn/2012-12-04

Page 2: gStore: A Graph-based SPARQL Query Engine

RDF and Semantic Web

RDF is a language for the conceptual modeling of informationabout web resources

A building block of semantic web

Facilitates exchange of informationSearch engines can retrieve more relevant informationFacilitates data integration (mashes)

Machine understandable

Understand the information on the web and theinterrelationships among them

2UPenn/2012-12-04

Page 3: gStore: A Graph-based SPARQL Query Engine

RDF Uses

Yago and DBPedia extract facts from Wikipedia & representas RDF → structural queries

Communities build RDF data

E.g., biologists: Bio2RDF and Uniprot RDF

Web data integration

Linked Data Cloud

. . .

3UPenn/2012-12-04

Page 4: gStore: A Graph-based SPARQL Query Engine

RDF Data Volumes . . .

. . . are growing – and fast

Linked data cloud currently consists of 325 datasets with>25B triplesSize almost doubling every year

4UPenn/2012-12-04

Page 5: gStore: A Graph-based SPARQL Query Engine

RDF Data Volumes . . .

. . . are growing – and fast

Linked data cloud currently consists of 325 datasets with>25B triplesSize almost doubling every year

As of March 2009

LinkedCTReactome

Taxonomy

KEGG

PubMed

GeneID

Pfam

UniProt

OMIM

PDB

SymbolChEBI

Daily Med

Disea-some

CAS

HGNC

InterPro

Drug Bank

UniParc

UniRef

ProDom

PROSITE

Gene Ontology

HomoloGene

PubChem

MGI

UniSTS

GEOSpecies

Jamendo

BBCProgramm

es

Music-brainz

Magna-tune

BBCLater +TOTP

SurgeRadio

MySpaceWrapper

Audio-Scrobbler

LinkedMDB

BBCJohnPeel

BBCPlaycount

Data

Gov-Track

US Census Data

riese

Geo-names

lingvoj

World Fact-book

Euro-stat

IRIT Toulouse

SWConference

Corpus

RDF Book Mashup

Project Guten-berg

DBLPHannover

DBLPBerlin

LAAS- CNRS

Buda-pestBME

IEEE

IBM

Resex

Pisa

New-castle

RAE 2001

CiteSeer

ACM

DBLP RKB

Explorer

eprints

LIBRIS

SemanticWeb.org Eurécom

ECS South-ampton

RevyuSIOCSites

Doap-space

Flickrexporter

FOAFprofiles

flickrwrappr

CrunchBase

Sem-Web-

Central

Open-Guides

Wiki-company

QDOS

Pub Guide

Open Calais

RDF ohloh

W3CWordNet

OpenCyc

UMBEL

Yago

DBpediaFreebase

Virtuoso Sponger

March ’09:89 datasets

4UPenn/2012-12-04

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.http://lod-cloud.net/

Page 6: gStore: A Graph-based SPARQL Query Engine

RDF Data Volumes . . .

. . . are growing – and fast

Linked data cloud currently consists of 325 datasets with>25B triplesSize almost doubling every year

As of September 2010

MusicBrainz

(zitgist)

P20

YAGO

World Fact-book (FUB)

WordNet (W3C)

WordNet(VUA)

VIVO UFVIVO

Indiana

VIVO Cornell

VIAF

URIBurner

Sussex Reading

Lists

Plymouth Reading

Lists

UMBEL

UK Post-codes

legislation.gov.uk

Uberblic

UB Mann-heim

TWC LOGD

Twarql

transportdata.gov

.uk

totl.net

Tele-graphis

TCMGeneDIT

TaxonConcept

The Open Library (Talis)

t4gm

Surge Radio

STW

RAMEAU SH

statisticsdata.gov

.uk

St. Andrews Resource

Lists

ECS South-ampton EPrints

Semantic CrunchBase

semanticweb.org

SemanticXBRL

SWDog Food

rdfabout US SEC

Wiki

UN/LOCODE

Ulm

ECS (RKB

Explorer)

Roma

RISKS

RESEX

RAE2001

Pisa

OS

OAI

NSF

New-castle

LAAS

KISTIJISC

IRIT

IEEE

IBM

Eurécom

ERA

ePrints

dotAC

DEPLOY

DBLP (RKB

Explorer)

Course-ware

CORDIS

CiteSeer

Budapest

ACM

riese

Revyu

researchdata.gov

.uk

referencedata.gov

.uk

Recht-spraak.

nl

RDFohloh

Last.FM (rdfize)

RDF Book

Mashup

PSH

ProductDB

PBAC

Poké-pédia

Ord-nance Survey

Openly Local

The Open Library

OpenCyc

OpenCalais

OpenEI

New York

Times

NTU Resource

Lists

NDL subjects

MARC Codes List

Man-chesterReading

Lists

Lotico

The London Gazette

LOIUS

lobidResources

lobidOrgani-sations

LinkedMDB

LinkedLCCN

LinkedGeoData

LinkedCT

Linked Open

Numbers

lingvoj

LIBRIS

Lexvo

LCSH

DBLP (L3S)

Linked Sensor Data (Kno.e.sis)

Good-win

Family

Jamendo

iServe

NSZL Catalog

GovTrack

GESIS

GeoSpecies

GeoNames

GeoLinkedData(es)

GTAA

STITCHSIDER

Project Guten-berg (FUB)

MediCare

Euro-stat

(FUB)

DrugBank

Disea-some

DBLP (FU

Berlin)

DailyMed

Freebase

flickr wrappr

Fishes of Texas

FanHubz

Event-Media

EUTC Produc-

tions

Eurostat

EUNIS

ESD stan-dards

Popula-tion (En-AKTing)

NHS (EnAKTing)

Mortality (En-

AKTing)Energy

(En-AKTing)

CO2(En-

AKTing)

educationdata.gov

.uk

ECS South-ampton

Gem. Norm-datei

datadcs

MySpace(DBTune)

MusicBrainz

(DBTune)

Magna-tune

John Peel(DB

Tune)

classical(DB

Tune)

Audio-scrobbler (DBTune)

Last.fmArtists

(DBTune)

DBTropes

dbpedia lite

DBpedia

Pokedex

Airports

NASA (Data Incu-bator)

MusicBrainz(Data

Incubator)

Moseley Folk

Discogs(Data In-cubator)

Climbing

Linked Data for Intervals

Cornetto

Chronic-ling

America

Chem2Bio2RDF

biz.data.

gov.uk

UniSTS

UniRef

UniPath-way

UniParc

Taxo-nomy

UniProt

SGD

Reactome

PubMed

PubChem

PRO-SITE

ProDom

Pfam PDB

OMIM

OBO

MGI

KEGG Reaction

KEGG Pathway

KEGG Glycan

KEGG Enzyme

KEGG Drug

KEGG Cpd

InterPro

HomoloGene

HGNC

Gene Ontology

GeneID

GenBank

ChEBI

CAS

Affy-metrix

BibBaseBBC

Wildlife Finder

BBC Program

mesBBC

Music

rdfaboutUS Census

Media

Geographic

Publications

Government

Cross-domain

Life sciences

User-generated content

September ’10:203 datasets

4UPenn/2012-12-04

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.http://lod-cloud.net/

Page 7: gStore: A Graph-based SPARQL Query Engine

RDF Data Volumes . . .

. . . are growing – and fast

Linked data cloud currently consists of 325 datasets with>25B triplesSize almost doubling every year

As of September 2011

MusicBrainz

(zitgist)

P20

Turismo de

Zaragoza

yovisto

Yahoo! Geo

Planet

YAGO

World Fact-book

El ViajeroTourism

WordNet (W3C)

WordNet (VUA)

VIVO UF

VIVO Indiana

VIVO Cornell

VIAF

URIBurner

Sussex Reading

Lists

Plymouth Reading

Lists

UniRef

UniProt

UMBEL

UK Post-codes

legislationdata.gov.uk

Uberblic

UB Mann-heim

TWC LOGD

Twarql

transportdata.gov.

uk

Traffic Scotland

theses.fr

Thesau-rus W

totl.net

Tele-graphis

TCMGeneDIT

TaxonConcept

Open Library (Talis)

tags2con delicious

t4gminfo

Swedish Open

Cultural Heritage

Surge Radio

Sudoc

STW

RAMEAU SH

statisticsdata.gov.

uk

St. Andrews Resource

Lists

ECS South-ampton EPrints

SSW Thesaur

us

SmartLink

Slideshare2RDF

semanticweb.org

SemanticTweet

Semantic XBRL

SWDog Food

Source Code Ecosystem Linked Data

US SEC (rdfabout)

Sears

Scotland Geo-

graphy

ScotlandPupils &Exams

Scholaro-meter

WordNet (RKB

Explorer)

Wiki

UN/LOCODE

Ulm

ECS (RKB

Explorer)

Roma

RISKS

RESEX

RAE2001

Pisa

OS

OAI

NSF

New-castle

LAASKISTI

JISC

IRIT

IEEE

IBM

Eurécom

ERA

ePrints dotAC

DEPLOY

DBLP (RKB

Explorer)

Crime Reports

UK

Course-ware

CORDIS (RKB

Explorer)CiteSeer

Budapest

ACM

riese

Revyu

researchdata.gov.

ukRen. Energy Genera-

tors

referencedata.gov.

uk

Recht-spraak.

nl

RDFohloh

Last.FM (rdfize)

RDF Book

Mashup

Rådata nå!

PSH

Product Types

Ontology

ProductDB

PBAC

Poké-pédia

patentsdata.go

v.uk

OxPoints

Ord-nance Survey

Openly Local

Open Library

OpenCyc

Open Corpo-rates

OpenCalais

OpenEI

Open Election

Data Project

OpenData

Thesau-rus

Ontos News Portal

OGOLOD

JanusAMP

Ocean Drilling Codices

New York

Times

NVD

ntnusc

NTU Resource

Lists

Norwe-gian

MeSH

NDL subjects

ndlna

myExperi-ment

Italian Museums

medu-cator

MARC Codes List

Man-chester Reading

Lists

Lotico

Weather Stations

London Gazette

LOIUS

Linked Open Colors

lobidResources

lobidOrgani-sations

LEM

LinkedMDB

LinkedLCCN

LinkedGeoData

LinkedCT

LinkedUser

FeedbackLOV

Linked Open

Numbers

LODE

Eurostat (OntologyCentral)

Linked EDGAR

(OntologyCentral)

Linked Crunch-

base

lingvoj

Lichfield Spen-ding

LIBRIS

Lexvo

LCSH

DBLP (L3S)

Linked Sensor Data (Kno.e.sis)

Klapp-stuhl-club

Good-win

Family

National Radio-activity

JP

Jamendo (DBtune)

Italian public

schools

ISTAT Immi-gration

iServe

IdRef Sudoc

NSZL Catalog

Hellenic PD

Hellenic FBD

PiedmontAccomo-dations

GovTrack

GovWILD

GoogleArt

wrapper

gnoss

GESIS

GeoWordNet

GeoSpecies

GeoNames

GeoLinkedData

GEMET

GTAA

STITCH

SIDER

Project Guten-berg

MediCare

Euro-stat

(FUB)

EURES

DrugBank

Disea-some

DBLP (FU

Berlin)

DailyMed

CORDIS(FUB)

Freebase

flickr wrappr

Fishes of Texas

Finnish Munici-palities

ChEMBL

FanHubz

EventMedia

EUTC Produc-

tions

Eurostat

Europeana

EUNIS

EU Insti-

tutions

ESD stan-dards

EARTh

Enipedia

Popula-tion (En-AKTing)

NHS(En-

AKTing) Mortality(En-

AKTing)

Energy (En-

AKTing)

Crime(En-

AKTing)

CO2 Emission

(En-AKTing)

EEA

SISVU

education.data.g

ov.uk

ECS South-ampton

ECCO-TCP

GND

Didactalia

DDC Deutsche Bio-

graphie

datadcs

MusicBrainz

(DBTune)

Magna-tune

John Peel

(DBTune)

Classical (DB

Tune)

AudioScrobbler (DBTune)

Last.FM artists

(DBTune)

DBTropes

Portu-guese

DBpedia

dbpedia lite

Greek DBpedia

DBpedia

data-open-ac-uk

SMCJournals

Pokedex

Airports

NASA (Data Incu-bator)

MusicBrainz(Data

Incubator)

Moseley Folk

Metoffice Weather Forecasts

Discogs (Data

Incubator)

Climbing

data.gov.uk intervals

Data Gov.ie

databnf.fr

Cornetto

reegle

Chronic-ling

America

Chem2Bio2RDF

Calames

businessdata.gov.

uk

Bricklink

Brazilian Poli-

ticians

BNB

UniSTS

UniPathway

UniParc

Taxonomy

UniProt(Bio2RDF)

SGD

Reactome

PubMedPub

Chem

PRO-SITE

ProDom

Pfam

PDB

OMIMMGI

KEGG Reaction

KEGG Pathway

KEGG Glycan

KEGG Enzyme

KEGG Drug

KEGG Com-pound

InterPro

HomoloGene

HGNC

Gene Ontology

GeneID

Affy-metrix

bible ontology

BibBase

FTS

BBC Wildlife Finder

BBC Program

mes BBC Music

Alpine Ski

Austria

LOCAH

Amster-dam

Museum

AGROVOC

AEMET

US Census (rdfabout)

Media

Geographic

Publications

Government

Cross-domain

Life sciences

User-generated content

September ’11:295 datasets

4UPenn/2012-12-04

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.http://lod-cloud.net/

Page 8: gStore: A Graph-based SPARQL Query Engine

Outline

1 RDF Introduction

2 Answering Regular SPARQL QueriesAnswering SPARQL queries using graph pattern matching[ZouMoZhaoChenOzsu, PVLDB 2011]

3 AggregationAnswering Aggregate SPARQL Queries Over Large RDFRepositories

4 Future Research

5UPenn/2012-12-04

Page 9: gStore: A Graph-based SPARQL Query Engine

Outline

1 RDF Introduction

2 Answering Regular SPARQL QueriesAnswering SPARQL queries using graph pattern matching[ZouMoZhaoChenOzsu, PVLDB 2011]

3 AggregationAnswering Aggregate SPARQL Queries Over Large RDFRepositories

4 Future Research

6UPenn/2012-12-04

Page 10: gStore: A Graph-based SPARQL Query Engine

RDF Introduction

Everything is an uniquely namedresource

Namespaces can be used to scopethe names

Properties of resources can bedefined

Relationships with other resourcescan be defined

Resources can be contributed bydifferent people/groups and can belocated anywhere in the web

Integrated web “database”

http://en.wikipedia.org/wiki/Abraham Lincoln

xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln

Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”

y:Washington DC

Abraham Lincoln:DiedIn

7UPenn/2012-12-04

Page 11: gStore: A Graph-based SPARQL Query Engine

RDF Introduction

Everything is an uniquely namedresource

Namespaces can be used to scopethe names

Properties of resources can bedefined

Relationships with other resourcescan be defined

Resources can be contributed bydifferent people/groups and can belocated anywhere in the web

Integrated web “database”

http://en.wikipedia.org/wiki/Abraham Lincoln

xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln

Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”

y:Washington DC

Abraham Lincoln:DiedIn

7UPenn/2012-12-04

Page 12: gStore: A Graph-based SPARQL Query Engine

RDF Introduction

Everything is an uniquely namedresource

Namespaces can be used to scopethe names

Properties of resources can bedefined

Relationships with other resourcescan be defined

Resources can be contributed bydifferent people/groups and can belocated anywhere in the web

Integrated web “database”

http://en.wikipedia.org/wiki/Abraham Lincoln

xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln

Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”

y:Washington DC

Abraham Lincoln:DiedIn

7UPenn/2012-12-04

Page 13: gStore: A Graph-based SPARQL Query Engine

RDF Introduction

Everything is an uniquely namedresource

Namespaces can be used to scopethe names

Properties of resources can bedefined

Relationships with other resourcescan be defined

Resources can be contributed bydifferent people/groups and can belocated anywhere in the web

Integrated web “database”

http://en.wikipedia.org/wiki/Abraham Lincoln

xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln

Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”

y:Washington DC

Abraham Lincoln:DiedIn

7UPenn/2012-12-04

Page 14: gStore: A Graph-based SPARQL Query Engine

RDF Introduction

Everything is an uniquely namedresource

Namespaces can be used to scopethe names

Properties of resources can bedefined

Relationships with other resourcescan be defined

Resources can be contributed bydifferent people/groups and can belocated anywhere in the web

Integrated web “database”

http://en.wikipedia.org/wiki/Abraham Lincoln

xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln

Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”

y:Washington DC

Abraham Lincoln:DiedIn

7UPenn/2012-12-04

Page 15: gStore: A Graph-based SPARQL Query Engine

RDF Data Model

Triple: Subject, Predicate (Property),Object (s, p, o)

Subject: the entity that is described(URI or blank node)

Predicate: a feature of the entity (URI)Object: value of the feature (URI,

blank node or literal)

(s, p, o) ∈ (U ∪ B)× U × (U ∪ B ∪ L)

Set of RDF triples is called an RDF graph

U

Subject Object

U B U B L

U: set of URIsB: set of blank nodesL: set of literals

Predicate

Subject Predicate ObjectAbraham Lincoln hasName “Abraham Lincoln”Abraham Lincoln BornOnDate “1809-02-12”Abraham Lincoln DiedOnDate “1865-04-15”

8UPenn/2012-12-04

Page 16: gStore: A Graph-based SPARQL Query Engine

RDF Example InstancePrefix: y=http://en.wikipedia.org/wiki

Subject Predicate Object

y: Abraham Lincoln hasName “Abraham Lincoln”y: Abraham Lincoln BornOnDate “1809-02-12”’y: Abraham Lincoln DiedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy: Abraham Lincoln DiedIn y: Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y: Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roosevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”

URI

Literal

URI

9UPenn/2012-12-04

Page 17: gStore: A Graph-based SPARQL Query Engine

RDF Graph

y:Abraham Lincoln

“Abraham Lincoln”

hasName

“1809-02-12”bornOnDate

“1865-04-15”

diedOnDate

“President”

title

“Male”

gender

y:Washington D.C.

“1790”

foundYear

“Washington D.C.”

hasName

y:Hodgenville KY “Hodgenville”hasName

y:United States

“United States”

hasName

“1776”

foundingYear

y:Reese Witherspoon

“1976-03-22”

bornOnDate

“Female”

gender

“Actress”

title

“Reese Witherspoon”

hasName

y:New Orleans LA

“1718”

foundingYear

y:Franklin Roosevelt

“Franklin D. Roosevelt”

hasName

“Male”

gender

“President”

title

y:Hyde Park NY

“1810”

foundingYear

y:Marilyn Monroe“1962-08-05”diedOnDate

“1926-07-01”

bornOnDate

“Female”

gender

“Marilyn Monroe”hasName

diedIn

bornIn

hasCapital

bornInlocatedIn locatedIn

bornIn

10UPenn/2012-12-04

Page 18: gStore: A Graph-based SPARQL Query Engine

RDF Query ModelQuery Model - SPARQL Protocol and RDF Query LanguageGiven U (set of URIs), L (set of literals), and V (set ofvariables), a SPARQL expression is defined recursively:

an atomic triple pattern, which is an element of

(U ∪ V )× (U ∪ V )× (U ∪ V ∪ L)

?x hasName “Abraham Lincoln”

P FILTER R, where P is a graph pattern expression and R is abuilt-in SPARQL condition (i.e., analogous to a SQL predicate)

?x price ?p FILTER(?p < 30)

P1 AND/OPT/UNION P2, where P1 and P2 are graphpattern expressions

Example:SELECT ?nameWHERE {

?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )

}11UPenn/2012-12-04

Page 19: gStore: A Graph-based SPARQL Query Engine

SPARQL Queries

SELECT ?nameWHERE {

?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )

}

?m ?citybornIn

?name

hasName

?bd

bornOnDate

“1718”

foundingYear

FILTER(regex(str(?bd),“1976”))

12UPenn/2012-12-04

Page 20: gStore: A Graph-based SPARQL Query Engine

Naıve Triple Store DesignSELECT ?nameWHERE {

?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )

}Subject Property Objecty:Abraham Lincoln hasName “Abraham Lincoln”y:Abraham Lincoln bornOnDate “1809-02-12”y:Abraham Lincoln diedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy:Abraham Lincoln diedIn y:Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y:Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roo-

sevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”

Too many self-joins!

13UPenn/2012-12-04

Page 21: gStore: A Graph-based SPARQL Query Engine

Naıve Triple Store DesignSELECT ?nameWHERE {

?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )

}Subject Property Objecty:Abraham Lincoln hasName “Abraham Lincoln”y:Abraham Lincoln bornOnDate “1809-02-12”y:Abraham Lincoln diedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy:Abraham Lincoln diedIn y:Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y:Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roo-

sevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”

SELECT T2 . o b j e c tFROM T as T1 , T as T2 , T as T3 ,

T as T4WHERE T1 . p r o p e r t y=” b o r n I n ”AND T2 . p r o p e r t y=”hasName”AND T3 . p r o p e r t y=” bornOnDate ”AND T1 . s u b j e c t=T2 . s u b j e c tAND T2 . s u b j e c t=T3 . s u b j e c tAND T4 . p r o p e t y=” f o u n d i n g Y e a r ”AND T1 . o b j e c t=T4 . s u b j e c tAND T4 . o b j e c t=” 1718 ”AND T3 . o b j e c t LIKE ’%1976% ’

Too many self-joins!

13UPenn/2012-12-04

Page 22: gStore: A Graph-based SPARQL Query Engine

Naıve Triple Store DesignSELECT ?nameWHERE {

?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )

}Subject Property Objecty:Abraham Lincoln hasName “Abraham Lincoln”y:Abraham Lincoln bornOnDate “1809-02-12”y:Abraham Lincoln diedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy:Abraham Lincoln diedIn y:Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y:Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roo-

sevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”

SELECT T2 . o b j e c tFROM T as T1 , T as T2 , T as T3 ,

T as T4WHERE T1 . p r o p e r t y=” b o r n I n ”AND T2 . p r o p e r t y=”hasName”AND T3 . p r o p e r t y=” bornOnDate ”AND T1 . s u b j e c t=T2 . s u b j e c tAND T2 . s u b j e c t=T3 . s u b j e c tAND T4 . p r o p e t y=” f o u n d i n g Y e a r ”AND T1 . o b j e c t=T4 . s u b j e c tAND T4 . o b j e c t=” 1718 ”AND T3 . o b j e c t LIKE ’%1976% ’

Too many self-joins!

13UPenn/2012-12-04

Page 23: gStore: A Graph-based SPARQL Query Engine

Existing Solutions1. Property table

Each class of objects go to a different table ⇒ similar tonormalized relationsEliminates some of the joins

2. Vertically partitioned tablesFor each property, build a two-column table, containing bothsubject and object, ordered by subjectsCan use merge join (faster)Good for subject-subject joins but does not help withsubject-object joins

3. Exhaustive indexingCreate indexes for each permutation of the three columnsQuery components become range queries over individualrelations with merge-join to combineExcessive space usage

None can handle wildcard queries easily

Most cannot handle updates

14UPenn/2012-12-04

Page 24: gStore: A Graph-based SPARQL Query Engine

Existing Solutions1. Property table

Each class of objects go to a different table ⇒ similar tonormalized relationsEliminates some of the joins

2. Vertically partitioned tablesFor each property, build a two-column table, containing bothsubject and object, ordered by subjectsCan use merge join (faster)Good for subject-subject joins but does not help withsubject-object joins

3. Exhaustive indexingCreate indexes for each permutation of the three columnsQuery components become range queries over individualrelations with merge-join to combineExcessive space usage

None can handle wildcard queries easily

Most cannot handle updates

14UPenn/2012-12-04

Page 25: gStore: A Graph-based SPARQL Query Engine

Outline

1 RDF Introduction

2 Answering Regular SPARQL QueriesAnswering SPARQL queries using graph pattern matching[ZouMoZhaoChenOzsu, PVLDB 2011]

3 AggregationAnswering Aggregate SPARQL Queries Over Large RDFRepositories

4 Future Research

15UPenn/2012-12-04

Page 26: gStore: A Graph-based SPARQL Query Engine

Our Approach – General Idea

We work directly on the RDF graph and the SPARQL querygraph

Answering SPARQL query ≡ subgraph matchingSubgraph matching is computationally expensive

Use a signature-based encoding of each entity and class vertexto speed up matching

Filter-and-evaluate

Use a false positive algorithm to prune nodes and obtain a setof candidates; then do more detailed evaluation on those

We develop an index (VS∗-tree) over the data signature graph(has light maintenance load) for efficient pruning

16UPenn/2012-12-04

Page 27: gStore: A Graph-based SPARQL Query Engine

Why not ordinary graph pattern matching?

1. RDF graphs are much larger than what is considered in usualgraph pattern matching (by orders of magnitude)

2. Cardinality of vertices and edges in an RDF graph is muchlarger than in traditional graph databases

AIDS dataset: 10,000 data graphs, each with avg 20 vertices& 25 edges with a total size 5MBYago RDF: 500M vertices with a total size >3GB

3. SPARQL queries have special structure

Star subqueries that use several properties of the same entity

17UPenn/2012-12-04

Page 28: gStore: A Graph-based SPARQL Query Engine

0. Start with RDF Graph G

?m ?citybornIn

?name

hasName

?bd

bornOnDate

“1718”

foundingYear

FILTER(regex(str(?bd),“1976”))

y:Abraham Lincoln

“Abraham Lincoln”

hasName

“1809-02-12”bornOnDate

“1865-04-15”

diedOnDate

“President”

title

“Male”

gender

y:Washington D.C.

“1790”

foundYear

“Washington D.C.”

hasName

y:Hodgenville KY “Hodgenville”hasName

y:United States

“United States”

hasName

“1776”

foundingYear

y:Reese Witherspoon

“1976-03-22”

bornOnDate

“Female”

gender

“Actress”

title

“Reese Witherspoon”

hasName

y:New Orleans LA

“1718”

foundingYear

y:Franklin Roosevelt

“Franklin D. Roosevelt”

hasName

“Male”

gender

“President”

title

y:Hyde Park NY

“1810”

foundingYear

y:Marilyn Monroe“1962-08-05”diedOnDate

“1926-07-01”

bornOnDate

“Female”

gender

“Marilyn Monroe”hasName

diedIn

bornIn

hasCapital

bornInlocatedIn locatedIn

bornIn

18UPenn/2012-12-04

Page 29: gStore: A Graph-based SPARQL Query Engine

0. Start with RDF Graph G

?m ?citybornIn

?name

hasName

?bd

bornOnDate

“1718”

foundingYear

FILTER(regex(str(?bd),“1976”))

y:Abraham Lincoln

“Abraham Lincoln”

hasName

“1809-02-12”bornOnDate

“1865-04-15”

diedOnDate

“President”

title

“Male”

gender

y:Washington D.C.

“1790”

foundYear

“Washington D.C.”

hasName

y:Hodgenville KY “Hodgenville”hasName

y:United States

“United States”

hasName

“1776”

foundingYear

y:Reese Witherspoon

“1976-03-22”

bornOnDate

“Female”

gender

“Actress”

title

“Reese Witherspoon”

hasName

y:New Orleans LA

“1718”

foundingYear

y:Franklin Roosevelt

“Franklin D. Roosevelt”

hasName

“Male”

gender

“President”

title

y:Hyde Park NY

“1810”

foundingYear

y:Marilyn Monroe“1962-08-05”diedOnDate

“1926-07-01”

bornOnDate

“Female”

gender

“Marilyn Monroe”hasName

diedIn

bornIn

hasCapital

bornInlocatedIn locatedIn

bornIn

18UPenn/2012-12-04

Finding matches over a largegraph is not a trivial task!

Page 30: gStore: A Graph-based SPARQL Query Engine

1. Represent G as Adjacency List and Encode Each Vertex

Prefix: y=http://en.wikipedia.org/wiki/Label adjList...

...

y:Reese Witherspoon (BornOnDate,“1976-03-22”), (gender,“Female”),(title,“Actress”), (hasName,“Reese Witherspoon”),(BornIn,New Orleans LA)

......

19UPenn/2012-12-04

Page 31: gStore: A Graph-based SPARQL Query Engine

1. Represent G as Adjacency List and Encode Each Vertex

Prefix: y=http://en.wikipedia.org/wiki/Label adjList...

...

y:Reese Witherspoon (BornOnDate,“1976-03-22”), (gender,“Female”),(title,“Actress”), (hasName,“Reese Witherspoon”),(BornIn,New Orleans LA)

......

Encode all neighbors of a vertex into a bit stringvertex signature0010 1000

19UPenn/2012-12-04

Page 32: gStore: A Graph-based SPARQL Query Engine

2. Construct Data Signature Graph G ∗

0000 0001 0001 100010000

0011 1000

00001

0010 1000

1000 0100

10000

1000 0001

10000

01000

0100 000001000

0010 1000

10000

0010 0000

001 003

002

005

006

004 008

009

20UPenn/2012-12-04

Page 33: gStore: A Graph-based SPARQL Query Engine

3. Encode Q to Get Signature Graph Q∗

Q

?m ?citybornIn

?name

hasName

?bd

bornOnDate

“1718”

foundingYear

FILTER(regex(str(?bd),“1976”))

0010 1000 1000 000010000

Q∗

21UPenn/2012-12-04

Page 34: gStore: A Graph-based SPARQL Query Engine

4. Filter-and-Evaluate

0010 1000 1000 000010000

Query signature graph Q∗

0000 0001 0001 100010000

0011 1000

00001

0010 1000

1000 0100

10000

1000 0001

10000

01000

0100 000001000

0010 1000

10000

0010 0000

Data signature graph G ∗

Find matches of Q∗ oversignature graph G ∗

Verify each match inRDF graph G

22UPenn/2012-12-04

Page 35: gStore: A Graph-based SPARQL Query Engine

How to Generate Candidate List

Two step process:1. Run an inclusion query for each node of Q∗ and get lists of

nodes in G∗ that include that node.

Given query signature q and a set of data signatures S , findall data signatures si ∈ S where q&si = q

2. Do a multi-way join to get the candidate list

Alternatives:

Sequential scan of G∗

Both steps are inefficient

Use S-trees

Height-balanced tree over signaturesSupport inclusion queries wellDoes not support second step – expensive

VS-tree (and VS∗-tree)

Multi-resolution summary graph based on S-treeSupports both steps efficiently

23UPenn/2012-12-04

Page 36: gStore: A Graph-based SPARQL Query Engine

How to Generate Candidate List

Two step process:1. Run an inclusion query for each node of Q∗ and get lists of

nodes in G∗ that include that node.

Given query signature q and a set of data signatures S , findall data signatures si ∈ S where q&si = q

2. Do a multi-way join to get the candidate list

Alternatives:

Sequential scan of G∗

Both steps are inefficient

Use S-trees

Height-balanced tree over signaturesSupport inclusion queries wellDoes not support second step – expensive

VS-tree (and VS∗-tree)

Multi-resolution summary graph based on S-treeSupports both steps efficiently

23UPenn/2012-12-04

Page 37: gStore: A Graph-based SPARQL Query Engine

How to Generate Candidate List

Two step process:1. Run an inclusion query for each node of Q∗ and get lists of

nodes in G∗ that include that node.

Given query signature q and a set of data signatures S , findall data signatures si ∈ S where q&si = q

2. Do a multi-way join to get the candidate list

Alternatives:Sequential scan of G∗

Both steps are inefficient

Use S-trees

Height-balanced tree over signaturesSupport inclusion queries wellDoes not support second step – expensive

VS-tree (and VS∗-tree)

Multi-resolution summary graph based on S-treeSupports both steps efficiently

23UPenn/2012-12-04

Page 38: gStore: A Graph-based SPARQL Query Engine

How to Generate Candidate List

Two step process:1. Run an inclusion query for each node of Q∗ and get lists of

nodes in G∗ that include that node.

Given query signature q and a set of data signatures S , findall data signatures si ∈ S where q&si = q

2. Do a multi-way join to get the candidate list

Alternatives:Sequential scan of G∗

Both steps are inefficient

Use S-trees

Height-balanced tree over signaturesSupport inclusion queries wellDoes not support second step – expensive

VS-tree (and VS∗-tree)

Multi-resolution summary graph based on S-treeSupports both steps efficiently

23UPenn/2012-12-04

Page 39: gStore: A Graph-based SPARQL Query Engine

How to Generate Candidate List

Two step process:1. Run an inclusion query for each node of Q∗ and get lists of

nodes in G∗ that include that node.

Given query signature q and a set of data signatures S , findall data signatures si ∈ S where q&si = q

2. Do a multi-way join to get the candidate list

Alternatives:Sequential scan of G∗

Both steps are inefficient

Use S-trees

Height-balanced tree over signaturesSupport inclusion queries wellDoes not support second step – expensive

VS-tree (and VS∗-tree)

Multi-resolution summary graph based on S-treeSupports both steps efficiently

23UPenn/2012-12-04

Page 40: gStore: A Graph-based SPARQL Query Engine

S-tree Solution

1111 1101

0011 1001 1101 1101

0010 1001 0011 1000 0101 1000 1000 0101

0000 0001

0010 1000 0010 0000

0011 1000

0010 1000

0001 1000

0100 0000

1000 0001

1000 0100

001

005 009

002

007

003

008

004

006

d11

d21 d2

2

d31 d3

2 d33 d3

4

G 3

G 2

G 1

1000 00000010 100010000

0010 1000

0011 1000

0010 1000 0010 1000

002

005

007

1000 0000

1000 0001

1000 0100

004

006

004

006on

0010 1000 1000 0100

Large join space!

24UPenn/2012-12-04

Page 41: gStore: A Graph-based SPARQL Query Engine

S-tree Solution

1111 1101

0011 1001 1101 1101

0010 1001 0011 1000 0101 1000 1000 0101

0000 0001

0010 1000 0010 0000

0011 1000

0010 1000

0001 1000

0100 0000

1000 0001

1000 0100

001

005 009

002

007

003

008

004

006

d11

d21 d2

2

d31 d3

2 d33 d3

4

G 3

G 2

G 1

1000 00000010 100010000

0010 1000

0011 1000

0010 1000 0010 1000

002

005

007

1000 0000

1000 0001

1000 0100

004

006

004

006on

0010 1000 1000 0100

Large join space!

24UPenn/2012-12-04

Page 42: gStore: A Graph-based SPARQL Query Engine

S-tree Solution

1111 1101

0011 1001 1101 1101

0010 1001 0011 1000 0101 1000 1000 0101

0000 0001

0010 1000 0010 0000

0011 1000

0010 1000

0001 1000

0100 0000

1000 0001

1000 0100

001

005 009

002

007

003

008

004

006

d11

d21 d2

2

d31 d3

2 d33 d3

4

G 3

G 2

G 1

1000 00000010 100010000

0010 1000

0011 1000

0010 1000 0010 1000

002

005

007

1000 0000

1000 0001

1000 0100

004

006

004

006on

0010 1000 1000 0100

Large join space!

24UPenn/2012-12-04

Page 43: gStore: A Graph-based SPARQL Query Engine

S-tree Solution

1111 1101

0011 1001 1101 1101

0010 1001 0011 1000 0101 1000 1000 0101

0000 0001

0010 1000 0010 0000

0011 1000

0010 1000

0001 1000

0100 0000

1000 0001

1000 0100

001

005 009

002

007

003

008

004

006

d11

d21 d2

2

d31 d3

2 d33 d3

4

G 3

G 2

G 1

1000 00000010 100010000

0010 1000

0011 1000

0010 1000 0010 1000

002

005

007

1000 0000

1000 0001

1000 0100

004

006

004

006on

0010 1000 1000 0100

Large join space!

24UPenn/2012-12-04

Page 44: gStore: A Graph-based SPARQL Query Engine

S-tree Solution

1111 1101

0011 1001 1101 1101

0010 1001 0011 1000 0101 1000 1000 0101

0000 0001

0010 1000 0010 0000

0011 1000

0010 1000

0001 1000

0100 0000

1000 0001

1000 0100

001

005 009

002

007

003

008

004

006

d11

d21 d2

2

d31 d3

2 d33 d3

4

G 3

G 2

G 1

1000 00000010 100010000

0010 1000

0011 1000

0010 1000 0010 1000

002

005

007

1000 0000

1000 0001

1000 0100

004

006

004

006on

0010 1000 1000 0100

Large join space!

24UPenn/2012-12-04

Page 45: gStore: A Graph-based SPARQL Query Engine

S-tree Solution

1111 1101

0011 1001 1101 1101

0010 1001 0011 1000 0101 1000 1000 0101

0000 0001

0010 1000 0010 0000

0011 1000

0010 1000

0001 1000

0100 0000

1000 0001

1000 0100

001

005 009

002

007

003

008

004

006

d11

d21 d2

2

d31 d3

2 d33 d3

4

G 3

G 2

G 1

1000 00000010 100010000

0010 1000

0011 1000

0010 1000 0010 1000

002

005

007

1000 0000

1000 0001

1000 0100

004

006

004

006on

0010 1000 1000 0100

Large join space!

24UPenn/2012-12-04

Page 46: gStore: A Graph-based SPARQL Query Engine

VS-tree

1111 1101

0011 1001 1101 1101

0010 1001 0011 1000 0101 1000 1000 0101

0000 0001

0010 1000 0010 0000

0011 1000

0010 1000

0001 1000

0001 0000

1000 0001

1000 0100

001

005 009

002

007

003

008

004

006

d11

d21 d2

2

d31 d3

2 d33 d3

4G 3

G 2

G 1

11101

00100

10000

00001 01000

10000

00001

00010

10000 01000 00100

00001

10000

10000

10000

01000

01000

01000

00100

Super edge

25UPenn/2012-12-04

Page 47: gStore: A Graph-based SPARQL Query Engine

Pruning with VS-Tree

1111 1101

0011 1001 1101 1101

0010 1001 0011 1000 0101 1000 1000 0101

0000 0001

0010 1000 0010 0000

0011 1000

0010 1000

0001 1000

0001 0000

1000 0001

1000 0100

001

005 009

002

007

003

008

004

006

d11

d21 d2

2

d31 d3

2 d33 d3

4G 3

G 2

G 1

11101

00100

10000

00001 01000

10000

00001

00010

10000 01000 00100

00001

10000

10000

10000

01000

01000

01000

00100

0010 1000 1000 000010000

d31

d32

d34

G 3

10000

005

002

007

004

006on

Reduced join space!

Performance

26UPenn/2012-12-04

Page 48: gStore: A Graph-based SPARQL Query Engine

Pruning with VS-Tree

1111 1101

0011 1001 1101 1101

0010 1001 0011 1000 0101 1000 1000 0101

0000 0001

0010 1000 0010 0000

0011 1000

0010 1000

0001 1000

0001 0000

1000 0001

1000 0100

001

005 009

002

007

003

008

004

006

d11

d21 d2

2

d31 d3

2 d33 d3

4G 3

G 2

G 1

11101

00100

10000

00001 01000

10000

00001

00010

10000 01000 00100

00001

10000

10000

10000

01000

01000

01000

00100

0010 1000 1000 000010000

d31

d32

d34

G 3

10000

005

002

007

004

006on

Reduced join space!

Performance

26UPenn/2012-12-04

Page 49: gStore: A Graph-based SPARQL Query Engine

Pruning with VS-Tree

1111 1101

0011 1001 1101 1101

0010 1001 0011 1000 0101 1000 1000 0101

0000 0001

0010 1000 0010 0000

0011 1000

0010 1000

0001 1000

0001 0000

1000 0001

1000 0100

001

005 009

002

007

003

008

004

006

d11

d21 d2

2

d31 d3

2 d33 d3

4G 3

G 2

G 1

11101

00100

10000

00001 01000

10000

00001

00010

10000 01000 00100

00001

10000

10000

10000

01000

01000

01000

00100

0010 1000 1000 000010000

d31

d32

d34

G 3

10000

005

002

007

004

006on

Reduced join space!

Performance

26UPenn/2012-12-04

Page 50: gStore: A Graph-based SPARQL Query Engine

Pruning with VS-Tree

1111 1101

0011 1001 1101 1101

0010 1001 0011 1000 0101 1000 1000 0101

0000 0001

0010 1000 0010 0000

0011 1000

0010 1000

0001 1000

0001 0000

1000 0001

1000 0100

001

005 009

002

007

003

008

004

006

d11

d21 d2

2

d31 d3

2 d33 d3

4G 3

G 2

G 1

11101

00100

10000

00001 01000

10000

00001

00010

10000 01000 00100

00001

10000

10000

10000

01000

01000

01000

00100

0010 1000 1000 000010000

d31

d32

d34

G 3

10000

005

002

007

004

006on

Reduced join space!

Performance

26UPenn/2012-12-04

Page 51: gStore: A Graph-based SPARQL Query Engine

Pruning with VS-Tree

1111 1101

0011 1001 1101 1101

0010 1001 0011 1000 0101 1000 1000 0101

0000 0001

0010 1000 0010 0000

0011 1000

0010 1000

0001 1000

0001 0000

1000 0001

1000 0100

001

005 009

002

007

003

008

004

006

d11

d21 d2

2

d31 d3

2 d33 d3

4G 3

G 2

G 1

11101

00100

10000

00001 01000

10000

00001

00010

10000 01000 00100

00001

10000

10000

10000

01000

01000

01000

00100

0010 1000 1000 000010000

d31

d32

d34

G 3

10000

005

002

007

004

006on

Reduced join space!

Performance

26UPenn/2012-12-04

Page 52: gStore: A Graph-based SPARQL Query Engine

Pruning with VS-Tree

1111 1101

0011 1001 1101 1101

0010 1001 0011 1000 0101 1000 1000 0101

0000 0001

0010 1000 0010 0000

0011 1000

0010 1000

0001 1000

0001 0000

1000 0001

1000 0100

001

005 009

002

007

003

008

004

006

d11

d21 d2

2

d31 d3

2 d33 d3

4G 3

G 2

G 1

11101

00100

10000

00001 01000

10000

00001

00010

10000 01000 00100

00001

10000

10000

10000

01000

01000

01000

00100

0010 1000 1000 000010000

d31

d32

d34

G 3

10000

005

002

007

004

006on

Reduced join space!

Performance 26UPenn/2012-12-04

Page 53: gStore: A Graph-based SPARQL Query Engine

Evaluation of VS-tree

1111 1101

0011 1001 1101 1101

0010 1001 0011 1000 0101 1000 1000 0101

0000 0001

0010 1000 0010 0000

0011 1000

0010 1000

0001 1000

0001 0000

1000 0001

1000 0100

001

005 009

002

007

003

008

004

006

d11

d21 d2

2

d31 d3

2 d33 d3

4G 3

G 2

G 1

11101

00100

10000

00001 01000

10000

00001

00010

10000 01000 00100

00001

10000

10000

10000

01000

01000

01000

00100

Very dense

in higher levelsof VS-tree

Lots of summary matches

in higher levelsof VS-tree

Reduced

pruning power

27UPenn/2012-12-04

Page 54: gStore: A Graph-based SPARQL Query Engine

Evaluation of VS-tree

1111 1101

0011 1001 1101 1101

0010 1001 0011 1000 0101 1000 1000 0101

0000 0001

0010 1000 0010 0000

0011 1000

0010 1000

0001 1000

0001 0000

1000 0001

1000 0100

001

005 009

002

007

003

008

004

006

d11

d21 d2

2

d31 d3

2 d33 d3

4G 3

G 2

G 1

11101

00100

10000

00001 01000

10000

00001

00010

10000 01000 00100

00001

10000

10000

10000

01000

01000

01000

00100

Very dense

in higher levelsof VS-tree

Lots of summary matches

in higher levelsof VS-tree

Reduced

pruning power

27UPenn/2012-12-04

Page 55: gStore: A Graph-based SPARQL Query Engine

Evaluation of VS-tree

1111 1101

0011 1001 1101 1101

0010 1001 0011 1000 0101 1000 1000 0101

0000 0001

0010 1000 0010 0000

0011 1000

0010 1000

0001 1000

0001 0000

1000 0001

1000 0100

001

005 009

002

007

003

008

004

006

d11

d21 d2

2

d31 d3

2 d33 d3

4G 3

G 2

G 1

11101

00100

10000

00001 01000

10000

00001

00010

10000 01000 00100

00001

10000

10000

10000

01000

01000

01000

00100

Very dense

in higher levelsof VS-tree

Lots of summary matches

in higher levelsof VS-tree

Reduced

pruning power

27UPenn/2012-12-04

Page 56: gStore: A Graph-based SPARQL Query Engine

Evaluation of VS-tree

1111 1101

0011 1001 1101 1101

0010 1001 0011 1000 0101 1000 1000 0101

0000 0001

0010 1000 0010 0000

0011 1000

0010 1000

0001 1000

0001 0000

1000 0001

1000 0100

001

005 009

002

007

003

008

004

006

d11

d21 d2

2

d31 d3

2 d33 d3

4G 3

G 2

G 1

11101

00100

10000

00001 01000

10000

00001

00010

10000 01000 00100

00001

10000

10000

10000

01000

01000

01000

00100

Very dense

in higher levelsof VS-tree

Lots of summary matches

in higher levelsof VS-tree

Reduced

pruning power

27UPenn/2012-12-04

Page 57: gStore: A Graph-based SPARQL Query Engine

Possible Optimizations

1. Do not start the multi-way join from the root

Find an oracle that determines the best place to start the joinCost-based

2. Do not materialize summary matches at higher levels ofVS-tree

Use a DFS-search

3. Reduce the number of super-edges when building the VS-tree

Reduce the number of vertices in each list

28UPenn/2012-12-04

Page 58: gStore: A Graph-based SPARQL Query Engine

VS*-tree

1001 1000

1100 0000 1011 0001

1000 0000 0100 0000 1001 0000 0010 0001

d11

d21 d2

2

01000

10000 00010

10000 00010

01000

G 1

G 2

G ∗

1001 1000

1100 0000 1011 0001

1000 0000

0100 0000

0000 1000 1001 0000 0010 0001

d11

d21 d2

2

01000

10000 00010

10000

00010

01000

10000

10000

G 1

G 2

G ∗

Enter new vertex signature u intonon-leaf node d where

Hamming(u, d) is minimum

29UPenn/2012-12-04

Page 59: gStore: A Graph-based SPARQL Query Engine

VS*-tree

1001 1000

1100 0000 1011 0001

1000 0000 0100 0000 1001 0000 0010 0001

d11

d21 d2

2

01000

10000 00010

10000 00010

01000

G 1

G 2

G ∗

1001 1000

1100 0000 1011 0001

1000 0000 0100 0000 1001 0000

0010 0001

0000 1000

d11

d21 d2

2

01000

10000 00010

10000

00010

01000

10000

G 1

G 2

G ∗

Enter new vertex signature u intonon-leaf node d considering

Hamming distance &no of supernodes introduced

29UPenn/2012-12-04

Page 60: gStore: A Graph-based SPARQL Query Engine

Pruning Effect

? x1 y : hasGivenName ? x5? x1 y : hasFamilyName ? x6? x1 r d f : t y p e

<w o r d n e t s c i e n t i s t 1 1 0 5 6 0 6 3 7 >? x1 y : BornIn ? x2? x1 y : h a s A c a d e m i c A d v i s o r ? x4? x2 y : l o c a t e d I n <S w i t z e r l a n d >? x3 y : l o c a t e d I n <Germany>? x4 y : b o r n I n ? x3

Before AfterPruning Pruning

x1 810 810x2 424 197x3 66 66x4 36187 6686

30UPenn/2012-12-04

Page 61: gStore: A Graph-based SPARQL Query Engine

Exact Queries

A1 A2 B1 B2 B3 C1 C20

1,000

2,000

3,000

4,000

5,000

6,000

Queries

Qu

ery

Res

pon

seT

ime

(in

ms)

Yago network (20 million triples & size 3.1GB)

gStore RDF-3X SW-Store x-RDF-3x BigOWLIM GRIN

31UPenn/2012-12-04

Page 62: gStore: A Graph-based SPARQL Query Engine

Wildcard Queries

A1 A2 B1 B2 B3 C1 C20

2,000

4,000

6,000

8,000

10,000

Queries

Qu

ery

Res

pon

seT

ime

(in

ms)

Yago network (20 million triples & size 3.1GB)

gStore RDF-3X+PF SW-Store+PFx-RDF-3x+PF BigOWLIM GRIN+PF

32UPenn/2012-12-04

Page 63: gStore: A Graph-based SPARQL Query Engine

Outline

1 RDF Introduction

2 Answering Regular SPARQL QueriesAnswering SPARQL queries using graph pattern matching[ZouMoZhaoChenOzsu, PVLDB 2011]

3 AggregationAnswering Aggregate SPARQL Queries Over Large RDFRepositories

4 Future Research

33UPenn/2012-12-04

Page 64: gStore: A Graph-based SPARQL Query Engine

Aggregation Queries

(Q1) Group all individuals by theirgenders, titles, and the founding yearof their birth places, and report thenumber of individuals in each group.SELECT ? t ?g ? y COUNT(?m)WHERE {?m <born In> ? c .

?m <t i t l e > ? t .?m <gender> ?g .? c <found ingYear> ? y .}

GROUP BY ? t ?g , ? y .

(Q2) Group all individuals by their genders,titles, and the founding year of theirbirth places, and report the numberof individuals in each group for thosegroups with more than 10 members.SELECT ? t ?g ? y COUNT(?m)WHERE {?m <born In> ? c .

?m <t i t l e > ? t .?m <gender> ?g .? c <found ingYear> ? y .}

GROUP BY ? t ?g , ? y .HAVING COUNT(?m)>10.

Prefix: y=http://en.wikipedia.org/wikiSubject Predicate Objecty: Abraham Lincoln hasName “Abraham Lincoln”y: Abraham Lincoln BornOnDate “1809-02-12”’y: Abraham Lincoln DiedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy: Abraham Lincoln DiedIn y: Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y: Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roosevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”

34UPenn/2012-12-04

Page 65: gStore: A Graph-based SPARQL Query Engine

Aggregation Queries

(Q1) Group all individuals by theirgenders, titles, and the founding yearof their birth places, and report thenumber of individuals in each group.SELECT ? t ?g ? y COUNT(?m)WHERE {?m <born In> ? c .

?m <t i t l e > ? t .?m <gender> ?g .? c <found ingYear> ? y .}

GROUP BY ? t ?g , ? y .

(Q2) Group all individuals by their genders,titles, and the founding year of theirbirth places, and report the numberof individuals in each group for thosegroups with more than 10 members.SELECT ? t ?g ? y COUNT(?m)WHERE {?m <born In> ? c .

?m <t i t l e > ? t .?m <gender> ?g .? c <found ingYear> ? y .}

GROUP BY ? t ?g , ? y .HAVING COUNT(?m)>10.

Prefix: y=http://en.wikipedia.org/wikiSubject Predicate Objecty: Abraham Lincoln hasName “Abraham Lincoln”y: Abraham Lincoln BornOnDate “1809-02-12”’y: Abraham Lincoln DiedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy: Abraham Lincoln DiedIn y: Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y: Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roosevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”

34UPenn/2012-12-04

Page 66: gStore: A Graph-based SPARQL Query Engine

Aggregate Query ResultsSELECT ? t ?g ? y COUNT(?m)WHERE {?m <born In> ? c .

?m <t i t l e > ? t .?m <gender> ?g .? c <found ingYear> ? y .}

GROUP BY ? t ?g , ? y .

v1

gender title

?mv2

foundingYearbornIn

gender title foundingYear COUNT(?m)

Male President 1718 1

Female Actress 1810 1

Aggregate Result

y:Abraham Lincoln

“Abraham Lincoln”

hasName

“1809-02-12”bornOnDate

“1865-04-15”

diedOnDate

“President”

title

“Male”

gender

y:Washington D.C.

“1790”

foundYear

“Washington D.C.”

hasName

y:Hodgenville KY “Hodgenville”hasName

y:United States

“United States”

hasName

“1776”

foundingYear

y:Reese Witherspoon

“1976-03-22”

bornOnDate

“Female”

gender

“Actress”

title

“Reese Witherspoon”

hasName

y:New Orleans LA

“1718”

foundingYear

y:Franklin Roosevelt

“Franklin D. Roosevelt”

hasName

“Male”

gender

“President”

title

y:Hyde Park NY

“1810”

foundingYear

y:Marilyn Monroe“1962-08-05”diedOnDate

“1926-07-01”

bornOnDate

“Female”

gender

“Marilyn Monroe”hasName

001

010

011

012

013 014

002

015016

003017

004

018

019

005

020

021

022 023 006

024

007

025 026

027

008

028

009029

030 031

032

diedIn

bornIn

hasCapital

bornInlocatedIn locatedIn

bornIn

36UPenn/2012-12-04

Page 67: gStore: A Graph-based SPARQL Query Engine

Aggregate Query ResultsSELECT ? t ?g ? y COUNT(?m)WHERE {?m <born In> ? c .

?m <t i t l e > ? t .?m <gender> ?g .? c <found ingYear> ? y .}

GROUP BY ? t ?g , ? y .

v1

gender title

?mv2

foundingYearbornIn

gender title foundingYear COUNT(?m)

Male President 1718 1

Female Actress 1810 1

Aggregate Result

y:Abraham Lincoln

“Abraham Lincoln”

hasName

“1809-02-12”bornOnDate

“1865-04-15”

diedOnDate

“President”

title

“Male”

gender

y:Washington D.C.

“1790”

foundYear

“Washington D.C.”

hasName

y:Hodgenville KY “Hodgenville”hasName

y:United States

“United States”

hasName

“1776”

foundingYear

y:Reese Witherspoon

“1976-03-22”

bornOnDate

“Female”

gender

“Actress”

title

“Reese Witherspoon”

hasName

y:New Orleans LA

“1718”

foundingYear

y:Franklin Roosevelt

“Franklin D. Roosevelt”

hasName

“Male”

gender

“President”

title

y:Hyde Park NY

“1810”

foundingYear

y:Marilyn Monroe“1962-08-05”diedOnDate

“1926-07-01”

bornOnDate

“Female”

gender

“Marilyn Monroe”hasName

001

010

011

012

013 014

002

015016

003017

004

018

019

005

020

021

022 023 006

024

007

025 026

027

008

028

009029

030 031

032

diedIn

bornIn

hasCapital

bornInlocatedIn locatedIn

bornIn

36UPenn/2012-12-04

Measuredimension

Group-bydimensions

Querypattern

Page 68: gStore: A Graph-based SPARQL Query Engine

Existing Solutions

1. Using Existing SPARQL Query Engines

Aggregate queryQ

Query withoutaggregation (Q ′)

Result withoutaggregation (R(Q ′))

Query resultR(Q)

Ignore aggregationfunction

Employ existingSPARQL engine

Online computeaggregation

Problem: missed opportunities for query optimization

2. Using a Relational DBMS

RDF datasetMap to table

(property tables)Build relational

aggregate index

Problems: (1) Usual problems with relationalimplementations; (2 )RDF data is high dimensional (DBpediahas more than 1000 properties) ⇒ prohibits using relationaloptimization techniques

37UPenn/2012-12-04

Page 69: gStore: A Graph-based SPARQL Query Engine

Existing Solutions

1. Using Existing SPARQL Query Engines

Aggregate queryQ

Query withoutaggregation (Q ′)

Result withoutaggregation (R(Q ′))

Query resultR(Q)

Ignore aggregationfunction

Employ existingSPARQL engine

Online computeaggregation

Problem: missed opportunities for query optimization

2. Using a Relational DBMS

RDF datasetMap to table

(property tables)Build relational

aggregate index

Problems: (1) Usual problems with relationalimplementations; (2 )RDF data is high dimensional (DBpediahas more than 1000 properties) ⇒ prohibits using relationaloptimization techniques

37UPenn/2012-12-04

Page 70: gStore: A Graph-based SPARQL Query Engine

Existing Solutions

1. Using Existing SPARQL Query Engines

Aggregate queryQ

Query withoutaggregation (Q ′)

Result withoutaggregation (R(Q ′))

Query resultR(Q)

Ignore aggregationfunction

Employ existingSPARQL engine

Online computeaggregation

Problem: missed opportunities for query optimization

2. Using a Relational DBMS

RDF datasetMap to table

(property tables)Build relational

aggregate index

Problems: (1) Usual problems with relationalimplementations; (2 )RDF data is high dimensional (DBpediahas more than 1000 properties) ⇒ prohibits using relationaloptimization techniques

37UPenn/2012-12-04

Page 71: gStore: A Graph-based SPARQL Query Engine

Existing Solutions

1. Using Existing SPARQL Query Engines

Aggregate queryQ

Query withoutaggregation (Q ′)

Result withoutaggregation (R(Q ′))

Query resultR(Q)

Ignore aggregationfunction

Employ existingSPARQL engine

Online computeaggregation

Problem: missed opportunities for query optimization

2. Using a Relational DBMS

RDF datasetMap to table

(property tables)Build relational

aggregate index

Problems: (1) Usual problems with relationalimplementations; (2 )RDF data is high dimensional (DBpediahas more than 1000 properties) ⇒ prohibits using relationaloptimization techniques

37UPenn/2012-12-04

Page 72: gStore: A Graph-based SPARQL Query Engine

RDF Data are not Well Structured

In relational systems aggregates can be computed from materializedviews.

Key A B C SUM Key A B SUM

View V1 Aggregate table T

T can be constructed by scanning V1

38UPenn/2012-12-04

Page 73: gStore: A Graph-based SPARQL Query Engine

RDF Data are not Well Structured

This can’t always be done in RDF data

(Q1) Group all individuals by theirtitles, gender and birthday,and report the number ofindividuals in each group.

SELECT ? t ?g ? y COUNT(?m)WHERE {?m <born In> ? c .

?m <t i t l e > ? t .?m <gender> ?g .? c <bornOnDate> ? y .}

GROUP BY ? t ?g , ? y .

(Q2) Group all individuals by theirtitles and gender, and reportthe number of individuals ineach group.

SELECT ? t ?g ? y COUNT(?m)WHERE {?m <born In> ? c .

?m <t i t l e > ? t .?m <gender> ?g .}

GROUP BY ? t ?g .

gender title bornOnDate COUNT

Male President 1865-04-15 1

Female Actress 1976-03-22 1

gender title COUNT

Male President 2

Female Actress 1

R(Q1) R(Q2)

R(Q2) cannot be constructed by scanning R(Q1)

38UPenn/2012-12-04

Page 74: gStore: A Graph-based SPARQL Query Engine

Overview of Approach

Work on RDF graphs and query graphs ⇒ graph pattern matching.

1. Decompose aggregatequery Q into starqueries {S1, . . . ,Sn}

2. Evaluate each starquery to get resultsR(S1), . . . , R(Sn)

T-index: triestructureEvaluate withoutjoins

3. Compute result of Qas R(Q) =oni (Ri )

Use VS*-tree tospeed up joinprocessing

v1

gender title

?mv2

foundingYearbornIn

Q

v1

gender title

?mv2

foundingYear

S1 S2

gender title

Male President

Female Actress

foundingYear

1790

1718

1810

1776

R(S1) R(S2)

gender title foundingYear COUNT(?m)Male President 1718 1

Female Actress 1810 1

on

39UPenn/2012-12-04

Page 75: gStore: A Graph-based SPARQL Query Engine

Overview of Approach

Work on RDF graphs and query graphs ⇒ graph pattern matching.

1. Decompose aggregatequery Q into starqueries {S1, . . . ,Sn}

2. Evaluate each starquery to get resultsR(S1), . . . , R(Sn)

T-index: triestructureEvaluate withoutjoins

3. Compute result of Qas R(Q) =oni (Ri )

Use VS*-tree tospeed up joinprocessing

v1

gender title

?mv2

foundingYearbornIn

Q

v1

gender title

?mv2

foundingYear

S1 S2

gender title

Male President

Female Actress

foundingYear

1790

1718

1810

1776

R(S1) R(S2)

gender title foundingYear COUNT(?m)Male President 1718 1

Female Actress 1810 1

on

39UPenn/2012-12-04

Page 76: gStore: A Graph-based SPARQL Query Engine

Overview of Approach

Work on RDF graphs and query graphs ⇒ graph pattern matching.

1. Decompose aggregatequery Q into starqueries {S1, . . . ,Sn}

2. Evaluate each starquery to get resultsR(S1), . . . , R(Sn)

T-index: triestructureEvaluate withoutjoins

3. Compute result of Qas R(Q) =oni (Ri )

Use VS*-tree tospeed up joinprocessing

v1

gender title

?mv2

foundingYearbornIn

Q

v1

gender title

?mv2

foundingYear

S1 S2

gender title

Male President

Female Actress

foundingYear

1790

1718

1810

1776

R(S1) R(S2)

gender title foundingYear COUNT(?m)Male President 1718 1

Female Actress 1810 1

on

39UPenn/2012-12-04

Page 77: gStore: A Graph-based SPARQL Query Engine

Selective Materialization of Views: T-index

Selectively materialize views through a trie structure called T-index.

VertexID

Entity vertex Adjacent Attributes

001 Abraham Lincoln hasName, gender,bornOnDate,title,diedOnDate

002 Washington DC hasName,foundingYear

003 Hodgenville KY hasName004 United States hasName,

foundingYear005 Reese Wither-

spoonhasName, gender,bornOnDate, title

006 New Orleans foundingYear007 Franklin Roo-

sevelthasName, gender,title

008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,

gender, bornOnDate,diedOnDate

N0

N1

N2

N3

N4

N5

N6

N7

N8

N9

root

hasName

gender

bonOnDate

title

diedOnDate

diedOnDate

title

foundingYear

foundingYear{001,002,003,004,005,007,009}

{001,005,007,009}

{001,005,009}

{001,005}

{001}

{009}

{007}

{002,004}

{006,008}

hasName gender bornOnDate title L

AbrahamLincoln

Male 1865-04-15 President {001}

Reese With-erspoon

Female 1976-03-22 Actress {005}

M(N4)

hasName gender title L

Franklin D.Roosevelt

Male President {007}

M(N7)

hasName:7foundingYear:4

gender:4

bornOnDate:3title:3

diedOnDate:2

DimensionList DL

40UPenn/2012-12-04

Transaction Database D

Page 78: gStore: A Graph-based SPARQL Query Engine

T-index Construction

VertexID

Entity vertex Adjacent Attributes

001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate

002 Washington DChasName,foundingYear

003 Hodgenville KYhasName

004 United StateshasName,foundingYear

005 Reese Wither-spoon hasName, gender,

bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-

sevelthasName, gender,title

008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,

gender, bornOnDate,diedOnDate

N0

N1

N2

N3

N4

N5

root

hasName

gender

bonOnDate

title

diedOnDate

N8

foundingYear

{001,002,003}{001,002,003,004}

{002,004}

{001,002,003,004,005}

{002,004}

{001,005}

{001,005}

{001,005}

N9

foundingYear

N6

N7

diedOnDate

title

{001,002,003,004,005,007,009}

{001,005,007,009}

{001,005,009}

{001,005}

{001}

{009}

{007}

{002,004}

{006,008}

hasName gender bornOnDate title L

AbrahamLincoln

Male 1865-04-15 President {001}

Reese With-erspoon

Female 1976-03-22 Actress {005}

M(N4)

hasName gender title L

Franklin D.Roosevelt

Male President {007}

M(N7)

hasName:7

foundingYear:4

gender:4

bornOnDate:3

title:3

diedOnDate:2

DimensionList DL

41UPenn/2012-12-04

Transaction Database D

Page 79: gStore: A Graph-based SPARQL Query Engine

T-index Construction

VertexID

Entity vertex Adjacent Attributes

001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate

002 Washington DChasName,foundingYear

003 Hodgenville KYhasName

004 United StateshasName,foundingYear

005 Reese Wither-spoon hasName, gender,

bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-

sevelthasName, gender,title

008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,

gender, bornOnDate,diedOnDate

N0

N1

N2

N3

N4

N5

root

hasName

gender

bonOnDate

title

diedOnDate

{001}

{001}

{001}

{001}

{001}

N8

foundingYear

{001,002,003}{001,002,003,004}

{002,004}

{001,002,003,004,005}

{002,004}

{001,005}

{001,005}

{001,005}

N9

foundingYear

N6

N7

diedOnDate

title

{001,002,003,004,005,007,009}

{001,005,007,009}

{001,005,009}

{001,005}

{001}

{009}

{007}

{002,004}

{006,008}

hasName gender bornOnDate title L

AbrahamLincoln

Male 1865-04-15 President {001}

Reese With-erspoon

Female 1976-03-22 Actress {005}

M(N4)

hasName gender title L

Franklin D.Roosevelt

Male President {007}

M(N7)

hasName:7

foundingYear:4

gender:4

bornOnDate:3

title:3

diedOnDate:2

DimensionList DL

41UPenn/2012-12-04

Transaction Database D

Page 80: gStore: A Graph-based SPARQL Query Engine

T-index Construction

VertexID

Entity vertex Adjacent Attributes

001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate

002 Washington DChasName,foundingYear

003 Hodgenville KYhasName

004 United StateshasName,foundingYear

005 Reese Wither-spoon hasName, gender,

bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-

sevelthasName, gender,title

008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,

gender, bornOnDate,diedOnDate

N0

N1

N2

N3

N4

N5

root

hasName

gender

bonOnDate

title

diedOnDate

{001}

{001}

{001}

{001}

N8

foundingYear

{001,002}

{002}

{001,002,003}{001,002,003,004}

{002,004}

{001,002,003,004,005}

{002,004}

{001,005}

{001,005}

{001,005}

N9

foundingYear

N6

N7

diedOnDate

title

{001,002,003,004,005,007,009}

{001,005,007,009}

{001,005,009}

{001,005}

{001}

{009}

{007}

{002,004}

{006,008}

hasName gender bornOnDate title L

AbrahamLincoln

Male 1865-04-15 President {001}

Reese With-erspoon

Female 1976-03-22 Actress {005}

M(N4)

hasName gender title L

Franklin D.Roosevelt

Male President {007}

M(N7)

hasName:7

foundingYear:4

gender:4

bornOnDate:3

title:3

diedOnDate:2

DimensionList DL

41UPenn/2012-12-04

Transaction Database D

Page 81: gStore: A Graph-based SPARQL Query Engine

T-index Construction

VertexID

Entity vertex Adjacent Attributes

001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate

002 Washington DChasName,foundingYear

003 Hodgenville KYhasName

004 United StateshasName,foundingYear

005 Reese Wither-spoon hasName, gender,

bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-

sevelthasName, gender,title

008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,

gender, bornOnDate,diedOnDate

N0

N1

N2

N3

N4

N5

root

hasName

gender

bonOnDate

title

diedOnDate

{001}

{001}

{001}

{001}

N8

foundingYear

{002}

{001,002,003}

{001,002,003,004}

{002,004}

{001,002,003,004,005}

{002,004}

{001,005}

{001,005}

{001,005}

N9

foundingYear

N6

N7

diedOnDate

title

{001,002,003,004,005,007,009}

{001,005,007,009}

{001,005,009}

{001,005}

{001}

{009}

{007}

{002,004}

{006,008}

hasName gender bornOnDate title L

AbrahamLincoln

Male 1865-04-15 President {001}

Reese With-erspoon

Female 1976-03-22 Actress {005}

M(N4)

hasName gender title L

Franklin D.Roosevelt

Male President {007}

M(N7)

hasName:7

foundingYear:4

gender:4

bornOnDate:3

title:3

diedOnDate:2

DimensionList DL

41UPenn/2012-12-04

Transaction Database D

Page 82: gStore: A Graph-based SPARQL Query Engine

T-index Construction

VertexID

Entity vertex Adjacent Attributes

001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate

002 Washington DChasName,foundingYear

003 Hodgenville KYhasName

004 United StateshasName,foundingYear

005 Reese Wither-spoon hasName, gender,

bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-

sevelthasName, gender,title

008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,

gender, bornOnDate,diedOnDate

N0

N1

N2

N3

N4

N5

root

hasName

gender

bonOnDate

title

diedOnDate

{001}

{001}

{001}

{001}

N8

foundingYear

{001,002,003}

{001,002,003,004}

{002,004}

{001,002,003,004,005}

{002,004}

{001,005}

{001,005}

{001,005}

N9

foundingYear

N6

N7

diedOnDate

title

{001,002,003,004,005,007,009}

{001,005,007,009}

{001,005,009}

{001,005}

{001}

{009}

{007}

{002,004}

{006,008}

hasName gender bornOnDate title L

AbrahamLincoln

Male 1865-04-15 President {001}

Reese With-erspoon

Female 1976-03-22 Actress {005}

M(N4)

hasName gender title L

Franklin D.Roosevelt

Male President {007}

M(N7)

hasName:7

foundingYear:4

gender:4

bornOnDate:3

title:3

diedOnDate:2

DimensionList DL

41UPenn/2012-12-04

Transaction Database D

Page 83: gStore: A Graph-based SPARQL Query Engine

T-index Construction

VertexID

Entity vertex Adjacent Attributes

001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate

002 Washington DChasName,foundingYear

003 Hodgenville KYhasName

004 United StateshasName,foundingYear

005 Reese Wither-spoon hasName, gender,

bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-

sevelthasName, gender,title

008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,

gender, bornOnDate,diedOnDate

N0

N1

N2

N3

N4

N5

root

hasName

gender

bonOnDate

title

diedOnDate{001}

N8

foundingYear

{001,002,003}{001,002,003,004}

{002,004}

{001,002,003,004,005}

{002,004}

{001,005}

{001,005}

{001,005}

N9

foundingYear

N6

N7

diedOnDate

title

{001,002,003,004,005,007,009}

{001,005,007,009}

{001,005,009}

{001,005}

{001}

{009}

{007}

{002,004}

{006,008}

hasName gender bornOnDate title L

AbrahamLincoln

Male 1865-04-15 President {001}

Reese With-erspoon

Female 1976-03-22 Actress {005}

M(N4)

hasName gender title L

Franklin D.Roosevelt

Male President {007}

M(N7)

hasName:7

foundingYear:4

gender:4

bornOnDate:3

title:3

diedOnDate:2

DimensionList DL

41UPenn/2012-12-04

Transaction Database D

Page 84: gStore: A Graph-based SPARQL Query Engine

T-index Construction

VertexID

Entity vertex Adjacent Attributes

001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate

002 Washington DChasName,foundingYear

003 Hodgenville KYhasName

004 United StateshasName,foundingYear

005 Reese Wither-spoon hasName, gender,

bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-

sevelthasName, gender,title

008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,

gender, bornOnDate,diedOnDate

N0

N1

N2

N3

N4

N5

root

hasName

gender

bonOnDate

title

diedOnDate{001}

N8

foundingYear

{001,002,003}{001,002,003,004}

{002,004}

{001,002,003,004,005}

{002,004}

{001,005}

{001,005}

{001,005}

N9

foundingYear

{006}

{001,002,003,004,005}

{002,004}

{001,005}

{001,005}

{001,005}

N6

N7

diedOnDate

title

{001,002,003,004,005,007,009}

{001,005,007,009}

{001,005,009}

{001,005}

{001}

{009}

{007}

{002,004}

{006,008}

hasName gender bornOnDate title L

AbrahamLincoln

Male 1865-04-15 President {001}

Reese With-erspoon

Female 1976-03-22 Actress {005}

M(N4)

hasName gender title L

Franklin D.Roosevelt

Male President {007}

M(N7)

hasName:7

foundingYear:4

gender:4

bornOnDate:3

title:3

diedOnDate:2

DimensionList DL

41UPenn/2012-12-04

Transaction Database D

Page 85: gStore: A Graph-based SPARQL Query Engine

T-index Construction

VertexID

Entity vertex Adjacent Attributes

001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate

002 Washington DChasName,foundingYear

003 Hodgenville KYhasName

004 United StateshasName,foundingYear

005 Reese Wither-spoon hasName, gender,

bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-

sevelthasName, gender,title

008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,

gender, bornOnDate,diedOnDate

N0

N1

N2

N3

N4

N5

root

hasName

gender

bonOnDate

title

diedOnDate{001}

N8

foundingYear

{001,002,003}{001,002,003,004}

{002,004}

{001,002,003,004,005}

{002,004}

{001,005}

{001,005}

{001,005}

N9

foundingYear

N6

N7

diedOnDate

title

{001,002,003,004,005,007,009}

{001,005,007,009}

{001,005,009}

{001,005}

{001}

{009}

{007}

{002,004}

{006,008}

hasName gender bornOnDate title L

AbrahamLincoln

Male 1865-04-15 President {001}

Reese With-erspoon

Female 1976-03-22 Actress {005}

M(N4)

hasName gender title L

Franklin D.Roosevelt

Male President {007}

M(N7)

hasName:7

foundingYear:4

gender:4

bornOnDate:3

title:3

diedOnDate:2

DimensionList DL

41UPenn/2012-12-04

Transaction Database D

Page 86: gStore: A Graph-based SPARQL Query Engine

T-index Construction

VertexID

Entity vertex Adjacent Attributes

001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate

002 Washington DChasName,foundingYear

003 Hodgenville KYhasName

004 United StateshasName,foundingYear

005 Reese Wither-spoon hasName, gender,

bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-

sevelthasName, gender,title

008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,

gender, bornOnDate,diedOnDate

N0

N1

N2

N3

N4

N5

root

hasName

gender

bonOnDate

title

diedOnDate{001}

N8

foundingYear

{001,002,003}{001,002,003,004}

{002,004}

{001,002,003,004,005}

{002,004}

{001,005}

{001,005}

{001,005}

N9

foundingYear

N6

N7

diedOnDate

title

{001,002,003,004,005,007,009}

{001,005,007,009}

{001,005,009}

{001,005}

{001}

{009}

{007}

{002,004}

{006,008}

hasName gender bornOnDate title L

AbrahamLincoln

Male 1865-04-15 President {001}

Reese With-erspoon

Female 1976-03-22 Actress {005}

M(N4)

hasName gender title L

Franklin D.Roosevelt

Male President {007}

M(N7)

hasName:7

foundingYear:4

gender:4

bornOnDate:3

title:3

diedOnDate:2

DimensionList DL

41UPenn/2012-12-04

Transaction Database D

Page 87: gStore: A Graph-based SPARQL Query Engine

T-index Construction

VertexID

Entity vertex Adjacent Attributes

001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate

002 Washington DChasName,foundingYear

003 Hodgenville KYhasName

004 United StateshasName,foundingYear

005 Reese Wither-spoon hasName, gender,

bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-

sevelthasName, gender,title

008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,

gender, bornOnDate,diedOnDate

N0

N1

N2

N3

N4

N5

root

hasName

gender

bonOnDate

title

diedOnDate{001}

N8

foundingYear

{001,002,003}{001,002,003,004}

{002,004}

{001,002,003,004,005}

{002,004}

{001,005}

{001,005}

{001,005}

N9

foundingYear

N6

N7

diedOnDate

title

{001,002,003,004,005,007,009}

{001,005,007,009}

{001,005,009}

{001,005}

{001}

{009}

{007}

{002,004}

{006,008}

hasName gender bornOnDate title L

AbrahamLincoln

Male 1865-04-15 President {001}

Reese With-erspoon

Female 1976-03-22 Actress {005}

M(N4)

hasName gender title L

Franklin D.Roosevelt

Male President {007}

M(N7)

hasName:7

foundingYear:4

gender:4

bornOnDate:3

title:3

diedOnDate:2

DimensionList DL

41UPenn/2012-12-04

Transaction Database D

Page 88: gStore: A Graph-based SPARQL Query Engine

Star Aggregate (SA) Query Processing

Given a SA QueryS(v , p1, . . . , pn):

1. Find all matches of(p1, . . . , pn) inT-index

2. For each match

2.1 Let Ni be thelowest node in thematch

2.2 Let M(Ni ) be Ni ’saggregate set

2.3 M ′(Ni ) =Π(P1,...,Pn)M(Ni )

3. M = ∪iM ′(Ni )

v1

gender title

?m

42UPenn/2012-12-04

Page 89: gStore: A Graph-based SPARQL Query Engine

Star Aggregate (SA) Query Processing

Given a SA QueryS(v , p1, . . . , pn):

1. Find all matches of(p1, . . . , pn) inT-index

2. For each match

2.1 Let Ni be thelowest node in thematch

2.2 Let M(Ni ) be Ni ’saggregate set

2.3 M ′(Ni ) =Π(P1,...,Pn)M(Ni )

3. M = ∪iM ′(Ni )

v1

gender title

?m

42UPenn/2012-12-04

{N1,N2,N3,N4} {N1,N2,N7}

Page 90: gStore: A Graph-based SPARQL Query Engine

Star Aggregate (SA) Query Processing

Given a SA QueryS(v , p1, . . . , pn):

1. Find all matches of(p1, . . . , pn) inT-index

2. For each match

2.1 Let Ni be thelowest node in thematch

2.2 Let M(Ni ) be Ni ’saggregate set

2.3 M ′(Ni ) =Π(P1,...,Pn)M(Ni )

3. M = ∪iM ′(Ni )

v1

gender title

?m

hasName gender bornOnDate title L

AbrahamLincoln

Male 1865-04-15 President {001}

ReeseWitherspoon

Female 1976-03-22 Actress {005}

gender title L

Male President {001}Female Actress {005}

gender title L

Male President {007}

hasName gender title L

Franklin D.Roosevelt

Male President {007}

42UPenn/2012-12-04

{N1,N2,N3,N4} {N1,N2,N7}M(N4)

M′(N4) M′(N7)

M(N7)

Page 91: gStore: A Graph-based SPARQL Query Engine

Star Aggregate (SA) Query Processing

Given a SA QueryS(v , p1, . . . , pn):

1. Find all matches of(p1, . . . , pn) inT-index

2. For each match

2.1 Let Ni be thelowest node in thematch

2.2 Let M(Ni ) be Ni ’saggregate set

2.3 M ′(Ni ) =Π(P1,...,Pn)M(Ni )

3. M = ∪iM ′(Ni )

v1

gender title

?m

gender title L

Male President {001}Female Actress {005}

gender title L

Male President {007}

gender title L COUNT

Male President {001,007} 2

Female Actress {005} 1

42UPenn/2012-12-04

M′(N4) M′(N7)

Page 92: gStore: A Graph-based SPARQL Query Engine

General Aggregate (GA) Query Processing

1. Decompose aggregatequery Q into starqueries {S1, . . . ,Sn}

2. Evaluate them to getR(S1), . . . , R(Sn)

3. Generate vertex lists(L1, . . . , Ln) for eachR(S1), . . . , R(Sn)

v1

gender title

?mv2

foundingYearbornIn

Q

v1

gender title

?mv2

foundingYear

S1 S2

gender title L

Male President {001,007}Female Actress {005}

foundingYear L

1790 {002}1718 {006}1810 {008}1776 {004}

43UPenn/2012-12-04

Page 93: gStore: A Graph-based SPARQL Query Engine

General Aggregate (GA) Query Processing

1. Decompose aggregatequery Q into starqueries {S1, . . . ,Sn}

2. Evaluate them to getR(S1), . . . , R(Sn)

3. Generate vertex lists(L1, . . . , Ln) for eachR(S1), . . . , R(Sn)

v1

gender title

?mv2

foundingYearbornIn

Q

v1

gender title

?mv2

foundingYear

S1 S2

gender title L

Male President {001,007}Female Actress {005}

foundingYear L

1790 {002}1718 {006}1810 {008}1776 {004}

43UPenn/2012-12-04

R(S1) R(S2)

Page 94: gStore: A Graph-based SPARQL Query Engine

General Aggregate (GA) Query Processing

1. Decompose aggregatequery Q into starqueries {S1, . . . ,Sn}

2. Evaluate them to getR(S1), . . . , R(Sn)

3. Generate vertex lists(L1, . . . , Ln) for eachR(S1), . . . , R(Sn)

v1

gender title

?mv2

foundingYearbornIn

Q

v1

gender title

?mv2

foundingYear

S1 S2

gender title L

Male President {001,007}Female Actress {005}

foundingYear L

1790 {002}1718 {006}1810 {008}1776 {004}

L1 = {001, 005, 007}L2 = {002, 004, 006, 008}

43UPenn/2012-12-04

R(S1) R(S2)

Page 95: gStore: A Graph-based SPARQL Query Engine

GA Query Processing (cont’d)

4. Use VS*-tree to findall subgraph matchesof (L1, . . . , Ln) –result is joinaccording to linkattribute

5. Compute the finalaggregate result (wehave optimizationshere)

u1 u2bornIn

u1 u2

005 006

007 008

Matchesgender title foundingYear u1 u2 COUNT

Male President 1718 005 006 1

Female Actress 1810 007 008 1

44UPenn/2012-12-04

L1 = {001, 005, 007} L2 = {002, 004, 006, 008}

Page 96: gStore: A Graph-based SPARQL Query Engine

GA Query Processing (cont’d)

4. Use VS*-tree to findall subgraph matchesof (L1, . . . , Ln) –result is joinaccording to linkattribute

5. Compute the finalaggregate result (wehave optimizationshere)

u1 u2bornIn

u1 u2

005 006

007 008

Matchesgender title foundingYear u1 u2 COUNT

Male President 1718 005 006 1

Female Actress 1810 007 008 1

44UPenn/2012-12-04

L1 = {001, 005, 007} L2 = {002, 004, 006, 008}

Page 97: gStore: A Graph-based SPARQL Query Engine

GA Query Processing (cont’d)

4. Use VS*-tree to findall subgraph matchesof (L1, . . . , Ln) –result is joinaccording to linkattribute

5. Compute the finalaggregate result (wehave optimizationshere)

u1 u2bornIn

u1 u2

005 006

007 008

Matchesgender title foundingYear u1 u2 COUNT

Male President 1718 005 006 1

Female Actress 1810 007 008 1

44UPenn/2012-12-04

L1 = {001, 005, 007} L2 = {002, 004, 006, 008}

Page 98: gStore: A Graph-based SPARQL Query Engine

Performance Comparison

SA1 SA2 SA3 GA1 GA2 GA30

2,000

4,000

6,000

8,000

Queries

Qu

ery

Res

pon

seT

ime

(in

ms)

Yago network (20 million triples & size 3.1GB)

gStore CAA Virtuoso45UPenn/2012-12-04

Page 99: gStore: A Graph-based SPARQL Query Engine

Outline

1 RDF Introduction

2 Answering Regular SPARQL QueriesAnswering SPARQL queries using graph pattern matching[ZouMoZhaoChenOzsu, PVLDB 2011]

3 AggregationAnswering Aggregate SPARQL Queries Over Large RDFRepositories

4 Future Research

47UPenn/2012-12-04

Page 100: gStore: A Graph-based SPARQL Query Engine

Systematic Study of SPARQL queryprocessing/optimization (Gunes Aluc)

Representation as graphs

Systematic study of dynamic RDF data management

Continuously and seamlessly self-tune their configurations

Systematic study of distributed RDF data management

Utilization of view materialization to improve performanceexploiting graph structure

End

48UPenn/2012-12-04

Page 101: gStore: A Graph-based SPARQL Query Engine

Efficient SPARQL Query Execution over Dynamic RDFDatabases

Problem: Existing RDF management systems are designedfor static datasets and workloads, whereas the RDF data onthe Web require dynamic solutions.

The underlying physical database layout in existing RDFmanagement systems is fixed a-priori.Hence, existing systems portray suboptimal performance whendata are updated or the characteristics of the workloadschange.

Objective: To design techniques (e.g., data structures,indexes, algorithms, etc.) that enable RDF managementsystems to continuously and seamlessly self-tune theirconfigurations to maintain an optimal performance.

49UPenn/2012-12-04

Page 102: gStore: A Graph-based SPARQL Query Engine

Efficient SPARQL Query Execution over Distributed RDFDatabases

Problem: RDF is inherently distributed across the Web,where query execution can be much more expensive than in acentralized setting.

Objective: To reduce the cost of query execution in adistributed RDF database through view materialization. Morespecifically,

to design efficient view selection algorithms (w.r.t. space/time)that potentially exploit the graph-theoretic properties of RDF.

50UPenn/2012-12-04

Page 103: gStore: A Graph-based SPARQL Query Engine

Thank you!

51UPenn/2012-12-04

Page 104: gStore: A Graph-based SPARQL Query Engine

Encoding Technique

vLabel adjListy:Abraham Lincoln (hasName, “Abraham Lincoln”), (BornOnDate, “1809-02-12”), (DiedOnDate,“1865-04-15”),

(DiedIn, y:Washington DC)

(hasName, “Abraham Lincoln”)

0010 0000 0000

1000 0010 0100 0000

“Abr”

“bra”

“rah”. . .

0000 0010 0000 0000

1000 0000 0000 0000

0000 0000 0100 0000. . .

OR

1000 0010 0100 0000

(BornOnDate, “1908-02-12”)

0100 0000 0000 0100 0010 0100 1000

(DiedOnDate, “1965-04-15”)

0000 1000 0000 0000 0010 0100 0000

(DiedIn, y:Washington DC)

0000 0010 0000 1000 0010 0100 0001

0000 0010 0000 1100 0010 0100 1001

OR

52UPenn/2012-12-04

Page 105: gStore: A Graph-based SPARQL Query Engine

Encoding Technique

vLabel adjListy:Abraham Lincoln (hasName, “Abraham Lincoln”), (BornOnDate, “1809-02-12”), (DiedOnDate,“1865-04-15”),

(DiedIn, y:Washington DC)

(hasName, “Abraham Lincoln”)

0010 0000 0000

1000 0010 0100 0000

“Abr”

“bra”

“rah”. . .

0000 0010 0000 0000

1000 0000 0000 0000

0000 0000 0100 0000. . .

OR

1000 0010 0100 0000

(BornOnDate, “1908-02-12”)

0100 0000 0000 0100 0010 0100 1000

(DiedOnDate, “1965-04-15”)

0000 1000 0000 0000 0010 0100 0000

(DiedIn, y:Washington DC)

0000 0010 0000 1000 0010 0100 0001

0000 0010 0000 1100 0010 0100 1001

OR

52UPenn/2012-12-04

Page 106: gStore: A Graph-based SPARQL Query Engine

Encoding Technique

vLabel adjListy:Abraham Lincoln (hasName, “Abraham Lincoln”), (BornOnDate, “1809-02-12”), (DiedOnDate,“1865-04-15”),

(DiedIn, y:Washington DC)

(hasName, “Abraham Lincoln”)

0010 0000 0000

1000 0010 0100 0000

“Abr”

“bra”

“rah”. . .

0000 0010 0000 0000

1000 0000 0000 0000

0000 0000 0100 0000. . .

OR

1000 0010 0100 0000

(BornOnDate, “1908-02-12”)

0100 0000 0000 0100 0010 0100 1000

(DiedOnDate, “1965-04-15”)

0000 1000 0000 0000 0010 0100 0000

(DiedIn, y:Washington DC)

0000 0010 0000 1000 0010 0100 0001

0000 0010 0000 1100 0010 0100 1001

OR

52UPenn/2012-12-04

Page 107: gStore: A Graph-based SPARQL Query Engine

Encoding Technique

vLabel adjListy:Abraham Lincoln (hasName, “Abraham Lincoln”), (BornOnDate, “1809-02-12”), (DiedOnDate,“1865-04-15”),

(DiedIn, y:Washington DC)

(hasName, “Abraham Lincoln”)

0010 0000 0000

1000 0010 0100 0000

“Abr”

“bra”

“rah”. . .

0000 0010 0000 0000

1000 0000 0000 0000

0000 0000 0100 0000. . .

OR

1000 0010 0100 0000

(BornOnDate, “1908-02-12”)

0100 0000 0000 0100 0010 0100 1000

(DiedOnDate, “1965-04-15”)

0000 1000 0000 0000 0010 0100 0000

(DiedIn, y:Washington DC)

0000 0010 0000 1000 0010 0100 0001

0000 0010 0000 1100 0010 0100 1001

OR

52UPenn/2012-12-04

Page 108: gStore: A Graph-based SPARQL Query Engine

Encoding Technique

vLabel adjListy:Abraham Lincoln (hasName, “Abraham Lincoln”), (BornOnDate, “1809-02-12”), (DiedOnDate,“1865-04-15”),

(DiedIn, y:Washington DC)

(hasName, “Abraham Lincoln”)

0010 0000 0000 1000 0010 0100 0000

“Abr”

“bra”

“rah”. . .

0000 0010 0000 0000

1000 0000 0000 0000

0000 0000 0100 0000. . .

OR

1000 0010 0100 0000

(BornOnDate, “1908-02-12”)

0100 0000 0000 0100 0010 0100 1000

(DiedOnDate, “1965-04-15”)

0000 1000 0000 0000 0010 0100 0000

(DiedIn, y:Washington DC)

0000 0010 0000 1000 0010 0100 0001

0000 0010 0000 1100 0010 0100 1001

OR

52UPenn/2012-12-04

Page 109: gStore: A Graph-based SPARQL Query Engine

Encoding Technique

vLabel adjListy:Abraham Lincoln (hasName, “Abraham Lincoln”), (BornOnDate, “1809-02-12”), (DiedOnDate,“1865-04-15”),

(DiedIn, y:Washington DC)

(hasName, “Abraham Lincoln”)

0010 0000 0000 1000 0010 0100 0000

“Abr”

“bra”

“rah”. . .

0000 0010 0000 0000

1000 0000 0000 0000

0000 0000 0100 0000. . .

OR

1000 0010 0100 0000

(BornOnDate, “1908-02-12”)

0100 0000 0000 0100 0010 0100 1000

(DiedOnDate, “1965-04-15”)

0000 1000 0000 0000 0010 0100 0000

(DiedIn, y:Washington DC)

0000 0010 0000 1000 0010 0100 0001

0000 0010 0000 1100 0010 0100 1001

OR

52UPenn/2012-12-04

Page 110: gStore: A Graph-based SPARQL Query Engine

Encoding Technique

vLabel adjListy:Abraham Lincoln (hasName, “Abraham Lincoln”), (BornOnDate, “1809-02-12”), (DiedOnDate,“1865-04-15”),

(DiedIn, y:Washington DC)

(hasName, “Abraham Lincoln”)

0010 0000 0000 1000 0010 0100 0000

“Abr”

“bra”

“rah”. . .

0000 0010 0000 0000

1000 0000 0000 0000

0000 0000 0100 0000. . .

OR

1000 0010 0100 0000

(BornOnDate, “1908-02-12”)

0100 0000 0000 0100 0010 0100 1000

(DiedOnDate, “1965-04-15”)

0000 1000 0000 0000 0010 0100 0000

(DiedIn, y:Washington DC)

0000 0010 0000 1000 0010 0100 0001

0000 0010 0000 1100 0010 0100 1001

OR

52UPenn/2012-12-04

Page 111: gStore: A Graph-based SPARQL Query Engine

T-index Maintenance

Updates modeled as triple deletion + insertion - Consider (s, p, o)

DL’s order does not change – s not in the RDF data

Insert new transaction into DLInsert the new transaction into one path of T-indexUpdate materialized sets along the path

VertexID

Entity ver-tex

Adjacency Attributes

001 Person1 salary, gender, title

002 Person2 salary, gender, title

003 Person3 salary, gender, title

004 Person4 salary, gender,title, nationality

005 Person5 salary, gender,nationality

006 Person6 salary, title,nationality

007 Paper1 year, inProceedings,paperTitle

008 Paper2 year, inProceedings,paperTitle

009 Paper3 year, inProceedings,paperTitle

010 School1 locatedIn, name

011 School2 locatedIn, name

null

N0

N1

N2

N3

N4

N5

N6

N7

N8

N9

N10

N11

salary

gender

title

nationality

title

nationality

year

inProceedings

paperTitle

locatedIn

name

nationality

{001,002,003,004,005,006}

{001,002,003,004,005}

{001,002,003,004}

{004}{005}

{006}

{006}

{007,008,009}

{007,008,009}

{007,008,009}

{010,011}

{010,011}

salary gender title L

$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}

salary gender title nationality L

$50,000 Female Assistant Professor Chinese {004}

M(N2)

M(N3)

salary

gender

title

nationality

year

inProceedings

paperTitle

locatedIn

name

53UPenn/2012-12-04

Transaction Database D

DimensionList DL

Page 112: gStore: A Graph-based SPARQL Query Engine

T-index Maintenance

Updates modeled as triple deletion + insertion - Consider (s, p, o)

DL’s order does not change – s is in the RDF data; pi > p > pi+1

Find Nn with (p1, . . . , pi , . . . , pn) and Ni with (p1, . . . , pi )Remove s from path Ni+1 − Nn and update M(Nj), i + 1 ≤ j ≤ nInsert (p, pi+1, . . . , pn) at Ni ; update M(Nj), null ≤ j ≤ i

VertexID

Entity ver-tex

Adjacency Attributes

001 Person1 salary, gender, title

002 Person2 salary, gender, title

003 Person3 salary, gender, title

004 Person4 salary, gender,title, nationality

005 Person5 salary, gender,nationality

006 Person6 salary, title,nationality

007 Paper1 year, inProceedings,paperTitle

008 Paper2 year, inProceedings,paperTitle

009 Paper3 year, inProceedings,paperTitle

010 School1 locatedIn, name

011 School2 locatedIn, name

null

N0

N1

N2

N3

N4

N5

N6

N7

N8

N9

N10

N11

salary

gender

title

nationality

title

nationality

year

inProceedings

paperTitle

locatedIn

name

nationality

{001,002,003,004,005,006}

{001,002,003,004,005}

{001,002,003,004}

{004}{005}

{006}

{006}

{007,008,009}

{007,008,009}

{007,008,009}

{010,011}

{010,011}

salary gender title L

$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}

salary gender title nationality L

$50,000 Female Assistant Professor Chinese {004}

M(N2)

M(N3)

salary

gender

title

nationality

year

inProceedings

paperTitle

locatedIn

name

53UPenn/2012-12-04

Transaction Database D

DimensionList DL

Page 113: gStore: A Graph-based SPARQL Query Engine

T-index Maintenance

Updates modeled as triple deletion + insertion - Consider (s, p, o)

DL’s order does not change – s is in the RDF data; pi > p > pi+1

Find Nn with (p1, . . . , pi , . . . , pn) and Ni with (p1, . . . , pi )Remove s from path Ni+1 − Nn and update M(Nj), i + 1 ≤ j ≤ nInsert (p, pi+1, . . . , pn) at Ni ; update M(Nj), null ≤ j ≤ i

VertexID

Entity ver-tex

Adjacency Attributes

001 Person1 salary, gender, title

002 Person2 salary, gender, title

003 Person3 salary, gender, title

004 Person4 salary, gender,title, nationality

005 Person5 salary, gender,nationality

006 Person6 salary, gender, title,nationality

007 Paper1 year, inProceedings,paperTitle

008 Paper2 year, inProceedings,paperTitle

009 Paper3 year, inProceedings,paperTitle

010 School1 locatedIn, name

011 School2 locatedIn, name

null

N0

N1

N2

N3

N4

N5

N6

N7

N8

N9

N10

N11

salary

gender

title

nationality

title

nationality

year

inProceedings

paperTitle

locatedIn

name

nationality

{001,002,003,004,005,006}

{001,002,003,004,005}

{001,002,003,004}

{004}{005}

{006}

{006}

{007,008,009}

{007,008,009}

{007,008,009}

{010,011}

{010,011}

salary gender title L

$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}

salary gender title nationality L

$50,000 Female Assistant Professor Chinese {004}

M(N2)

M(N3)

salary

gender

title

nationality

year

inProceedings

paperTitle

locatedIn

name

53UPenn/2012-12-04

Transaction Database D

DimensionList DL

Insert (Person6, gender, “Male”)

Page 114: gStore: A Graph-based SPARQL Query Engine

T-index Maintenance

Updates modeled as triple deletion + insertion - Consider (s, p, o)

DL’s order does not change – s is in the RDF data; pi > p > pi+1

Find Nn with (p1, . . . , pi , . . . , pn) and Ni with (p1, . . . , pi )Remove s from path Ni+1 − Nn and update M(Nj), i + 1 ≤ j ≤ nInsert (p, pi+1, . . . , pn) at Ni ; update M(Nj), null ≤ j ≤ i

VertexID

Entity ver-tex

Adjacency Attributes

001 Person1 salary, gender, title

002 Person2 salary, gender, title

003 Person3 salary, gender, title

004 Person4 salary, gender,title, nationality

005 Person5 salary, gender,nationality

006 Person6 salary, gender, title,nationality

007 Paper1 year, inProceedings,paperTitle

008 Paper2 year, inProceedings,paperTitle

009 Paper3 year, inProceedings,paperTitle

010 School1 locatedIn, name

011 School2 locatedIn, name

null

N0

N1

N2

N3

N4

N5

N6

N7

N8

N9

N10

N11

salary

gender

title

nationality

title

nationality

year

inProceedings

paperTitle

locatedIn

name

nationality

{001,002,003,004,005,006}

{001,002,003,004,005}

{001,002,003,004}

{004}{005}

{006}

{006}

{007,008,009}

{007,008,009}

{007,008,009}

{010,011}

{010,011}

salary gender title L

$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}

salary gender title nationality L

$50,000 Female Assistant Professor Chinese {004}

M(N2)

M(N3)

salary

gender

title

nationality

year

inProceedings

paperTitle

locatedIn

name

53UPenn/2012-12-04

Transaction Database D

DimensionList DL

Insert (Person6, gender, “Male”)

Page 115: gStore: A Graph-based SPARQL Query Engine

T-index Maintenance

Updates modeled as triple deletion + insertion - Consider (s, p, o)

DL’s order does not change – s is in the RDF data; pi > p > pi+1

Find Nn with (p1, . . . , pi , . . . , pn) and Ni with (p1, . . . , pi )Remove s from path Ni+1 − Nn and update M(Nj), i + 1 ≤ j ≤ nInsert (p, pi+1, . . . , pn) at Ni ; update M(Nj), null ≤ j ≤ i

VertexID

Entity ver-tex

Adjacency Attributes

001 Person1 salary, gender, title

002 Person2 salary, gender, title

003 Person3 salary, gender, title

004 Person4 salary, gender,title, nationality

005 Person5 salary, gender,nationality

006 Person6 salary, gender, title,nationality

007 Paper1 year, inProceedings,paperTitle

008 Paper2 year, inProceedings,paperTitle

009 Paper3 year, inProceedings,paperTitle

010 School1 locatedIn, name

011 School2 locatedIn, name

null

N0

N1

N2

N3

N4

N5

N6

N7

N8

N9

N10

N11

salary

gender

title

nationality

title

nationality

year

inProceedings

paperTitle

locatedIn

name

nationality

{001,002,003,004,005,006}

{001,002,003,004,005,006}

{001,002,003,004,006}

{004,006}{005}

{006}

{006}

{007,008,009}

{007,008,009}

{007,008,009}

{010,011}

{010,011}

salary gender title L

$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}$50,000 Male Associate Professor {006}

salary gender title nationality L

$50,000 Female Assistant Professor Chinese {004}

M(N2)

M(N3)

salary

gender

title

nationality

year

inProceedings

paperTitle

locatedIn

name

53UPenn/2012-12-04

Transaction Database D

DimensionList DL

Insert (Person6, gender, “Male”)

Page 116: gStore: A Graph-based SPARQL Query Engine

T-index Maintenance

Updates modeled as triple deletion + insertion - Consider (s, p, o)

DL’s order does change

This causes swapping of pi and pi+1 in DLFirst update as if the order does not changeThen update DL, T-index and materialized sets

VertexID

Entity ver-tex

Adjacency Attributes

001 Person1 salary, gender, title

002 Person2 salary, gender, title

003 Person3 salary, gender, title

004 Person4 salary, gender,title, nationality

005 Person5 salary, gender,nationality

006 Person6 salary, title,nationality

007 Paper1 year, inProceedings,paperTitle

008 Paper2 year, inProceedings,paperTitle

009 Paper3 year, inProceedings,paperTitle

010 School1 locatedIn, name

011 School2 locatedIn, name

null

N0

N1

N2

N3

N4

N5

N6

N7

N8

N9

N10

N11

salary

gender

title

nationality

title

nationality

year

inProceedings

paperTitle

locatedIn

name

nationality

{001,002,003,004,005,006}

{001,002,003,004,005}

{001,002,003,004}

{004}{005}

{006}

{006}

{007,008,009}

{007,008,009}

{007,008,009}

{010,011}

{010,011}

salary gender title L

$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}

salary gender title nationality L

$50,000 Female Assistant Professor Chinese {004}

M(N2)

M(N3)

salary

gender

title

nationality

year

inProceedings

paperTitle

locatedIn

name

53UPenn/2012-12-04

Transaction Database D

DimensionList DL

Page 117: gStore: A Graph-based SPARQL Query Engine

T-index Maintenance

Updates modeled as triple deletion + insertion - Consider (s, p, o)

DL’s order does change

This causes swapping of pi and pi+1 in DLFirst update as if the order does not changeThen update DL, T-index and materialized sets

VertexID

Entity ver-tex

Adjacency Attributes

001 Person1 salary, gender, title

002 Person2 salary, gender, title

003 Person3 salary, gender, title

004 Person4 salary, gender,title, nationality

005 Person5 salary, gender,nationality

006 Person6 salary, title,nationality

007 Paper1 year, inProceedings,paperTitle

008 Paper2 year, inProceedings,paperTitle

009 Paper3 year, inProceedings,paperTitle

010 School1 locatedIn, name

011 School2 locatedIn, name

null

N0

N1

N2

N3

N4

N5

N6

N7

N8

N9

N10

N11

salary

gender

title

nationality

title

nationality

year

inProceedings

paperTitle

locatedIn

name

nationality

{001,002,003,004,005,006}

{001,002,003,004,005}

{001,002,003,004}

{004}{005}

{006}

{006}

{007,008,009}

{007,008,009}

{007,008,009}

{010,011}

{010,011}

salary gender title L

$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}

salary gender title nationality L

$50,000 Female Assistant Professor Chinese {004}

M(N2)

M(N3)

salary

gender

title

nationality

year

inProceedings

paperTitle

locatedIn

name

53UPenn/2012-12-04

Transaction Database D

DimensionList DL

Delete (Person2, gender, “Male”)

swap gender (pi )and title (pi+1)

Page 118: gStore: A Graph-based SPARQL Query Engine

T-index Maintenance

Updates modeled as triple deletion + insertion - Consider (s, p, o)

DL’s order does change

This causes swapping of pi and pi+1 in DLFirst update as if the order does not changeThen update DL, T-index and materialized sets

VertexID

Entity ver-tex

Adjacency Attributes

001 Person1 salary, gender, title

002 Person2 salary, gender, title

003 Person3 salary, gender, title

004 Person4 salary, gender,title, nationality

005 Person5 salary, gender,nationality

006 Person6 salary, title,nationality

007 Paper1 year, inProceedings,paperTitle

008 Paper2 year, inProceedings,paperTitle

009 Paper3 year, inProceedings,paperTitle

010 School1 locatedIn, name

011 School2 locatedIn, name

null

N0

N1

N2

N3

N4

N5

N6

N7

N8

N9

N10

N11

salary

gender

title

nationality

title

nationality

year

inProceedings

paperTitle

locatedIn

name

nationality

{001,002,003,004,005,006}

{001,003,004,005}

{001,003,004}

{004}{005}

{002,006}

{006}

{007,008,009}

{007,008,009}

{007,008,009}

{010,011}

{010,011}

salary gender title L

$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}

salary gender title nationality L

$50,000 Female Assistant Professor Chinese {004}

M(N2)

M(N3)

salary

gender

title

nationality

year

inProceedings

paperTitle

locatedIn

name

53UPenn/2012-12-04

Transaction Database D

DimensionList DL

Delete (Person2, gender, “Male”)

Delete 002 from path null − N3

Page 119: gStore: A Graph-based SPARQL Query Engine

T-index Maintenance

Updates modeled as triple deletion + insertion - Consider (s, p, o)

DL’s order does change

This causes swapping of pi and pi+1 in DLFirst update as if the order does not changeThen update DL, T-index and materialized sets

VertexID

Entity ver-tex

Adjacency Attributes

001 Person1 salary, gender, title

002 Person2 salary, gender, title

003 Person3 salary, gender, title

004 Person4 salary, gender,title, nationality

005 Person5 salary, gender,nationality

006 Person6 salary, title,nationality

007 Paper1 year, inProceedings,paperTitle

008 Paper2 year, inProceedings,paperTitle

009 Paper3 year, inProceedings,paperTitle

010 School1 locatedIn, name

011 School2 locatedIn, name

null

N0

N1

N2

N3

N4

N5

N6

N7

N8

N9

N10

N11

salary

gender

title

nationality

title

nationality

year

inProceedings

paperTitle

locatedIn

name

nationality

{001,002,003,004,005,006}

{001,003,004,005}

{001,003,004}

{004}{005}

{002,006}

{006}

{007,008,009}

{007,008,009}

{007,008,009}

{010,011}

{010,011}

salary gender title L

$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}

salary gender title nationality L

$50,000 Female Assistant Professor Chinese {004}

M(N2)

M(N3)

salary

gender

title

nationality

year

inProceedings

paperTitle

locatedIn

name

53UPenn/2012-12-04

Transaction Database D

DimensionList DL

Delete (Person2, gender, “Male”)

Insert (Person, salary, title) intoT-index

Page 120: gStore: A Graph-based SPARQL Query Engine

T-index Maintenance

Updates modeled as triple deletion + insertion - Consider (s, p, o)

DL’s order does change

This causes swapping of pi and pi+1 in DLFirst update as if the order does not changeThen update DL, T-index and materialized sets

VertexID

Entity ver-tex

Adjacency Attributes

001 Person1 salary, gender, title

002 Person2 salary, gender, title

003 Person3 salary, gender, title

004 Person4 salary, gender,title, nationality

005 Person5 salary, gender,nationality

006 Person6 salary, title,nationality

007 Paper1 year, inProceedings,paperTitle

008 Paper2 year, inProceedings,paperTitle

009 Paper3 year, inProceedings,paperTitle

010 School1 locatedIn, name

011 School2 locatedIn, name

null

N0

N ′1

N2

N3

N4

N5

N6

N7

N8

N9

N10

N11N ′′1

gender{005}

salary

title

gender

nationality

title

nationality

year

inProceedings

paperTitle

locatedIn

name

nationality

{001,002,003,004,005,006}

{001,003,004}

{001,003,004}

{004}{005}

{002,006}

{006}

{007,008,009}

{007,008,009}

{007,008,009}

{010,011}

{010,011}

salary gender title L

$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}

salary gender title nationality L

$50,000 Female Assistant Professor Chinese {004}

M(N2)

M(N3)

salary

title

gender

nationality

year

inProceedings

paperTitle

locatedIn

name

53UPenn/2012-12-04

Transaction Database D

DimensionList DL

Delete (Person2, gender, “Male”)

Swap “gender” and “title”

Page 121: gStore: A Graph-based SPARQL Query Engine

T-index Maintenance

Updates modeled as triple deletion + insertion - Consider (s, p, o)

DL’s order does change

This causes swapping of pi and pi+1 in DLFirst update as if the order does not changeThen update DL, T-index and materialized sets

VertexID

Entity ver-tex

Adjacency Attributes

001 Person1 salary, gender, title

002 Person2 salary, gender, title

003 Person3 salary, gender, title

004 Person4 salary, gender,title, nationality

005 Person5 salary, gender,nationality

006 Person6 salary, title,nationality

007 Paper1 year, inProceedings,paperTitle

008 Paper2 year, inProceedings,paperTitle

009 Paper3 year, inProceedings,paperTitle

010 School1 locatedIn, name

011 School2 locatedIn, name

null

N0

N ′1

N2

N3

N4

N5

N6

N7

N8

N9

N10

N11N ′′1

gender{005}

N6

nationality

salary

title

gender

nationality

title

nationality

year

inProceedings

paperTitle

locatedIn

name

nationality

{001,002,003,004,005,006}

{001,003,004,006}

{001,003,004}

{004}{005}

{002,006}

{006}

{007,008,009}

{007,008,009}

{007,008,009}

{010,011}

{010,011}

salary gender title L

$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}

salary gender title nationality L

$50,000 Female Assistant Professor Chinese {004}

M(N2)

M(N3)

salary

title

gender

nationality

year

inProceedings

paperTitle

locatedIn

name

53UPenn/2012-12-04

Transaction Database D

DimensionList DL

Delete (Person2, gender, “Male”)

Swap “gender” and “title”

Page 122: gStore: A Graph-based SPARQL Query Engine

T-index Maintenance

Updates modeled as triple deletion + insertion - Consider (s, p, o)

DL’s order does change

This causes swapping of pi and pi+1 in DLFirst update as if the order does not changeThen update DL, T-index and materialized sets

VertexID

Entity ver-tex

Adjacency Attributes

001 Person1 salary, gender, title

002 Person2 salary, gender, title

003 Person3 salary, gender, title

004 Person4 salary, gender,title, nationality

005 Person5 salary, gender,nationality

006 Person6 salary, title,nationality

007 Paper1 year, inProceedings,paperTitle

008 Paper2 year, inProceedings,paperTitle

009 Paper3 year, inProceedings,paperTitle

010 School1 locatedIn, name

011 School2 locatedIn, name

null

N0

N ′1

N2

N3

N4

N5

N6

N7

N8

N9

N10

N11N ′′1

gender{005}

N6

nationality

salary

title

gender

nationality

title

nationality

year

inProceedings

paperTitle

locatedIn

name

nationality

{001,002,003,004,005,006}

{001,003,004,006}

{001,003,004}

{004}{005}

{002,006}

{006}

{007,008,009}

{007,008,009}

{007,008,009}

{010,011}

{010,011}

salary gender title L

$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}

salary gender title nationality L

$50,000 Female Assistant Professor Chinese {004}

M(N2)

M(N3)

salary

title

gender

nationality

year

inProceedings

paperTitle

locatedIn

name

53UPenn/2012-12-04

Transaction Database D

DimensionList DL

Delete (Person2, gender, “Male”)

Update materialized sets