A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A...

31
Semantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan, The University of Auckland, New Zealand Solicited review(s): Grant McKenzie, University of Maryland, College Park, USA; Benjamin Adams, University of Canterbury, New Zealand; Kristin Stock, Massey University, New Zealand Nuref¸ san Gür a,b,* , Torben Bach Pedersen a , Esteban Zimányi b and Katja Hose a a Center for Data Intensive Systems, Aalborg University, Selma Lagerlöfsvej 300, DK-9220 Aalborg Ø, Denmark E-mail: {nurefsan,tbp,khose}@cs.aau.dk b Department of Computer and Decision Engineering, Université Libre de Bruxelles, Avenue F. D. Roosevelt 50, B-1050 Brussels, Belgium E-mail: {nurefsan.gur,ezimanyi}@ulb.ac.be Abstract. Large volumes of geospatial data are being published on the Semantic Web (SW), yielding a need for advanced analysis of such data. However, existing SW technologies only support advanced analytical concepts such as multidimensional (MD) data warehouses and Online Analytical Processing (OLAP) over non-spatial SW data. To remedy this need, this paper presents the QB4SOLAP vocabulary, which supports spatially enhanced MD data cubes over RDF data. The paper also defines a number of Spatial OLAP (SOLAP) operators over QB4SOLAP cubes and provides algorithms for generating spatially extended SPARQL queries from the SOLAP operators. The proposals are validated by applying them to a realistic use case. Keywords: Spatial OLAP, Spatial data, Multidimensional data, Data modelling, RDF, SPARQL 1. Introduction The Semantic Web (SW) has evolved, from focus- ing mostly on data publishing to also support increas- ingly complex queries such as interactive analytical queries. Simultaneously, the data available on the SW has evolved from being simple, most alphanumeric data, to also include complex data such as spatial data. Indeed, geospatial data is now common on the SW, but it remains difficult to analyze it. In a non-SW context, the main tools for interac- tive data analyses have been Data Warehouses (DWs) and Online Analytical Processing (OLAP) tools and queries. DWs store large volumes of data and are de- signed with a multidimensional (MD) modeling ap- proach, which has shown itself to be intuitive for inter- active data analytics. Concretely, DWs consist of MD data cubes. The cells of the cube represent the topic * Corresponding author. E-mail: [email protected]. of analysis, and associate observation facts with nu- merical measures that can be aggregated. For example, a sales fact cube has measures such as QuantitySold and SalesPrice. Facts are linked to dimensions, which provide contextual information, e.g., sales date, prod- uct, and location. Dimensions are perspectives, which are used to analyze data, and are organized into hier- archies with levels, e.g., Store, City, and Region, that allow users to analyze and aggregate measures at dif- ferent levels of detail. Levels have a set of attributes that describe the characteristics of the level members. In traditional DWs, the location dimension is widely used, but as a conventional dimension with alphanu- meric data and thus only nominal reference to spatial concepts such as areas and places. This does not al- low manipulating through spatial location data or de- riving topological relations among the hierarchy lev- els of the location dimension. This yields a demand for truly spatial DWs for better analysis purposes. Includ- ing the geometric information of the location data, sig- 1570-0844/0-1900/$27.50 c 0 – IOS Press and the authors. All rights reserved

Transcript of A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A...

Page 1: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

Semantic Web 0 (0) 1–31 1IOS Press

A Foundation for Spatial Data Warehouses onthe Semantic WebEditor(s): Mark Gahegan, The University of Auckland, New ZealandSolicited review(s): Grant McKenzie, University of Maryland, College Park, USA; Benjamin Adams, University of Canterbury, New Zealand;Kristin Stock, Massey University, New Zealand

Nurefsan Gür a,b,∗, Torben Bach Pedersen a, Esteban Zimányi b and Katja Hose a

a Center for Data Intensive Systems, Aalborg University, Selma Lagerlöfsvej 300, DK-9220 Aalborg Ø, DenmarkE-mail: {nurefsan,tbp,khose}@cs.aau.dkb Department of Computer and Decision Engineering, Université Libre de Bruxelles, Avenue F. D. Roosevelt 50,B-1050 Brussels, BelgiumE-mail: {nurefsan.gur,ezimanyi}@ulb.ac.be

Abstract. Large volumes of geospatial data are being published on the Semantic Web (SW), yielding a need for advancedanalysis of such data. However, existing SW technologies only support advanced analytical concepts such as multidimensional(MD) data warehouses and Online Analytical Processing (OLAP) over non-spatial SW data. To remedy this need, this paperpresents the QB4SOLAP vocabulary, which supports spatially enhanced MD data cubes over RDF data. The paper also defines anumber of Spatial OLAP (SOLAP) operators over QB4SOLAP cubes and provides algorithms for generating spatially extendedSPARQL queries from the SOLAP operators. The proposals are validated by applying them to a realistic use case.

Keywords: Spatial OLAP, Spatial data, Multidimensional data, Data modelling, RDF, SPARQL

1. Introduction

The Semantic Web (SW) has evolved, from focus-ing mostly on data publishing to also support increas-ingly complex queries such as interactive analyticalqueries. Simultaneously, the data available on the SWhas evolved from being simple, most alphanumericdata, to also include complex data such as spatial data.Indeed, geospatial data is now common on the SW, butit remains difficult to analyze it.

In a non-SW context, the main tools for interac-tive data analyses have been Data Warehouses (DWs)and Online Analytical Processing (OLAP) tools andqueries. DWs store large volumes of data and are de-signed with a multidimensional (MD) modeling ap-proach, which has shown itself to be intuitive for inter-active data analytics. Concretely, DWs consist of MDdata cubes. The cells of the cube represent the topic

*Corresponding author. E-mail: [email protected].

of analysis, and associate observation facts with nu-merical measures that can be aggregated. For example,a sales fact cube has measures such as QuantitySoldand SalesPrice. Facts are linked to dimensions, whichprovide contextual information, e.g., sales date, prod-uct, and location. Dimensions are perspectives, whichare used to analyze data, and are organized into hier-archies with levels, e.g., Store, City, and Region, thatallow users to analyze and aggregate measures at dif-ferent levels of detail. Levels have a set of attributesthat describe the characteristics of the level members.In traditional DWs, the location dimension is widelyused, but as a conventional dimension with alphanu-meric data and thus only nominal reference to spatialconcepts such as areas and places. This does not al-low manipulating through spatial location data or de-riving topological relations among the hierarchy lev-els of the location dimension. This yields a demand fortruly spatial DWs for better analysis purposes. Includ-ing the geometric information of the location data, sig-

1570-0844/0-1900/$27.50 c© 0 – IOS Press and the authors. All rights reserved

Page 2: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

2 N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web

Spatial RDFEndpoints

SOLAP

User Spatial RDF Data Warehouses

SOLAP to SPARQL

QB4SOLAP

Fig. 1. QB4SOLAP approach to SOLAP on the SW

nificantly improves the analysis process (i.e., proxim-ity analysis of the locations) with additional perspec-tives by revealing dynamic spatial hierarchy levels andnew spatial members.

Similarly, providing deep spatial analytics supportfor spatial SW data is very valuable. Spatial data re-quires specific treatment techniques, in particular en-coding, special functions and different manipulationmethods, which should be considered in the modelingprocess and querying. The current state of the art forthe geospatial Semantic Web focuses on techniques forpublishing, linking and querying spatial data, but sup-ports only “plain” spatial SW data (without support forspatial DW concepts such as spatial hierarchies, levels,and measures) and does not consider analytical queriesover spatial RDF data (see Section 2 for details).Problem Definition. The proliferation of open geospa-tial data on the SW creates possibilities for advancedanalysis of such data. Many examples exist of spa-tial Linked Open Data (LOD) published on the SW asRDF1,2,3,4. These datasets have observations and mea-sures that are well suited for analytical queries (e.g.,water/air quality measurements, immigration rates, EUsubsidies in agriculture, crop revenue, etc.). However,such datasets are typically not modeled with spatialdimension levels and hierarchies. Thus, they cannotbe queried with interactive spatial analytical queries(a.k.a. SOLAP) on the SW. In the current state of theSW, if a (spatial) DW user would like to query theexisting spatial RDF data from the SW with SOLAPoperations, the user needs to download the RDF data,map it to a relational data model (i.e., with a snowflakeschema), and then import it into a traditional spatial

1EuroStat: http://ec.europa.eu/eurostat2UK Environmental Data: http://environment.data.

gov.uk3Danish Agricultural Data: https://datahub.io/

dataset/govagribus-denmark4Australian Climate Observations: https://datahub.io/

dataset/acorn-sat

data warehouse in order to query with SOLAP, whichis slow, labor-intensive, and stores the data in a non-open format.Our Approach. On the contrary, annotating spatialRDF datasets with QB4SOLAP [16,17] allows usersto define spatial multidimensional concepts on top ofexisting RDF data. Hence, the user can create andpublish spatial data warehouses on the Semantic Web,which can be easily queried with SOLAP operations.Fig. 1 depicts the general workflow scenario, wherethe spatial RDF datasets from endpoints can be an-notated with QB4SOLAP. This makes it possible forend users to use SOLAP queries. However, writing aSOLAP query in SPARQL can be very complicatedfor users inexperienced with SPARQL (e.g., traditionalDW users). Due to the lack of MD semantics of spatialRDF data and the lack of translation techniques fromhigh-level SOLAP expressions to SPARQL, there isa considerable entry barrier for advanced spatial dataanalysis on the SW for data warehouse users.Contributions. In order to address these issues, thispaper makes a number of contributions. First, wepropose QB4SOLAP, a generic and extensible vo-cabulary (metamodel) for spatial DWs on the SW.QB4SOLAP extends the most recent stable version ofthe QB4OLAP vocabulary with spatial concepts. Weprovide a full formalization of QB4SOLAP. The keyconcepts of spatial cube members, spatial hierarchiesand levels, spatial measures, spatial aggregate func-tions (e.g., union, buffer, and convex–hull) and topo-logical relations among spatial dimension and hierar-chy level members (e.g., within, intersects, and over-laps), are defined. Second, we define a number of an-alytical Spatial OLAP (SOLAP) operators over themodel including giving formal semantics of the opera-tors. The operators support advanced analytical queriesover MD geospatial SW data. Third, we provide al-gorithms for generating spatially extended SPARQLqueries for individual and nested SOLAP operators,which allows writing SOLAP queries without knowl-edge of RDF/SPARQL. Fourth, we validate the vocab-ulary, operators, and query generation algorithms byapplying them to a realistic use case.Paper structure. The remainder of the paper is struc-tured as follows. Section 2 discusses related work. Sec-tion 3 defines preliminary spatial and OLAP concepts.Section 4 defines the QB4SOLAP vocabulary, whileSection 5 defines the SOLAP operators. Section 6 pro-vides the SPARQL query generation algorithms. Fi-nally, Section 7 concludes the paper and points to fu-ture research.

Page 3: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web 3

2. Related work

DW and OLAP technologies have been successfulfor analyzing large volumes of data [1]. CombiningDW/OLAP technologies with RDF data makes RDFdata sources more easily available for interactive anal-ysis. The following work concerns the integration ofDW/OLAP with the SW.

DW/OLAP and Semantic Web. Using OLAP to an-alyze SW data is considered in several approaches.Kämpgen et al. propose an extended model [23] ontop of the RDF Data Cube Vocabulary (QB) [5] forinteracting with statistical linked data via OLAP op-erations directly in SPARQL. However, it has theinherent limitations of QB and thus cannot supportOLAP dimensions with hierarchies and levels, andbuilt-in aggregate functions. Etcheverry et al. intro-duce QB4OLAP [11] as an extended vocabulary basedon QB, with a full MD metamodel, supporting OLAPoperations directly over RDF data with SPARQLqueries. Nath et al. considers creating an Extract–Transform–Load (ETL) framework for semantic datawarehouses [8]. Varga et al. presents a comprehensivemethodology for dimensional enrichment of statisti-cal LOD by using QB4OLAP and provide a SW-basedOLAP engine for traditional DW users [37]. However,these approaches and vocabularies support neither spa-tial DWs nor provide SOLAP operators for the SW.

Spatial DW and OLAP. The constraint representa-tion of spatial data has been the focus in many fieldsfrom databases to AI [32]. Extending OLAP with spa-tial features has also attracted the attention of the datawarehousing community. Bédard et al. first introducedthe term SOLAP [4] in 1997. SOLAP systems [9,33]since then, have significantly been improved. Respec-tively, various papers improve the spatial aggregationfunctions and techniques [6,14,28,38].

Several conceptual models are proposed for rep-resenting spatial data in data warehouses. Stefanovicet al. [19] considers constructing and materializingspatial cubes in their proposed model. The Multi-Dim conceptual model, introduced by Malinowski andZimányi [27], copes with spatial features and is ex-tended in [36], to include complex geometric features(continuous fields), with a set of operations and anMD calculus supporting spatial data types. Gómez etal. [15] propose an algebra and a general frameworkfor OLAP cube analysis on discrete and continuousspatial data. Even though spatial data warehousing isthus widely studied, those studies are limited to tradi-

tional non-semantic spatial data warehouses and SO-LAP techniques. The work above neither consideredsemantic web data nor spatial analytical querying inSPARQL.

Geospatial Semantic Web. The Open GeospatialConsortium (OGC) has proposed GeoSPARQL [3] asa vocabulary to represent and query spatial data inRDF using an extension to SPARQL. Kyzirakos et al.present a comprehensive survey of data models andquery languages for linked geospatial data in [25], andpropose a semantic geospatial data store called Stra-bon in [24]. Strabon has an extensive query languagecalled stSPARQL , which is however limited to thespecific environment. LinkedGeoData is a significantcontribution on interactive transformation of Open-StreetMap5 data to RDF data [35]. GeoKnow [26] isa more recent project with focus on linking geospa-tial data from heterogeneous sources. Andersen et al.considers publishing/converting open spatial data asLinked Open Data [2]. However, none of these worksconsider the MD aspects of geospatial data or allowquerying with SOLAP on the SW, unlike QB4SOLAP.The QB4SOLAP vocabulary is validated with boththe running example use case, the GeoNorthwind datacube, as well as a substantial real-world use case,the GeoFarmHerdState data cube [17]. GeoFarmHerd-State is a spatial data cube about livestock holdingsin Denmark, which integrates environmental and ge-ographical open data from several sources, thus en-abling a range of interesting SOLAP queries.

In summary, none of the related work, which issurveyed in the fields of “DW/OLAP and the SW",“Spatial DW and OLAP", and “Geospatial SemanticWeb" provides a substantial foundation for modelingand querying spatial data warehouses on the SemanticWeb, unlike the QB4SOLAP vocabulary, SOLAP op-erators, and SPARQL generation algorithms presentedin this paper.

3. Preliminary concepts

In this section, we describe the spatial objects andthe spatial operations that manipulate them. Then, weintroduce the data cubes and spatial enhancement onthem as spatial data cubes. Finally, we show the tradi-tional OLAP operations, which manipulate data cubes,and explain the Spatial OLAP (SOLAP) operators,which manipulate spatial data cubes.

5http://www.openstreetmap.org

Page 4: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

4 N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web

21 10 18 35

27 14 11 30

26 12 35 32

14 20 47 31

24 18 28 1433 25 23 2512 20 24 3321 10 18 35

35332514

30142318

32122017

31103318

Q1

Q2

Q3

Q4

Bremen

AarhusHamburg

Odense

BeveragesSeafoodCereals

CondimentsProduct (Category)

Tim

e (

Qu

art

er)

Cust

omer

(Ci

ty)

dimensions

measurevalues

Fig. 2. A three-dimensional cube for Sales data

33 30 42 68

39 26 41 44

30 22 46 44

25 29 49 41

57 43 51 3933 30 42 68

6839

4441

4437

4151T

ime (

Qu

art

er)

Q1

Q2

Q3

Q4

GermanyDenmarkCu

stom

er

(Cou

ntry

)

BeveragesCereals Seafood

CondimentsProduct (Category)

Fig. 3. Roll-up to the Country level

3.1. Spatial objects

A spatial object represents a real-world objectwhose geographic features are important for an appli-cation. These geographic features are encoded usingthe geometry data type. Point, Line, and Polygon arethe basic instantiable types of the geometry data type.Coordinates for geometry data type are generally givenin 2-dimensions with X , Y values. Geometries are as-sociated with a spatial reference system (SRS), whichdescribes the coordinate space in which the geometryis defined. There are several SRSs and each of themare identified with a spatial reference system identi-fier (SRID). The World Geodetic System (WGS) is themost well-known SRS and the latest version is calledWGS84, which is also used in our use case.

3.2. Spatial operations

There is a set of spatial operations that can be ap-plied on spatial data. We grouped these operations intoclasses, based on the common functionality of the op-erators. These classes are defined next.

Definition 1. (Spatial aggregation) The operators inthe spatial aggregation class Sagg aggregate two ormore spatial objects and return a new spatial ob-ject. Union, Intersection, ConvexHull, and Minimum-BoundingRectangle (MBR) are example operators ofthis class. Some spatial functions such as Convex-Hull or MBR can also be interpreted as unary spa-tial functions with a single parameter, but here weonly consider the aggregate versions of the functions.In order to make this clear, the aggregate versions ofthose functions are given with a prefix “Aggr" in theQB4SOLAP vocabulary (Fig. 6). For our purpose, it is

enough to group all spatial aggregate functions into asingle group, although more fine-grained classificationproposals for spatial aggregate functions exist [6].

Definition 2. (Topological relations) The operators inthe topological relation class Trel are commonly ex-pressed in the RCC86 and DE-9DIM7 models [10,31].Topological relations are Boolean predicates that spec-ify how two spatial objects are related to each other.Examples of topological relations are Intersects, Dis-joint, Equals, Overlaps, Contains, Within, Touches,Covers, CoveredBy, and Crosses.

Definition 3. (Numeric operations) The operators inthe numeric operation class Nop take one or morespatial objects and return a numeric value. Perime-ter, Area, NoOfInteriorRings, Distance, HaversineDis-tance, and NoOfGeometries are example operators ofthis class.

3.3. Data cubes

Data warehouses store large volumes of data fordecision support. They are based on the multidimen-sional model, which views data in an n-dimensionalspace, usually called a data cube. The cells of the cuberepresent the observation facts for analysis with a set ofattributes called measures (e.g., a sales fact cube withmeasures product quantity and price). Facts are linked

6RCC8 (Region Connection Calculus) describes regions in Eu-clidean space or in a topological space by their possible relations toeach other.

7DE-9DIM (Dimensionally Extended Nine-Intersection Model) isa topological model that describes spatial relations of two geome-tries in two dimensions.

Page 5: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web 5

Product

Category

All

(a) Categories hierarchy inthe Product dimension

Time

Month

Quarter

Year

All

(b) Calendar hierarchyin the Time dimension

Customer

State

Country

All

Supplier

City

(c) Spatial Geography hierarchies in theCustomer and Supplier dimensions

Customer

CityClosestSupplier

State

Country

All

(d) Dynamic spatial Geography hi-erarchy in the Customer dimension

Fig. 4. Dimension hierarchies

to dimensions, which provide perspectives to analyzedata (e.g., sales date, product, and customer location).Dimensions are organized into hierarchies, which al-low users to aggregate measures at various levels ofdetail. Hierarchies are composed of levels and there isalways a unique top level All with just one member all.Levels have a set of attributes that describe the charac-teristics of the level members.

An example of a data cube with three dimen-sions (Customer, Time, and Product) and one measure(Quantity) is given in Fig. 2. Each cell in the cube is anobservation fact, which is characterized by dimensionand measure values. The hierarchies of this cube aregiven in Fig. 4a–c. Thus, in the cube shown in Fig. 2,the Product dimension is given at the Category level,the Time dimension at the Quarter level, and the Cus-tomer dimension at the City level. Measure values rep-resent the measure Quantity of the sold products.

3.4. Spatial data cubes

A spatial data cube contains both conventional andspatial dimensions. A spatial dimension is a dimen-sion, which includes at least one spatial level in whichthe application should store the spatial characteristicsof the members. Similarly, a hierarchy is a spatial hi-erarchy if it has at least one spatial level. Spatial char-acteristics of the levels are captured by their geome-tries and can be recorded in the spatial attributes of thelevel. A spatial fact is a fact that relates several dimen-sions in which, two or more are spatial.

For example, consider a Sales spatial fact, whichhas spatial dimensions Customer and Supplier, eachwith a spatial hierarchy Geography composed of spa-tial levels City → State → Country → All (Fig. 4c).These spatial levels record the spatial characteristicsof its members with spatial attributes: Customer, Sup-plier, and City using a point spatial data type, whereasState and Country with a multi-polygon spatial datatype.

Following the philosophy of spatially extended Mul-tiDim conceptual model [36], MD concepts such aslevels are considered to be spatial, only if they recordthe spatial characteristics of the concepts as geome-tries. For instance, “continent" might be consideredas a spatial object, in theory or in other vocabularies.However, if there is no information about the geome-try of the continents in the schema and in the instancedata, continent does not become a spatial level (Ext. 7),although continent might still be a traditional level(Def. 7) of the spatial hierarchy Geography with al-phanumeric attributes (i.e., continent name, code, andetc.).

Spatial data cubes typically have spatial measures,which are also represented by a spatial data type. Anexample is a SalesPoint measure that stores the lo-cation of sales. Fig. 7 shows the multidimensionalschema of the GeoNorthwind data warehouse, whichis used as running example in the paper.

Page 6: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

6 N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web

3.5. OLAP operators

OLAP operators are used for expressing queriesover data cubes. The traditional OLAP operators aregiven next.

The slice operator removes a dimension from a cubeby selecting one instance in a dimension level. An ex-ample is “slice on City is equal to Odense”.

The dice operator selects the cells in a cube that sat-isfy a Boolean condition. An example is “dice on thefirst and last quarter of the year”.

The roll-up operator aggregates measures along ahierarchy to obtain data at a coarser granularity. Anexample is “roll-up to the Country level” (Fig. 3).

Finally, the drill-down operator disaggregates mea-sures along a hierarchy to obtain data at a finer gran-ularity. It is the inverse operation of roll-up. Startingfrom the cube in Fig. 3, an example is “drill-down tothe City level”.

3.6. Spatial OLAP operators

Spatial OLAP (SOLAP) operates on spatial datacubes. SOLAP increases the analytical capabilities ofOLAP by taking into account the spatial informationin the cube. SOLAP operators involve spatial condi-tions or spatial functions by using the spatial operatorsdefined in Sect. 3.1. Spatial conditions specify con-straints on the geometries associated to cube membersor measures, while spatial functions derive new datafrom the cube, which can be used, e.g., to derive dy-namic spatial hierarchies or levels, as explained in thefollowing example. Spatial extensions of the commonOLAP operators are formally defined in Sect. 5.

Table 1Sample (instance) data for the Sales cube

Customer City Supplier TotalSalesCustomer s1 s2 s3

Düsseldorfc1 8 – 3 11c2 10 – – 10

Dortmundc3 7 4 – 11c4 – 20 3 23

Münster c5 – – 30 30

Example 1. Consider the summarized data for theSales cube given in Table 1, where a ‘–’ is used ifthere are no sales to customers from the correspond-ing suppliers. The data in Table 1 is shown on the mapin Fig. 5, where the arrows on the map between the

Table 2Roll-up of the Sales cube

CustomerCity Sales

Düsseldorf 21Dortmund 34Münster 30

Table 3S-Roll-up of the Sales cube

CityClosestSupplier Sales

Düsseldorf 25Dortmund 20Münster 33

Table 4Customer to Supplier distance

Supplier CitySupplier

Customer City Düsseldorf Dortmund MünsterCustomer s1 s2 s3

Düsseldorfc1 15 km 45 km 30 kmc2 15 km 60 km 60 km

Dortmundc3 15 km 30 km 45 kmc4 45 km 15 km 15 km

Münster c5 60 km 45 km 15 km

Fig. 5. Example map of Sales (instance) data

supplier and customer locations represent the distance.The quantities of sold products are shown along thesearrows.

The hierarchies in Fig. 4a–c can be used to performclassical roll-up operations, where measures are aggre-gated from a child to a parent level. An example ofsuch a roll-up operator is expressed by the query “totalsales to customers by city”, whose results is given inTable 2.

On the other hand, as shown in Table 4 and Fig. 5,some customers may be closer to suppliers from othercities. For example, customer c3 is related to its cityDortmund by using traditional Geography hierarchy,

Page 7: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web 7

but the customer is closer to the city Düsseldorf of sup-plier s1. Similarly, customer c4 in city Dortmund iscloser to the city Münster of supplier s3. Fig. 4d showsa new dynamic spatial hierarchy that can be obtainedwith a spatial roll-up (s-roll-up) operator that expressesthe query “total sales to customers by city of the clos-est supplier”. Such queries are not possible to expresson conventional hierarchies with traditional OLAP.

The hierarchy in Fig. 4d is created on the fly with thehelp of a spatial function computing the distance be-tween customer and supplier locations. Therefore, us-ing the s-roll-up operator, sales to customers are aggre-gated by city of the closest suppliers, where Dortmundhas a significant drop off in the quantity of the salesfrom 34 (Table 2) to 20 (Table 3).

4. The QB4SOLAP vocabulary

In this section, we formally define how to represent(spatial) data cubes in RDF. We use as running exam-ple the GeoNorthwind data warehouse whose concep-tual schema is given in Fig. 7.

The QB4OLAP [11] vocabulary allows to definecube schemas and cube instances as RDF triples.QB4OLAP is an extension of the RDF Data CubeVocabulary (QB) [5] with multidimensional conceptsin order to be able to support OLAP operations di-rectly over RDF data with SPARQL queries. We ex-tended QB4OLAP (v1.2)8 with spatial concepts togive QB4SOLAP [16]. We based our extension onGeoSPARQL [30], a standard from the Open Geospa-tial Consortium (OGC) for representing and queryinggeospatial linked data for the Semantic Web. Since ourbase vocabulary QB4OLAP uses the MultiDim con-ceptual model to describe the multidimensional con-cepts, we base our definitions on a spatially extendedversion of MultiDim conceptual model [36] for spa-tial extension of the MD concepts. Fig. 6 shows theQB4SOLAP vocabulary for representing a spatial cubeschema and spatial cube members as RDF triples. Acube schema defines the structure of the cube in termsof dimension levels, measures, aggregation functions(e.g., SUM, AVG, COUNT) on measures, spatial ag-gregation functions (Sagg in Def. 1) on spatial mea-sures, dimensions hierarchies, and parent–child rela-tionships between levels (including their cardinalityand topological relationships for spatial levels). These

8QB4OLAP v1.2: https://github.com/lorenae/qb4olap/blob/master/rdf/qb4olap.1.2.ttl

schema level metadata are used to define multidimen-sional datasets in RDF. Cube members are the in-stances of a cube schema that represent level mem-bers, facts, and measure values. As we will show inSect. 6, we use the schema level metadata to produceSPARQL queries that implement SOLAP operators oncube members.

Terms with capitalized initials and non-italic fontin Fig. 6 represent RDF classes, terms with capital-ized initials and italic font represent RDF instances,and terms with non-capitalized initials represent RDFproperties. Classes in external vocabularies are de-picted in light gray background and font. RDF Cube(QB), QB4OLAP, and QB4SOLAP classes are shown,respectively, with white, light gray, dark gray back-grounds. Original QB terms are prefixed with qb:9.QB4OLAP and QB4SOLAP terms are prefixed, re-spectively, with qb4o:10 and qb4so:11. Spatialclasses and properties are prefixed with geo:12.

In what follows, we first define formally RDFtriples, and then discuss how to describe (spatial) mul-tidimensional data using QB4OLAP and QB4SOLAP.

Definition 4. (RDF triple) An RDF triple t = (s, p, o)consists of three components: s is the subject, p is thepredicate, and o is the object. RDF triples are definedover

T = (I ∪ B)× I × (I ∪ B ∪ L)

where I is the set of IRIs, B is the set of blank nodes,and L is the set of literals.

A set of RDF triples is referred to as a graph. Wedenote a QB4SOLAP graph by G, where G ⊂ T .Thecube schema and cube instances are subsets of thisgraph and are denoted, respectively, by GS and GI ,where GS ⊂ G and GI ⊂ G.

Given an MD element x ∈ (I ∪ B) in a schemagraph or instance graph G, we define by Gx the sub-graph of G for x, where Gx ⊂ G. We define the func-tion id(x) : G → I, that given a MD element x returnsits identifier I from the graph G. We use superscriptnotation to indicate the type of the identifier from thecube schema graph (GS) and cube instance graph (GI ),e.g., idS(x) for a cube schema identifier and idI(x) fora cube instance identifier.

9RDF cube: http://purl.org/linked-data/cube#10QB4OLAP: http://purl.org/qb4olap/cubes#11QB4SOLAP: http://w3id.org/qb4solap#12GeoSPARQL:http://www.opengis.net/ont/

geosparql#

Page 8: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

8 N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web

Fig. 6. The QB4SOLAP vocabulary

4.1. Defining spatial data cube schemas withQB4SOLAP

An n-dimensional cube schema CS is a tuple CS =(D,M,F ), with a set of dimensions D, a set of mea-sures M , and a fact F . A dimension d ∈ D has a setof hierarchies H(d). Each hierarchy h ∈ H(d) is or-ganized into a set of levels L(h). Each level l ∈ L(h)has a set of attributes A(l). Each attribute a ∈ A(l) isdefined over a domain. Each measure m ∈ M is alsodefined over a domain.

We define next how to represent a cube schema CSin RDF using QB4SOLAP. We denote the RDF graphof the cube schema GS . In the examples we prefix theelements of GS with gnw:. We follow a similar nam-ing convention for schema elements as in QB4OLAP.If there is a possibility of confusion for different MDconcepts with same schema name, i.e., customer di-mension and customer level, we suffix the dimensionswith Dim (e.g., gnw:customerDim for dimension,

and gnw:customer for level). The subgraph of GS

that refers to a specific schema element x is denoted byGSx and the unique identifier of x is denoted by idS(x).

Definition 5. (Dimensions) An n-dimensional cubeschema CS has a set of dimensions D = {d1, . . . , dn}and each dimension di has a set of hierarchies H(di)

(Def. 6). Each dimension di ∈ D is defined in the cubeschema graph GS with qb:DimensionProperty.Each hierarchy h ∈ H(di) is linked to its dimension diwith the qb4o:hasHierarchy property. The RDFgraph formulation of the dimensions D is representedas

GSD =n⋃

i=1

GSdi

Page 9: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web 9

where

GSdi =

{(idS(di) rdf:type qb:DimensionProperty)}∪⋃h∈H(di)

{(idS(di) qb4o:hasHierarchy idS(h))}

Extension 5. (Spatial dimensions) A dimension isspatial if it has at least one spatial level. A spatial di-mension dis belongs to the set of spatial dimensionsDs, which is a subset of the set of dimensions D, suchthat dis ∈ Ds ⊆ D.

Example 2. The triples below show how some of thedimensions of the GeoNorthwind DW (Fig. 7) are rep-resented in RDF using Def. 5 and Ext. 5. As we willsee below, the Customer and Supplier dimensions arespatial as they both have a spatial hierarchy Geogra-phy.

# Dimensionsgnw:customerDim rdf:type qb:DimensionProperty ;

qb4o:hasHierarchy gnw:customerGeography .gnw:supplierDim rdf:type qb:DimensionProperty ;

qb4o:hasHierarchy gnw:supplierGeography .gnw:productDim rdf:type qb:DimensionProperty ;

qb4o:hasHierarchy gnw:categories .gnw:employeeDim rdf:type qb:DimensionProperty .

Definition 6. (Hierarchies) A dimension d has a set ofhierarchies H(d) = {h1, . . . , hm}, where each hierar-chy hi has a set of levels L(hi) (Def. 7). Each hierar-chy hi ∈ H(d) is defined in the cube schema graph GSwith the qb4o:Hierarchy predicate and is linkedwith its dimension d by the qb4o:inDimensionproperty. Each level l ∈ L(hi) that belongs to a hierar-chy hi is defined with the qb4o:hasLevel property.The RDF graph formulation of the hierarchies H(d) isrepresented as

GSH(d) =

m⋃i=1

GShi

where

GShi= {(idS(hi) rdf:type qb4o:Hierarchy)} ∪

{(idS(hi)qb4o:inDimension idS(d))}∪⋃l∈L(hi)

{(idS(hi) qb4o:hasLevel idS(l))}

Extension 6. (Spatial hierarchies) A hierarchy is spa-tial if it contains at least one spatial level. A spatialhierarchy his belongs to the set of spatial hierarchiesHs(d), which is a subset of the set of hierarchiesH(d),such that his ∈ Hs(d) ⊆ H(d).Example 3. The triples below show how some of thehierarchies of the GeoNorthwind DW (Fig. 7) are rep-resented in RDF using Def. 6 and Ext. 6. As we will seebelow, the Geography hierarchies in the Customer andSupplier dimensions are spatial since they have spatiallevels (City, State, etc.)# Hierarchiesgnw:customerGeography rdf:type qb4o:Hierarchy ;

qb4o:inDimension gnw:customerDim ;qb4o:hasLevel gnw:customer, gnw:city,

gnw:state, gnw:country .gnw:supplierGeography rdf:type qb4o:Hierarchy ;

qb4o:inDimension gnw:supplierDim ;qb4o:hasLevel gnw:supplier, gnw:city,

gnw:state, gnw:country .gnw:categories rdf:type qb4o:Hierarchy ;

qb4o:inDimension gnw:productDim ;qb4o:hasLevel gnw:product, gnw:category .

Definition 7. (Levels) A hierarchy h has a set oflevels L(h) = {l1, . . . , lk} and each level li hasa set of attributes A(li) (Def. 8). Each level li ∈L(h) is defined in the cube schema graph GS withthe qb4o:LevelProperty predicate. Each at-tribute a ∈ A(li) is linked to its level li with theqb4o:hasAttribute property. The RDF graphformulation of the levels L(h) is represented as

GSL(h) =

k⋃i=1

GSli

where

GSli = {(idS(li) rdf:type qb4o:LevelProperty)} ∪⋃

a∈A(li)

{(idS(li) qb4o:hasAttribute idS(a))}

Extension 7. (Spatial levels) A level is spatial if ithas an associated geometry. A spatial level lis belongsto the set of spatial levels Ls(h), which is a subsetof the set of levels L(h), such that lis ∈ Ls(h) ⊆L(h). The geometry of a spatial level is defined in thecube schema graph GS with the geo:hasGeometryproperty The RDF graph formulation of the spatial lev-els Ls(h) is represented as

GSLs(h) =

k⋃i=1

GSlis

Page 10: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

10 N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web

Product

ProductID

ProductName

QuantityPerUnit

UnitPrice

Discontinued

Supplier

SupplierID

SupplierName

Address

PostalCode

Category

CategoryID

CategoryName

Description

Customer

CustomerID

CustomerName

Address

PostalCode

Employee

EmployeeID

FirstName

LastName

Title

BirthDate

HireDate

City

CityName

Ca

teg

orie

s

Ge

og

rap

hy

Country

CountryName

CountryCode

CountryCapital

CapitalGeo

Population

Subdivision

State

StateName

EnglishStateName

StateType

StateCode

StateCapital

CapitalGeo

Time

Date

DayNoWeek

DayNameWeek

DayNoMonth

DayNoYear

WeekNoYear

Calendar

Month

MonthNo

MonthName

Quarter

QuarterNo

Year

YearNo

DueDate

OrderDate

Ge

og

rap

hy

Quantity

UnitPrice: Avg +!

Discount: Avg +!

SalesAmount

Freight

SalesPoint

Sales

Fig. 7. Conceptual multidimensional schema of the GeoNorthwind data warehouse

where

GSlis = {(idS(lis) rdf:type qb4o:LevelProperty)} ∪

{(idS(lis) geo:hasGeometry geo:Geometry)} ∪⋃a∈A(lis )

{(idS(lis) qb4o:hasAttribute idS(a))}

Example 4. The triples below show how some of thelevels of the GeoNorthwind DW (Fig. 7) are repre-sented in RDF using Def. 7 and Ext. 7. Note that theCustomer and City levels are spatial as they have a ge-ometry that is specified at the level definition.

# Levelsgnw:customer rdf:type qb4o:LevelProperty ;

qb4o:hasAttribute gnw:customerID ;qb4o:hasAttribute gnw:customerName ;qb4o:hasAttribute gnw:address ;qb4o:hasAttribute gnw:postalCode ;geo:hasGeometry gnw:customerGeometry .

gnw:city rdf:type qb4o:LevelProperty ;qb4o:hasAttribute gnw:cityName ;geo:hasGeometry gnw:cityGeometry .

Definition 8. (Attributes) A level l has a set of at-tributes A(l) = {a1, . . . , ap}, which defines the char-acteristics of the level members. One among theseattribute, denoted as aID, specifies a surrogate keyfor the level, i.e., the value of aID uniquely iden-tifies the members of the level. For simplicity, weassume that it is the first attribute in the set of at-tributes A(l), i.e., a1 = aID. Each attribute ai ∈A(l) is defined in the cube schema graph GS with theqb4o:LevelAttribute predicate and is linked toits level l with the qb4o:inLevel property. Each at-tribute ai is defined as ranging over XSD literals L us-ing the rdfs:range property. The RDF graph for-mulation of the attributes A(l) is represented as

GSA(l) =

p⋃i=1

GSai

Page 11: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web 11

where

GSai=

{(idS(ai) rdf:type qb4o:LevelAttribute)}∪

{(idS(ai) qb4o:inLevel idS(l))} ∪

{(idS(ai) rdfs:range L)}

Extension 8. (Spatial attributes) An attribute is spa-tial if it is defined over a spatial domain. A spatial at-tribute ais belongs to the set of spatial attributesAs(l),which is a subset of the set of attributes A(l), suchthat ais ∈ As(l) ⊆ A(l). The RDF graph formulationof the spatial attributes is similar as in Def. 8. However,the attribute must range over spatial literals Ls i.e., awell-known text literal (WKT) from OGC schemas.Further, the domain of the attribute should be speci-fied with the rdfs:domain property, which must bea geometry. Finally, the attribute must be specified asspatial object with the rdfs:subclassOf property.The RDF graph formulation of the spatial attributesAs(l) is represented as

GSAs(l) =

p⋃i=1

GSais

where

GSais=

{(idS(ais) rdf:type qb4o:LevelAttribute)}∪

{(idS(ais) qb4o:inLevel idS(l))} ∪

{(idS(ais) rdfs:range Ls)} ∪

{(idS(ais) rdfs:subPropertyOf geo:Geometry)}∪

{(idS(ais) rdfs:subClassOf geo:SpatialObject)}

Example 5. The triples below show how some of theattributes of the GeoNorthwind DW (Fig. 7) are repre-sented in RDF using Def. 8 and Ext. 8. Note that theCustomer level has a spatial attribute (Customer geom-etry). It is represented as a WKT literal that defines aPoint type from the Geometry class, which is a sub-class of Spatial Object.

# Attributesgnw:customerID rdf:type qb4o:LevelAttribute ;

qb4o:inLevel gnw:customer;rdfs:range xsd:Integer .

gnw:customerName rdf:type qb4o:LevelAttribute ;qb4o:inLevel gnw:customer;rdfs:range xsd:String .

gnw:address rdf:type qb4o:LevelAttribute ;qb4o:inLevel gnw:customer;rdfs:range xsd:String .

gnw:postalCode rdf:type qb4o:LevelAttribute ;qb4o:inLevel gnw:customer;rdfs:range xsd:String .

gnw:customerGeometry rdf:typeqb4o:LevelAttribute ;

rdfs:subPropertyOf geo:Geometry ;qb4o:inLevel gnw:customer ;rdfs:range geo:wktLiteral;rdfs:domain geo:Point ;rdfs:subClassOf geo:SpatialObject .

Definition 9. (Hierarchy steps) A hierarchy h hasa set of hierarchy steps HS(h) = {hs1, . . . , hsq},which define the structure of the hierarchy in rela-tion with its corresponding levels. A hierarchy stephsi = (lc, lp, card) ∈ HS(h) entails a roll-up rela-tion between a lower (child) level lc to an upper (par-ent) level lp with a cardinality card. The cardinalitycard ∈ {1-1, 1-n, n-1, n-n} describes the number ofmembers in one level that can be related to a memberin the other level for both the child and the parent lev-els.

Each hierarchy step hsi is defined in the cubeschema graph GS as a blank node _:hsi ∈ Bwith the qb4o:HierarchyStep predicate. Eachhierarchy step is linked to its hierarchy with theqb4o:inHierarchy property. The child and par-ent levels are linked in a hierarchy step with theqb4o:childLevel and qb4o:parentLevelproperties, respectively. The cardinality card of a hier-archy step is defined by the qb4o:pcCardinalityproperty. The RDF graph formulation of the hierarchysteps HS(h) is represented as

GSHS(h) =

q⋃i=1

GShsi

where

GShsi =

{(_:hsi rdf:type qb4o:HierarchyStep)} ∪

{(_:hsi qb4o:inHierarchy idS(h))} ∪

{(_:hsi qb4o:parentLevel idS(lp))} ∪

{(_:hsi qb4o:childLevel idS(lc))} ∪

{(_:hsi qb4o:pcCardinality idS(card))}

Page 12: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

12 N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web

Extension 9. (Spatial hierarchy steps) A hierarchystep is spatial if it relates a spatial child level lcsand a spatial parent level lps , in which case it entailsa topological relationships between these spatial lev-els. A spatial hierarchy step is then a tuple hsis =(lcs , lps , card, topoRel) where the topological relationtopoRel belongs to the Trel class (Def. 2). The topo-logical relation between parent-child levels of a spatialhierarchy step is defined by the qb4so:pcTopoRelproperty. The RDF graph formulation of the spatial hi-erarchy steps HSs(h) (w.r.t. Def. 9) is represented as

GSHSs(h) =

q⋃i=1

GShsis

where

GShsis = GShsi∪

{(_:hsi qb4so:pcTopoRel idS(topoRel))}

Example 6. The triples below show how the hierarchysteps of the Geography spatial hierarchy in the Cus-tomer dimension of the GeoNorthwind DW (Fig. 7) arerepresented in RDF using Def. 9 and Ext. 9. Note thatall hierarchy steps are spatial and have an associatedtopological relation.

# Hierarchy steps_:customerGeography_hs1 a qb4o:HierarchyStep ;

qb4o:inHierarchy gnw:customerGeography ;qb4o:childLevel gnw:customer ;qb4o:parentLevel gnw:city ;qb4o:pcCardinality qb4o:ManyToOne ;qb4so:pcTopoRel qb4so:Within .

_:customerGeography_hs2 a qb4o:HierarchyStep ;qb4o:inHierarchy gnw:customerGeography ;qb4o:childLevel gnw:city ;qb4o:parentLevel gnw:state ;qb4o:pcCardinality qb4o:ManyToOne ;qb4so:pcTopoRel qb4so:Within .

_:customerGeography_hs3 a qb4o:HierarchyStep ;qb4o:inHierarchy gnw:customerGeography ;qb4o:childLevel gnw:state ;qb4o:parentLevel gnw:country ;qb4o:pcCardinality qb4o:ManyToOne ;qb4so:pcTopoRel qb4so:Within .

Definition 10. (Partial order on levels) The hierarchysteps HS(h) of a hierarchy h define a partial order onthe levels l ∈ L(h). The reflexive and transitive closureof the partial order is denoted as v, with a unique base

level (lb) and a unique top level (All), where all levelsl are such that lb v l, and l v All.

Definition 11. (Measures) An n-dimensional cubeschema has a set of measures M = {m1, . . . ,mr},which record the values of a phenomena being ob-served. Each measure mi ∈ M is defined in the cubeschema graph GS with the qb:MeasurePropertypredicate. Similarly to attributes, each measure mi

is defined as ranging over XSD literals L with therdfs:range property. The RDF graph formulationof the measures M is represented as

GSM =

r⋃i=1

GSmi

where

GSmi=

{(idS(mi) rdf:type qb:MeasureProperty)}∪

{(idS(mi) rdfs:range L)}

Extension 11. (Spatial measures) A measure is spa-tial if it is defined over a spatial domain as in spatialattributes (Ext. 8). A spatial measure mis belongs tothe set of spatial measures Ms, which is a subset ofthe set of measures M , such that mis ∈ Ms ⊆ M .The RDF formulation of the spatial measures is similaras in Def. 11. However, the domain should range overspatial literals Ls. The RDF graph formulation of thespatial measures Ms (w.r.t. Def. 11) is represented as

GSMs=

r⋃i=1

GSmis

where

GSmis=

{(idS(mis) rdf:type qb:MeasureProperty)}∪

{(idS(mis) rdfs:range Ls)} ∪ {(idS(mis)

rdfs:subClassOf geo:SpatialObject)}

Example 7. The triples below show how the measuresof the GeoNorthwind DW (Fig. 7) are represented inRDF using Def. 11 and Ext. 11. Note that SalesPointis a spatial measure, which records the location of the

Page 13: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web 13

stores in which the sales occurred. It is defined overGeometry domain as a Point type with WKT literal.

# Measuresgnw:quantity rdf:type qb:MeasureProperty ;

rdfs:range xsd:integer .gnw:unitPrice rdf:type qb:MeasureProperty ;

rdfs:range xsd:decimal .gnw:discount rdf:type qb:MeasureProperty ;

rdfs:range xsd:decimal .gnw:salesAmount rdf:type qb:MeasureProperty ;

rdfs:range xsd:decimal .gnw:freight rdf:type qb:MeasureProperty ;

rdfs:range xsd:decimal .gnw:salesPoint rdf:type qb:MeasureProperty ;

rdfs:domain geo:Point;rdfs:range geo:wktLiteral ;rdfs:subClassOf geo:SpatialObject .

Definition 12. (Fact) In an n-dimensional cube schemaCS = (D,M,F ), the fact F defines the structure of acube with the qb:DataStructureDefinitionproperty. The dimensions are given as componentsof the fact and are defined with the qb4o:levelproperty. We assume that the fact F links the di-mensions at the lowest granularity level, thereforeqb4o:level links the lowest (base) level lb ofeach dimension di, which is denoted as lb(di). Thecardinality card of the relationship between a di-mension level and a fact is represented with theqb4o:cardinality property. Similarly, the mea-sures are given as components of the fact and aredefined with the qb:measure property. The aggre-gate function aggr associated to each measure is repre-sented with the qb4o:aggregateFunction prop-erty. The RDF graph formulation of the fact F is givenin the following equation.

GSF = {(idS(F )

rdf:type qb:DataStructureDefinition)}∪⋃di∈D

{(idS(F ) qb:component

[qb4o:level idS(lb(di));

qb4o:cardinality idS(card)])}∪⋃mi∈M

{(idS(F ) qb:component

[qb:measure idS(mi);

qb4o:aggregateFunction idS(aggr)])}

Extension 12. (Spatial fact) A fact is spatial if it re-lates several levels, where two or more are spatial.

A spatial fact may also have a topological relationtopoRel that must be satisfied by the related spatiallevels, which is represented with qb4so:topologi-calRelation. This object property allows to spec-ify a topological relation in fact-level relationship ofspatial facts. The RDF graph formulation of such a factis simply by adding the property of fact-level topolog-ical relation consecutively to the cardinality propertyas given in the following equation.

GSFs= {(idS(Fs)

rdf:type qb:DataStructureDefinition)}∪⋃di∈D

{(idS(Fs) qb:component

[qb4o:level idS(lb(di));

qb4o:cardinality idS(card);

qb4so:topologicalRelation idS(topoRel)]}∪⋃mi∈M

{(idS(F ) qb:component

[qb:measure idS(mi);

qb4o:aggregateFunction idS(aggr)])}

Example 8. The triples below show how the fact ofthe GeoNorthwind DW (Fig. 7) is represented in RDFusing Def. 12. Sales fact does not impose any topolog-ical relation between its spatial dimensions Supplierand Customer. SalesPoint is a spatial measure, whichhas a spatial aggregate function (AggrConvexHull).

# Cube definitiongnw:GeoNorthwind rdf:typeqb:DataStructureDefinition ;

# Lowest level for each dimensionqb:component [qb4o:level gnw:customer ;qb4o:cardinality qb4o:ManyToOne] ;qb:component [qb4o:level gnw:supplier ;qb4o:cardinality qb4o:ManyToOne] ;qb:component [qb4o:level gnw:product ;qb4o:cardinality qb4o:ManyToOne] ;...# Cube measuresqb:component [qb:measure gnw:quantity ;

qb4o:aggregateFunction qb4o:Sum] ;qb:component [qb:measure gnw:unitPrice ;

qb4o:aggregateFunction qb4o:Avg] ;qb:component [qb:measure gnw:discount ;

qb4o:aggregateFunction qb4o:Avg] ;...qb:component [qb:measure gnw:salesPoint ;

qb4o:aggregateFunction qb4so:AggrConvexHull] .

Page 14: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

14 N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web

4.2. Defining spatial data cube members withQB4SOLAP

We have explained in Sect. 4.1 how a data cubeschema can be represented in RDF with QB4SOLAP.We show next how to use this schema to represent theinstances of the GeoNorthwind DW (Fig. 7) in RDF.We denote by GI the RDF graph of the data cube in-stances. In the examples, we prefix the elements of GIwith gnwi:. The subgraph of GI that refers to a spe-cific cube instance x is denoted by GIx and the uniqueidentifier of x is denoted by idI(x).

Definition 13. (Level members) A level l has a setof level members LM(l) = {lm1, . . . , lmy}. Eachlevel member lmi has a unique IRI idI(lmi) ∈ I,which is linked in the cube instance graph GI withthe qb4o:LevelMember predicate. A level memberis related to its level by the qb4o:memberOf prop-erty. The RDF graph formulation of the level membersLM(l) is represented as

GILM(l) =

y⋃i=1

GIlmi

where

GIlmi=

{(idI(lmi) rdf:type qb4o:LevelMember)}∪

{(idI(lmi) qb4o:memberOf idS(l))}

Definition 14. (Attributes of level members) A levelmember lm has a set of attributesA(lm) = {a1, . . . , ap},which are used to describe the characteristics of thelevel member (Def. 8). Each attribute ai is linked to thelevel member with the identifier idS(ai). We denote bylm vai the value vai that a level member lm asso-ciates to attribute ai. This value is given as a literal Lsuch that vai ∈ L. The RDF graph formulation of theattributes A(lm) is represented as

GIA(lm) =

p⋃i=1

GIai

where

GIai= {(idI(lm) idS(ai) vai) | lm vai}

Definition 15. (Partial order on level members) A hi-erarchy step hs = (lc, lp, card) between a child levellc and a parent level lp defines a set of roll-up relationsRU(hs) = {r1, . . . , rk} where each ri = lmci vlmpi relates a child level member lmci ∈ LM(lc) toa parent level member lmpi ∈ LM(lp). These roll-up relations define a partial order between level mem-bers with regard to Def. 10 and are expressed using theproperty skos:broader. The RDF graph formula-tion of the roll-up relations RU(hs) is represented as

GIRU(hs) =

k⋃i=1

GIri

where

GIri = {(idI(lmc) skos:broader idI(lmp)) |

ri = lmci v lmpi}

Example 9. The triples below show how some levelmembers of the GeoNorthwind DW (Fig. 7) are repre-sented in RDF using Defs. 13–14.

gnwi:customer_1 rdf:type qb4o:LevelMember ;qb4o:memberOf gnw:customer ;gnw:customerID 1 ;gnw:customerName "Alfreds Futterkiste" ;gnw:address "Obere Str. 57" ;gnw:postalCode "12209" ;gnw:customerGeo"POINT(13.099 52.401)"ˆˆgeo:wktLiteral ;skos:broader gnwi:city_6 .

gnwi:city_6 rdf:type qb4o:LevelMember ;qb4o:memberOf gnw:city ;gnw:cityName "Berlin" ;gnw:cityGeo"POINT(13.4060 52.519)"ˆˆgeo:wktLiteral ;skos:broader gnwi:state_224 .

Definition 16. (Fact members) A fact F has a set offact members FM(F ) = {f1, . . . , ft}, which are theinstances of the data cube. Each fact fi ∈ FM has aunique IRI idI(fi) ∈ I, which is linked in the cubeinstance graph GI with the qb:Observation pred-icate.

A fact member fi is related to a set of dimensionlevels L(fi) = {l1, . . . , lr} and has a set of measuresM(fi) = {m1, . . . ,ms}. Each dimension level lj islinked to the level member with the identifier idS(lj)and each measure mk is linked to the level memberwith the identifier idS(mk). We denote by f vljand f vmk

, respectively, the dimension values and

Page 15: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web 15

measure values associated with a fact f . The valuevlj ∈ I is the identifier of a level member in LM(lj).Further, the value vmk

for every measure mk is a lit-eral such that vmk

∈ L. The RDF graph formulationof the fact members FM(F ) is represented as

GIFM(F ) =

t⋃i=1

GIfi

where

GIfi = {(idI(fi) rdf:type qb:Observation)}∪⋃

lj∈L(fi)

{(idI(fi) idS(lj) idI(vlj ) | fi vlj )}∪

⋃mk∈M(fi)

{(idI(fi) idS(mk) idI(vmk ) | fi vmk )}

Example 10. The triples below show how a fact mem-ber of the GeoNorthwind DW (Fig. 7) is representedin RDF using Defs. 12–16. Note that the fact memberand corresponding level members relating to dimen-sions are given with the prefix gnwi:. idS(aID) is thesurrogate key (Def. 8) that links the fact member to thecorresponding dimensions’ base level members.

gnwi:sale_10613_1 rdf:type qb:Observation ;gnw:customer gnwi:customer_1 ;gnw:supplier gnwi:supplier_6 ;gnw:product gnwi:product_13 ;...gnw:quantity 8 ;gnw:unitPrice "6,00"ˆˆxsd:decimal ;gnw:discount "0,10"ˆˆxsd:decimal ;gnw:salesPoint"POINT(23.08 42.34)"ˆˆgeo:wktLiteral.

5. Semantics of SOLAP operators

This section defines a formal algebra for SOLAP op-erators. Examples of the operators are provided aftertheir definitions. The complete SPARQL query exam-ples are given at our website13 and can be tested at ourpublic endpoint14. The query runtimes for each SO-LAP operator are given in Appendix A1, Table 5 forthe use case dataset GeoNorthwind (Sect. 4.1, Fig. 7).These operators can be applied on spatially enhanced

13http://extbi.cs.aau.dk/SOLAP4SW/queries14http://extbi.lab.aau.dk/sparql

multidimensional data cubes (Sect. 3.3). The presen-tation defines the semantics of a SOLAP operator bylogically specifying the typical OLAP operators withspatial functions and conditions. Spatial functions andconditions can be selected from a range of operationclasses, which can be applied on spatial data types(Sect. 3.1). Let S be the set of any spatial operatorswhere S = (Sagg ∪ Trel ∪ Nop), used to represent aspatial predicate φS ∈ S or a spatial function fS ∈ S,which is in a SOLAP operator. The following SOLAPoperators are defined with a spatial extension to thewell-known OLAP operators, which are given in theremarks.

Remark 17. (Slice) The slice operator removes a di-mension from a cube C by selecting one instance ina dimension level. For example, the query “slice oncustomers in the city of Odense” is a slice operation.(Cube is the sales, dimension is the customer, level indimension is the city and the value is Odense, which issliced out from the cube).

Definition 17. (S-Slice) The s-slice operator removesa dimension from a cube C by choosing a single spatialattribute value vs ∈ Ls (Ext. 8) in a spatial level ls(Ext. 7).

As for the semantics, s-slice takes an n-dimensionalcube C as an argument. We assume that the cube hasthe cube schema CS = (D,M,F ), with the fact mem-bers f ∈ FM as given in Def. 16. As parameters, s-slice takes a spatial literal value vs, the base level lband the target (spatial) level ls of a dimension di. Thebase level lb specifies the dimension di (Def. 16). Thetarget spatial level ls is the level, that the spatial literalvalue vs is related.

The operator is defined as: SS(C)[lb, ls, vs] = C′,which returns a cube C′ with n− 1 dimensions and theschema CS = (D′,M ′, F ′), where D′ = D \ {di} ,M ′ = M , and F ′ = F . The measures M and the facttype F remains the same though the new cube C′ hasone dimension less.

The s-slice operator selects a subset FM ′ from theset of fact members FM (FM ′ ⊆ FM ), with respectto the given parameter vs. Assuming that the granular-ity of the fact members are at the (lowest) base levelof the dimension lb ∈ L(di) in the given cube, a par-tial order exists among the levels, from bottom level tothe target spatial level ls such that lb v ls. The givenparameter vs is related to a level member of the levells. We say that the fact members are characterized bydimension values, which is written as f vdi wherevdi ≡ vlb(di) (Def. 16). In other words, dimensions

Page 16: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

16 N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web

are associated to the fact members by the values of thedimensions’ base level members vlb(di) . When the di-mension di is clear in the context, we will use baselevel vlb for simplicity reasons.

To sum up, the subset FM ′ of facts is selected withregards to the partial order on levels from base levellb to the target level ls. The value vls in the targetlevel ls is specified with respect to the given spatialliteral value vs. The value of vs might be equal to aspatial attribute value in the target level ls, thus vls ischaracterized by the attribute value vs and written asvls vs (Ex. 11). Or, vs is an arbitrary spatial literalthat entails a topological relation Trel (i.e., within) ina value of the target spatial level vls , which is writtenas ∃vls : φS(vs) where φS is a spatial Boolean predi-cate that represents a topological relation (Ex. 12). Af-ter applying the s-slice operator on cube C, the new(sub)set of fact members is defined for both cases re-spectively as follows; FM ′ = {f ∈ FM | ∃ vlb ∈LM(lb), vls ∈ LM(ls) : f vlb ∧ vlb v vls ∧ vls vs}, FM ′ = {f ∈ FM | ∃ vlb ∈ LM(lb), vls ∈LM(ls) : f vlb ∧ vlb v vls : φS(vs)}.

Example 11. With regards to the traditional slicequery “slice on customers in the city of Odense”,in s-slice, the user could specify a geometry extent(e.g., polygon coordinates of the city of Odense) asspatial literal for slicing instead of giving a text lit-eral (e.g., “Odense”). So the s-slice query wouldbe; “slice on customers of the city, which has thegeometry "POLYGON((10.43951 55.47006,10.439472 55.470036, 10.439240(...))"”.More intuitively, instead of the specified spatial literalvs ∈ Ls, the user can pass a function call as parameterto s-slice, e.g., by querying “slice on customers in thelargest city of (southern) Denmark by land area”. Thefunction call should calculate the area of the cities bytheir geometries where the largest city is selected asa requirement of the s-slice operator. Both cases aregiven in the following.

1. The following SPARQL query shows an s-sliceoperator, which filters with the given spatial lit-eral by the user.

SELECT ?obs WHERE {?obs rdf:type qb:Observation ;

gnw:customerID ?cust .?cust qb4o:memberOf gnw:customer ;

skos:broader ?city .?city gnw:cityGeo ?cityGeo .

FILTER (?cityGeo = "POLYGON((10.439517 55.470064,10.4394729 55.4700361, 10.4392403 (...))") }

2. The following SPARQL query shows the s-sliceoperator, which filters with the function call(largest city) returned from inner select. Giventhe current limitations of SPARQL, there is notan area calculation function from the geometriesof the spatial objects during query run time, how-ever we give the query with a notional built-inbif:st_area function.

SELECT ?obs WHERE {?obs rdf:type qb:Observation ;

gnw:customerID ?cust .?cust qb4o:memberOf gnw:customer ;

skos:broader ?city .?city gnw:cityGeo ?cityGeo .

# Inner select for finding the largest city{SELECT ?x (MAX(?area) as ?maxArea)WHERE {

?obs rdf:type qb:Observation ;gnw:customerID ?cust .

?cust qb4o:memberOf gnw:customer ;skos:broader ?city .

?city gnw:cityGeo ?x .BIND(bif:st_area(?x) as ?area)}

FILTER (?cityGeo = ?x) }

Example 12. With regards to the traditional slicequery “slice on customers in the city of Odense”, inthis example of s-slice, the user gives a point geom-etry (i.e., X,Y coordinates of a point as spatial lit-eral) and filter at the given level (i.e., City level) thatthe given point is within. So the s-slice query wouldbe; “slice on customers of the city, in which the given"POINT(10.43951 55.47006)" is within”.

The following SPARQL query shows an s-slice op-erator, which filters at the specified level with the givenspatial literal by the user.

SELECT ?obs WHERE {?obs rdf:type qb:Observation ;

gnw:customerID ?cust .?cust qb4o:memberOf gnw:customer ;

skos:broader ?city .?city gnw:cityGeo ?cityGeo .

FILTER (bif:st_within("POINT(10.43951 55.47006)",?cityGeo)) }

Note that the s-slice can be operated in differentways based on the geometry given to the query. In bothEx.s 11 and 12, slice level is given as City, however inEx. 12 a random X,Y point is given that is falling intothe target city. Therefore we need to use within fromtopological relationships (Trel) class in order to verifyand filter that city.

Remark 18. (Dice) The traditional dice operator takesa cube and a Boolean condition φ, which returns a new

Page 17: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web 17

cube containing only the cells that satisfy the Booleancondition φ. Dice operation is analogous to relationalalgebra, R selection; σφ(R), but the argument is a cubenot a relation. For example, the query “sales to cus-tomers of type LLC (Limited Liability Company)” isa dice operation. (Cube is the sales, dimension is thecustomer, and Boolean condition is the customer typeif they are LLC).

Definition 18. (S-Dice) Similarly, the s-dice operatortakes an n-dimensional cube C as an argument, whichhas the cube schema CS = (D,M,F ) with the factmembers f ∈ FM as given in Def. 16. As a parameters-dice takes a spatial Boolean predicate, which is de-noted by φS . The s-dice operator keeps the cells of thecube C that satisfies the spatial predicate over spatialdimension levels ls, attributes as, and measures m.

The semantics of the operator is defined as:SD(C)[φS ] = C′ where spatial predicate φS can beapplied on spatial level member values φS(vls), spa-tial attribute values φS(vas), measure values φS(vm)and/or a combination of these.SD operator returns a sub cube C′ ⊆ C, which has

the schema CS = (D′,M ′, F ′) whereD′ = D ,M ′ =M , and F ′ = F . Unlike the s-slice operator, s-dicekeeps all the dimensions D in the output cube C′. Theset of measuresM and the fact type F also remains thesame, though the new cube C′ is a subset of the originalcube C with filtered fact members f ∈ FM ′, which isexplained in the following.

The s-dice operator selects a subset FM ′ of the factmembers’ set FM ′ ⊆ FM with respect to the spatialpredicate φS on level members as follows;

1. Spatial predicate on level values: FM ′ = {f ∈FM | ∃ vlb ∈ LM(lb), vls ∈ LM(ls) : f vlb ∧ vlb v vls ∧ φS(vls)}.

2. Spatial predicate on level attribute values:FM ′ = {f ∈ FM | ∃ vlb ∈ LM(lb), vls ∈ LM(ls)∧vls vas : f vlb ∧ vlb v vls ∧ vls vas ∧φS(vas)}.Note that the filtering the facts through level memberscan be done by vls (level values) or attribute values vasby applying the spatial predicate φS . Finally filteringof the facts is on associated measure values is definedin the following;

3. Spatial predicate on measure values of ms: FM ′ ={f ∈ FM | ∃ vms

∈ Codomain(ms) : f vms∧

φS(vms)}.

For complex cases, i.e., combining these three types;the result set is also followed by combining the basicresult sets.

Example 13. The s-dice operator can be implementedon level and attribute values by filtering level membersin the cube or on measures by filtering the facts in thecube. In both cases the spatial predicate φS is used.

The query for the s-dice operator could be “sales tocustomers, which are located within 5 km distance fromtheir city center” where the s-dice is on level membersby filtering the customer level. The spatial predicateφS can be interpreted in two different ways (See Ap-pendix A1 for comparison of their query run times).

1. First method is assuming a buffer area of 5 kmfrom the coordinates of city center and check-ing customers’ locations by within operator fromtopological relations φS ∈ Trel if it meets thecondition. The following SPARQL query showsthe implementation of this method on level mem-bers.

SELECT ?obs WHERE {?obs rdf:type qb:Observation ;

gnw:customerID ?cust .?cust qb4o:memberOf gnw:customer ;

skos:broader ?city ;gnw:customerGeo ?custGeo .

?city gnw:cityGeo ?cityCentGeo .FILTER (bif:st_within (?custGeo, ?cityCentGeo,5))}

2. Second method is checking if the distance from acustomer location to the corresponding city cen-ter is less than 5 km, by using distance functionfrom numeric operations fS ∈ Nop. In this casethe spatial predicate φS is a combination of a spa-tial function fS and a regular Boolean predicateφ. Spatial function is distance from numeric op-erations and the predicate is less than (<). Thefollowing SPARQL query shows the implementa-tion of this method for s-dice on level members.

SELECT ?obs WHERE {?obs rdf:type qb:Observation ;

gnw:customerID ?cust .?cust qb4o:memberOf gnw:customer ;

skos:broader ?city ;gnw:customerGeo ?custGeo .

?city gnw:cityGeo ?cityCentGeo .BIND (bif:st_distance (?custGeo, ?cityCentGeo)AS ?distance) FILTER ( ?distance < 5 ) }

Remark 19. (Roll-up) The traditional roll-up opera-tor aggregates measures according to a dimension hi-erarchy (by using an aggregate function), in order toobtain measures at a coarser granularity for a given di-mension. For example, the query “total amount of salesto customers by city” is a classical roll-up operation.

Page 18: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

18 N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web

(Cube is the sales, dimension is the customer, level indimension to roll-up is the city such that customer vcity, measure is the sales amount and aggregate func-tion is the sum in order to calculate the total sales.)

Definition 19. (S-Roll-up) Similarly to roll-up opera-tor, s-roll-up aggregates measures m ∈ M of a givencube C, by using an aggregate function and a spatialfunction fS ∈ S (Sect. 3.1) along a spatial dimension’shierarchy hs (Ext. 6), which should have spatial levelsls (Ext. 7). However, in s-roll-up the dimension hier-archy is created dynamically on levels by the spatialfunction fS . We call this hierarchy a dynamic spatialhierarchy, conceptually from a base level lb to the dy-namically created target level l′s such that lb vd l′s. Theinstances of the target level l′s are obtained by the spa-tial function fS(ls) that is applied on spatial dimensionlevels.

As for the semantics, s-roll-up takes an n-dimensionalcube C as an argument, which has the cube schemaCS = (D,M,F ) with the fact members f ∈ FMas given in Def. 16. As a parameter s-roll-up takes aspatial function fS ∈ S to operate on levels L(di)and an aggregate function agg to calculate a measurem at the higher target level. For simplicity of expla-nation and without loss of generality, we initially as-sume that there is only one measure m. The exten-sion of the operator on several measures m ∈ M isexplained in the last paragraph. S-Roll-up operator isformulated as; SRU(C)[fS(L(di)), agg(m)] = C′,which returns a cube C′ with n-dimensions and has theschema CS = (D′,M ′, F ′) where F ′ = F , M ′ =M ,and D′ = {di ∈ D | {d1, . . . , d′i, . . . , dn} ∧ L′(d′i) =L(di) \ (lb vd . . . <d ls) ∪ {l′s}}. After the s-roll-up operation, number of dimensions in D remains thesame, although the base levels and levels below thetarget level (lb vd . . . <d ls) of the correspondingdimension di are left out and a new target level l′s isadded to the set of dimension levels L′(d′i) of d′i.

The set of level members of the level l′s is se-lected with respect to the spatial function on base levelmembers of a spatial dimension such that LM(l′s) ={fS(vlb) | vlb ∈ LM(lb)} where lb vd l′s ⇐⇒fS(vlb) = vl′s , which means that the base level lb rollsup along the spatial dynamic hierarchy (vd) to the tar-get new spatial level l′s if and only if spatial functionon base level fS(vlb) = vl′s produces the new spa-tial level members vl′s . Even though the set of mea-sures M remains the same, the s-roll-up operator ob-tains the measure values associated with fact membersf ′ at a coarser granularity l′s, which alters the set of

facts FM ′ * FM . In order to create the new set offacts FM ′ at the new granularity level l′s, the Groupoperator [29] is used to group the facts characterizedby the same level members vl′s ∈ LM(l′s) such thatGroup(vl′s) = {f ∈ FM | ∃ vlb ∈ LM(lb) :f vlb ∧ vlb vd vl′s}. The output of the Groupoperator on level members is a new fact instance f ′.In order to aggregate the measure values vm, whichare associated with the fact members f we use an ag-gregate function agg such that agg({f1, . . . , fk}) =agg(vm1

, . . . , vmk) where fi vmi

, i = 1, . . . , k.Finally, the set of the new facts f ′ ∈ FM ′ is con-structed, that is given with the associated new levelmembers and aggregated measure values as; FM ′ ={f ′ = Group(vl′s) | ∃ vl′s ∈ LM(l′s) : f ′ vl′s ∧ f

′ agg(Group(vl′s))}.The extension to multiple measures is similar, which

is done by providing and using a separate aggregatefunction for each measure m ∈M .

Example 14. The following SPARQL query showsthe s-roll-up operator, which is exemplified in Sect. 3.6.The query is “total amount of sales to customers bycity of the closest suppliers”. Note that the measuresare aggregated up to a new city from customer levelof the customer dimension, which is specified as theClosest City. The hierarchy step from customer to cityis defined dynamically by a spatial function fS (dis-tance from numeric operations Nop ⊂ S), which isthen used in a wrapper function to find the closestdistance of the suppliers and customers. The levelsand level members (of customer), which are below thenewly defined level (Closest City) are left out in theresult.

SELECT ?city (SUM(?sales) AS ?totalSales)WHERE { ?obs rdf:type qb:Observation ;

gnw:customerID ?cust ;gnw:supplierID ?sup;gnw:salesAmount ?sales .

?cust qb4o:memberOf gnw:customer ;gnw:customerGeo ?custGeo ;gnw:customerName ?custName ;skos:broader ?city .

?city qb4o:memberOf gnw:city .?sup gnw:supplierGeo ?supGeo .# Inner Select for the total sales to# the closest supplier of the customer

{ SELECT ?cust1 (MIN(?distance) AS?minDistance) WHERE{ ?obs rdf:type qb:Observation ;

gnw:customerID ?cust1 ;gnw:supplierID ?sup1 .

?sup1 gnw:supplierGeo ?sup1Geo .?cust1 gnw:customerGeo ?cust1Geo .BIND (bif:st_distance( ?cust1Geo, ?sup1Geo )

Page 19: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web 19

AS ?distance)}GROUP BY ?cust1 }FILTER (?cust = ?cust1 && bif:st_distance(?custGeo, ?supGeo) = ?minDistance)}

GROUP BY ?city

Remark 20. (Drill-down) Drill-down is the inverseoperator of roll-up, which disaggregates previouslysummarized data to a child level in order to obtainmeasures at a finer granularity of a given dimension.For example, the roll-up query given in Remark 19(“total amount of sales to customers by city”) aggre-gates sales by summing up the sales amount, from cus-tomer level to city level along a hierarchy. As drill-down operator performs the operation opposite to theroll-up an example would be; “average amount of salesof each supplier, drilled down from the city level to thesupplier level”. (Cube is the same as sales, and the hier-archy is the same but the dimension is the supplier, sochild level in dimension to drill-down from city levelis the supplier such that City w Supplier). Concep-tually, a drill-down to level li on a cube C correspondsto a roll-up to the same level li on the base cube of C,that is denoted as BaseCube(C).

Definition 20. (S-Drill-down) Analogously to drill-down operator, s-drill-down disaggregates measuresm ∈M of a given cube C, by using an aggregate func-tion and a spatial function fS (Sect. 3.1) along a spatialdimension’s hierarchy hs (Ext. 6), which should havespatial levels (i.e., ls) (Ext. 7).

Conceptually, in s-drill-down, the dimension hier-archy is created dynamically on levels by the spatialfunction fS as in s-roll-up. This is similar to the dy-namic spatial hierarchy defined in Def. 19, that is froma spatial parent level lps to a dynamically created spa-tial child level l′cs such that lps wd l′cs . The target spa-tial child level l′cs is the output of the spatial functionfS on spatial levels lis ∈ L(di) of the spatial dimen-sion. Applying s-drill-down to child level l′cs from aparent level lps on a cube C corresponds to applyings-roll-up to the same level l′cs from the base level lb onthe base cube of C. Therefore, the semantics of the s-drill-down is described same as s-roll-up and the oper-ator is formulated as SDD(C)[fS(L(di)), agg(m)] =SRU(BaseCube(C))[fS(L(di)), agg(m)].

Example 15. In order to exemplify an s-drill-down,starting from the result cube graph of Ex. 14 (“totalamount of sales to customers by city of the closest sup-plier”), which is at the granularity of City level, wedrill down to child level Supplier with the query “av-erage amount of sales of furthest suppliers to their city

center, drilled down the from City level to Supplierlevel”. The following SPARQL query shows the givenexample.

SELECT ?sup (AVG(?sales) AS ?averageSales)(MAX(?distance) AS ?maxDistance)

WHERE { ?obs rdf:type qb:Observation ;gnw:supplierID ?sup ;gnw:salesAmount ?sales .

?sup qb4o:memberOf gnw:supplier ;gnw:supplierGeo ?supGeo ;gnw:supplierName ?supName ;skos:broader ?city .

?city qb4o:memberOf gnw:city ;gnw:cityGeo ?cityCentGeo .

BIND (bif:st_distance(?supGeo, ?cityCentGeo)AS ?distance)}

FILTER (?distance = ?maxDistance)}GROUP BY ?sup

In this paper, we focus on direct querying of singledata cubes with main SOLAP operators in SPARQL.The integration of several cubes through s-drill-acrossor set-oriented operations such as union, intersection,and difference [7] is out of scope and remained as fu-ture work.

6. Generating SOLAP queries in SPARQL viaQB4SOLAP

After having defined the high-level SOLAP oper-ators in Sect. 5, this section first describes how togenerate SPARQL queries for each of these operatorsby using the QB4SOLAP metamodel (Sect. 4). After-wards, this section describes how to create more com-plex SPARQL queries for nested SOLAP operations.

6.1. Generation algorithms

The generated SPARQL queries Q are of the form“Q = SELECT R WHERE GP ", where GP is a graphpattern containing triple patterns and R is the (setof) variable(s) that are returned in the result of thequery. Triple patterns are based on triples of the form(s, p, o) (Def. 4), where triple components are replacedby variables. A set of triple patterns defines a graphpattern GP . Given an RDF graph G, a graph patternGP is used to search for subgraphs G(R) ⊆ G match-ing the pattern. In our algorithm, the graph pattern isinitially empty, GP = ∅, and the triple patterns areadded incrementally to the body of the WHERE clause:GP = GP ∪ (s p o).

Page 20: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

20 N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web

RDF datasets published with the QB4SOLAP vo-cabulary use the skos:broader property to definethe roll-up relation from child level to parent level(Defs. 13 and 15). As this is the case for all hierar-chy levels in a dimension, every OLAP query containssuch roll-up paths that we need to consider as part ofGP in the WHERE clause.

Thus, we define a helper function RUPath (Algo-rithm 1) that we can use in the SOLAP query genera-tion algorithms.

Algorithm 1: RUPath(GS(C), lb, ls, aID, ?as, ?f):GP

Input: GS(C), lb, ls, aID,?as, ?fOutput: GP

1 begin2 GP = (?f rdf:type qb:Observation)3 GP = GP ∪ (?f idS(aID) ?lb ∧ ?lb

qb4o:memberOf idS(lb))4 foreach (idS(lc), id

S(lp)) ∈ GS(C) | lp v ls do5 GP = GP ∪ (?lb skos:broader ?lp ∧

?lp skos:broader ?ls)

6 let GP = GP ∪ (?ls idS(as) ?as)

7 return GP

Build roll-up path (RUPath). The helper func-tion RUPath returns a graph pattern that we canuse in the body of the WHERE clause. The roll-up path pattern is created as a path-shaped join oftriples with partial order (v , skos:broader)(Def.s 10 and 15). The triple pattern is of the form{(s1 p1 o1), (o1 p2 o2), (o2 p2 o3), . . . (on−1 p2 on)}where s1 is the root of the graph pattern and cor-responds to fact members f from the QB4SOLAPschema (Def. 16), p1 is the predicate idS(aID) thatassociates facts with level members vli (f vli ,Def. 16), o1 is the variable for the first level memberthat rolls up to its parent level o2 such that o1 v o2and so on, and the p2 predicate corresponds to theskos:broader property. The last variable in thepath on corresponds to the target level ls in orderto represent the level member variables at the targetlevel. The roll-up path starts at the fact instances f(Def. 16). Afterwards, the partial order on level mem-bers (Def. 15) from base level lb to target level ls isapplied. Algorithm 1 sketches the helper function forbuilding the roll-up path for dimensions; from factsto dimension levels with predicates and cube memberIRIs defined in the cube schema.

In order to represent such varying parameters at theinstance level such as fact members, level members,or parameter values given by the user, and to distin-guish these parameters from other parameters in the al-gorithm, we represent such parameters using variablenames with question marks.

We use a FILTER expression to restrict the out-put data by using a (spatial) Boolean predicate φS .A FILTER expression is part of the WHERE clausein a SPARQL query. Therefore, it is added to thebody of the WHERE clause in the graph pattern GPas GP = GP ∪ (FILTER φS). In the cases wherethere is a spatial function fS(x) in the SOLAP oper-ator, it is given in the BIND clause, which is techni-cally a part of the WHERE clause and therefore addedto the body of the WHERE clause in a graph patternGPas GP = GP ∪ (BIND fS(x)). SPARQL 1.1 definesaggregate expressions15, such as SUM, MIN, MAX,AVG, etc.

We apply them on measure values or use them aswrappers in spatial functions. In the following, we of-ten write AGG to represent them.

In the following, we present the SPARQL querygeneration algorithms for the SOLAP operators de-fined in Sect. 5. The algorithms take the input parame-ters and arguments of the SOLAP operator and returnthe a SPARQL query Q that can be executed.

S-Slice generator. To generate a SPARQL query forthe s-slice operator SS(C)[lb, ls, vs] (Def. 17), we useAlgorithm 2. Parameter vs is a spatial literal value vs ∈Ls (i.e., POINT or POLYGON) that should be related toa spatial level ls (Ext. 7). This means that vs is definedas a polygon geometry that corresponds to a spatialattribute value in the target level ls (Ex. 11) or vs isdefined as a point geometry that is spatially containedin a spatial attribute value of the target level ls (Ex. 12).Note that in Ex. 11.1, the given spatial literal has thegeometry data type polygon, which corresponds to aspatial level attribute as (Ext. 8) at a spatial level ls.Similarly, the spatial function call fS(x) in Ex. 11.2returns a polygon that corresponds to a spatial levelattribute as.

On the other hand the given spatial literal in Ex. 12has the geometry data type point, which correspondsto the spatial level ls via topological relations (Trel).We consider all these possibilities in the s-slice gen-erator algorithm. We explained these in the following,

15https://www.w3.org/TR/sparql11-query/#aggregates

Page 21: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web 21

Algorithm 2: S−SliceGenerator (GI(C), vs, lb, ls): Q

Input: GI(C), vs, lb, lsOutput: Q

1 begin2 Q = ∅; GP =

RUPath(GS(C), lb, ls, aID,?as,?f)3 if vs is a POINT then4 GP = GP∪ (FILTER(st_within vs,

?as))5 else if vs = fS(x) then6 Q′=∅;

GP ′=RUPath(GS(C), lb, ls, aID,?as,?f)7 GP ′ = GP ′∪ (BIND fS(x) AS ?vx)8 Q′ = SELECT ?x AGG(?vx) WHERE GP ′

9 GP = GP ∪Q′ ∪ (FILTER ?x = ?as)

10 else11 GP = GP ∪ (FILTER vs = ?as )

12 return Q = SELECT ?f WHERE GP

where the steps are referencing the line numbers in Al-gorithm 2.

Line 2. Get the path for dimension ds (e.g., Customer)from the observation facts f to the base level lb,and build path-shaped triple pattern paths fromthe dimension’s base level lb to the target spa-tial level ls (e.g., City level). Finally, get level at-tribute IRIs and variables for the spatial attributesas (e.g., City geometry). All this is done by theRUPath function (Algorithm 1) that is used bythe s-slice generator. The following shows an ex-ample result of this step that is added to GP :

{?obs rdf:type qb:Observation ;gnw:customerID ?cust .

?cust qb4o:memberOf gnw:customer ;skos:broader ?city .

?city gnw:cityGeo ?cityGeo .

Line 3. Check if the spatial literal vs is a point geom-etry type. If true, create a FILTER statement witha spatial Boolean predicate (Line 4) and go to theresult (Line 12).

Line 4. Build the FILTER statement based on the spa-tial literal vs and the spatial attribute as (Ex. 12).As a result the following lines might be added tothe GP :

FILTER (bif:st_within("POINT(10.4395155.47006)", ?cityGeo))}

Line 5. Check if vs is a function call fS(x). If true(Ex. 11.2), construct an inner select query to com-pute the spatial function fS(x), then go to the re-sult (Line 12).

Line 6. Call the RUPath function in order to link asvariables with the fact instances (this time for in-ner select queryQ′). This step creates a graph pat-tern GP ′ for inner select query Q′, for example:

{?obs rdf:type qb:Observation ;gnw:customerID ?cust .

?cust qb4o:memberOf gnw:customer ;skos:broader ?city .

?city gnw:cityGeo ?x .

Line 7. Build a bind statement on as variables for cal-culating spatial functions (e.g., compute areas).For example, the following lines might be addedto graph pattern GP ′:

BIND (bif:st_area(?x) as ?area) }

Line 8. Generate the inner select query Q′ based onGP ′ generated in Lines 6 and 7. For example (Q′

finds the geometry of the largest city):

Q‘ = {SELECT ?x (MAX(?area) as ?maxArea)WHERE {?obs rdf:type qb:Observation ;

gnw:customerID ?cust .?cust qb4o:memberOf gnw:customer ;

skos:broader ?city .?city gnw:cityGeo ?x .BIND (bif:st_area(?x) as ?area) }

Line 9. Build the filter statement with the output ofthe spatial function fS(x), construct GP (in-cludes Q′) for the outer query, and go to the re-sult (Line 12). At this stage GP is constructed inLines 2 and 8. The following for the filter state-ment is added to the GP :

FILTER (?cityGeo = ?x) }

Line 11. If a spatial literal vs ∈ L is given as theparameter instead, build a filter statement thatchecks if vs is equal to the spatial attribute as val-ues, and go to the result (Line 12). For example,the following filter condition might be added tograph pattern GP :

FILTER (?cityGeo = "POLYGON((10.4395155.47006, 10.439472 55.470036,10.439240 (...))")}

Line 12. Finally, the algorithm generates query Q,which can be executed over the fact membersFM ′. In our running examples we obtain the fol-lowing cases for the generated s-slice query Q.

Page 22: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

22 N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web

S-Slice operator with a given spatial value as pointdata type: The following listing corresponds to theSPARQL output of the running example where the spa-tial value is given as POINT data type (Ex. 12) and fil-ters the level attributes with a spatial predicate withina given level. The graph pattern GP for the query iscreated in Lines 2 to 7.

1 Q = SELECT ?obs WHERE2 {?obs rdf:type qb:Observation ;3 gnw:customerID ?cust .4 ?cust qb4o:memberOf gnw:customer ;5 skos:broader ?city .6 ?city gnw:cityGeo ?cityGeo .7 FILTER (bif:st_within("POINT(10.4395155.47006)", ?cityGeo)) }

S-Slice operator with a spatial function call: The fol-lowing listing corresponds to the SPARQL output ofthe running example where the spatial value is returnedfrom a function call (Ex. 11.2). The graph pattern GP ′

for the spatial function call is created in Lines 8 to 13.The graph pattern GP for the whole query is createdin Lines 2 to 12.

1 Q = SELECT ?obs WHERE2 { ?obs rdf:type qb:Observation ;3 gnw:customerID ?cust .4 ?cust qb4o:memberOf gnw:customer ;5 skos:broader ?city .6 ?city gnw:cityGeo ?cityGeo .# When there is spatial function call we apply# inner select for finding the largest city

7 {SELECT ?x (MAX(?area) as ?maxArea) WHERE8 { ?obs rdf:type qb:Observation ;9 gnw:customerID ?cust .

10 ?cust qb4o:memberOf gnw:customer ;11 skos:broader ?city .12 ?city gnw:cityGeo ?x .13 BIND(bif:st_area(?x) as ?area)}

# Then we apply the filter on the output# coming from the spatial function

14 FILTER (?cityGeo = ?x) }

S-Slice operator with a given spatial value as poly-gon data type: The following listing corresponds to theSPARQL output of the running example where the spa-tial value is given as a POLYGON data type (Ex. 11.1)corresponding to a level attribute. The graph patternGP for the query is created in Lines 2 to 7.

1 Q = SELECT ?obs WHERE2 {?obs rdf:type qb:Observation ;3 gnw:customerID ?cust .4 ?cust qb4o:memberOf gnw:customer ;5 skos:broader ?city .6 ?city gnw:cityGeo ?cityGeo .7 FILTER (?cityGeo = "POLYGON((10.4395155.47006, 10.43947 55.47003, 10.43924 (...))")}

S-Dice generator. To generate a SPARQL query forthe s-dice operator, SD(C)[φS ] (Def. 18 – parameterφS represents a spatial predicate), we follow the stepssketched in Algorithm 3. The algorithm takes param-eter φS as input, which corresponds to a spatial pred-icate that could represent a topological relation fromthe Trel set or a combination of a spatial function (a nu-meric operation from the Nop set) and a regular predi-cate φ. For illustration, we use the example query thatwe have introduced in Sect. 5 for s-dice (Ex. 13):

“sales to customers, which are located 5 km distancefrom their city center". In the following, we discuss themain steps of Algorithm 3 with the running example,where the steps are referencing the line numbers in Al-gorithm 3.

Line 3. The algorithm runs through the levels frombase level lb to the target spatial level ls, whichare both given in the spatial Boolean predicate φS

Line 4. Build the roll-up path for those levels usingthe helper function RUPath. Note that, when weapply the roll-up path to the target level, we canalso link the level attributes for the target (spatial)level – as, for example, in the last line of the fol-lowing listing. The output of function RUPath isadded to graph pattern GP :

{?obs rdf:type qb:Observation ;gnw:customerID ?cust .

?cust qb4o:memberOf gnw:customer ;skos:broader ?city ;gnw:customerGeo ?custGeo .

?city gnw:cityGeo ?cityCentGeo .

Line 5. Check if φS is to be implemented as a spatialpredicate from topological relations Trel as inter-preted in Ex. 13.1.

Line 6. Create a filter statement with a spatial pred-icate and the spatial level attribute as, which isreferenced in the roll-up path (Line 4). For ourrunning example, the filter statement is appliedon customers that are located within a buffer areaof 5 km from their city centers. The spatial pred-icate st_within is used from the topologicalrelations. The following lines are added to graphpattern GP :

FILTER (bif:st_within(?custGeo,?cityCentGeo, 5))}

Line 7. Check if φS is to be implemented as a com-bination of a spatial function fS(x) and a regularpredicate φ as interpreted in Ex. 13.2.

Page 23: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web 23

Algorithm 3: S−DiceGenerator (GI(C), φS) : Q

Input: GI(C), φS

Output: Q1 begin2 Q = ∅ ; GP = ∅3 for lb, ls ∈ φS do4 GP = GP ∪

RUPath(GS(C), lb, ls, aID,?as,?f)

5 if φS is a spatial predicate then6 GP = GP ∪ (FILTER φS (?as))

7 else if φS uses a spatial function fS(x) and aregular Boolean predicate φ then

8 GP = GP ∪ (BIND fS(x) AS ?vx)9 GP = GP ∪ (FILTER φ(?vx))

10 return Q = SELECT ?f WHERE GP

Lines 8, 9. Create a bind statement based on a spatialfunction (i.e., calculate st_distance betweencustomers and city center) and a filter statementbased on the assigned values with a regular predi-cate (i.e., less than 5 km). The following lines areadded to graph pattern GP :

BIND (bif:st_distance (?custGeo, ?cityCentGeo)AS ?distance) FILTER ( ?distance < 5 )}

Line 10. Generate query Q for selecting the facts f ∈FM ′ matching the incrementally created graphpattern GP in the previous steps. In our runningexamples we obtain the following cases for thegenerated s-dice query Q.

S-Dice operator with φS : The following listing is theSPARQL query generated for the running example(Ex. 13.1), where the spatial predicate is interpreted asa topological relation. The graph pattern GP for thequery is created in Lines 2 to 8.

1 Q = SELECT ?obs WHERE2 { ?obs rdf:type qb:Observation ;3 gnw:customerID ?cust .4 ?cust qb4o:memberOf gnw:customer ;5 skos:broader ?city ;6 gnw:customerGeo ?custGeo .7 ?city gnw:cityGeo ?cityCentGeo .8 FILTER (bif:st_within (?cityCentGeo, ?custGeo, 5))}

S-Dice operator with fS(x) and a regular Booleanpredicate φ: The following listing is the SPARQLquery generated for the running example (Ex. 13.2),where the spatial predicate is interpreted as a combina-

tion of a spatial function and a regular predicate. Thegraph pattern GP for the query is created in Lines 2 to9.

1 Q = SELECT ?obs WHERE2 { ?obs rdf:type qb:Observation ;3 gnw:customerID ?cust .4 ?cust qb4o:memberOf gnw:customer ;5 skos:broader ?city ;6 gnw:customerGeo ?custGeo .7 ?city gnw:cityGeo ?cityCentGeo .8 BIND (bif:st_distance (?custGeo, ?cityCentGeo) AS9 ?distance) FILTER ( ?distance < 5 ) }

S-Roll-up Generator. To generate a SPARQL queryfor the s-roll-up operator from a high-level SOLAPexpression, SRU(C)[fS(L(di)), agg(m)] (Def. 19),where parameter fS(L(di)) denotes a spatial functionon spatial level members and agg(m) is an aggre-gate function on measures. For illustration, we use thequery example for s-roll-up given in Sect. 5 for s-roll-up (Ex. 14): “total amount of sales to customers bycity of the closest suppliers". We follow the main stepssketched in Algorithm 4 in the following.

Lines 2, 3. Build the roll-up path using helper func-tion RUPath. In addition to the variables givenin the RUPath function, we also need to con-sider measures and measure value variables (Line3) since we aggregate the measures. A measureis specified in the following listing of the runningexample as gnw:salesAmount. The followinglines are added to the graph pattern GP :

{?obs rdf:type qb:Observation ;gnw:customerID ?cust ;gnw:supplierID ?sup ;gnw:salesAmount ?sales .

?cust qb4o:memberOf gnw:customer ;gnw:customerGeo ?custGeo ;skos:broader ?city .

?sup qb4o:memberOf gnw:supplier ;gnw:supplierGeo ?supGeo ;skos:broader ?city .

?city qb4o:memberOf gnw:city .

Line 4. Build inner select subquery to apply the spa-tial function fS on the spatial level membersL(di) (i.e., Customer, Supplier). In the example,we will use this information to create a dynamicspatial hierarchy from the Customer to the Citylevel.

Line 5. Call RUPath for the inner select subquery tolink the geometry attributes of base level mem-bers with different variables and create a graphpattern GP ′ for the inner select. The followinglines are added to the graph pattern GP ′:

Page 24: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

24 N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web

Algorithm 4: SRUGenerator(GI(C),fS(L(ds)),

agg(m)) :Q

Input: GI(C), fS(L(di)), agg(m)

Output: Q1 begin2 Q = ∅ ; GP =

RUPath(GS(C), lb, ls, aID,?as,?f)3 GP = GP ∪ (?f idS(m) ?m )4 for fS(L(ds)) do5 GP ′ = RUPath(GS(C), lb, ls, aID,?as,?f);

Q′ = ∅6 GP ′ = GP ′ ∪ (BIND fS(x) AS ?vx)7 Q′ = SELECT ?x (AGG(?vx) AS ?vy)

WHERE GP ′ GROUP BY ?x8 GP = GP ∪ Q′ ∪ (FILTER ?x = ?as

&& fS(x) = ?vy )9 let ls = l′s

10 return Q = SELECT ?f ?l’s AGG(?m) WHERE GPGROUP BY ?f ?l’s

{?obs rdf:type qb:Observation ;gnw:customerID ?cust1 ;gnw:supplierID ?sup1 .

?sup1 gnw:supplierGeo ?sup1Geo .?cust1 gnw:customerGeo ?cust1Geo .

Line 6. Build the bind statement in order to calculatethe spatial function fS(L(ds)) on spatial levelmembers. For the running example the spatialfunction is st_distance. The following linesare added to the graph pattern GP ′:

BIND (bif:st_distance(?cust1Geo, ?sup1Geo)AS ?distance)}

Line 7. Generate the inner select query Q′ usinggraph pattern GP ′ (Lines 5 and 6). Select thecorresponding level members (Customer level forthe running example) and group them in a groupby statement on the selected level members. Notethat this is where the spatial function fS(L(di))is called with a wrapper expression (e.g., MIN,MAX, etc.) to find the closest distance. The fol-lowing lines illustrate the inner select query Q′:

Q‘ = {SELECT ?cust1 (MIN(?distance) AS?minDistance) WHERE GP‘

GROUP BY ?cust1}

Lines 8, 9. Build the filter statement for the wholequery based on the output of the spatial function,

which is calculated in the inner select subquery.Then, add the filter and inner select subquery tothe main graph pattern GP ′ (Line 8). The filterstatement for the running example is :

FILTER (?cust = ?cust1 && bif:st_distance(?custGeo, ?supGeo) = ?minDistance)}

Note that in Line 9, the spatial target level ls(City) is altered to a dynamic spatial level l′s sinceapplying the spatial function creates a dynamichierarchy.

Line 10. Generate queryQ for computing the facts f ∈FM ′ based on graph pattern GP created in theprevious steps. The measures are also aggregatedat the spatial target level (closest City, which isdynamically selected). The group by statementis applied on the fact members and target levelmembers. In our running example we obtain thefollowing case for the generated s-roll-up queryQ.

S-Roll-up operator: The following listing shows thegenerated SPARQL query. Graph pattern GP ′ for theinner select subquery is created in Lines 15 to 22 andthe graph pattern GP for the whole query is created inLines 3 to 24.

1 Q = SELECT ?obs ?city (SUM(?sales) AS2 ?totalSales) WHERE3 { ?obs rdf:type qb:Observation ;4 gnw:customerID ?cust ;5 gnw:supplierID ?sup ;6 gnw:salesAmount ?sales .7 ?cust qb4o:memberOf gnw:customer ;8 gnw:customerGeo ?custGeo ;9 gnw:customerName ?custName ;

10 skos:broader ?city .11 ?city qb4o:memberOf gnw:city .12 ?sup gnw:supplierGeo ?supGeo .

# Inner Select for the distance function13 { SELECT ?cust1 (MIN(?distance) AS14 ?minDistance) WHERE15 { ?obs rdf:type qb:Observation ;16 gnw:customerID ?cust1 ;17 gnw:supplierID ?sup1 .18 ?sup1 gnw:supplierGeo ?sup1Geo .19 ?cust1 gnw:customerGeo ?cust1Geo .20 BIND (bif:st_distance( ?cust1Geo, ?sup1Geo )21 AS ?distance)}22 GROUP BY ?cust1 }23 FILTER (?cust = ?cust1 && bif:st_distance24 (?custGeo, ?supGeo) = ?minDistance)}25 GROUP BY ?city ?obs

S-Drill-down Generator. The semantics of the s-drill-down operator are defined in the same way as for

Page 25: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web 25

the s-roll-up operator with the condition that the in-put cube C for s-roll-up is obtained using a functionBaseCube such that SDD(C)[fS(L(di)), agg(m)] =SRU(BaseCube(C))[fS(L(di)), agg(m)] (Def. 20).Therefore, no generator algorithm and steps are spec-ified since an s-drill-down operator corresponds toa rewriting of an s-roll-up operator, which is ob-tained with a Base function that calls the base cubegraph in SRUGenerator as; SDDGenerator =SRUGenerator (Base(GI(C)), f

S(L(di)), agg(m)).

6.2. Nested SOLAP operations to SPARQL

We now show how a SPARQL query can be gen-erated for a nested SOLAP expression. In general, anested set of SOLAP operators can be rewritten intoan expression with an additional s-dice, on top of aseries of s-roll-ups, on top of one or more s-slices,on top of an s-dice, i.e., (s-dice2(s-roll-up1(. . . s-roll-upk(s-slice1(. . . s-slicen(s-dice1(C))))))).

Let us begin with a simpler nested form that showsthe most typical pattern, namely (s-roll-up (s-slice (s-dice(C)))), where initially a subcube graph is selectedby s-dice. Afterwards, an s-slice is performed on ahigher level of a dimension. Then, an s-roll-up is ap-plied, which aggregates the measures in the sliced cubefrom a lower level to a higher level. Finally, we couldalso perform another s-dice for filtering the measures.There may be several s-slices and s-roll-ups in be-tween.

We formulate the nested SOLAP query as 3(s-roll-up 2(s-slice 1(s-dice(C)))) and apply our running ex-amples such that the enumeration of operators can beinterpreted as follows: 1Get the subcube graph of cus-tomers that are located within a 5 km distance fromtheir city center, 2slice on the customers of the largestcountry, (which drops the dimension and leaves outall the other countries) and 3get the total amount ofsales for customers by the city of their closest suppliers(aggregates the measure Sales amount from Customerto Closest City level). Finally, we may also performanother (s-)dice on measures, e.g., filtering the totalamount of sales greater than 10500. To perform nestedSOLAP operators, we identify a set of principles to beconsidered by the algorithm.

Principle 1: Perform s-dice in the beginning or atthe end.Principle 2: If there are several s-roll-up or s-sliceoperations call their generator algorithms repeat-edly.

Principle 3: Always separate FILTER clauseswhen a SOLAP generator algorithm is used. Enu-merate separated FILTER clauses. If a SOLAPoperator is the final function added to the graphpattern, do not separate the FILTER clause.Principle 4: Build the final graph pattern with theseparated and enumerated FILTER clauses withrespect to Principle 3.Principle 5: Drop the main SELECT clause fromeach SOLAP generator algorithms and build onlyone SELECT that is added to the query at the end.Principle 6: Separate the GROUP BY clause andAGG functions from the s-roll-up generator algo-rithms (and enumerate them), and build add themto the main (outer) SELECT clause at the end.

Algorithm 5: WriteSPARQL((SRU(C)[fS(L(di)),agg (m)] (SS(C)[lb, ls, vas ](SD(C)[φS ])))) :Q

Input: (SRU(C)[fS(L(ds)), agg(m)](SS(C)[lb, ls,vas ](SD(C)[φS ])))

Output: Q1 begin2 Q = ∅ ; GP =

RUPath(GS(C), lb, ls, aID,?as,?f)3 GP = GP ∪ (?f idS(m) ?m )4 GP 1 = S-DiceGenerator(GI(C), φ

S)

\FILTER1 \ SELECT5 GP = GP ∪GP 1

6 GP 2 = S-SliceGenerator(GI(C), vs, lb, ls)\FILTER2 \ SELECT

7 GP = GP ∪GP 2

8 GP = GP ∪ FILTER1 ∪ FILTER2 ∪9 SRUGenerator(GI(C), f

S(L(ds)), agg(m))\SELECT \ GROUP BY1 \ AGG1

10 return Q = SELECT ?l’s AGG1(?m) WHERE GPGROUP BY1 ?l’s

To separate the FILTER clauses, we call SOLAPgenerator algorithms without their FILTER clause andenumerate each FILTER clause for each SOLAP gen-erator algorithm that is used, i.e., S-SliceGenerator(GI(C), φ

S) \ FILTER1 (Algorithm 5, Line 4). Then,we build the final graph pattern with these separatedFILTER clauses i.e., GP = GP ∪ FILTER1 ∪FILTER2 (Line 8). When the last SOLAP generatoralgorithm is called, the output is directly added to thegraph pattern without separating its FILTER clause(Line 9). Throughout the algorithm, all the SELECT

Page 26: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

26 N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web

clauses are omitted and combined into one SELECTin the output on Line 10. According to Principle 6, ifthere are any GROUP BY clauses and AGG functions(on measures) in inner selects, we eliminate them with" \ " from the inner selects (Line 9) and finally buildthe main (outer) select query with (Line 10). Note thatin the algorithm, the general graph pattern GP is ini-tially created by the RUPath function (Line 2) andincremented with triple patterns for selected measures(Line 3).

Example 16. ((3s-roll-up (2s-slice (1s-dice(C))))):1Get the subcube graph of customer that are locatedwithin a 5 km distance from their city center, 2sliceon the customers of the largest country, (which dropsthe dimension and leave out all the other countries)and 3get the total amount of sales of customers by thecity of their closest suppliers (aggregates the measureSales amount from Customer to Closest City level).The query is written starting from the innermost oper-ator s-dice to the outermost operator s-roll-up.

1 SELECT ?city (SUM(?sales) AS ?totalSales)2 WHERE {3 ?obs rdf:type qb:Observation ;4 gnw:customerID ?cust ;5 gnw:supplierID ?sup ;6 gnw:salesAmount ?sales .7 ?cust qb4o:memberOf gnw:customer ;8 gnw:customerGeo ?custGeo ;9 skos:broader ?city .

10 ?sup qb4o:memberOf gnw:supplier ;11 gnw:supplierGeo ?supGeo ;12 skos:broader ?city .13 ?city qb4o:memberOf gnw:city ;14 gnw:cityGeo ?cityCentGeo ;15 skos:broader ?country .16 ?country qb4o:memberOf gnw:country ;17 gnw:countryGeo ?countGeo .18 ?city gnw:cityGeo ?cityGeo .

### 1.Inner select for (S-SLICE)### Find the largest country

19 {SELECT ?x (MAX(?area) as ?maxArea)20 WHERE {21 ?obs rdf:type qb:Observation ;22 gnw:customerID ?cust .23 ?cust qb4o:memberOf gnw:customer ;24 skos:broader ?city .25 ?city skos:broader ?country .26 ?country gnw:countryGeo ?x .27 BIND(bif:st_area(?x) as ?area)}}

### 2.Inner select for (S-ROLL-UP)### Find the closest suppliers to customers

28 { SELECT ?cust1 (MIN(?distance) AS29 ?minDistance) WHERE {30 ?obs rdf:type qb:Observation ;31 gnw:customerID ?cust1 ;32 gnw:supplierID ?sup1 .

33 ?sup1 gnw:supplierGeo ?sup1Geo .34 ?cust1 gnw:customerGeo ?cust1Geo .35 BIND (bif:st_distance(?cust1Geo, ?sup1Geo)

AS ?distance)}36 GROUP BY ?cust1 }

### FILTER for S-DICE, to get a subcube37 FILTER (bif:st_within (?custGeo, ?cityCentGeo, 5))

### FILTER for S-SLICE, the 1st inner SELECT38 FILTER (?countGeo = ?x)

### FILTER for S-ROLL-UP, the 2nd inner SELECT39 FILTER (?cust = ?cust1 && bif:st_distance

(?custGeo, ?supGeo) = ?minDistance)}40 GROUP BY ?city

The graph pattern GP is initially created withthe RUPath function for the corresponding levelsand level attributes (Lines 3 to 18 of the generatedSPARQL query in the listing above). The first opera-tor is called by function S-DiceGenerator, wherethe first FILTER clause of the outer selected is addedto the query (Line 37). The second operator is calledby the S-SliceGenerator function excluding itsFILTER clause (Lines 19 to 27), which is followedby the SRUGenerator function without GROUP BYand AGG statements (Lines 28 to 36). Note that inLine 36, GROUP BY is applied on the lower level Cus-tomer, and the actual GROUP BY for the target Citylevel is applied in the last line (Line 40). SeparatedFILTER clauses for the S-DiceGenerator andS-SliceGenerator functions are later added tothe graph pattern (Algorithm 5, Line 8) correspondingto Lines 37 and 38 in the above example. The mainouter select query is defined in the first line by speci-fying the target level (City) and aggregate function onmeasures (sum of the total Sales).

7. Conclusions and future work

Motivated by the need for a formal foundation forspatial data warehouses on the Semantic Web, this pa-per made a number of contributions. First, it proposedthe QB4SOLAP vocabulary (metamodel), which sup-ports spatially enhanced multidimensional (MD) datacubes over RDF data. This allows users to publish MDspatial data in RDF format. Second, the paper defines anumber of spatial OLAP (SOLAP) operators over thedefined QB4SOLAP cubes, allowing spatial analyticalqueries over RDF data, and gives their formal seman-tics. Third, the paper provides algorithms for gener-ating spatially extended SPARQL queries from indi-vidual and nested SOLAP operators, allowing users towrite their spatial analytical queries in our high-level

Page 27: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web 27

SOLAP language instead of the lower-level and morecomplex SPARQL. Fourth, the vocabulary, operators,and query generation algorithms are validated by ap-plying them to a realistic use case.

RDF2SOLAP module

External Geo-

vocabularies

SOLAP

User Spatial RDF Data Warehouses

GeoSemOLAP

SOLAP to SPARQL

QB4SOLAP

Spatial RDFEndpoints

Fig. 8. Our future vision of SOLAP on the SW

Fig. 8 presents our future vision of SOLAP on theSemantic Web with regards to the current state of theart and ongoing work. In order to verify the algo-rithms, which are defined in this paper, we developedGeoSemOLAP [18], a SPARQL query generation tool.GeoSemOLAP allows users to perform SOLAP op-erations and generate SPARQL queries interactivelythrough a GUI. We refer to our screencast16 for a de-tailed demonstration of GeoSemOLAP.

Publishing spatial DWs on the SW allows usersto also exploit the existing external geo-vocabularies(e.g., GeoNames, etc.)17 by defining spatial levelsand hierarchies from external open data sources. Ourmain ongoing work thus focuses on developing anRDF2SOLAP enrichment module that performs themultidimensional annotation of existing spatial RDFdatasets with QB4SOLAP (semi-)automatically.

Additional interesting aspects of future work wouldbe, for instance, extending the formal techniques andalgorithms for generating SOLAP queries in SPARQLto work over multiple RDF cubes, i.e., to supports-drill-across, and supporting spatial aggregation (s-aggregation) with user-defined functions over spatialmeasures. It would be also interesting to increase ef-ficiency by extending our spatial data warehouse withtechniques that been developed in the context of RDF

16https://youtu.be/Pc3RBPPgBhA17GeoNames: http://www.geonames.org/

Global Administrative Areas: http://gadm.geovocab.org/NUTS – EU’s Nomenclature of Territorial Units for Statistics:http://nuts.geovocab.org/

data cubes and SPARQL analytical queries in general,e.g., materialization and optimizing the physical lay-out [13,21,22], and to enable efficient support of abroad range of external sources by considering aspectssuch as federated processing of analytical queries [20]and schema heterogeneity [34]. We will also considermore efficient representations of the data, e.g., by re-moving redundancies. Furthermore, it would be inter-esting to extend QB4SOLAP and GeoSemOLAP [18]to handle highly dynamic spatio-temporal data andqueries, as for instance, found in large-scale transportanalytics [12].

Acknowledgement. This research is partially fundedby the European Commission through the ErasmusMundus Joint Doctorate Information Technologies forBusiness Intelligence (EM IT4BI-DC) and the DanishCouncil for Independent Research (DFF) under grantagreement no. DFF-4093-00301.

References

[1] Alberto Abelló, Oscar Romero, Torben Bach Pedersen,Rafael Berlanga Llavori, Victoria Nebot, María José Aram-buru Cabo, and Alkis Simitsis. Using semantic web tech-nologies for exploratory OLAP: A survey. IEEE Transactionson Knowledge and Data Engineering (TKDE), 27(2):571–588,2015. DOI https://doi.org/10.1109/TKDE.2014.2330822.

[2] Alex B. Andersen, Nurefsan Gür, Katja Hose, Kim AhlstrømJakobsen, and Torben Bach Pedersen. Publishing Danish agri-cultural government data as semantic web data. In Thep-chai Supnithi, Takahira Yamaguchi, Jeff Z. Pan, Vilas Wu-wongse, and Marut Buranarach, editors, Semantic Technology- 4th Joint International Conference, JIST 2014, Chiang Mai,Thailand, November 9-11, 2014. Revised Selected Papers, vol-ume 8943 of Lecture Notes in Computer Science, pages 178–186. Springer, 2014. DOI https://doi.org/10.1007/978-3-319-15615-6_13.

[3] Robert Battle and Dave Kolas. Enabling the geospatial se-mantic web with parliament and geosparql. Semantic Web, 3(4):355–370, 2012. DOI https://doi.org/10.3233/SW-2012-0065.

[4] Yvan Bédard, E Bernier, S Larrivée, M Nadeau, MJ Proulx,and S Rivest. Spatial OLAP. In Forum annuel sur la RD,Géomatique VI: Un monde accessible, pages 13–14, 1997.

[5] Richard Cyganiak and Dave Reynolds. The RDF Data CubeVocabulary. W3C Recommendation, 16 January 2014. URLhttps://www.w3.org/TR/vocab-data-cube/.Additional contributor: Jeni Tennison.

[6] Joel da Silva, Valéria Cesário Times, Ana Carolina Salgado,Clenúbio Souza, Robson do Nascimento Fidalgo, and An-jolina Grisi de Oliveira. A set of aggregation functions forspatial measures. In Il-Yeol Song and Alberto Abelló, edi-tors, DOLAP 2008, ACM 11th International Workshop on DataWarehousing and OLAP, Napa Valley, California, USA, Octo-

Page 28: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

28 N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web

ber 30, 2008, Proceedings, pages 25–32. ACM, 2008. DOIhttps://doi.org/10.1145/1458432.1458438.

[7] Cristina Dutra de Aguiar Ciferri, Ricardo Rodrigues Ciferri,Leticia I. Gómez, Markus Schneider, Alejandro A. Vais-man, and Esteban Zimányi. Cube algebra: A generic user-centric model and query language for OLAP cubes. Inter-national Journal of Data Warehousing and Mining, 9(2):39–65, 2013. DOI https://doi.org/10.4018/jdwm.2013040103.

[8] Rudra Pratap Deb Nath, Katja Hose, and Torben B. Peder-sen. Towards a Programmable Semantic Extract-Transform-Load Framework for Semantic Data Warehouses. In Il-Yeol Song, Carlos Garcia-Alvarado, and Carlos Ordonez, ed-itors, DOLAP’15, Proceedings of the ACM Eighteenth Inter-national Workshop on Data Warehousing and OLAP, pages15–24. ACM, 2015. DOI https://doi.org/10.1145/2811222.2811229.

[9] Élodie Edoh-Alove, Sandro Bimonte, and François Pinet. AnUML profile and SOLAP datacubes multidimensional schemastransformation process for datacubes risk-aware design. Inter-national Journal of Data Warehousing and Mining, 11(4):64–83, 2015. DOI https://doi.org/10.4018/ijdwm.2015100104.

[10] Max J Egenhofer and John Herring. A mathematical frame-work for the definition of topological relationships. In Kurt E.Brassel and Haruko Kishimoto, editors, Proceedings of the 4thInternational Symposium on Spatial Data Handling: July 23-27, 1990, Zurich, Switzerland, pages 803–813. InternationalGeographical Union IGU, Commission on Geographic Infor-mation Systems, Dept. of Geography, the Ohio State Univer-sity, 1990.

[11] Lorena Etcheverry, Alejandro A. Vaisman, and EstebanZimányi. Modeling and querying data warehouses on theSemantic Web using QB4OLAP. In Ladjel Bellatreche andMukesh K. Mohania, editors, Data Warehousing and Knowl-edge Discovery - 16th International Conference, DaWaK 2014,Munich, Germany, September 2-4, 2014. Proceedings, vol-ume 8646 of Lecture Notes in Computer Science, pages 45–56. Springer, 2014. DOI https://doi.org/10.1007/978-3-319-10160-6_5.

[12] Gyözö Gidófalvi, Torben Bach Pedersen, Tore Risch, and ErikZeitler. Highly scalable trip grouping for large-scale collectivetransportation systems. In Alfons Kemper, Patrick Valduriez,Noureddine Mouaddib, Jens Teubner, Mokrane Bouzeghoub,Volker Markl, Laurent Amsaleg, and Ioana Manolescu, edi-tors, EDBT 2008, 11th International Conference on Extend-ing Database Technology, Nantes, France, March 25-29, 2008,Proceedings, volume 261 of ACM International ConferenceProceeding Series, pages 678–689. ACM, 2008. DOI https://doi.org/10.1145/1353343.1353425.

[13] François Goasdoué, Konstantinos Karanasos, Julien Leblay,and Ioana Manolescu. View selection in semantic webdatabases. Proceedings of the VLDB Endowment, 5(2):97–108, 2011. URL http://www.vldb.org/pvldb/vol5/p097_francoisgoasdoue_vldb2012.pdf.

[14] Leticia I. Gómez, Sofie Haesevoets, Bart Kuijpers, and Alejan-dro A. Vaisman. Spatial aggregation: Data model and imple-mentation. Information Systems, 34(6):551–576, 2009. DOIhttps://doi.org/10.1016/j.is.2009.03.002.

[15] Leticia I. Gómez, Silvia A. Gómez, and Alejandro A. Vaisman.A generic data model and query language for spatiotemporal

OLAP cube analysis. In Elke A. Rundensteiner, Volker Markl,Ioana Manolescu, Sihem Amer-Yahia, Felix Naumann, and Is-mail Ari, editors, 15th International Conference on Extend-ing Database Technology, EDBT ’12, Berlin, Germany, March27-30, 2012, Proceedings, pages 300–311. ACM, 2012. DOIhttps://doi.org/10.1145/2247596.2247632.

[16] Nurefsan Gür, Katja Hose, Torben Bach Pedersen, and Es-teban Zimányi. Modeling and querying spatial data ware-houses on the Semantic Web. In Guilin Qi, Kouji Kozaki,Jeff Z. Pan, and Siwei Yu, editors, Semantic Technology- 5th Joint International Conference, JIST 2015, Yichang,China, November 11-13, 2015, Revised Selected Papers, vol-ume 9544 of Lecture Notes in Computer Science, pages 3–22. Springer, 2015. DOI https://doi.org/10.1007/978-3-319-31676-5_1.

[17] Nurefsan Gür, Katja Hose, Torben Bach Pedersen, and Es-teban Zimányi. Enabling spatial OLAP over environmen-tal and farming data with QB4SOLAP. In Yuan-Fang Li,Wei Hu, Jin Song Dong, Grigoris Antoniou, Zhe Wang,Jun Sun, and Yang Liu, editors, Semantic Technology - 6thJoint International Conference, JIST 2016, Singapore, Sin-gapore, November 2-4, 2016, Revised Selected Papers, vol-ume 10055 of Lecture Notes in Computer Science, pages 287–304. Springer, 2016. DOI https://doi.org/10.1007/978-3-319-50112-3_22.

[18] Nurefsan Gür, Jacob Nielsen, Katja Hose, and Torben BachPedersen. GeoSemOLAP: Geospatial OLAP on the Seman-tic Web made easy. In Rick Barrett, Rick Cummings, Eu-gene Agichtein, and Evgeniy Gabrilovich, editors, Proceed-ings of the 26th International Conference on World Wide WebCompanion, Perth, Australia, April 3-7, 2017, pages 213–217. ACM, 2017. DOI https://doi.org/10.1145/3041021.3054731.

[19] Jiawei Han, Nebojsa Stefanovic, and Krzysztof Koperski. Se-lective materialization: An efficient method for spatial datacube construction. In Xindong Wu, Kotagiri Ramamohanarao,and Kevin B. Korb, editors, Research and Development inKnowledge Discovery and Data Mining, Second Pacific-AsiaConference, PAKDD-98, Melbourne, Australia, April 15-17,1998, Proceedings, volume 1394 of Lecture Notes in Com-puter Science, pages 144–158. Springer, 1998. DOI https://doi.org/10.1007/3-540-64383-4_13.

[20] Dilshod Ibragimov, Katja Hose, Torben Bach Pedersen, andEsteban Zimányi. Processing aggregate queries in a federa-tion of SPARQL endpoints. In Fabien Gandon, Marta Sabou,Harald Sack, Claudia d’Amato, Philippe Cudré-Mauroux, andAntoine Zimmermann, editors, The Semantic Web. Latest Ad-vances and New Domains - 12th European Semantic Web Con-ference, ESWC 2015, Portoroz, Slovenia, May 31 - June 4,2015. Proceedings, volume 9088 of Lecture Notes in Com-puter Science, pages 269–285. Springer, 2015. DOI https://doi.org/10.1007/978-3-319-18818-8_17.

[21] Dilshod Ibragimov, Katja Hose, Torben Bach Pedersen, andEsteban Zimányi. Optimizing aggregate SPARQL queries us-ing materialized RDF views. In Paul T. Groth, Elena Sim-perl, Alasdair J. G. Gray, Marta Sabou, Markus Krötzsch,Freddy Lécué, Fabian Flöck, and Yolanda Gil, editors, TheSemantic Web - ISWC 2016 - 15th International SemanticWeb Conference, Kobe, Japan, October 17-21, 2016, Proceed-ings, Part I, volume 9981 of Lecture Notes in Computer Sci-ence, pages 341–359, 2016. DOI https://doi.org/10.

Page 29: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web 29

1007/978-3-319-46523-4_21.[22] Kim Ahlstrøm Jakobsen, Alex B. Andersen, Katja Hose, and

Torben Bach Pedersen. Optimizing RDF data cubes for effi-cient processing of analytical queries. In Olaf Hartig, Juan Se-queda, and Aidan Hogan, editors, Proceedings of the 6th Inter-national Workshop on Consuming Linked Data co-located with14th International Semantic Web Conference (ISWC 2105),Bethlehem, Pennsylvania, US, October 12th, 2015, volume1426 of CEUR Workshop Proceedings. CEUR-WS.org, 2015.URL http://ceur-ws.org/Vol-1426/paper-02.pdf.

[23] Benedikt Kämpgen, Seán O’Riain, and Andreas Harth. In-teracting with statistical linked data via OLAP opera-tions. In Elena Simperl, Barry Norton, Dunja Mladenic,Emanuele Della Valle, Irini Fundulaki, Alexandre Passant,and Raphaël Troncy, editors, The Semantic Web: ESWC 2012Satellite Events - ESWC 2012 Satellite Events, Heraklion,Crete, Greece, May 27-31, 2012. Revised Selected Papers, vol-ume 7540 of Lecture Notes in Computer Science, pages 87–101. Springer, 2012. DOI https://doi.org/10.1007/978-3-662-46641-4_7.

[24] Manolis Koubarakis, Manos Karpathiotakis, Kostis Kyzirakos,Charalampos Nikolaou, and Michael Sioutis. Data models andquery languages for linked geospatial data. In Thomas Eiterand Thomas Krennwallner, editors, Reasoning Web. Seman-tic Technologies for Advanced Query Answering - 8th Inter-national Summer School 2012, Vienna, Austria, September 3-8, 2012. Proceedings, volume 7487 of Lecture Notes in Com-puter Science, pages 290–328. Springer, 2012. DOI https://doi.org/10.1007/978-3-642-33158-9_8.

[25] Kostis Kyzirakos, Manos Karpathiotakis, and ManolisKoubarakis. Strabon: A semantic geospatial DBMS. InPhilippe Cudré-Mauroux, Jeff Heflin, Evren Sirin, Tania Tu-dorache, Jérôme Euzenat, Manfred Hauswirth, Josiane XavierParreira, Jim Hendler, Guus Schreiber, Abraham Bern-stein, and Eva Blomqvist, editors, The Semantic Web -ISWC 2012 - 11th International Semantic Web Confer-ence, Boston, MA, USA, November 11-15, 2012, Proceed-ings, Part I, volume 7649 of Lecture Notes in ComputerScience, pages 295–311. Springer, 2012. DOI https://doi.org/10.1007/978-3-642-35176-1_19.

[26] Jens Lehmann, Spiros Athanasiou, Andreas Both, Alejan-dra García-Rojas, Giorgos Giannopoulos, Daniel Hladky,Jon Jay Le Grange, Axel-Cyrille Ngonga Ngomo, Mo-hamed Ahmed Sherif, Claus Stadler, Matthias Wauer, PatrickWestphal, and Vadim Zaslawski. Managing geospatial linkeddata in the GeoKnow project. In Tom Narock and Peter Fox,editors, The Semantic Web in Earth and Space Science. Cur-rent Status and Future Directions, volume 20 of Studies on theSemantic Web, pages 51–78. IOS Press, 2015. DOI https://doi.org/10.3233/978-1-61499-501-2-51.

[27] Elzbieta Malinowski and Esteban Zimányi. Advanced DataWarehouse Design - From Conventional to Spatial and Tem-poral Applications. Data-Centric Systems and Applications.Springer, 2008. DOI https://doi.org/10.1007/978-3-540-74405-4.

[28] Torben Bach Pedersen and Nectaria Tryfona. Pre-aggregationin spatial data warehouses. In Christian S. Jensen, MarkusSchneider, Bernhard Seeger, and Vassilis J. Tsotras, editors,Advances in Spatial and Temporal Databases, 7th Interna-tional Symposium, SSTD 2001, Redondo Beach, CA, USA,

July 12-15, 2001, Proceedings, volume 2121 of Lecture Notesin Computer Science, pages 460–480. Springer, 2001. DOIhttps://doi.org/10.1007/3-540-47724-1_24.

[29] Torben Bach Pedersen, Christian S. Jensen, and Curtis E. Dyre-son. A foundation for capturing and querying complex multi-dimensional data. Information Systems, 26(5):383–423, 2001.DOI https://doi.org/10.1016/S0306-4379(01)00023-0.

[30] Matthew Perry and John Herring, editors. GeoSPARQL - AGeographic Query Language for RDF Data. Open Geospa-tial Consortium, 10 September 2012. URL http://www.opengeospatial.org/standards/geosparql.

[31] David A. Randell, Zhan Cui, and Anthony G. Cohn. A spa-tial logic based on regions and connection. In Bernhard Nebel,Charles Rich, and William R. Swartout, editors, Proceedings ofthe 3rd International Conference on Principles of KnowledgeRepresentation and Reasoning (KR’92). Cambridge, MA, Oc-tober 25-29, 1992, pages 165–176. Morgan Kaufmann, 1992.

[32] Peter Z. Revesz. Introduction to Databases - From Biologi-cal to Spatio-Temporal. Texts in Computer Science. Springer,2010. ISBN 978-1-84996-094-6. DOI https://doi.org/10.1007/978-1-84996-095-3.

[33] Sonia Rivest, Yvan Bédard, and Pierre Marchand. Toward bet-ter support for spatial decision making: defining the character-istics of spatial on-line analytical processing (SOLAP). GEO-MATICA, 55(4):539–555, 2001.

[34] Jacobo Rouces, Gerard de Melo, and Katja Hose. FrameBase:Representing n-ary relations using semantic frames. In Fa-bien Gandon, Marta Sabou, Harald Sack, Claudia d’Amato,Philippe Cudré-Mauroux, and Antoine Zimmermann, editors,The Semantic Web. Latest Advances and New Domains -12th European Semantic Web Conference, ESWC 2015, Por-toroz, Slovenia, May 31 - June 4, 2015. Proceedings, vol-ume 9088 of Lecture Notes in Computer Science, pages 505–521. Springer, 2015. DOI https://doi.org/10.1007/978-3-319-18818-8_31.

[35] Claus Stadler, Jens Lehmann, Konrad Höffner, and Sören Auer.Linkedgeodata: A core for a web of spatial open data. SemanticWeb, 3(4):333–354, 2012. DOI https://doi.org/10.3233/SW-2011-0052. URL http://dx.doi.org/10.3233/SW-2011-0052.

[36] Alejandro A. Vaisman and Esteban Zimányi. A multidimen-sional model representing continuous fields in spatial datawarehouses. In Divyakant Agrawal, Walid G. Aref, Chang-Tien Lu, Mohamed F. Mokbel, Peter Scheuermann, Cyrus Sha-habi, and Ouri Wolfson, editors, 17th ACM SIGSPATIAL Inter-national Symposium on Advances in Geographic InformationSystems, ACM-GIS 2009, November 4-6, 2009, Seattle, Wash-ington, USA, Proceedings, pages 168–177. ACM, 2009. DOIhttps://doi.org/10.1145/1653771.1653797.

[37] Jovan Varga, Alejandro A. Vaisman, Oscar Romero, LorenaEtcheverry, Torben Bach Pedersen, and Christian Thomsen.Dimensional enrichment of statistical linked open data. Jour-nal of Web Semantics, 40:22–51, 2016. DOI https://doi.org/10.1016/j.websem.2016.07.003.

[38] Inés Fernando Vega López, Richard T. Snodgrass, and BongkiMoon. Spatiotemporal aggregate computation: a survey. IEEETransactions on Knowledge and Data Engineering, 17(2):271–286, 2005. DOI https://doi.org/10.1109/TKDE.2005.34.

Page 30: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

30 N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web

Appendix

A1. Query Run Times

Table 5 presents the query runtimes for the SOLAPoperator examples (Ex. 12, Ex. 13.1, Ex. 13.2, Ex. 14,and Ex. 15) given in Sect. 5 and a nested SOLAP op-erator example (Ex. 16) given in Sect. 6.2. The SO-LAP queries are tested against an instance of the usecase dataset (GeoNorthwind) that we have discussedin Sect. 4. In total, 48677 triples are obtained fromthe GeoNorthwind dataset. The published RDF graphinstance is denoted as C in Table 5. We used Virtu-oso Open Source Edition (Column Store and multithreaded) Version 7.2.5 on an Ubuntu 14.04 serverwith 2.30GHz CPU and 16 GB RAM to publish thetriples at the SPARQL endpoint http://lod.cs.aau.dk:8890/sparql. The actual queries usedin the paper are available at http://extbi.cs.aau.dk/SOLAP4SW/queries.

Table 5Runtimes in seconds

SOLAP Operators Query RuntimeEx. 12 (s-slice(C)) 0.07Ex. 13.1 (s-dice(C)) 0.09Ex. 13.2 (s-dice(C)) 1.01Ex. 14 (s-roll-up(C)) 2.03

Ex. 15 (s-drill-down(C) 1.86

Ex. 16 (s-roll-up (s-slice (s-dice(C)))) 3.04

The query runtime of the s-slice query in Ex 12 ismeasured as 0.07 seconds, which is an efficient SO-LAP operator to execute, since filtering the given spa-tial literal is done only for the records of the specifiedlevel instances.

The query runtimes of the s-dice queries in Ex. 13.1and Ex. 13.2 are measured as 0.09 and 1.01 seconds re-spectively, for the two alternative ways of implement-ing s-dice in SPARQL. It is clear that the first use ofthe s-dice operator is more efficient compared to thesecond, even though both return the same results. Thefirst one performs s-dice by filtering the instances ofthe customers through their geometries that are withina buffer area of a circle with 5 km radius and the citycenter geometry as the center point of the circle. Thesecond one performs s-dice by measuring the distancesbetween each customer’s location and their city cen-ter, and then applies a filter of those measurements tofind the ones that are less than 5 km. This is a more

expensive approach due to measuring the distances be-tween each customer and city instances with a spatialdistance function.

The query runtime of the s-roll-up query in Ex. 14is measured as 2.03 seconds. The s-roll-up operatorhas a longer response time than the other operatorsas it combines a spatial distance function with aggre-gate operators. Due to its complexity, s-roll-up requiresan inner select where it binds the distances of spec-ified spatial level attributes (e.g., customer and sup-plier geometry), which is calculated with a spatial dis-tance function. Those distance measurements are thenfiltered in the outer select with respect to the aggregatefunction.

The query runtime of the s-drill-down query inEx. 15 is measured as 1.86 seconds. S-drill-down op-erates in a very similar manner as s-roll-up. Theoret-ically, s-drill-down operates on an already spatiallyrolled up data cube, whilst it is implemented in prac-tice as an s-roll-up on the base cube. The RDF datais at the lowest granularity, which is equal to the def-inition of the base cube. The SPARQL query of thes-drill-down operator is very similar the s-roll-up op-erator. The difference of the query runtime betweenEx. 14 and Ex. 15 is directly related to the number ofinstances of the specified levels and the complexity ofthe spatial functions.

The query runtime for the nested SOLAP query inEx. 16 is measured as 3.04 seconds. The nesting in thequery is the main reason why it has a longer responsetime. Moreover, the nested query has more triple pat-terns and clauses that need to be evaluated duringquery execution. Hence, it takes longer than evaluatinga single SOLAP operator.

A2. Table of Contents

Table 6 summarizes a list of all the definitions, ex-tensions, remarks, and algorithms. Definitions are enu-merated starting from "1". Extensions are enumeratedstarting from "5" to be consistent with the base def-inition of the extension, e.g., Def. 5 (Dimensions) isfollowed by Ext. 5 (Spatial dimensions). Remarks areenumerated starting from "17" for OLAP operators,which are given before the definitions of the corre-sponding SOLAP operator, e.g., Remark 17 (Slice) isfollowed by Def. 17 (S-Slice). Algorithms are enumer-ated starting from "1".

Page 31: A Foundation for Spatial Data Warehouses on the Semantic WebSemantic Web 0 (0) 1–31 1 IOS Press A Foundation for Spatial Data Warehouses on the Semantic Web Editor(s): Mark Gahegan,

N. Gür et al. / A Foundation for Spatial Data Warehouses on the Semantic Web 31

Table 6Table of Contents

Type No. Topic Sect. PageDefinition 1 Spatial aggregation 3.2 4Definition 2 Topological relations 3.2 4Definition 3 Numeric operations 3.2 4Definition 4 RDF triple 4 7Definition 5 Dimensions 4.1 8-9Extension 5 Spatial dimensions 4.1 9Definition 6 Hierarchies 4.1 9Extension 6 Spatial hierarchies 4.1 9Definition 7 Levels 4.1 9Extension 7 Spatial levels 4.1 9-10Definition 8 Attributes 4.1 10-11Extension 8 Spatial attributes 4.1 11Definition 9 Hierarchy steps 4.1 11Extension 9 Spatial hierarchy steps 4.1 12Definition 10 Partial order on levels 4.1 12Definition 11 Measures 4.1 12Extension 11 Spatial measures 4.1 12Definition 12 Fact 4.1 13Extension 12 Spatial fact 4.1 13Definition 13 Level members 4.2 14Definition 14 Attributes of level members 4.2 14Definition 15 Partial order on level members 4.2 14Definition 16 Fact members 4.2 14-15Remark 17 Slice 5 15Definition 17 S-Slice 5 15-16Remark 18 Dice 5 17Definition 18 S-Dice 5 17Remark 19 Roll-up 5 17-18Definition 19 S-Roll-up 5 18Remark 20 Drill-down 5 19Definition 20 S-Drill-down 5 19Algorithm 1 RUPath 6.1 20Algorithm 2 S-SliceGenerator 6.1 21Algorithm 3 S-DiceGenerator 6.1 23Algorithm 4 SRUGenerator 6.1 24Algorithm 5 WriteSPARQL 6.2 25