Scalable Real-Time OLAP On Cloud ArchitecturesIarc/RiskAnalyticsLab/papers/2014-38OLAP.pdf ·...

15
Scalable Real-Time OLAP On Cloud Architectures F. Dehne a,* , Q. Kong b , A. Rau-Chaplin b , H. Zaboli a , R. Zhou a a School of Computer Science, Carleton University, Ottawa, Canada b Faculty of Computer Science, Dalhousie University, Halifax, Canada Abstract In contrast to queries for on-line transaction processing (OLTP) systems that typically access only a small portion of a database, OLAP queries may need to aggregate large portions of a database which often leads to performance issues. In this paper we introduce CR-OLAP, a scalable Cloud based Real-time OLAP system based on a new distributed index structure for OLAP, the distributed PDCR tree. CR- OLAP utilizes a scalable cloud infrastructure consisting of (m + 1) multi-core processors. That is, with increasing database size, CR-OLAP dynamically increases m to maintain performance. Our distributed PDCR tree data structure supports multiple dimension hierarchies and efficient query processing on the elaborate dimension hierarchies which are so central to OLAP systems. It is particularly efficient for complex OLAP queries that need to aggregate large portions of the data warehouse, such as “report the total sales in all stores located in California and New York during the months February-May of all years”. We evaluated CR-OLAP on the Amazon EC2 cloud, using the TPC-DS benchmark data set. The tests demonstrate that CR-OLAP scales well with increasing number of processors, even for complex queries. For example, on an Amazon EC2 cloud instance with eight processors, for a TPC-DS OLAP query stream on a data warehouse with 80 million tuples where every OLAP query aggregates more than 50% of the database, CR-OLAP achieved a query latency of 0.3 seconds which can be considered a real time response. Keywords: Real-time OLAP, Scalability, Cloud Architecture, TPC-DS Benchmark 1. Introduction On-line analytical processing (OLAP) systems are at the heart of many business analytics appli- cations. This paper reports on the results of a re- search project (supported by the IBM Centre For Advanced Studies Canada) to investigate the use of cloud computing for high performance, scalable real-time OLAP. 1.1. Background Decision Support Systems (DSS) are designed to empower the user with the ability to make effec- tive decisions regarding both the current and fu- ture state of an organization. DSS allow users to A preliminary (short) version appeared in the conference proceeding of IEEE Big Data 2013. * Corresponding author Email addresses: [email protected] (F. Dehne), [email protected] (Q. Kong), [email protected] (A. Rau-Chaplin), [email protected] (H. Zaboli), [email protected] (R. Zhou) study relationships in a chronological context be- tween things such as customers, vendors, products, inventory, geography, and sales. One of the most powerful and prominent technologies for knowledge discovery in DSS environments is on-line analyt- ical processing (OLAP). OLAP is the foundation for a wide range of essential business applications, including sales and marketing analysis, planning, budgeting, and performance measurement [1, 2]. By exploiting multi-dimensional views of the un- derlying data warehouse, the OLAP server allows users to “drill down” or “roll up” on dimension hi- erarchies, “slice and dice” particular attributes, or perform various statistical operations such as rank- ing and forecasting. To support this functionality, OLAP relies heavily upon a classical data model known as the data cube [3] which allows users to view organizational data from different perspectives and at a variety of summarization levels. It con- sists of the base cuboid, the finest granularity view containing the full complement of d dimensions (or Preprint submitted to Elsevier December 4, 2013

Transcript of Scalable Real-Time OLAP On Cloud ArchitecturesIarc/RiskAnalyticsLab/papers/2014-38OLAP.pdf ·...

Scalable Real-Time OLAP On Cloud ArchitecturesI

F. Dehnea,∗, Q. Kongb, A. Rau-Chaplinb, H. Zabolia, R. Zhoua

aSchool of Computer Science, Carleton University, Ottawa, Canadab Faculty of Computer Science, Dalhousie University, Halifax, Canada

Abstract

In contrast to queries for on-line transaction processing (OLTP) systems that typically access only asmall portion of a database, OLAP queries may need to aggregate large portions of a database whichoften leads to performance issues. In this paper we introduce CR-OLAP, a scalable Cloud based Real-timeOLAP system based on a new distributed index structure for OLAP, the distributed PDCR tree. CR-OLAP utilizes a scalable cloud infrastructure consisting of (m + 1) multi-core processors. That is, withincreasing database size, CR-OLAP dynamically increases m to maintain performance. Our distributedPDCR tree data structure supports multiple dimension hierarchies and efficient query processing on theelaborate dimension hierarchies which are so central to OLAP systems. It is particularly efficient for complexOLAP queries that need to aggregate large portions of the data warehouse, such as “report the total sales inall stores located in California and New York during the months February-May of all years”. We evaluatedCR-OLAP on the Amazon EC2 cloud, using the TPC-DS benchmark data set. The tests demonstrate thatCR-OLAP scales well with increasing number of processors, even for complex queries. For example, on anAmazon EC2 cloud instance with eight processors, for a TPC-DS OLAP query stream on a data warehousewith 80 million tuples where every OLAP query aggregates more than 50% of the database, CR-OLAPachieved a query latency of 0.3 seconds which can be considered a real time response.

Keywords: Real-time OLAP, Scalability, Cloud Architecture, TPC-DS Benchmark

1. Introduction

On-line analytical processing (OLAP) systemsare at the heart of many business analytics appli-cations. This paper reports on the results of a re-search project (supported by the IBM Centre ForAdvanced Studies Canada) to investigate the useof cloud computing for high performance, scalablereal-time OLAP.

1.1. Background

Decision Support Systems (DSS) are designed toempower the user with the ability to make effec-tive decisions regarding both the current and fu-ture state of an organization. DSS allow users to

IA preliminary (short) version appeared in the conferenceproceeding of IEEE Big Data 2013.

∗Corresponding authorEmail addresses: [email protected] (F. Dehne),

[email protected] (Q. Kong), [email protected](A. Rau-Chaplin), [email protected] (H. Zaboli),[email protected] (R. Zhou)

study relationships in a chronological context be-tween things such as customers, vendors, products,inventory, geography, and sales. One of the mostpowerful and prominent technologies for knowledgediscovery in DSS environments is on-line analyt-ical processing (OLAP). OLAP is the foundationfor a wide range of essential business applications,including sales and marketing analysis, planning,budgeting, and performance measurement [1, 2].By exploiting multi-dimensional views of the un-derlying data warehouse, the OLAP server allowsusers to “drill down” or “roll up” on dimension hi-erarchies, “slice and dice” particular attributes, orperform various statistical operations such as rank-ing and forecasting. To support this functionality,OLAP relies heavily upon a classical data modelknown as the data cube [3] which allows users toview organizational data from different perspectivesand at a variety of summarization levels. It con-sists of the base cuboid, the finest granularity viewcontaining the full complement of d dimensions (or

Preprint submitted to Elsevier December 4, 2013

attributes), surrounded by a collection of 2d−1 sub-cubes/cuboids that represent the aggregation of thebase cuboid along one or more dimensions.

In contrast to queries for on-line transaction pro-cessing (OLTP) systems which typically access onlya small portion of the database (e.g. update a cus-tomer record), OLAP queries may need to aggre-gate large portions of the database (e.g. calculatethe total sales of a certain type of items duringa certain time period) which may lead to perfor-mance issues. Therefore, most of the traditionalOLAP research, and most of the commercial sys-tems, follow the static data cube approach proposedby Gray et al. [3] and materialize all or a subsetof the cuboids of the data cube in order to ensureadequate query performance. Building the datacube can be a massive computational task, and sig-nificant research has been published on sequentialand parallel data cube construction methods (e.g.[4, 5, 3, 6, 7, 8]). However, the traditional staticdata cube approach has several disadvantages. TheOLAP system can only be updated periodically andin batches, e.g. once every week. Hence, latest in-formation can not be included in the decision sup-port process. The static data cube also requiresmassive amounts of memory space and leads toa duplicate data repository that is separate fromthe on-line transaction processing (OLTP) systemof the organization. Practitioners have thereforecalled for some time for an integrated OLAP/OLTPapproach with a real-time OLAP system that getsupdated instantaneously as new data arrives andalways provides an up-to-date data warehouse forthe decision support process (e.g. [9]). Some recentpublications have begun to address this problemby providing “quasi real-time” incremental main-tenance schemes and loading procedures for staticdata cubes (e.g. [9, 10, 11, 12]). However, these ap-proaches are not fully real-time. A major obstacleare significant performance issues with large scaledata warehouses.

1.2. Contributions

The aim of our research project is to help ad-dress the above mentioned performance issues forreal-time OLAP systems through the use of efficientparallel computing methods. In a recent paper [13]we presented the first parallel real-time OLAP sys-tem designed to be executed on a multi-core pro-cessor. We documented significant performance in-creases with increasing number of processor cores.Our system won the 2012 IBM Canada Research

Impact Of The Year Award and an IBM sponsoredpatent application has been submitted. In this pa-per, we report on the next phase of our project:to scale up our real-time OLAP system to utilizea collection of (m + 1) multi-core processors in ascalable cloud environment.

We introduce CR-OLAP, a scalable Cloud basedReal-time OLAP system that utilizes a new dis-tributed index structure for OLAP, refered to as adistributed PDCR tree. This data structure is notjust another distributed R-tree, but rather a multi-dimensional data structure designed specifically tosupport efficient OLAP query processing on theelaborate dimension hierarchies that are central toOLAP systems. The distributed PDCR tree, basedon the sequential DC tree introduced by Kriegelet al. [14] and our previous PDC tree [13], ex-ploits knowledge about the structure of individualdimension hierarchies both for compact data rep-resentation and accelerated query processing. Thefollowing is a brief overview of the properties of oursystem.

Consider a d-dimensional data warehouse withd dimension hierarchies. CR-OLAP supports aninput stream consisting of insert and query oper-ations. Each OLAP query can be represented asan aggregate range query that specifies for each di-mension either a single value or range of values atany level of the respective dimension hierarchy, or asymbol “*” indicating the entire range for that di-mension. CR-OLAP utilizes a cloud infrastructureconsisting of m+1 multi-core processors where eachprocessor executes up to k parallel threads. As typi-cal for current high performance databases, all datais kept in the processors’ main memories [15]. Withincreasing database size, CR-OLAP will increasem by dynamically allocating additional processorswithin the cloud environment and re-arranging thedistributed PDCR tree. This will ensure that both,the available memory and processing capability willscale with the database size. One of the m+1 multi-core processors is referred to as the master, and theremaining m processors are called workers. Themaster receives from the users the input stream ofOLAP insert and query operations, and reports theresults back to the users (in the form of referencesto memory locations where the workers have de-posited the query results). In order to ensure highthroughput and low latency even for compute inten-sive OLAP queries that may need to aggregate largeportions of the entire database, CR-OLAP utilizesseveral levels of parallelism: distributed processing

2

of multiple query and insert operations among mul-tiple workers, and parallel processing of multipleconcurrent query and insert operations within eachworker. For correct query operation, CR-OLAP en-sures that the result for each OLAP query includesall data inserted prior but no data inserted afterthe query was issued within the input stream.

CR-OLAP is supported by a new distributed in-dex structure for OLAP termed distributed PDCRtree which supports distributed OLAP query pro-cessing, including fast real-time data aggregation,real-time querying of multiple dimension hierar-chies, and real-time data insertion. Note that, sinceOLAP is about the analysis of historical data collec-tions, OLAP systems do usually not support datadeletion. Our system does however support bulkinsert operations of large groups of data items.

The distributed index structure consists of a col-lection of PDCR trees whereby the master storesone PDCR tree (called hat) and each worker storesmultiple PDCR trees (called subtrees). Each indi-vidual PDCR tree supports multi-core parallelismand executes multiple concurrent insert and queryoperations at any point in time. PDCR trees are anon-trivial modification of the authors’ previouslypresented PDC trees [13], adapted to the cloudenvironment and designed to scale. For example,PDCR trees are array based so that they can easilybe compressed and transferred between processorsvia message passing. When the database grows andnew workers are added, sub-trees are split off andsent to the new worker.

We evaluated CR-OLAP on the Amazon EC2cloud for a multitude of scenarios (different ratiosof insert and query transactions, query transac-tions with different sizes of results, different sys-tem loads, etc.), using the TPC-DS “Decision Sup-port” benchmark data set. The tests demonstratethat CR-OLAP scales well with increasing num-ber of workers. For example, for fixed data ware-house size (10,000,000 data items), when increas-ing the number of workers from 1 to 8, the aver-age query throughput and latency improves by afactor 7.5. When increasing the data warehousesize from 10,000,000 data items to 80,000,000 dataitems while, at the same time, letting CR-OLAPincrease the number of workers used from 1 to 8,respectively, we observed that query performanceremained essentially unchanged. That is, the sys-tem performed an 8-fold increase in size, includingan 8-fold increase in the average amount of data ag-gregated by each OLAP query, without noticeable

performance impact for the user.A particular strength of CR-OLAP is to effi-

ciently answer queries with large query coverage,i.e. the portion of the database that needs to beaggregated for an OLAP query. For example, onan Amazon EC2 cloud instance with eight workersand a stream of OLAP queries on a data warehousewith 80 million tuples, where each query has morethan 50% coverage, CR-OLAP achieved a query la-tency of 0.3 seconds, which can be considered areal time response. CR-OLAP also handles wellincreasing dimensionality of the data warehouse.For tree data structures this is a critical issue asit is known e.g. for R-trees that, with increasingnumber of dimensions, even simple range search(no dimension hierarchies, no aggregation) can de-generate to linear search (e.g. [16]). In our ex-periments, we observed that increasing number ofdimensions does not significantly impact the per-formance of CR-OLAP. Another possible disadvan-tage of tree data structures is that they are po-tentially less cache efficient than in-memory linearsearch which can make optimum use of streamingdata between memory and processor caches. Toestablish a comparison baseline for CR-OLAP, weimplemented STREAM-OLAP which partitions thedatabase between multiple cloud processors basedon one chosen dimension and uses parallel memoryto cache streaming on the cloud processors to an-swer OLAP queries. We observed that the perfor-mance of CR-OLAP is similar to STREAM-OLAPfor simple OLAP queries with small query coveragebut that CR-OLAP vastly outperforms STREAM-OLAP for more complex queries that utilize differ-ent dimension hierarchies and have a larger querycoverage (e.g. “report the total sales in all storeslocated in California and New York during themonths February-May of all years”).

The remainder of this paper is organized as fol-lows. In Section 2 we review related work. In Sec-tion 3 we introduce the PDCR tree data structureand in Section 4 we present our CR-OLAP systemfor real-time OLAP on cloud architectures. Section5 shows the results of an experimental evaluation ofCR-OLAP on the Amazon EC2 cloud, and Section6 concludes the paper.

2. Related Work

In addition to the related work discussed inthe introduction, there are many efforts to storeand query large data sets in cloud environments.

3

Hadoop[17] and its file system, HDFS, are popularexamples of such systems which are typically builton MapReduce [18]. Related projects most simi-lar to our work are Hive[19] and HadoopDB[20].However, these systems are not designed for real-time (OLTP style) operation. Instead, they usebatch processing similar to [9, 10, 11, 12]. Thesituation is similar for BigTable[21], BigQuery[22],and Dremel[23]. In fact, Dremel[23] uses a colum-nar data representation scheme and is designed toprovide data warehousing and querying supportfor read-only data. To overcome the batch pro-cessing in Hadoop based systems, Storm [24] in-troduced a distributed computing model that pro-cesses in-flight Twitter data. However, Storm as-sumes small, compact Twitter style data packetsthat can quickly migrate between different com-puting resources. This is not possible for largedata warehouses. For peer-to-peer networks, re-lated work includes distributed methods for query-ing concept hierarchies such as [25, 26, 27, 28].However, none of these methods provides real-timeOLAP functionality. There are various publicationson distributed B-trees for cloud platforms suchas [29]. However, these method only supports 1-dimensional indices which are insufficient for OLAPqueries. There have been efforts to build dis-tributed multi-dimensional indices on Cloud plat-forms based on R-trees or related multi-dimensionaltree structures, such as [30, 31, 32]. However, thesemethod do not support dimension hierarchies whichare essential for OLAP queries.

3. PDCR Trees

Consider a data warehouse with fact table F anda set of d dimensions {D1, D2, ..., Dd} where eachdimension Di, 1 ≤ i ≤ d has a hierarchy Hi in-cluding hierarchical attributes corresponding to thelevels of the hierarchy. The hierarchical attributesin the hierarchy of dimension i are organized asan ordered set Hi of parent-child relationships inthe hierarchy levels Hi = {Hi1, Hi2, ...,Hil} wherea parent logically summarizes and includes its chil-dren. Figure 1 shows the dimensions and hierarchylevels of each dimension for a 4-dimensional datawarehouse.

The sequential DC tree introduced by Kriegel etal. [14] exploits knowledge about the structure ofindividual dimension hierarchies both for compactdata representation and accelerated OLAP query

All Dims

ItemCustomerStore Date

Country

State

City

BYear

BMonth

BDay

Category

Class

Brand

Year

Month

Day

Figure 1: A 4-dimensional data warehouse with 3 hierarchylevels for each dimension. The first box for each dimensiondenotes the name of the dimension.

processing. In our previous work [13], we intro-duced the PDC tree, a parallel DC tree for a singlemulti-core processor. In this section, we outline amodified PDC tree, termed PDCR tree, which willbecome the building block for our CR-OLAP sys-tem. Here, we only outline the differences betweenthe PDCR tree and its predecessors, and we refer to[13, 14] for more details. We note that, our PDCRtree data structure is not just another distributedR-tree, but rather a multi-dimensional data struc-ture designed specifically to support efficient OLAPquery processing on the elaborate dimension hier-archies that are central to OLAP systems.

For a cloud architecture with multiple proces-sors, each processor will store one or more PDCRtrees. Our CR-OLAP system outlined in the follow-ing Section 4 requires that a sub-tree of a PDCRtree can be split off and transferred to another pro-cessor. This required us to (a) devise an array basedtree implementation that can be packed into a mes-sage to be sent between processors and (b) a carefulencoding of data values, using compact IDs relatedto the different dimension hierarchy levels. In thefollowing we outline some details of the encodingused for the PDCR tree.

IDs for each dimension represent available enti-ties in the dimension. Each dimension has a hierar-chy of entities with l levels. In the example of Fig-ure 1, an ID may represent an entity at the Countrylevel for the Store dimension, e.g. US or Canada.Similarly, another ID may represent an entity at theCity level, e.g. Chicago or Toronto. It is importantto note that an ID may summarize many IDs atlower hierarchy levels. To build an ID for a dimen-sion with l levels, we assign bj bits to the hierarchylevel j, 0 ≤ j ≤ l − 1. Different entities at each hi-erarchy level are assigned numerical values starting

4

with “1”. By concatenating the numerical values ofthe levels, a numerical value is created. We reservethe value zero to represent “All” or “*”. The exam-ple in Figure 2 shows an example of an entity atthe lowest hierarchy level of dimension Store. AnID for the state California will have a value of zerofor its descendant levels City and Store S key. Asa result, containment of IDs between different hi-erarchy levels can be tested via fast bit operations.Figure 3 illustrates IDs and their coverages in theStore dimension with respect to different hierarchylevels. As illustrated, each entity in level j (Coun-try) is a country specified by a numerical value andcovers cities that are represented using numericalvalues in level j + 1. Note that IDs used for citieswill have specific values at the city level, while theID of a country will have a value of zero at the citylevel and a specific value only at the country level.

The sequential DC tree introduced by Kriegelet al. [14] and our previous PDC tree [13] storeso called “minimum describing set” (MDS) entriesat each internal tree node to guide the query pro-cess; see [14] for details. The MDS concept wasdeveloped in [14] to better represent unordered di-mensions with dimension hierarchies. Experimentswith our CR-OLAP system showed that in a largercloud computing environment with multiple treedata structures, the number of MDS entries be-comes very large and unevenly distributed betweenthe different trees, leading to performance bottle-necks. On the other hand, the bit representation ofIDs outlined above gives us the opportunity to con-vert unordered dimensions into ordered dimensions,and then use traditional ranges instead of the MDSentries. An example is shown in Figure 4. Theranges lead to a much more compact tree storageand alleviated the above mentioned bottleneck. Itis important to note that, this internal ordering im-posed on dimensions is invisible to the user. OLAPqueries can still include unordered aggregate val-ues on any dimension such as “Total sales in theUS and Canada” or “Total sales in California andNew York”.

4. CR-OLAP: Cloud based Real-time OLAP

CR-OLAP utilizes a cloud infrastructure consist-ing of m + 1 multi-core processors where each pro-cessor executes up to k parallel threads. One ofthe m+1 multi-core processors is referred to as themaster, and the remaining m processors are calledworkers. The master receives from the users the

Level 0 Level 1 Level 2 . . . Level l

Store US California Los Angels Store S_key

01 01 1001 101 1011011

b0 bits b1 bits b2 bits bl bits

Figure 2: Illustration of the compact bit representation ofIDs.

1 2 3Country

City

Store S_key

1 2 3 1 2

123 4 5 6 7 8 9 10 11-13 14

HierarchyLevels

All

Figure 3: Example of relationships between different hierar-chy levels of a given dimension.

input stream of OLAP insert and query operations,and reports the results back to the users (in theform of references to memory locations where theworkers have deposited the query results). In or-der to ensure high throughput and low latency evenfor compute intensive OLAP queries that may needto aggregate large portions of the entire database,CR-OLAP utilizes several levels of parallelism: dis-tributed processing of multiple query and insertoperations among multiple workers, and parallelprocessing of multiple concurrent query and insertoperations within each worker. With increasingdatabase size, CR-OLAP will increase m by dynam-ically allocating additional processors within thecloud environment and re-arranging the distributedPDCR tree. This will ensure that both, the avail-able memory and processing capability will scalewith the database size.

Root1990-2000US - Canada

2000-2013US

2000-2013UK - Germany

. . .

C1

1995-2000California - Virginia

1993-2000Canada

C2

2000-2010New York -Texas

2000-2013California

Cn

2000-2005UK

2004-2013Germany

. . . . . . . . .

Figure 4: Example of a PDCR tree with 2 dimensions (Storeand Date).

5

We start by outlining the structure of a dis-tributed PDC tree and PDCR tree on m + 1 multi-core processors in a cloud environment. Considera single PDCR tree T storing the entire database.For a tunable depth parameter h, we refer to thetop h levels of T as the hat and we refer to the re-maining trees rooted at the leaves of the hat as thesubtrees s1, . . . , sn. Level h is referred to as the cutlevel. The hat will be stored at the master and thesubtrees s1, . . . , sn will be stored at the m workers.We assume n ≥ m and that each worker stores oneor more subtrees.

CR-OLAP starts with an empty database andone master processor (i.e. m = 0) storing an emptyhat (PDCR tree). Note that, DC trees [14], PDCtrees [13] and PDCR trees are leaf oriented. Alldata is stored in leafs called data nodes. Internalnodes are called directory nodes and contain ar-rays with routing information and aggregate val-ues. Directory nodes have a high capacity and fan-out of typically 10 - 20. As insert operations aresent to CR-OLAP, the size and height of the hat(PDCR tree) grows. When directory nodes of thehat reach height h, their children become roots atsubtrees stored at new worker nodes that are allo-cated through the cloud environment. An illustra-tion of such a distributed PDCR tree is shown inFigure 5.

Insertion

S1 S2 S3 S4

Query

Hat

Leaf Node

Figure 5: Illustration of a distributed PDCR tree.

For a typical database size, the hat will usuallycontain only directory nodes and all data will bestored in the subtrees s1, . . . , sn. After the initialset of data insertions, all leaf nodes in the hat willusually be directory nodes of height h, and the rootsof subtrees in workers will typically be directorynodes as well. As illustrated in Figure 5, both insertand query operations are executed concurrently.

4.1. Concurrent insert and query operations

Each query operation in the input stream ishanded to the master which traverses the hat. Notethat, at each directory node the query can gen-erate multiple parallel threads, depending on howmany child nodes have a non empty intersectionwith the query. Eventually, each query will accessa subset of the hat’s leaves, and then the query willbe transferred to the workers storing the subtreesrooted at those leaves. Each of those workers willthen in parallel execute the query on the respectivesubtrees, possibly generating more parallel threadswithin each subtree. For more details see Algo-rithm 3 and Algorithm 4.

For each insert operation in the input stream, themaster will search the hat, arriving at one of the leafnodes, and then forward the insert operation to theworker storing the subtree rooted at that leaf. Formore details see Algorithm 1 and Algorithm 2.

Figures 6 and 7 illustrate how new workers andnew subtrees are added as more data items get in-serted. Figures 6 illustrates insertions creating anoverflow at node A, resulting in a horizontal splitat A into A1 and A2 plus a new parent node C.Capacity overflow at C then triggers a vertical splitillustrated in 7. This creates two subtrees in two dif-ferent workers. As outlined in more details in theCR-OLAP “migration strategies” outlined below,new workers are requested from the cloud environ-ment when either new subtrees are created or whensubtree sizes exceed the memory of their host work-ers. Workers usually store multiple subtrees. How-ever, CR-OLAP randomly shuffles subtrees amongworkers. This ensures that query operations access-ing a contiguous range of leaf nodes in the hat createa distributed workload among workers.

R

A BCut

level

c = 2

R

C B

A1 A2

(a) (b)

Data Node

Directory Node

Figure 6: Insertions triggering creation of new workers andsubtrees. Part 1. (a) Current hat configuration. (b) Inser-tions create overflow at node A and horizontal split.

6

Algorithm 1: Hat Insertion

Input: D (new data item).Output: voidInitialization:Set ptr = root

Repeat:Determine the child node C of ptr that causesminimal MBR/MDS enlargement for thedistributed PDCR/PDC tree if D is insertedunder C. Resolve ties by minimal overlap, thenby minimal number of data nodes.Set ptr = C.Acquire a LOCK for C.Update MBR/MDS and TS of C.Release the LOCK for C.Until: ptr is a leaf node.if ptr is the parent of Data Nodes then

Acquire a LOCK for ptr.Insert D under ptr.Release the LOCK for C.if capacity of ptr is exceeded then

Call Horizontal Split for ptr.if capacity of the parent of ptr isexceeded then

Call Vertical Split for the parent ofptr.if depth of ptr is greater than hthen

Create a new subtree with theparent of ptr as its root, ptr andits sibling node as the childrenof the root.Choose the next availableworker and update the list ofsubtrees in the master.Send the new subtree and itsdata nodes to the chosen worker.

end

end

end

endif ptr is the parent of a subtree then

Find the worker that contains the subtreefrom the list of subtrees.Send the insertion transaction to theworker.

endEnd of Algorithm.

Algorithm 2: Subtree Insertion

Input: D (new data item).Output: voidInitialization:Set ptr = root

Repeat:Determine the child node C of ptr that causesminimal MBR/MDS enlargement for thedistributed PDCR/PDC tree if D is insertedunder C. Resolve ties by minimal overlap, thenby minimal number of data nodes.Set ptr = C.Acquire a LOCK for C.Update MBR/MDS and TS of C.Release the LOCK for C.Until: ptr is a leaf node.Acquire a LOCK for ptr.Insert D under ptr.Release the LOCK for C.if capacity of ptr is exceeded then

Call Horizontal Split for ptr.if capacity of the parent of ptr is exceededthen

Call Vertical Split for the parent of ptr.end

endEnd of Algorithm.

7

Algorithm 3: Hat Query

Input: Q (OLAP query).Output: A result set or an aggregate valueInitialization:Set ptr = rootPush ptr into a local stack S for query Q.

Repeat:Pop the top item ptr′ from stack S.if TS(time stamp) of ptr′ is smaller (earlier)than the TS of ptr then

Using the sibling links, traverse the siblingnodes of ptr until a node with TS equal tothe TS of ptr is met. Push the visitednodes including ptr into the stack (startingfrom the rightmost node) for reprocessing.

endfor each child C of ptr do

if MBR/MDS of C is fully contained inMBR/MDS of Q then

Add C and its measure value to theresult set.

endelse

if MBR/MDS of C overlapsMBR/MDS of Q then

if C is the root of a sub-tree thenSend the query Q to the workerthat contains the subtree.

endelse

Push C into the stack S.end

end

end

endUntil: stack S is empty.if the query Q is dispatched to a subtree then

Wait for the partial results of thedispatched queries from workers.Create the final result of the collectedpartial results.Send the final result back to the client.

endEnd of Algorithm.

Algorithm 4: Subtree Query

Input: Q (OLAP query).Output: A result set or an aggregate valueInitialization:Set ptr = rootPush ptr into a local stack S for query Q.

Repeat:Pop the top item ptr′ from stack S.if TS(time stamp) of ptr′ is smaller (earlier)than the TS of ptr then

Using the sibling links, traverse the siblingnodes of ptr until a node with TS equal tothe TS of ptr is met. Push the visitednodes including ptr into the stack startingfrom the rightmost node for reprocessing.

endfor each child C of ptr do

if MBR/MDS of C is fully contained inMBR/MDS of Q then

Add C and its measure value to theresult set.

endelse

if MBR/MDS of C overlapsMBR/MDS of Q then

Push C into the stack S.end

end

endUntil: stack S is empty.Send the result back to the master or clientdepending on whether Q is an aggregationquery or a data report query.End of Algorithm.

8

R

C B

A1 A2

Subtree 1

Subtree 2

R

C B

D A2

Worker 1

A1 A3

Worker 2

(a) (b)

Figure 7: Insertions triggering creation of new workers andsubtrees. Part 2. (a) Same as Figure 6b with critical subtreeshighlighted. (b) Insertions create overflow at node C andvertical split, triggering the creation of two subtrees in twodifferent workers.

For correct real time processing of an inputstream of mixed insert and query operations, CR-OLAP needs to ensure that the result for eachOLAP query includes all data inserted prior butno data inserted after the query was issued withinthe input stream. We will now discuss how this isachieved in a distributed cloud based system wherewe have a collection of subtrees in different work-ers, each of which is processing multiple concurrentinsert and query threads. In our previous work [13]we presented a method to ensure correct query pro-cessing for a single PDC tree on a mutli-core pro-cessor, where multiple insert and query operationsare processed concurrently. The PDC tree main-tains for each data or directory item a time stampindicating its most recent update, plus it maintainsfor all nodes of the same height a left-to-right linkedlist of all siblings. Furthermore, each query threadmaintains a stack of ancestors of the current nodeunder consideration, together with the time stampsof those items. We refer to [13] for more details.The PDCR tree presented in this paper inheritsthis mechanism for each of its subtrees. In fact, theabove mentioned sibling links are shown as horizon-tal links in Figures 6 and 7. With the PDCR treebeing a collection of subtrees, if we were to main-tain sibling links between subtrees to build linkedlist of siblings across all subtrees then we would en-sure correct query operation in the same way as forthe PDC tree [13]. However, since different subtreesof a PDCR tree typically reside on different work-ers, a PDCR tree only maintains sibling links inside

subtrees but it does not maintain sibling links be-tween different subtrees. The following lemma showthat correct real time processing of mixed insert andquery operations is still maintained.

Theorem 1. Consider the situation depicted inFigure 8 where the split of node B created a newnode D and subtree rooted at D that is stored sepa-rately from the subtrees rooted at A and B. Then,the sibling links labelled “a” and “c” are not re-quired for correct real time query processing (as de-fined above).

Proof. Assume a thread for a query Q that isreturning from searching the subtree below B onlyto discover that B has been modified. Let Bstack

be the old value of B that is stored in the stackstack(Q) associated with Q. If neither B nor anyancestor of B are in stack(Q) then Q does not con-tain any data covered by B. Otherwise, Q will fol-low the sibling link labelled “b” to find B′ and re-maining data from the node split off B.

P

A CD

B B’

a

b

c

Figure 8: Illustration for Theorem 1.

4.2. Load balancing

CR-OLAP is executed on a cloud platform with(m+1) processors (m workers and one master). Asdiscussed earlier, CR-OLAP uses the cloud’s elas-ticity to increase m as the number of data items in-creases. We now discuss in more detail CR-OLAP’smechanisms for worker allocation and load balanc-ing in the cloud. The insert operations discussedabove create independent subtrees for each height hleaf of the hat. Since internal (directory) nodes havea high degree (typically 10 - 20), a relatively smallheight of the hat typically leads to thousands ofheight h leaves and associated subtrees s1, . . . , sn.The master processor keeps track of the subtree lo-cations and allocation of new workers, and it makessure that a relatively high n/m ratio is maintained.

9

As indicated above, CR-OLAP shuffles thesen >> m subtrees among the m workers. This en-sures that threads of query operations are evenlydistributed over the workers. Furthermore, CR-OLAP performs load balancing among the workersto ensure both, balanced workload and memory uti-lization. The master processor keeps track of thecurrent sizes and number of active threads for allsubtrees. For each worker, its memory utilizationand workload are the total number of threads of itssubtrees and the total size if its subtrees, respec-tively.

If a worker w has a memory utilization above acertain threshold (e.g. 75% of its total memory),then the master processor determines the workerw′ with the lowest memory utilization and checkswhether it is possible to store an additional subtreefrom w while staying well below it’s memory thresh-old (e.g. 50% of its total memory). If that is notpossible, a new worker w′ is allocated within thecloud environment. Then, a subtree from w is com-pressed and sent from w to w′ via message passing.As discussed earlier, PDCR trees are implementedin array format and using only array indices aspointers. This enables fast compression and decom-pression of subtrees and greatly facilitates subtreemigration between workers. Similarly, if a worker whas a workload utilization that is a certain percent-age above the average workload of the m workersand is close to the maximum workload thresholdfor a single worker, then the master processor de-termines a worker w′ with the lowest workload andwell below its maximum workload threshold. If thatis not possible, a new worker w′ is allocated withinthe cloud environment. Then, the master processorinitiates the migration of one or more subtrees fromw (and possibly other workers) to w′.

5. Experimental Evaluation On AmazonEC2

5.1. Software

CR-OLAP was implemented in C++, using theg++ compiler, OpenMP for multi-threading, andZeroMQ [33] for message passing between proces-sors. Instead of the usual MPI message passing li-brary we chose ZeroMQ because it better supportscloud elasticity and the addition of new processorsduring runtime. CR-OLAP has various tunable pa-rameters. For our experiments we set the depth hof the hat to h = 3, the directory node capacity c

to c = 10 for the hat and c = 15 for the subtrees,and the number k of threads per worker to k = 16.

5.2. Hardware/OS

CR-OLAP was executed on the Amazon EC2cloud. For the master processor we used an AmazonEC2 m2.4xlarge instance: “High-Memory Quadru-ple Extra Large” with 8 virtual cores (64-bit ar-chitecture, 3.25 ECUs per core) rated at 26 com-pute units and with 68.4 GiB memory. For theworker processors we used Amazon EC2 m3.2xlargeinstances: “M3 Double Extra Large” with 8 vir-tual cores (64-bit architecture, 3.25 ECUs per core)rated at 26 compute units and with 30 GiB mem-ory. The OS image used was the standard AmazonCentOS (Linux) AMI.

5.3. Comparison baseline: STREAM-OLAP

As outlined in Section 2, there is no compari-son system for CR-OLAP that provides scalablecloud based OLAP with full real time capabil-ity and support for dimension hierarchies. To es-tablish a comparison baseline for CR-OLAP, wetherefore designed and implemented a STREAM-OLAP method which partitions the database be-tween multiple cloud processors based on one cho-sen dimension and uses parallel memory to cachestreaming on the cloud processors to answer OLAPqueries. More precisely, STREAM-OLAP buildsa 1-dimensional index on one ordered dimensiondstream and partitions the data into approx. 100×m arrays. The arrays are randomly shuffled be-tween the m workers. The master processor main-tains the 1-dimensional index. Each array repre-sents a segment of the dstream dimension and isaccessed via the 1-dimensional index. The arraysthemselves are unsorted, and insert operations sim-ply append the new item to the respective array.For query operations, the master determines via the1-dimensional index which arrays are relevant. Theworkers then search those arrays via linear search,using memory to cache streaming.

The comparison between CR-OLAP (usingPDCR trees) and STREAM-OLAP (using a 1-dimensional index and memory to cache streaming)is designed to examine the tradeoff between a so-phisticated data structure which needs fewer dataaccesses but is less cache efficient and a brute forcemethod which accesses much more data but opti-mizes cache performance.

10

5.4. Test data

For our experimental evaluation of CR-OLAPand STREAM-OLAP we used the standard TPC-DS “Decision Support” benchmark for OLAP sys-tems [34]. We selected “Store Sales”, the largestfact table available in TPC-DS. Figure 9 shows thefact table’s 8 dimensions, and the respective 8 di-mension hierarchies below each dimension. Thefirst box for each dimension denotes the dimensionname while the boxes below denote hierarchy lev-els from highest to lowest. Dimensions Store, Item,Address, and Promotion are unordered dimensions,while dimensions Customer, Date, Household andTime are ordered. TPC-DS provides a stream ofinsert and query operations on “Store Sales” whichwas used as input for CR-OLAP and STREAM-OLAP. For experiments where we were interestedin the impact of query coverage (the portion ofthe database that needs to be aggregated for anOLAP query), we selected sub-sequences of TPC-DS queries with the chosen coverages.

All Dims

ItemCustomerStore TimePromotionHouseholdDate Address

Country

Ordered Ordered

Ordered

Ordered

State

City

BYear

BMonth

BDay

Category

Class

Brand

Year

Month

Day

Country

State

City

Income Band

Name Hour

Minute

Figure 9: The 8 dimensions of the TPC-DS benchmark forthe fact table “Store Sales”. Boxes below each dimensionspecify between 1 and 3 hierarchy levels for the respectivedimension. Some dimensions are “ordered” and the remain-ing are not ordered.

5.5. Test results: impact of the number of workers(m) for fixed database size (N)

We tested how the time of insert and query oper-ations for CR-OLAP and STREAM-OLAP changesfor fixed database size (N) as we increase the num-ber of workers (m). Using a variable number ofworkers 1 ≤ m ≤ 8, we first inserted 40 millionitems (with d=8 dimensions) from the TPC-DSbenchmark into CR-OLAP and STREAM-OLAP,and then we executed 1,000 (insert or query) op-erations on CR-OLAP and STREAM-OLAP. Sinceworkers are virtual processors in the Amazon EC2cloud, there is always some performance fluctuationbecause of the virtualization. We found that the to-tal (or average) of 1,000 insert or query operations

is a sufficiently stable measure. The results of ourexperiments are shown in Figures 10, 11, and 12.

0

0.05

0.1

0.15

0.2

1 2 4 6 8

Tim

e pe

r 10

00 o

pera

tions

(sec

)

Number of workers(m)

PDCR-tree1D-index

Figure 10: Time for 1000 insertions as a function of thenumber of workers. (N = 40Mil, d = 8, 1 ≤ m ≤ 8)

0

200

400

600

800

1000

1200

1400

1 2 4 6 8

Tim

e pe

r 10

00 o

pera

tions

(sec

)

Number of workers(m)

PDCR-tree(10% coverage)1D-index(10% coverage)

PDCR-tree(60% coverage)1D-index(60% coverage)

PDCR-tree(95% coverage)1D-index(95% coverage)

Figure 11: Time for 1000 queries as a function of the numberof workers. (N = 40Mil, d = 8, 1 ≤ m ≤ 8)

0

1

2

3

4

5

6

7

8

1 2 4 6 8

Spee

dup

Number of workers(m)

Y=XPDCR-tree(10% coverage)1D-index(10% coverage)

PDCR-tree(60% coverage)1D-index(60% coverage)

PDCR-tree(95% coverage)1D-index(95% coverage)

Figure 12: Speedup for 1000 queries as a function of thenumber of workers. (N = 40Mil, d = 8, 1 ≤ m ≤ 8)

Figure 10 shows the time for 1,000 inser-tions in CR-OLAP (PDCR-tree) and STREAM-OLAP (1D-index) as a function of the number

11

of workers (m). As expected, insertion times inSTREAM-OLAP are lower than in CR-OLAP be-cause STREAM-OLAP simply appends the newitem in the respective array while CR-OLAP has toperform tree insertions with possible directory nodesplits and other overheads. However, STREAM-OLAP shows no speedup with increasing number ofworkers (because only one worker performs the ar-ray append operation) whereas CR-OLAP shows asignificant speedup (because the distributed PDCRtree makes use of the multiple workers). It is im-portant to note that insertion times are not visibleto the users because they do not create any userresponse. What is important to the user are theresponse times for OLAP queries. Figure 11 showsthe time for 1,000 OLAP queries in CR-OLAP andSTREAM-OLAP as a function of the number ofworkers (m). Figure 12 shows the speedup mea-sured for the same data. We selected OLAP querieswith 10%, 60% and 95% query coverage, whichrefers to the percentage of the entire range of val-ues for each dimension that is covered by a givenOLAP query. The selected OLAP queries there-fore aggregate a small, medium and large portionof the database, resulting in very different work-loads. We observe in Figure 11 that CR-OLAPsignificantly outperforms STREAM-OLAP with re-spect to query time. The difference in performanceis particularly pronounced for queries with smallor large coverages. For the former, the tree datastructure shows close to logarithmic performanceand for the latter, the tree can compose the re-sult by adding the aggregate values stored at a fewroots of large subtrees. The worst case scenariofor CR-OLAP are queries with medium coveragearound 60% where the tree performance is propor-tional to N1− 1

d . However, even in this worst casescenario, CR-OLAP outperforms STREAM-OLAP.Figure 12 indicates that both systems show a closeto linear speedup with increasing number of work-ers, however for CR-OLAP that speedup occurs formuch smaller absolute query times.

5.6. Test results: impact of growing system size (N& m combined)

In an elastic cloud environment, CR-OLAP andSTREAM-OLAP increase the number of workers(m) as the database size (N) increases. The impacton the performance of insert and query operationsis shown in Figures 13 and 14, respectively.

With growing system size, the time for insertoperations in CR-OLAP (PDCR-tree) approaches

0

0.05

0.1

0.15

0.2

N=10M,m=1

N=20M,m=2

N=40M,m=4

N=60M,m=6

N=80M,m=8

Tim

e pe

r 10

00 o

pera

tions

(sec

)

Increasing system size(N,m)

PDCR-tree1D-index

Figure 13: Time for 1000 insertions as a function of systemsize: N & m combined. (10Mil ≤ N ≤ 80Mil, d = 8,1 ≤ m ≤ 8)

0

1000

2000

3000

4000

5000

N=10M,m=1

N=20M,m=2

N=40M,m=4

N=60M,m=6

N=80M,m=8

Tim

e pe

r 10

00 o

pera

tions

(sec

)

Increasing system size(N,m)

PDCR-tree(10% coverage)1D-index(10% coverage)

PDCR-tree(60% coverage)1D-index(60% coverage)

PDCR-tree(95% coverage)1D-index(95% coverage)

Figure 14: Time for 1000 queries as a function of system size:N & m combined. (10Mil ≤ N ≤ 80Mil, d = 8, 1 ≤ m ≤ 8)

the time for STREAM-OLAP (1D-index). Moreimportantly however, the time for query opera-tions in CR-OLAP again outperforms the time forSTREAM-OLAP by a significant margin, as shownin Figure 14. Also, it is very interesting that forboth systems, the query performance remains es-sentially unchanged with increasing database sizeand number of workers. This is obvious forSTREAM-OLAP where the size of arrays to besearched simply remains constant but it is an im-portant observation for CR-OLAP. Figure 14 in-dicates that the overhead incurred by CR-OLAP’sload balancing mechanism (which grows with in-creasing m) is balanced out by the performancegained through more parallelism. CR-OLAP ap-pears to scale up without affecting the performanceof individual queries. It performed an 8-fold in-crease in database size and number of processors,including an 8-fold increase in the average amountof data aggregated by each OLAP query, withoutnoticeable performance impact for the user.

12

5.7. Test results: impact of the number of dimen-sions

It is well known that tree based search meth-ods can become problematic when the number ofdimensions in the database increases. In Figures15 and 16 we show the impact of increasing don the performance of insert and query operationsin CR-OLAP (PDCR-tree) and STREAM-OLAP(1D-index) for fixed database size N = 40 millionand m = 8 workers.

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

4 5 6 7 8

Tim

e pe

r 10

00 o

pera

tions

(sec

)

Number of Dimensions(d)

PDCR-tree1D-index

Figure 15: Time for 1000 insertions as a function of thenumber of dimensions. (N = 40Mil, 4 ≤ d ≤ 8, m = 8)

0

100

200

300

400

500

600

700

800

4 5 6 7 8

Tim

e pe

r 10

00 o

pera

tions

(sec

)

Number of dimensions(d)

PDCR-tree(10% coverage)1D-index(10% coverage)

PDCR-tree(60% coverage)1D-index(60% coverage)

PDCR-tree(95% coverage)1D-index(95% coverage)

Figure 16: Time for for 1000 queries as a function of the num-ber of dimensions. The values for “1D-index 95% coverage”are 828.6, 1166.4, 1238.5, 1419.7 and 1457.8, respectively.(N = 40Mil, 4 ≤ d ≤ 8, m = 8)

Figure 15 shows some increase in insert time forCR-OLAP because the PDCR tree insertion inher-its from the PDC tree a directory node split opera-tion with an optimization phase that is highly sen-sitive to the number of dimensions. However, theresult of the tree optimization is improved queryperformance in higher dimensions. As shown inFigure 16, the more important time for OLAP query

operations grows only slowly as the number of di-mensions increases. This is obvious for the arraysearch in STREAM-OLAP but for the tree searchin CR-OLAP this is an important observation.

5.8. Test results: impact of query coverages

Figures 17, 18, 19, and 20 show the impact ofquery coverage on query performance in CR-OLAP(PDCR-tree) and STREAM-OLAP (1D-index).

0

50

100

150

200

250

300

350

10 20 30 40 50 60 70 80 90

Tim

e pe

r 10

00 o

pera

tions

(sec

)

Query coverage per dimension(%)

Store is *Customer is *

Item is *Date is *

Address is *Household is *Promotion is *

Time is *No dimension is *

Figure 17: Time for 1000 queries (PDCR tree) as a functionof query coverages: 10% − 90%. Impact of value “*” fordifferent dimensions. (N = 40Mil, m = 8, d = 8)

0

50

100

150

200

250

300

350

91 92 93 94 95 96 97 98 99

Tim

e pe

r 10

00 o

pera

tions

(sec

)

Query coverage per dimension(%)

Store is *Customer is *

Item is *Date is *

Address is *Household is *Promotion is *

Time is *No dimension is *

Figure 18: Time for 1000 queries (PDCR tree) as a functionof query coverages: 91% − 99%. Impact of value “*” fordifferent dimensions. (N = 40Mil, m = 8, d = 8)

For fixed database size N = 40Mil, number ofworkers m = 8, and number of dimensions d = 8,we vary the query coverage and observe the querytimes. In addition we observe the impact of a “*”in one of the query dimensions. Figures 17 and18 show that the “*” values do not have a signif-icant impact for CR-OLAP. As discussed earlier,CR-OLAP is most efficient for small and very largequery coverage, with maximum query time some-where in the mid range. (In this case, the maximum

13

0

5

10

15

20

10 20 30 40 50 60 70 80 90

Tim

e ra

tio o

f 1D

-inde

x ov

er P

DCR

-tre

e

Query coverage per dimension(%)

Store is *Customer is *

Item is *Date is *

Address is *Household is *Promotion is *

Time is *No dimension is *

Figure 19: Time comparison for 1000 queries (Ratio: 1D-index / PDCR tree) for query coverages 10%−90%. Impactof value “*” for different dimensions. (N = 40Mil, m = 8,d = 8)

0

5

10

15

20

91 92 93 94 95 96 97 98 99

Tim

e ra

tio o

f 1D

-inde

x ov

er P

DCR

-tre

e

Query coverage per dimension(%)

Store is *Customer is *

Item is *Date is *

Address is *Household is *Promotion is *

Time is *No dimension is *

Figure 20: Time comparison for 1000 queries (Ratio: 1D-index / PDCR tree) for query coverages 91%−99%. Impactof value “*” for different dimensions. (N = 40Mil, m = 8,d = 8)

point is shifted away from the typical 60% becauseof the “*” values.) Figures 19, and 20 show the per-formance of STREAM-OLAP as compared to CR-OLAP (ratio of query times). It shows that CR-OLAP consistently outperforms STREAM-OLAPby a factor between 5 and 20.

5.9. Test results: query time comparison for se-lected query patterns at different hierarchy lev-els

Figure 21 shows a query time comparison be-tween CR-OLAP (PDCR-tree) and STREAM-OLAP (1D-index) for selected query patterns. Forfixed database size N = 40Mil, number of work-ers m = 8 and d = 8 dimensions, we test for di-mension Date the impact of value “*” for differenthierarchy levels. CR-OLAP is designed for OLAPqueries such as “total sales in the stores located in

0

200

400

600

800

1000

1200

1400

1600

* * * y * * y m * y m d * m * * m d * * d

Tim

e pe

r 10

00 o

pera

tions

(sec

)

* in different hierarchy levels of dimension Date

PDCR-tree(10% coverage)PDCR-tree(60% coverage)PDCR-tree(95% coverage)1D-index(10% coverage)1D-index(60% coverage)1D-index(95% coverage)

Figure 21: Query time comparison for selected query pat-terns for dimension Date. Impact of value “*” for differenthierarchy levels of dimension Date. (N = 40Mil, m = 8,d = 8).

California and New York during February-May ofall years’ which act at different levels of multipledimension hierarchies. For this test, we created 7combinations of “*” and set values for hierarchylevels Year, Month, and Day: *-*-*, year-*-*, year-month-*, year-month-day, *-month-*, *-month-day,and *-*-day. We then selected for each combinationqueries with coverages 10%, 60%, and 95%. Thetest results are summarized in Figure 21. The mainobservation is that CR-OLAP consistently outper-forms STREAM-OLAP even for complex and verybroad queries that one would expect could be eas-ier solved through data streaming than through treesearch.

6. Conclusion

We introduced CR-OLAP, a Cloud based Real-time OLAP system based on a distributed PDCRtree, a new parallel and distributed index structurefor OLAP, and evaluated CR-OLAP on the Ama-zon EC2 cloud for a multitude of scenarios. Thetests demonstrate that CR-OLAP scales well withincreasing database size and increasing number ofcloud processors. In our experiments, CR-OLAPperformed an 8-fold increase in database size andnumber of processors, including an 8-fold increasein the average amount of data aggregated by eachOLAP query, without noticeable performance im-pact for the user.

Acknowledgment

The authors would like to acknowledge financialsupport from the IBM Centre for Advanced Stud-

14

ies Canada and the Natural Science and Engineer-ing Research Council of Canada. We thank the re-search staff at the IBM Centre for Advanced Stud-ies Canada, and in particular Stephan Jou, for theirsupport and helpful discussions.

References

[1] J. Han, M. Kamber, Data Mining: Concepts and Tech-niques, Morgan Kaufmann Publishers, 2000.

[2] The OLAP Report, http://www.olapreport.com.[3] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Re-

ichart, M. Venkatrao, F. Pellow, H. Pirahesh, DataCube: A Relational Aggregation Operator Generaliz-ing Group-By, Cross-Tab, and Sub-Totals, Data Min.Know. Disc. 1 (1997) 29–53.

[4] Y. Chen, F. Dehne, T. Eavis, A. Rau-Chaplin, PnP:sequential, external memory, and parallel iceberg cubecomputation, Distributed and Parallel Databases 23 (2)(2008) 99–126. doi:10.1007/s10619-007-7023-y.

[5] F. Dehne, T. Eavis, S. Hambrusch, Parallelizing thedata cube, Distributed and Parallel Databases 11 (2002)181–201.

[6] Z. Guo-Liang, C. Hong, L. Cui-Ping, W. Shan, Z. Tao,Parallel Data Cube Computation on Graphic Process-ing Units, Chinese Journal of Computers 33 (10) (2010)1788–1798.

[7] R. T. Ng, A. Wagner, Y. Yin, Iceberg-cube computationwith PC clusters, ACM SIGMOD 30 (2) (2001) 25–36.doi:10.1145/376284.375666.

[8] J. You, J. Xi, P. Zhang, H. Chen, A Parallel Algo-rithm for Closed Cube Computation, IEEE/ACIS In-ternational Conference on Computer and InformationScience (2008) 95–99doi:10.1109/ICIS.2008.63.

[9] R. Bruckner, B. List, J. Schiefer, Striving towards nearreal-time data integration for data warehouses, DaWaKLNCS 2454 (2002) 173–182.

[10] D. Jin, T. Tsuji, K. Higuchi, An Incremen-tal Maintenance Scheme of Data Cubes and ItsEvaluation, DASFAA LNCS 4947 (2008) 36–48.doi:10.2197/ipsjtrans.2.36.

[11] R. Santos, J. Bernardino, Real-time data warehouseloading methodology, IDEAS (2008) 49–58.

[12] R. J. Santos, J. Bernardino, Optimizing datawarehouse loading procedures for enabling useful-time data warehousing, IDEAS (2009) 292–299doi:10.1145/1620432.1620464.

[13] F. Dehne, H. Zaboli, Parallel real-time olap onmulti-core processors, in: Proceedings of the 201212th IEEE/ACM International Symposium on Cluster,Cloud and Grid Computing (ccgrid 2012), 2012, pp.588–594.

[14] M. Ester, J. Kohlhammer, H.-P. Kriegel, The dc-tree:A fully dynamic index structure for data warehouses,in: In Proceedings of the 16th International Conferenceon Data Engineering (ICDE, 2000, pp. 379–388.

[15] H. Plattner, A. Zeier, In-Memeory Data Management,Springer Verlag, 2011.

[16] M. Ester, J. Kohlhammer, H.-P. Kriegel, The DC-tree:a fully dynamic index structure for data warehouses,16th International Conference on Data Engineering(ICDE) (2000) 379–388doi:10.1109/ICDE.2000.839438.

[17] Hadoop.

[18] J. Dean, S. Ghemawat, Mapreduce: simplified data pro-cessing on large clusters, Commun. ACM 51 (1) (2008)107–113.

[19] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka,S. Anthony, H. Liu, P. Wyckoff, R. Murthy, Hive: awarehousing solution over a map-reduce framework,Proc. VLDB Endow. 2 (2) (2009) 1626–1629.

[20] A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Sil-berschatz, A. Rasin, Hadoopdb: an architectural hy-brid of mapreduce and dbms technologies for analyticalworkloads, Proc. VLDB Endow. 2 (1) (2009) 922–933.

[21] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A.Wallach, M. Burrows, T. Chandra, A. Fikes, R. E. Gru-ber, Bigtable: A distributed storage system for struc-tured data, ACM Trans. Comput. Syst. 26 (2) (2008)4:1–4:26.

[22] Bigquery.[23] S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shiv-

akumar, M. Tolton, T. Vassilakis, Dremel: interactiveanalysis of web-scale datasets, Proc. VLDB Endow.3 (1-2) (2010) 330–339.

[24] Twitter storm.[25] K. Doka, D. Tsoumakos, N. Koziris, Online querying of

d-dimensional hierarchies, J. Parallel Distrib. Comput.71 (3) (2011) 424–437.

[26] A. Asiki, D. Tsoumakos, N. Koziris, Distributing andsearching concept hierarchies: an adaptive dht-basedsystem, Cluster Computing 13 (3) (2010) 257–276.

[27] K. Doka, D. Tsoumakos, N. Koziris, Brown dwarf: Afully-distributed, fault-tolerant data warehousing sys-tem, J. Parallel Distrib. Comput. 71 (11) (2011) 1434–1446.

[28] Y. Sismanis, A. Deligiannakis, N. Roussopoulos, Y. Ko-tidis, Dwarf: shrinking the petacube, in: Proceedingsof the 2002 ACM SIGMOD international conference onManagement of data, 2002, pp. 464–475.

[29] S. Wu, D. Jiang, B. C. Ooi, K.-L. Wu, Efficient b-treebased indexing for cloud data processing, Proc. VLDBEndow. 3 (1-2) (2010) 1207–1218.

[30] J. Wang, S. Wu, H. Gao, J. Li, B. C. Ooi, Indexingmulti-dimensional data in a cloud system, in: Proceed-ings of the 2010 ACM SIGMOD International Confer-ence on Management of data, 2010, pp. 591–602.

[31] X. Zhang, J. Ai, Z. Wang, J. Lu, X. Meng, An efficientmulti-dimensional index for cloud data management,in: Proceedings of the first international workshop onCloud data management, 2009, pp. 17–24.

[32] M. C. Kurt, G. Agrawal, A fault-tolerant environmentfor large-scale query processing, in: High PerformanceComputing (HiPC), 2012 19th International Conferenceon, 2012, pp. 1–10.

[33] Zeromq socket library as a concurrency framework.URL http://www.zeromq.org/

[34] Transaction processing performance council, tpc-ds (de-cision support) benchmark.

15