Parallel Multi-Dimensional ROLAP Indexing

Andrew Rau-ChaplinFaculty of Computer Science

Dalhousie University

Joint work with

Frank Dehne, Carleton Univ.

Todd Eavis, Dalhousie Univ.

Data Warehousing for Decision Support

Operational data collected into DW

DW used to support multi-dimensional views

Views form the basis of OLAP processing

Our focus: the OLAP server

Data MiningAnalysisQuery Reports

Olap ServerOlap Server

Meta Data Repository

MonitoringAdministration

Operational Databases

Data Warehouse

Data Marts

External Sources

ExtractClean

TransformLoad

Refresh

Output

Front-End Tools

Olap Engines

Data Storage

Data Cleaningand

Integration

Multi-dimensional views

Collection of feature attributes

Aggregate along one or more measure attributes

Reduce the granularity by “collapsing” dimensions

Points generated by: distributive functions(e.g.,

sum) algebraic functions (e.g.,

average) holistic functions(e.g.,

median)

By Make & Colour

By Colour

By Make

19901991

ChevyFord

By Year

By Colour & Year

By Make & Year

Data Cube Generation

Proposed by Gray et al in 1995

Can be generated “manually” from a relational DB but this is very inefficient

Exploit the relationship between cuboids to compute all 2d cuboids

In OLAP environments, we typically pre-compute these views to improve query response time

AB AC BC

Existing Parallel Results

Goil & ChoudharyMOLAP solution

in-memory structures global partition + d

communication rounds

distributed viewsLimitations

Memory for multi-dimensional arrays

expensive communication for larger d

J. Of Data Mining & Knowledge Discovery 1(4), 1997

Our Approach

ROLAP solution Construct and cost the

data cube lattice Find a “least cost”

spanning tree Partition the spanning tree

over the processors equally, construct views and distribute

Can handle partial cubes

Limitations What about indexing?????

ABC ABD ACD BCD

AB AC AD BC BD CD

AA BB CC DD

CCGrid’01 + J. Dist. & Parallel Databases 11(2), 2001

Parallel Multi-dimensional Indexing

Query specifies a range on multiple dimensions

Forms a hypercube in the point space

General Approach

No multidimensional index is universally successful

Exploit domain specific information and the features of a particular index

OLAP Data is provided up front Updates are batch oriented

Design Goals

A framework for distributed high-performance indexing of ROLAP cubes Practical to implement Low communication volume Fully adapted to external memory (disks) No shared disk required Incrementally maintainable Efficient for high D spatial searches Scalable in terms of data size,

dimensions, processors

Challenge

How to order and partition data such that Number of records retrieved per node is

as balanced as possible Minimize the number of disk seeks

required in answering a queryABC

P1 P2 P3 P4

Indexing the Data Cube

Combine the strengths of a space filling and an r-tree index

Use Hilbert curve to load buckets

Index buckets with r-tree

Update indexes with merge/sort

Space Filling Curves & Striping

Query Retrieval

P1 P2 P3 P4

ABC ABC ABC ABC

Example

Original Space Processor 1 Processor 2

8 points to be reported

Reports:2 consecutive blocks & 4 points

The Parallel Framework

A single view is partitioned across p processors

Partial Hilbert/r-tree indexes are computed locally

Queries are answered concurrently

Queries answered individually or “piggy-backed”

The Virtual Data Cube

Problem: Full cube often to large to materialize

Solution: Use surrogate views

Surrogate Processing

Other issues…

Dimension orderingQuery piggybacking Batch updatingManaging Hierarchies of views

Experimental Results

Machine 17 node cluster Node = 1.8 GHz Xeon, 1 GB RAM, 2 * 40

GB IDE drives, running Linux Interconnect = Intel Fast Ethernet

switchTest Data

10 dimensions and 1,000,000 records

RCUBE index Construction

Output: ~640 million rows, 16 Gigabytes

Distributed Query Resolution

Test: Random queries returning ~15% of points (10 experiments per point)

Disk blocks retrieved vs. Disk Seeks

Test: Random queries returning 5-15% of points (15 experiments per point)

Distributed Query Resolution in Surrogate Group-bys

Thank You

Questions?

Parallel Multi-Dimensional ROLAP Indexing

Documents

Transcript of Parallel Multi-Dimensional ROLAP Indexing

Fusion OLAP: Fusing the Pros of MOLAP and ROLAP Together ...

Indexing for Similarity Search - cs.princeton.edu€¦ · Reference: Indexing Survey • Searching in high-dimensional spaces - Index structures for improving the performance of multimedia

Content-based Three-dimensional Engineering Shape Search€¦ · dimensional indexing and clustering. In order to offer more information for a user to determine similarity of 3D engineering

INDEXING* INDEXING*

From UML to ROLAP multidimensional databases using a pivot model

Efﬁcient R-Tree Based Indexing Scheme for Server-Centric ... · two-layer indexing scheme for multi-dimensional data in diverse server-centric cloud storage system. We ﬁrst propose

High Dimensional Indexing using MongoDB (MongoSV 2012)

Random indexing of multi-dimensional data - DiVA portalltu.diva-portal.org/smash/get/diva2:1049308/FULLTEXT01.pdf · Random indexing (RI) is a lightweight dimension reduction method,

MIND: A Distributed Multi-dimensional Indexing System for

ROLAP partitioning in MS SQL Server 2016

OLAP Systems and Multidimensional Queries IOLAP servers Relational OLAP (ROLAP), Multidimensional OLAP (MOLAP), Hybrid OLAP (HOLAP). 15/45. Outline 1 Motivation 2 OLAP Servers 3 ROLAP

Transbase® Hypercube: A leading-edge ROLAP Engine supporting multidimensional Indexing and Hierarchy Clustering Roland Pieringer Transaction Software GmbH.

Multidimensional Indexing: Spatial Data Management & High Dimensional Indexing

High-Dimensional Data Indexing with Applications

Dw - Rolap Molap Holap

High-dimensional indexing formultimedia features - EMISsubs.emis.de/LNI/Proceedings/Proceedings144/192.pdf · High-dimensional indexing formultimedia features ... the huge number

Optimizing Parallel Data Cube Generation For ROLAP - CiteSeerX

A N M -DIMENSIONAL HYPERBOLIC F CLOUD SERVICE INDEXING · Hyperbolic Geometry, Hyperbolic Tree, Data Indexing, Virus, Cloud Computing, Dist ributed Hash Table. 1. INTRODUCTION The

Data Warehouse Logical Design - Politecnico di Milanotanca.faculty.polimi.it/wp-content/uploads/images/...ROLAP stands for Relational OLAP. ROLAP uses the relational data model to

Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ.