Generating the Data Cube (Shared Disk) Andrew Rau-Chaplin Faculty of Computer Science Dalhousie...

Generating the Data Cube(Shared Disk)

Andrew Rau-ChaplinFaculty of Computer ScienceDalhousie University

Joint Work withF. DehneT. EavisS. Hambrusch

Data Cube Generation

Proposed by Gray et al in 1995 Can be generated from a

relational DB but…

A

B

C The cuboid ABC (or CAB)

ABC

AB AC BC

A C B

ALL

12

18

83

21

34

3850

21

As a table

Model Year Colour Sales

Chevy 1990 Blue 87

Chevy 1990 Red 5

Chevy 1990 ALL 92

Chevy ALL Blue 87

Chevy ALL Red 5

Chevy ALL ALL 92

Ford 1990 Blue 99

Ford 1990 Green 64

Ford 1990 ALL 163

Ford 1991 Blue 7

Ford 1991 Red 8

Ford 1991 ALL 15

Ford ALL Blue 106

Ford ALL Green 64

Ford ALL Red 8

ALL 1990 Blue 186

ALL 1990 Green 64

ALL 1991 Blue 7

ALL 1991 Red 8

Ford ALL ALL 178

ALL 1990 ALL 255

ALL 1991 ALL 15

ALL ALL Blue 193

ALL ALL Green 64

ALL ALL Red 13

ALL ALL ALL 270

Model Year Colour Sales

Chevy 1990 Red 5

Chevy 1990 Blue 87

Ford 1990 Green 64

Ford 1990 Blue 99

Ford 1991 Red 8

Ford 1991 Blue 7

The Challenge Input data set, R |R| typically in the millions, usually will

not fit into memory. Number of dimensions, d, 10-30 2d cuboids in Data Cube

How to solve this highly data and computational intensive problem in parallel?

Existing Parallel Results Goil &

Choudhary MOLAP Approach

Parallelize the generation of each cuboid

Challenge > 2d comm.

rounds

Overview

1) Data cubes2) Review sequential cubing

algorithms3) Our Top-down parallel algorithm4) Conclusions and open problems

Optimizations based on computing multiple cuboids

Smallest-parent - computing a cuboid from the smallest previously computed cuboid.

Cache-results - cache in memory the results of a cuboid from which other cuboid are computed to reduce disk I/O.

Amortize-scans - amortizing disk read by computing as many cuboid as possible.

Share-sorts - sharing sorting cost.

ABCD

ABC ABD ACD BCD

AB AC AD BC BD CD

AA BB CC DD

All

Many Algorithms

Pipesort – [AADGNRS’96] PipeHash – [SAG’96] Overlap – [DANR’96] ArrayCube – [ZDN’97] Bottom-up-cube – [BR’99] Partition Cube – [RS’97] Memory Cube - [RS’97]

Approaches

Top Down Pipesort – [AADGNRS’96] PipeHash – [SAG’96] Overlap – [DANR’96]

Bottom up Bottom-up-cube – [BR’99] Partition Cube – [RS’97] Memory Cube - [RS’97]

Array Based ArrayCube – [ZDN’97]

Our results A framework for parallelization of existing

sequential data cube algorithms Top-down Bottom-up Array based

Architecture independent Communication efficient

Avoids irregular communication patterns Few large messages Overlap computation and communication

Today’s Focus Top down approach

ABCD

ABC ABD ACD BCD

AB AC AD BC BD CD

AA BB CC DD

All

Top Down Algorithms Find a “least cost”

spanning tree Use estimators of

cuboid size Exploit

Data shrinking Pipelining Cuts vs. Sorts

Cut vs. Sort Ordering ABCD Cutting

ABCD -> ABC Linear time

Sorting ABCD ->ABD Sort time

Size ABC may be much

smaller than ABCDA3

A2

A1

B1

B3

B4

C1

C2

B2

D1

D2

PipesortABCD

ABC ABD ACD BCD

AB AC AD BC BD CD

AA BB CC DD

All

CBAD

CBA BAD ACD BCD

BA AC AD CB DB CD

AA BB CC DD

All

[AADGNRS’96]

Minimize sorting while seeking to compute cuboid from smallest parent

Pipeline sorts with common prefixes

Level-by-level Optimization

Minimum cost matching in a bipartite graph

Scan edges solid, Sort edges dashed

Establish dimension ordering working up the lattice

AB AC BC

AA BB CC

(a) Possible Pathways

AB AC BC

AA BB CC

AB BCAC22 10 55 12 13 20

AB AC BC

AA BB CC

AB BCAC22 10 55 12 13 20

(b) Transformed Search Lattice

(c) Minimum Cost Matching

Overview

1) Data cubes2) Review sequential cubing

algorithms3) Our Top-down parallel

algorithm4) Conclusions and open problems

Top-down parallel: The Idea

Cut the process tree into p “equal weight” subtrees

Each Proc. generates cuboids from a subtree independently

Load balance/stripe the output

CBAD

CBA BAD ACD BCD

BA AC AD CB DB CD

AA BB CC DD

All

The Basic Algorithm(1) Construct a lattice housing all 2d views.(2) Estimate the size of each of the views in the lattice.(3) To determine the cost of using a given view to directly

compute its children, use its estimated size to calculate (a) the cost of scanning the view and (b) the cost of sorting it.

(4) Using the bipartite matching technique presented in the original IBM paper, reduce the lattice to a spanning tree that identifies the appropriate set of prefix-ordered sort paths.

(5) Add the I/O estimates to the spanning tree.(6) Partition the tree into p sub-trees.(7) Distribute the sub-tree lists to each of the p compute

nodes.(8) On each node, use the sequential Pipesort algorithm to

build the set of local views.

Tree Partitioning What does “Equal Weight”

mean? Want to minimize the max

weight partition!

O(Rk(k + log d)+n) time - Becker, Perl and Schach ‘82

O(n) time, Frederickson 1990

time

Tree Partitioning Min-max tree k-partitioning. Given a tree T with n vertices

and a positive weight assigned to each vertex, delete k edges in the tree to obtain k connected components T1, T2, … Tk+1 such that the largest total weight of a resulting sub-tree is minimized.

O(Rk(k + log d)+n) time - Becker, Perl and Schach ‘82

O(n) time, Frederickson 1990

Dynamic min-max

125

15

15

8

125

47

125

Raw data

ABC

AB BC

A

Over-sampling

p subtrees

s * p subtrees

p subsets

Implementation Issues 1) Sort Optimization 2) Minimizing Data Movement 3) Efficient Aggregation Operations 4) Disk Optimizations

1) Sort Optimization qSort is SLOW

May be O(n2) when there are duplicates

When cardinality is small range of keys is small Radix sort

Dynamically select between well optimized Radix and Quick Sorts

2) Minimizing Data Movement

Sort pointers to the records!

Never reorder the columns

3) Efficient Aggregation Operations

One pass for each pipeline

Do lazy aggregation

A3

A2

A1

B1

B3

B4

C1

C2

B2

D1

D2

ABCD

ABC

A

AB

all

4) Disk Optimizations Avoid OS buffering Implemented I/O

Manager Manages buffers

to avoid thrashing Does I/O in

separate process to overlap with computation

Speedup - Cluster

Efficiency - Cluster

Speedup - SunFire

Efficiency - SunFire

Increasing Data Size

Varying Over Sampling Factor

Varying Skew

Conclusions New communication efficient

parallel cubing framework for Top-down Bottom up Array based

Easy to implement (sort of), architecture independent

Thank you!

Questions?

Generating the Data Cube (Shared Disk) Andrew Rau-Chaplin Faculty of Computer Science Dalhousie...

Documents

Transcript of Generating the Data Cube (Shared Disk) Andrew Rau-Chaplin Faculty of Computer Science Dalhousie...