Generating the Data Cube (Shared Disk) Andrew Rau-Chaplin Faculty of Computer Science Dalhousie...
-
Upload
trystan-crom -
Category
Documents
-
view
213 -
download
0
Transcript of Generating the Data Cube (Shared Disk) Andrew Rau-Chaplin Faculty of Computer Science Dalhousie...
Generating the Data Cube(Shared Disk)
Andrew Rau-ChaplinFaculty of Computer ScienceDalhousie University
Joint Work withF. DehneT. EavisS. Hambrusch
Data Cube Generation
Proposed by Gray et al in 1995 Can be generated from a
relational DB but…
A
B
C The cuboid ABC (or CAB)
ABC
AB AC BC
A C B
ALL
12
18
83
21
34
3850
21
As a table
Model Year Colour Sales
Chevy 1990 Blue 87
Chevy 1990 Red 5
Chevy 1990 ALL 92
Chevy ALL Blue 87
Chevy ALL Red 5
Chevy ALL ALL 92
Ford 1990 Blue 99
Ford 1990 Green 64
Ford 1990 ALL 163
Ford 1991 Blue 7
Ford 1991 Red 8
Ford 1991 ALL 15
Ford ALL Blue 106
Ford ALL Green 64
Ford ALL Red 8
ALL 1990 Blue 186
ALL 1990 Green 64
ALL 1991 Blue 7
ALL 1991 Red 8
Ford ALL ALL 178
ALL 1990 ALL 255
ALL 1991 ALL 15
ALL ALL Blue 193
ALL ALL Green 64
ALL ALL Red 13
ALL ALL ALL 270
Model Year Colour Sales
Chevy 1990 Red 5
Chevy 1990 Blue 87
Ford 1990 Green 64
Ford 1990 Blue 99
Ford 1991 Red 8
Ford 1991 Blue 7
The Challenge Input data set, R |R| typically in the millions, usually will
not fit into memory. Number of dimensions, d, 10-30 2d cuboids in Data Cube
How to solve this highly data and computational intensive problem in parallel?
Existing Parallel Results Goil &
Choudhary MOLAP Approach
Parallelize the generation of each cuboid
Challenge > 2d comm.
rounds
Overview
1) Data cubes2) Review sequential cubing
algorithms3) Our Top-down parallel algorithm4) Conclusions and open problems
Optimizations based on computing multiple cuboids
Smallest-parent - computing a cuboid from the smallest previously computed cuboid.
Cache-results - cache in memory the results of a cuboid from which other cuboid are computed to reduce disk I/O.
Amortize-scans - amortizing disk read by computing as many cuboid as possible.
Share-sorts - sharing sorting cost.
ABCD
ABC ABD ACD BCD
AB AC AD BC BD CD
AA BB CC DD
All
Many Algorithms
Pipesort – [AADGNRS’96] PipeHash – [SAG’96] Overlap – [DANR’96] ArrayCube – [ZDN’97] Bottom-up-cube – [BR’99] Partition Cube – [RS’97] Memory Cube - [RS’97]
Approaches
Top Down Pipesort – [AADGNRS’96] PipeHash – [SAG’96] Overlap – [DANR’96]
Bottom up Bottom-up-cube – [BR’99] Partition Cube – [RS’97] Memory Cube - [RS’97]
Array Based ArrayCube – [ZDN’97]
Our results A framework for parallelization of existing
sequential data cube algorithms Top-down Bottom-up Array based
Architecture independent Communication efficient
Avoids irregular communication patterns Few large messages Overlap computation and communication
Today’s Focus Top down approach
ABCD
ABC ABD ACD BCD
AB AC AD BC BD CD
AA BB CC DD
All
Top Down Algorithms Find a “least cost”
spanning tree Use estimators of
cuboid size Exploit
Data shrinking Pipelining Cuts vs. Sorts
Cut vs. Sort Ordering ABCD Cutting
ABCD -> ABC Linear time
Sorting ABCD ->ABD Sort time
Size ABC may be much
smaller than ABCDA3
A2
A1
B1
B3
B4
C1
C2
B2
D1
D2
PipesortABCD
ABC ABD ACD BCD
AB AC AD BC BD CD
AA BB CC DD
All
CBAD
CBA BAD ACD BCD
BA AC AD CB DB CD
AA BB CC DD
All
[AADGNRS’96]
Minimize sorting while seeking to compute cuboid from smallest parent
Pipeline sorts with common prefixes
Level-by-level Optimization
Minimum cost matching in a bipartite graph
Scan edges solid, Sort edges dashed
Establish dimension ordering working up the lattice
AB AC BC
AA BB CC
(a) Possible Pathways
AB AC BC
AA BB CC
AB BCAC22 10 55 12 13 20
AB AC BC
AA BB CC
AB BCAC22 10 55 12 13 20
(b) Transformed Search Lattice
(c) Minimum Cost Matching
Overview
1) Data cubes2) Review sequential cubing
algorithms3) Our Top-down parallel
algorithm4) Conclusions and open problems
Top-down parallel: The Idea
Cut the process tree into p “equal weight” subtrees
Each Proc. generates cuboids from a subtree independently
Load balance/stripe the output
CBAD
CBA BAD ACD BCD
BA AC AD CB DB CD
AA BB CC DD
All
The Basic Algorithm(1) Construct a lattice housing all 2d views.(2) Estimate the size of each of the views in the lattice.(3) To determine the cost of using a given view to directly
compute its children, use its estimated size to calculate (a) the cost of scanning the view and (b) the cost of sorting it.
(4) Using the bipartite matching technique presented in the original IBM paper, reduce the lattice to a spanning tree that identifies the appropriate set of prefix-ordered sort paths.
(5) Add the I/O estimates to the spanning tree.(6) Partition the tree into p sub-trees.(7) Distribute the sub-tree lists to each of the p compute
nodes.(8) On each node, use the sequential Pipesort algorithm to
build the set of local views.
Tree Partitioning What does “Equal Weight”
mean? Want to minimize the max
weight partition!
O(Rk(k + log d)+n) time - Becker, Perl and Schach ‘82
O(n) time, Frederickson 1990
time
Tree Partitioning Min-max tree k-partitioning. Given a tree T with n vertices
and a positive weight assigned to each vertex, delete k edges in the tree to obtain k connected components T1, T2, … Tk+1 such that the largest total weight of a resulting sub-tree is minimized.
O(Rk(k + log d)+n) time - Becker, Perl and Schach ‘82
O(n) time, Frederickson 1990
Dynamic min-max
125
15
15
8
125
47
125
Raw data
ABC
AB BC
A
Over-sampling
p subtrees
s * p subtrees
p subsets
Implementation Issues 1) Sort Optimization 2) Minimizing Data Movement 3) Efficient Aggregation Operations 4) Disk Optimizations
1) Sort Optimization qSort is SLOW
May be O(n2) when there are duplicates
When cardinality is small range of keys is small Radix sort
Dynamically select between well optimized Radix and Quick Sorts
2) Minimizing Data Movement
Sort pointers to the records!
Never reorder the columns
3) Efficient Aggregation Operations
One pass for each pipeline
Do lazy aggregation
A3
A2
A1
B1
B3
B4
C1
C2
B2
D1
D2
ABCD
ABC
A
AB
all
4) Disk Optimizations Avoid OS buffering Implemented I/O
Manager Manages buffers
to avoid thrashing Does I/O in
separate process to overlap with computation
Speedup - Cluster
Efficiency - Cluster
Speedup - SunFire
Efficiency - SunFire
Increasing Data Size
Varying Over Sampling Factor
Varying Skew
Conclusions New communication efficient
parallel cubing framework for Top-down Bottom up Array based
Easy to implement (sort of), architecture independent
Thank you!
Questions?