C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY...

34
C-Store: Self-Organ izing Tuple Reconst ruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009

Transcript of C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY...

Page 1: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

C-Store: Self-Organizing Tuple Reconstruction

Jianlin FengSchool of SoftwareSUN YAT-SEN UNIVERSITYApr. 17, 2009

Page 2: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Review of Tuple Reconstruction Stitch together separate column values of the

same logical tuple. Join on Tuple IDs/positions.

Two Strategies Early materialization Late matertialization

Page 3: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Motivation

Tuple Reconstruction is easy if columns are sorted in the same order

However the pre-requisite can not always be preserved. During query processing, many operators (joins, g

roup by, order by, etc.) are not tuple order-preserving.

Page 4: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

The Ultimate Access Pattern

For each relation R, we have one copy for each attribute in R. each copy is pre-sorted on the corresponding attribute.

All tuple reconstruction initiated by a restriction on an attribute R.a, can be done using the copy that is sorted on R.a.

The Limitations: Space constraint Idle time for the pre-sortings.

Page 5: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

The Proposed Solution

Partial Sideways Cracking Uses auxiliary self-organizing data structures to m

aterialize mappings between pairs of attributes used together for tuple reconstruction.

Background MonetDB Selection-based Cracking

Page 6: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

MonetDB: http://monetdb.cwi.nl/ Every relation table

is represented as a collection of Binary Association Tables (BATs).

Each BAT is a set of two columns For a relation R of k attributes, there exists k BATs. Each BAT stores (key, attr) pairs. In each BAT, keys are system generated tuple IDs. For base BAT, the key column is typically virtual.

Like STORAGE KEY in Read Store of C-Store.

Page 7: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

MonetDB’s Basic Operators (1) select(A, v1, v2)

Searches all (key, attr) pairs in base column A for attribute values between v1 and v2.

Output: A list of keys/positions.

In the output, the tuple order is usually preserved.

Page 8: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

MonetDB’s Basic Operators (2) join(j1, j2)

Performs a join between attr1 of j1 and attr2 of j2. Output:

A list of (key1, key2) pairs. In the output, the tuple order is mainly preserved f

or outer join.

Page 9: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Outer Join

An outer join does not require each record in the two joined tables to have a matching record.

The joined table retains each record—even if no other matching record exists. Left outer join Right outer join Full outer join

Page 10: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Left Outer Join

Page 11: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

MonetDB’s Basic Operators (3) reconstruct(A, r)

Output: All (key, attr) pairs of base column A at the position spec

ified by r.

Page 12: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Selection-Based Cracking Cracker column

The first time an attribute A is required by a query, a copy of column A is created, called the cracker column CA of A.

Each selection operator on A triggers a range-based physical reorganization of CA.

Each cracker column, has a cracker index (AVL-tree) to maintain partitioning information.

Future queries benefit from the physically clustered data and do not need to access the whole column.

Page 13: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

AVL-Tree

An AVL tree is a self-balancing binary search tree.

In an AVL tree, the heights of the two child subtrees of any node differ by at most one.

Page 14: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

An example of an unbalanced non-AVL tree

Page 15: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

The same tree after being height-balanced

Page 16: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Order for Tuple Reconstruction The order in which tuples are inserted is used

for tuple construction. Physical reorganization happens only on cracker

columns.

Page 17: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

The crackers.select Operator

crackers.select(A, v1, v2) First, it creates CA if it does not exist.

It searches the index of CA for the area where v1 and v2 fall.

If the bounds do not exist, i.e., no query used them in the past, then CA is physically reorganized to cluster all qualifying tuples into a contiguous area.

Output: A list of keys/positions.

Page 18: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Cracker Map

A cracker map MAB is defined as a two-column table over two attributes A and B of a relation R. Values of A are stored in the left column, called he

ad. Values of B are stored in the right column, called t

ail.

Values of A and B in the same position of MAB belong to the same tuple.

Page 19: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Maps Are Created on Demand Only When a query q needs access to attribute B

based on a restriction on attribute A and MAB does not exist,

then q will create MAB by performing a scan over base columns A and B.

For each cracker map MAB , there is a cracker index (AVL-tree) that maintains information about how A values are distributed over MAB.

Page 20: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Queries Trigger Cracking

Query Style Access B based on A.

Each such query triggers cracking (physical reorganization) of MAB based on the restriction applied to A.

Cracking All tuples with values of A that qualify the restrictio

n are in a contiguous area in MAB . Realized by splitting a piece of MAB into two or thre

e new pieces.

Page 21: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

The sideways.select(A, v1, v2, B) Operator Returns tuples of attribute B of relation R based on a

predicate on attribute A of R as follows:(1) If there is no cracker map MAB , then create one.

(2) Search the index of MAB to find the contiguous area w of the pieces related to the restriction σ on A.

If σdoes not match existing piece boundaries,

(3) Physically reorganize w to move false hits out of the contiguous area of qualifying tuples.

(4) Update the cracker index of MAB accordingly.

(5) Return a non-materialized view of the tail of w.

Page 22: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.
Page 23: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Multi-Projection Queries

A single-selection query q that projects n attributes requires n maps, one for each attribute to be projected.

Select B, C

From R

Where A < 4;

For this query, we need 2 maps MAB and MAc .

All maps that have been created using A as head are collected in the map set SA.

Page 24: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.
Page 25: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Adaptive Alignment

The Problem Naïve use of the sideways.select operator may le

ad to non-aligned cracker maps. The Solution

Extend the sideways.select operator with an alignment step to keep the alignment maps .

The Basic Idea Is to apply all physical reorganizations, due to sele

ctions on an attribute A, in the same order to all maps in the map set SA.

Page 26: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Cracker Tape

For each map set SA, introduce a cracker tape TA. TA logs (in order of their occurrence) all selections on attribute A t

hat trigger cracking of any map in SA.

Each map MAx is equipped with a cursor pointing to the entry in T

A that represents the last crack on MAx.

Given a tape TA , a map MAx is aligned (synchronized) by successively forwarding its cursor towards the end of MAx

and incrementally cracking MAx according to all selections it passes on its way.

All maps whose cursors point to the same position in TA , are physically aligned.

Page 27: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

The Extended sideways.select Operator

Page 28: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.
Page 29: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Map Set Choice: Self-organizing Histograms Following the “cracking philosophy”

In an unpredictable environment with no idle system time, always perform the minimum investment.

In this way, for a query q, a set SA is chosen such that the restriction on A is the most selective in q. Yielding a minimal bit vector

The most selective restriction can be found using the cracker indices.

Page 30: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Complex Queries

No other (relational) operators, rather than tuple reconstruction, depends on tuple insertion order. Joins,aggregations, groupings, etc.

Potentially many operators can exploit the clustering information in the maps. A MAX operator can consider only the last piece o

f a map. Such directions are for future work.

Page 31: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Experimental Analysis

Compare the implementation of selection and sideways cracking on top of MonetDB,

Against the latest non-cracking version of MonetDB,

And against MonetDB on presorted data. Results

Sideways cracking achieves similar performance to presorted data.

But does not have the heavy initial cost and the restrictions on updates and workload prediction.

Page 32: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Partial Sideways Cracking

Consider storage restriction Partial Maps

Maps are only partially materialized driven by the workload.

A map consists of several chunks. Each chunk is a separate two-column table. Each chunk contains a given value range of the

head attribute of this map. Each chunk is cracked separately.

Page 33: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

A Research Direction

Improving performance by compression C-Store uses compression heavily.

Can we integrate compression with cracking?

Page 34: C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

References

S. Idreos, M. L. Kersten, S. Manegold. Self-organizing Tuple Reconstruction in Column-stores. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Providence, RI, USA, Accepted for publication, June 2009.

Daniel J. Abadi, Daniel S. Myers, David J. DeWitt, and Samuel R. Madden 。 Materialization Strategies in a Column-Oriented DBMS . Proceedings of ICDE, April, 2007, Istanbul, Turkey.