Compression Aware Physical Database Design

28
Compression Aware Physical Database Design Vivek Narasayya Manoj Syamala Microsoft Research Brown University Hideaki Kimura * [email protected] {viveknar,manojsy}@microsoft.com (*) Graduates soon. On Job Market.

description

Compression Aware Physical Database Design. Microsoft Research . Brown University . Hideaki Kimura *. Vivek Narasayya Manoj Syamala. [email protected]. { viveknar,manojsy }@ microsoft.com. (*) Graduates soon. On Job Market. Background: Compression in DB. Every Major DBMS Supports - PowerPoint PPT Presentation

Transcript of Compression Aware Physical Database Design

Page 1: Compression Aware Physical Database Design

Compression Aware Physical Database

Design

Vivek NarasayyaManoj Syamala

Microsoft Research Brown University

Hideaki Kimura*

[email protected]{viveknar,manojsy}@microsoft.com

(*) Graduates soon. On Job Market.

Page 2: Compression Aware Physical Database Design

2/28

Every Major DBMS Supports Saves Storage Consumption Saves I/O Bandwidth

Background: Compression in DB

Tables,Indexes

SELECTCompressed

Data

INSERT

Decompress

Compress

Query ProcessEngine

DBMS A: 4x!DBMS B: 10x!DBMS C: 12x!

Page 3: Compression Aware Physical Database Design

3/28

Compression Schemes in DB

CitySeattle

San JoseSeattle

..

Dictionary Encoding

121..

Dict.1:Seattle

2:San Jose

+

◦ Local dict. (Oracle, SQL Server)◦ Global dict. (DB2)

NULL Suppression

LZO, RLE…

Price000321000054000015

..

@321@54@15

..

Prefix Suppression,

Page 4: Compression Aware Physical Database Design

4/28

Two Types of Compression in DBOrder

IndependentOrder

Dependent

A000AA000AA000AA000AA00BBB00BBB00BBB00BBB

BXXYYXXYY

IABA

000AA000AA00BBB00BBB000AA000AA00BBB00BBB

BXXXXYYYY

IBA

A@AA@AA@AA@AA@BB

B@BB

B@BB

B@BB

B

A@AA@AA@BB

B@BB

B@AA@AA@BB

B@BB

B

IAB IBAA

000AA↑↑↑

00BBB↑↑↑

A000AA

↑00BBB

↑000AA

↑00BBB

IAB IBA

page

= ≠frag

men

ted

◦ NULL-Supp.◦ Global dict.◦ …

◦ Run Length Enc.◦ Local dict.◦ …

Page 5: Compression Aware Physical Database Design

5/28

Saves Storage Space, I/OCPU Overhead to Compress & Decompress

Different Compression Scheme= Different Saving ↔ Overhead

Benefits and Overheads

How Do We Use It?DBA

Page 6: Compression Aware Physical Database Design

6/28

Depends on Workload◦SELECTs/INSERTs Frequency◦CPU bottleneck? IO bottleneck?

Issue 1: To Compress or not..Depends on Data

9GB10GB-90%

10GB 1GB -10%

High Compression Ratio Low Compression Ratio

Page 7: Compression Aware Physical Database Design

7/28

Issue 2: What Index to Create

I1 I2I3 I4

Q1 Q2

I1 I3 I5

I5

SyntacticallyRelevantIndexes

   

 

SelectCandidate

Configurations

EnumerateBest

Configuration

  Configuration

I1 I5 

Physical DB Design ToolDBMS

QueryOptimizer

HypotheticalIndexes

Estimate Runtime

What-ifAnalysis

Prune

Page 8: Compression Aware Physical Database Design

8/28

Run Design Tool to Select IndexesCompress them, then Repeat.

Naïve Solution: Staged Design

Stage 1 Stage 2Compress!

100 MBBudget

Idx

MV

100 MB

Idx

MV

50 MB

Idx

MV

100 MB

Workload

Page 9: Compression Aware Physical Database Design

9/28

Misses an index that makes sense only with compression

Problem in tight space budget

Shipdate State

Price Discount

Feb 21 CA $123 10%Jan 9 RI $222 0%Jul 5 TX $213 5%

Sales

SELECT SUM(Price*Discount) FROM SalesWHERE State='CA' and Jul 01 < Shipdate < Sep 01

I1 (State, Shipdate):95 MB → 50 MB

I2 (State, Shipdate)Include (Price, Discount):

170 MB → 90 MB

Choice for 100 MB?

Page 10: Compression Aware Physical Database Design

10/28

Example: Tight Space Budget

?

Good design:175MBCREATE COMPRESSED INDEX (L_PARTKEY,L_ORDERKEY,L_SUPPKEY) INCLUDE (L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT)

Staged:155MBCREATE INDEX (L_ORDERKEY) INCLUDE(L_SUPPKEY,L_COMMITDATE,L_RECEIPTDATE)

0 200 400 600 800 10000

10

20

30

40

50

60

70 TPC-H, 2ndary Index Only

Good design

Staged

Space Budget [MB]

Impr

ovem

ent [

%]

Page 11: Compression Aware Physical Database Design

11/28

Result in too high CPU overheads for compression/decompression.

Problem in plenty space budget

I2 (State, Shipdate)Include (Price, Discount):

170 MB → 90 MB

Choice for 200 MB?

UPDATE Sales SET Price=..

I1 (State, Shipdate):95 MB → 50 MB

INSERT INTO Sales …

CPU Overheads

Page 12: Compression Aware Physical Database Design

12/28

Example: Plenty Space Budget

0 500 1000 1500 2000 2500 30000

10

20

30

40

50

60 UPDATE Intensive TPC-H, 2ndary Index Only

Good design

Staged

Space Budget [MB]

Impr

ovem

ent [

%]

Worse with More Budget!

Page 13: Compression Aware Physical Database Design

13/28

How to Estimate Index-size after compression?

How to Evaluate benefits/overheads of compression?

How Compression affects Candidate Selection/Enumeration?

Integrated Solution Needed!

Page 14: Compression Aware Physical Database Design

14/28

Essential Metric of Indexes◦To Fit Space Budget◦To Estimate I/O cost

Need Compression Fraction

Size Estimation

Col-AWidth=8

Col-BWidth=

4Col-C

Width=10StatsTable

#tuple=1M

Size (IABC) = (8 + 4 + 10 + 4) * 1M = 26 MB

Clust. KeyWidth=4

Comp. Size (IABC) = 26 MB * CF (IABC)

Page 15: Compression Aware Physical Database Design

15/28

SampleCF [Idreos et al. ICDE'10]

Prior work

Sample Size: Cost ↔ Accuracy Still Expensive for 1,000s of indexes

1GBTable

10MBSample

CREATECOMPRESSE

DINDEX Naïve I...

05

101520

Des

ign

Tool

Ru

ntim

e [m

in]

SampleCFOverheads

Page 16: Compression Aware Physical Database Design

16/28

Solution Overview

Microsoft SQL Server

Query Optimizer

(Compression Aware Cost Model)

Samples

Temp DB

Workload

Candidate Selection

Merging

Enumeration

Physical design recommendation

Size Estimation

What-if analysis

SampleCFDatabase

Engine Tuning Advisor (DTA)

Storage bound

Page 17: Compression Aware Physical Database Design

17/28

Index Size Deduction

IbIa

Ia,b

SampleCF

Col-Ext Deduction

Ia IbNULL supp. (ORD-IND)

Ia,b

Ib,a

Col-SetDeduction

A000AA

///

00BBB///

A000AA

/00BBB

/000AA

/00BBB

/

IABIBA

Local dict. (ORD-DEP)

4,

1,

AIL

AIDV

AB

AB 2,

2,

AIL

AIDV

BA

BA

Estimate From Run-Length

Sum-upSavings

More Details in paper

Page 18: Compression Aware Physical Database Design

18/28

Size-Estimation Strategy◦Sample Size?◦Deduction Path?◦Expected Errors?

Formulate as Graph ProblemGreedy algorithm to solve

(details in the paper)

Optimize Accuracy-Cost Trade-off

Page 19: Compression Aware Physical Database Design

19/28

Query Cost model to consider (De)Compression CPU cost

Candidate Selection/Enumeration

Issues in Design Tool

Key Challenge:Space-Performance Trade-off

Page 20: Compression Aware Physical Database Design

20/28

Candidate Selection:Space-Performance Trade-off

IA IB IC IDQ1 Q2

SelectFastest

IA IC

IA IB IC ID

Compressed Versions

Add CompressedIndexes

Compressed Indexesare often

Slower-but-Smaller

Most of themare Ignored!

(exception: very highcompression ratio)

Page 21: Compression Aware Physical Database Design

21/28

Skyline Candidate Selection

Configuration Size

Que

ry C

ost Slow-small

Fast-large

Construct Skyline of Configurations Pick Both Fast-Indexes

and Small-Indexes

Page 22: Compression Aware Physical Database Design

22/28

Greedy picks un-compressed indexes too early

Enumeration: Problem

IA

IB ICB IC 10MB5MB10MB

Comp.

Seed IA  IC

15MB Room

IA  IB

IA  ICB

IA  IB ICB

IA  IB IC

IA  ICICB

Optimal Design

Page 23: Compression Aware Physical Database Design

23/28

Recover oversized configurationsCompress indexes in the config.

Local Backtrack in Enumeartion

IA IA  IB IA  IB IC

IA  ICICB

RecoverIf Oversized

IA  IB ICC …

Page 24: Compression Aware Physical Database Design

24/28

Implemented on SQL Server 2008◦Modified Database Tuning Advisor (DTA) "DTAc"

◦Modified Query Cost ModelTPC-H Scale-1 (more results in paper)

◦SELECT-intensive/UPDATE-intensive◦Compared Estimated Runtime

Experimental Results

Page 25: Compression Aware Physical Database Design

25/28

Both Skyline & Backtrack are required esp. for tight budget

Candidate Selection/Enumeration

50 300 700 15000

10

20

30

40

50

60

70

80 Select Intensive

Budget [MB]

Impr

ovem

ent

[%]

50 300 700 15000

10

20

30

40

50

60

70 Update Intensive DTAc (Both)SkylineBacktrackDTAc (None)DTA

Budget [MB]Im

prov

emen

t [%

]

Clustered/2ndary Indexes

Page 26: Compression Aware Physical Database Design

26/28

Especially better in tight budgetChoose lightly compressed designs in UPDATE-intensive

DTAc vs. DTA

0 200 400 600 800 10000

20

40

60

80 Select Intensive

DTAc

DTA

Budget [MB]

Impr

ovem

ent

[%]

0 200 400 600 800 10000

20

40

60

Update Intensive

DTAc

DTA

Budget [MB]

Impr

ovem

ents

[%]

Clustered/2ndary/MV Indexes

Page 27: Compression Aware Physical Database Design

27/28

Reduce Size Estimation Overheads for a factor of 3

Mostly <10% Estimation Error

Overhead in DTA

DTAc w/oOptimization

DTAc0

5

10

15

20 MV-EstimateMV-SamplePartial-EstimatePartial-SampleTable-EstimateTable-SampleOther

Des

ign

Tool

Ru

ntim

e [m

in]

Page 28: Compression Aware Physical Database Design

28/28

Opportunities and Challenges Integrated Approach to exploit compression in physical design◦Space-Performance Tradeoff◦Size Estimation

Open Issues◦Column-Store

Conclusion