Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft...

46
Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313

Transcript of Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft...

Page 1: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Turbocharge your Data Warehouse Queries with Columnstore IndexesLen WyattProgram ManagerMicrosoft Corporation

DBI313

Page 2: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Agenda

MotivationHow columnstores speed up queriesLoading columnstoresOptimizing database and index design Optimizing queries

Page 3: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

demo

Columnstores speed up queries

Page 4: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.
Page 5: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Overview of Columnstore Index

Page 6: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

How ColumnStore Indexes Speed Up Queries6

C1 C2 C3 C5 C6C4

ColumnStore indexes store data column-wise

Each page stores data from a single column

Highly compressedAbout 2x better than PAGE compressionMore data fits in memory

Each column can be accessed independently

Fetch only columns neededCan dramatically decrease IO

Heaps, B-trees store data row-wise

Page 7: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Columnstore Index Structure

Column SegmentSegment contains values from one column for a set of rowsSegments for the same set of rows comprise a row groupSegments are compressedEach segment stored in a separate LOBSegment is unit of transfer between disk and memory

7

Segments

C1 C2 C3 C5 C6C4

Row group

Page 8: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Columnstore Index Example

OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount

20101107 106 01 1 6 30.00

20101107 103 04 2 1 17.00

20101107 109 04 2 2 20.00

20101107 103 03 2 1 17.00

20101107 106 05 3 4 20.00

20101108 106 02 1 5 25.00

20101108 102 02 1 1 14.00

20101108 106 03 2 5 25.00

20101108 109 01 1 1 10.00

20101109 106 04 2 4 20.00

20101109 106 04 2 5 25.00

20101109 103 01 1 1 17.00

Page 9: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Horizontally Partition (Row Groups)OrderDateKey ProductKe

yStoreKey RegionKey Quantity SalesAmount

20101107 106 01 1 6 30.00

20101107 103 04 2 1 17.00

20101107 109 04 2 2 20.00

20101107 103 03 2 1 17.00

20101107 106 05 3 4 20.00

20101108 106 02 1 5 25.00OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount

20101108 102 02 1 1 14.00

20101108 106 03 2 5 25.00

20101108 109 01 1 1 10.00

20101109 106 04 2 4 20.00

20101109 106 04 2 5 25.00

20101109 103 01 1 1 17.00

Page 10: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Vertically Partition (Segments)

OrderDateKey

20101107

20101107

20101107

20101107

20101107

20101108

ProductKey

106

103

109

103

106

106

StoreKey

01

04

04

03

05

02

RegionKey

1

2

2

2

3

1

Quantity

6

1

2

1

4

5

SalesAmount

30.00

17.00

20.00

17.00

20.00

25.00

OrderDateKey

20101108

20101108

20101108

20101109

20101109

20101109

ProductKey

102

106

109

106

106

103

StoreKey

02

03

01

04

04

01

RegionKey

1

2

1

2

2

1

Quantity

1

5

1

4

5

1

SalesAmount

14.00

25.00

10.00

20.00

25.00

17.00

Page 11: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Compress Each Segment*OrderDateKey

20101107

20101107

20101107

20101107

20101107

20101108

ProductKey

106

103

109

103

106

106

StoreKey

01

04

04

03

05

02

RegionKey

1

2

2

2

3

1

Quantity

6

1

2

1

4

5

SalesAmount

30.00

17.00

20.00

17.00

20.00

25.00

Some segments will compress more than others

OrderDateKey

20101108

20101108

20101108

20101109

20101109

20101109

ProductKey

102

106

109

106

106

103

StoreKey

02

03

01

04

04

01

RegionKey

1

2

1

2

2

1

Quantity

1

5

1

4

5

1

SalesAmount

14.00

25.00

10.00

20.00

25.00

17.00

*Encoding and reordering not shown

Page 12: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Fetch Only Needed ColumnsSELECT ProductKey, SUM (SalesAmount) FROM SalesTable WHERE OrderDateKey < 20101108

StoreKey

01

04

04

03

05

02

StoreKey

02

03

01

04

04

01

RegionKey

1

2

2

2

3

1

RegionKey

1

2

1

2

2

1

Quantity

6

1

2

1

4

5

Quantity

1

5

1

4

5

1

OrderDateKey

20101107

20101107

20101107

20101107

20101107

20101108

OrderDateKey

20101108

20101108

20101108

20101109

20101109

20101109

ProductKey

106

103

109

103

106

106

ProductKey

102

106

109

106

106

103

SalesAmount

30.00

17.00

20.00

17.00

20.00

25.00

SalesAmount

14.00

25.00

10.00

20.00

25.00

17.00

Page 13: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Fetch Only Needed SegmentsSELECT ProductKey, SUM (SalesAmount) FROM SalesTable WHERE OrderDateKey < 20101108

StoreKey

01

04

04

03

05

02

StoreKey

02

03

01

04

04

01

RegionKey

1

2

2

2

3

1

RegionKey

1

2

1

2

2

1

Quantity

6

1

2

1

4

5

Quantity

1

5

1

4

5

1

OrderDateKey

20101107

20101107

20101107

20101107

20101107

20101108

OrderDateKey

20101108

20101108

20101108

20101109

20101109

20101109

ProductKey

106

103

109

103

106

106

ProductKey

102

106

109

106

106

103

SalesAmount

30.00

17.00

20.00

17.00

20.00

25.00

SalesAmount

14.00

25.00

10.00

20.00

25.00

17.00

Page 14: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Batch Mode Speeds Up Queries

Biggest advancement in SQL Server query processing in years…• Data moves as a batch through query

plan operators• Minimizes instructions per row• Takes advantage of cache structures

• Highly efficient algorithms• Better parallelism

Page 15: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Batch mode processing

Process ~1000 rows at a timeBatch stored in vector formOptimized to fit in L1 cache.

Vector operators implementedFilter, hash join, hash aggregation

Greatly reduced CPU time (7 to 40X)

15

bit

map o

f qu

alif

yin

g

row

s

Column vectors

Batch object

Page 16: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

#1 Takeaway!

Make sure most of the work of the query happens in batch mode

Page 17: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Loading Columnstores Effectively

Page 18: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Loading new data into a columnstore index

Tables with columnstores can be read, not updated

Partition switching allowedINSERT, UPDATE, DELETE, and MERGE not allowed

Recommended methods for loading dataDisable, update, rebuildPartition switchingUNION ALL

Page 19: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Adding Data Using Disable, Update, Rebuild

Disable (or drop) the columnstore indexALTER INDEX my_index ON MyTable DISABLE

Update the tableRebuild the columnstore indexALTER INDEX my_index ON MyTable REBUILD

Page 20: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Adding Data Using Partition Switching

Columnstores must be partition-aligned Partition switching fully supportedTo add data daily

Partition by dayEvery day

Split last partitionLoad data into staging table and columnstore index itSwitch it in

Avoids costly drop/rebuild

Page 21: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Adding Data Using UNION ALL (trickle load)

Master table (columnstore)Delta table (rowstore)Query using UNION ALL local-global aggregation workaroundAdd Delta to Master nightly

Page 22: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Achieving Fast Columnstore Index BuildsMemory intensive

Memory requirement related to # of columns, data, DOPIndex build is parallel only if table has > 1 million rows

One thread per segmentLow memory throttles parallelismConsider

High min server memory setting Set REQUEST_MAX_MEMORY_GRANT_PERCENT to 50Add memoryOmit columnsReduce parallelism

create columnstore index <name> on <table>(<columns>) with (maxdop = 1);

Page 23: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Optimizing database and index design

Page 24: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Eliminating Unsupported Data TypesCurrent unsupported types for columnstores:

decimal > 18 digitsBinaryBLOB(n)varchar(max)UniqueidentifierDate/time types > 8 bytes and CLR

Omit column from columnstore, orModify column type to supported type

Reduce precision of numerics to 18 digits or lessConvert guid’s to intsReduce precision of datetimeoffset to 2 or lessConvert hierarchyid to int or string

Page 25: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Reduce Nonclustered B-trees

Covering B-trees are no longer needed on source tableExtra B-trees can cause optimizer to choose poor planSave spaceReduce ETL time

Page 26: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Ensuring segment elimination by date

Use clustered B-tree on date in source table

Columnstore inherits orderOr, partition by dateOrdering by load date, ship date, order date etc. can all work

Dates are naturally correlated

Page 27: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Design out strings from columnstoresString filters don’t get pushed to storage engine

more batches to processdefeats segment elimination

Joining on string columns is slowFactor strings out to dimensions

Date LicenseNum Measure

20120301 XYZ123 100

20120302 ABC777 200

Date LicenseId Measure

20120301 1 100

20120302 2 200

LicenseId LicenseNum

1 XYZ123

2 ABC777

Page 28: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Optimizing queries

Page 29: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Best Practices

Use star schemaPut columnstores on large tables onlyInclude every column of table in columnstore indexUse integer surrogate keys for joins

Page 30: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Forcing use or non-use of Columnstores

Query hintOPTION(IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)

Index hint… FROM F WITH(index=MyColumnStore) …… FROM F WITH(index=MyClusteredBtree) …

Page 31: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Things to Avoid

Join/filter on string columnsJoin pairs of very large tables if you don’t have toNOT IN <subquery> on columnstore tableOUTER JOIN on columnstore tableUNION ALL to combine columnstore tables with other tables

Page 32: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Common workarounds

Page 33: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

demo

Example need for a workaround

Page 34: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

The common theme

Since there are some queries that the optimizer won’t be able to run in batch mode…Check execution plan to verify batch mode

Find the subset that can run in batch modeRewrite query to run mostly in batch modeJoin to the rest of the data

Page 35: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

#1 Takeaway!

Make sure most of the work of the query happens in batch mode

Page 36: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Outer Join Example & Workaround

Outer join prevents batch processing

Rewrite queryInner join in batch modeLeft join to complete the data set

select m.Title, COUNT(p.IP) PurchaseCountfrom Media m left outer join Purchase p on p.MediaId=m.MediaIdgroup by m.Titleorder by COUNT(p.IP) desc

with T (Title, PurchaseCount) as ( select m.Title, COUNT(p.IP) PurchaseCount from Media m join Purchase p on p.MediaId=m.MediaId group by m.Title ) select distinct m.Title,

ISNULL(T.PurchaseCount,0) as PurchaseCountfrom Media m left outer join T on m.Title=T.Titleorder by ISNULL(T.PurchaseCount,0) desc;

6.4 sec elapsed55 CPU-seconds

0.2 sec elapsed1.9 CPU-sec

Page 37: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

IN and EXISTs Example & Workaround

Using IN and EXISTS with subqueries can prevent batch mode execution

IN ( <constants list> ) typically works fine

Example:MediaId IN (23263, 29637, 27208)

select p.Date, count(*) from Purchase p where p.MediaId in (select MediaId from MediaStudyGroup) group by p.Date order by p.Date; --or--select p.Date, count(*) from Purchase p where exists (select m.MediaId from MediaStudyGroup m where m.MediaId = p.MediaId) group by p.Date order by p.Date;

select p.Date, count(*) from Purchase pjoin MediaStudyGroup m on p.MediaId = m.MediaId group by p.Date order by p.Date;

3.0 sec elapsed32 CPU-seconds

0.05 sec elapsed0.3 CPU-seconds

Page 38: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Union All Example

UNION ALL canprevent batch modeexecution

create view vPurchase as select * from Purchase union allselect * from DeltaPurchase;

select p.date, d.DayNumOfMonth, count(*) from vPurchase as p, Date d where p.Date = d.DateId group by p.date, d.DayNumOfMonth;

select p.date, d.DayNumOfMonth, m.Genre, count(*)from vPurchase p, Date d, Media mwhere p.Date = d.DateId and m.MediaId = p.MediaId group by p.date, d.DayNumOfMonth, m.Genre

Batch mode0.1 sec elapsed

Row mode19 sec elapsed

Page 39: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Union All Workaround

Push GROUP BY and aggregation over UNION ALLDo final GROUP BY and aggregation of resultsCalled “local-global aggregation”

with MainSummary (date, DayNumOfmonth, Genre, c) as ( select p.date, d.DayNumOfMonth, m.Genre, count(*) c from Purchase p, Date d, Media m where p.Date = d.DateId and m.MediaId = p.MediaId group by p.date, d.DayNumOfMonth, m.Genre ), DeltaSummary (date, DayNumOfmonth, Genre, c) as ( select p.date, d.DayNumOfMonth, m.Genre, count(*) c from DeltaPurchase p, Date d, Media m where p.Date = d.DateId and m.MediaId = p.MediaId group by p.date, d.DayNumOfMonth, m.Genre ), CombinedSummary (date, DayNumOfMonth, Genre, c) as ( --union all across the output of the two queries select * from MainSummary UNION ALL select * from DeltaSummary ) --group by to aggregate the data.select t.date, t.DayNumOfmonth, t.Genre, sum(c) as c from CombinedSummary as t group by t.date, t.DayNumOfmonth, t.Genre;

Batch mode0.3 sec elapsed

Page 40: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Scalar Aggregates Example & Workaround

Aggregate without group by doesn’t get batch processing

Workaround:Add a group by!

select count(*) from Purchase

with CountByDate (Date, c) as ( select Date, count(*) from Purchase group by Date ) select sum(c) from CountByDate;

1.0 sec elapsed15 CPU-seconds

0.06 sec elapsed0.3 CPU-seconds

Page 41: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Multiple DISTINCT aggregates example

Generates atable spoolSpool write/read is single threaded

SQL Server 2012runs queries with 1 DISTINCT aggand 1 or more non-distinct aggs in batch mode without any spool!

select p.Date, count(distinct p.UserId) as UserIdCount, count(distinct p.MediaId) as MediaIdCountfrom Purchase p, Media m where p.MediaId = m.MediaId and m.Category in ('Horror') group by p.Date;

26 sec elapsed31 CPU-seconds

Page 42: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Multiple DISTINCT aggregates workaround

Form each DISTINCT aggregate in aseparate subqueryJoin results on grouping keys

with DistinctMediaIds (Date, MediaIdCount) as ( select p.Date, count(distinct p.MediaId) as MediaIdCountfrom Purchase p, Media m where p.MediaId = m.MediaId and m.Category in ('Horror') group by p.Date ), DistinctUserIds (Date, UserIdCount) as ( select p.Date, count(distinct p.UserId) as UserIdCount from Purchase p, Media m where p.MediaId = m.MediaId and m.Category in ('Horror') group by p.Date ) select m.Date, m.MediaIdCount, u.UserIdCount from DistinctMediaIds m join DistinctUserIds u on m.Date=u.Date

0.5 sec elapsed6 CPU-seconds

Page 43: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Summary

Page 44: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

Summary

Keys to fast query processingColumnstore Index + Batch mode = amazing performanceColumn and segment elimination greatly reduce data demand

Working with the read-only property of columnstores:

Drop, Update, RebuildPartition SwitchingUNION ALL method.

Future work will reduce need for query tuningFor now, make sure most work happens in batch mode

Page 46: Turbocharge your Data Warehouse Queries with Columnstore Indexes Len Wyatt Program Manager Microsoft Corporation DBI313.

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS

PRESENTATION.