SQL Server 2012 Data Warehousing Deep Dive Dejan Sarka, SolidQ dsarka@solidq

39
SQL Server 2012 Data Warehousing Deep Dive Dejan Sarka, SolidQ [email protected]

description

SQL Server 2012 Data Warehousing Deep Dive Dejan Sarka, SolidQ [email protected]. Agenda. DW Problems Bitmap Filtered Hash Joins Table Partitioning Filtered Indexes Indexed Views Data Compression Window Functions Columnstore Indexes. Algorithms Complexity. - PowerPoint PPT Presentation

Transcript of SQL Server 2012 Data Warehousing Deep Dive Dejan Sarka, SolidQ dsarka@solidq

Page 1: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

SQL Server 2012 Data Warehousing Deep Dive

Dejan Sarka, SolidQ

[email protected]

Page 2: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Agenda

• DW Problems• Bitmap Filtered Hash Joins• Table Partitioning• Filtered Indexes• Indexed Views• Data Compression• Window Functions• Columnstore Indexes

2

Page 3: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Algorithms Complexity

• Forever* = about 40 billion billion years!

3

Page 4: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

SSAS Dimensional Addressing

Nelson White

USA Japan

USA Japan USA_North USA_

South USA_North USA_

South Seattle Boston Seattle Boston

1991, Qtr1

Jan 00 10 20 30 40 50 60 70

Feb 01 11 21 31 41 51 61 71

Mar 02 12 22 32 42 52 62 72

1991, Qtr2 03 13 23 33 43 53 63 73

1991, Qtr 3 04 14 24 34 44 54 64 74

1991, Qtr4

Oct 05 15 25 35 45 55 65 75

Nov 06 16 26 36 46 56 66 76

Dec 07 17 27 37 47 57 67 77

Axis(1).Position(3)

Axis(1).Position(1).Memb

ers(2)

Axis(1)

Every cell has an address

4

Page 5: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

SSAS Tabular Problems

• SSAS address space: mn cells−Maximum number of possible combinations

200 * 5000 * 1095 = 109,500,000

−SSAS address space grows exponentially!−Can run out of address space – limited scalability

5

Page 6: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

6

RDBMS Joins• Merge: complexity ~ O(n)

– Needs sorted inputs, equijoin

• Hash: complexity ~ O(n) / ~O(n2)– Needs equijoin

• Nested Loops: complexity ~ O(n) (indexed), ~ O(n2) (not

indexed)– Works always, can become quadratic

• Non-equijoins are frequently quadratic– E.g., running totals

Page 7: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Linearize Joins

x y1 = x y2 = x2

y3 = x2

per partes

0 0 0 00,2 0,2 0,04 0,040,4 0,4 0,16 0,160,6 0,6 0,36 0,360,8 0,8 0,64 0,64

1 1 1 11,2 1,2 1,44 1,041,4 1,4 1,96 1,161,6 1,6 2,56 1,361,8 1,8 3,24 1,64

2 2 4 22,2 2,2 4,84 2,042,4 2,4 5,76 2,162,6 2,6 6,76 2,362,8 2,8 7,84 2,64

3 3 9 3

7

Page 8: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

8

Bitmap Filtered Star Joins

• Optimized bitmap filtering for star schema joins– Bitmap representation of a set of values from

a dim table to pre-filter rows to join from a fact table

– Enables filtering rows early in the plan, allowing subsequent operators to operate on fewer rows

Page 9: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Bloom Filter (1)*

• Bloom filter is a bit array of m bits– Start with all bits set to 0

• k different hash functions defined– Each of which maps some set element to one of the m positions with a uniform random distribution

• To add an element, feed it to each of the k hash functions to get k array positions– Set the bits at all these positions to 1

9

Source: Wikipedia

Page 10: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Bloom Filter (2)• To test whether and element it is in the set, feed

it to each of the k hash functions to get k array positions– If any of the bits at these positions are 0, the element

is not in the set – If all are 1, then either the element is in the set, or the

bits have been set to 1 during the insertion of other elements

10

Page 11: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Table Partitioning

• Partition function• Partition scheme• Aligned indexes• Partition elimination• Partition switching

11

Page 12: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Filtered Indexes

• Where clause in the Create Index statement

• Small B-trees on subset of data only• Useful when some values are selective,

while others dense– Index on selective values only

12

Page 13: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Indexed Views

• Useful for queries that aggregate data– Can also reduce number of joins

• Depending on edition of SQL Server can be used automatically– No need to change reporting queries

• Many limitations

13

Page 14: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Data Compression

• Pre-SQL 2005: variable-length data types• SQL 2005: vardecimal• SQL 2008

−Row compression−Page compression

• SQL 2008 R2−Unicode compression

Page 15: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

SQL 2008 Compression

• Row compression– Fixed-width data type values stored in variable format

•Page compression

• Prefix compression

• Dictionary compression

Page 16: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Unicode Compression

• Works on nchar(n) and nvarchar(n)• Automatically with row or page compression• Savings depends on language

– Up to 50% in English, German– Only 15% in Japanese

• Very low performance penalty

Page 17: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Window Functions

• Functions operating on a window (set) of rows defined by an OVER clause

• Types of functions:• Ranking• Aggregate• Distribution

SELECT empid, ordermonth, qty, SUM(qty) OVER(PARTITION BY empid ORDER BY ordermonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS runqtyFROM Sales.EmpOrders;

17

Page 18: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Window Functions in SQL Server

• SQL Server 2005:– Ranking calculations– Aggregates with only window partitioning

• SQL Server 2012:– Aggregates with also window ordering and framing– Offset functions: LAG, LEAD, FIRST_VALUE,

LAST_VALUE– Distribution functions: PERCENT_RANK,

CUME_DIST, PERCENTILE_CONT, PERCENTILE_DISC

18

Page 19: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

SQL Server DW / OLAP Offerings

• Personal and team level– PowerPivot for Excel (client)– PowerPivot for SharePoint (server)

• Corporate level– SQL Server – SSAS Tabular– SSAS Dimensional– Fast Track Data Warehouse– Parallel Data Warehouse

Vert iPaq

19

Page 20: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Trans-Relational Model• Not “beyond” relational

– Transformation between logical and physical layer• Steve Tarin, Required Technologies Inc. (1999)• All columns stored in sorted order

– All joins become merge joins– Can condense storage– Of course, updates suffer

• Logically, this is a pure relational model• SQL Server uses own variant

– Order of columns not preserved – optimized for compression

– Leverages parallel hash joins rather than merge joins20

Page 21: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Columnar Storage (1)Row / Col 1 2 3

Name Color City

1 Nut Red London

2 Bolt Green Paris

3 Screw Blue Oslo

4 Screw Red London

5 Cam Blue Paris

6 Cog Red London

Row / Col 1 2 3

Name Color City

1 Bolt Blue London

2 Cam Blue London

3 Cog Green London

4 Nut Red Oslo

5 Screw Red Paris

6 Screw Red Paris

21

Page 22: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Columnar Storage (2)

Row / Col 1 2 3

Name Color City

1 Bolt [1:1] Blue [1:2] London [1:3]

2 Cam [2:2] Green [3:3] Oslo [4:4]

3 Cog [3:3] Red [4:6] Paris [5:6]

4 Nut [4:4]

5 Screw [5:6]

6

22

Row / Col 1 2 3

Name Color City

1 Bolt Blue London

2 Cam Blue London

3 Cog Green London

4 Nut Red Oslo

5 Screw Red Paris

6 Screw Red Paris

Page 23: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Row Reconstruction Table

Row / Col 1 2 3

Name Color City

1 Bolt [1:1] Blue [1:2] London [1:3]

2 Cam [2:2] Green [3:3] Oslo [4:4]

3 Cog [3:3] Red [4:6] Paris [5:6]

4 Nut [4:4]

5 Screw [5:6]

6

Row / Col 1 2 3

Name Color City

1 3 6 4

2 1 4 6

3 6 5 3

4 4 1 5

5 2 2 1

6 5 3 2

23

Page 24: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

SQL Server Solution (1)*

• Converting rows to column segments

24

Source: SQL Server Column Store Indexes by Per-Åke Larson, et al., MicrosoftSIGMOD’10, June 12–16, 2011

Page 25: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

SQL Server Solution (2)

• Storing column segments as BLOBs– Leverages existing

BLOB storage– Additional segment

metadata– Multiple

compression algorithms

25

Page 26: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Columnstore Compression

• Encoding values to 32-bit or 64-bit integer– Dictionary-based encoding– Value-based (prefix) encoding

• Optimal row ordering with VertiPaq™ algorithm to rearrange rows– Optimal ordering for Run-Length Encoding

(RLE) for best overall compression• Compression

– RLE - data stored as <value, count> pairs– Bit-Pack– use min number of bits for a value

26

Page 27: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Result: Reduced I/O

• Fetches only needed columns from disk

• Columns are compressed

• Less IO• Better buffer hit

rates

C1

C2

C4 C5 C6

C3

SELECT region, sum (sales) …

Page 28: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Result: Reading Segments

• Column segment contains values from one column for a set of about 1M rows

• Column segment is unit of transfer from disk

• Storage engine can eliminate segments early in the process• Because of additional

column segment metadata

C1 C2 C3 C5 C6C4

Set of about 1M rows

Column Segment

Page 29: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Reducing CPU Usage

• Columnstore indexes reduce disk IO• Bitmap-filtered hash joins can be executed

in parallel• Problem: CPU becomes a bottleneck• Solution: reduce CPU usage by

processing large numbers of rows– Iterators that do not process row-at-a-time– Process batch-at-a-time

Page 30: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Batch Processing

• Orthogonal to columnstore indices– Can support other storage

• However, best results with columnstore indices– Sometimes can perform batch operations

directly on compressed data• Can mix batch and row operators

– Can dynamically switch from batch to row mode

30

Page 31: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Batch Operators

• The following operators support batch mode processing:– Filter– Project– Scan– Local hash (partial) aggregation– Hash inner join– Batch hash table build

31

Source: http://social.technet.microsoft.com/wiki/contents/articles/sql-server-columnstore-index-faq.aspx#Batch_mode_processing

Page 32: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Columnstore Indexes Constraints

• Base table must be clustered B-tree or heap

• Columnstore index:– Nonclustered – One per table– Must be partition-aligned– Not allowed on indexed view– Can’t be a filtered index

32

Page 33: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Data Type Restrictions

• Unsupported types– Decimal > 18 digits– Binary– BLOB– (n)varchar(max)– Uniqueidentifier– Date/time types > 8 bytes– CLR

33

Page 34: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Query Performance Restrictions

• Outer joins• Unions• Consider modifying queries to hit “sweet

spot”– Inner joins– Star joins– Aggregation

34

Page 35: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Loading New Data

• Columnstore index makes table read-only• Partition switching allowed• INSERT, UPDATE, DELETE, and MERGE

not allowed• Two recommended methods for loading

data• Disable, update, rebuild• Partition switching

35

Page 36: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

36

Columnstore Indexes Usage

• Use when:– Read-mostly workload– Most updates are appending new data– Workflow permits partitioning or index drop/rebuild – Queries often scan & aggregate lots of data

• Use on fact (and large dimensions) tables• Do not use when:

– Frequent updates– Partition switching or rebuilding index doesn’t fit workflow– Frequent small look up queries– VertiPaq cannot handle your data model

Page 37: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

Review

• DW Problems• Bitmap Filtered Hash Joins• Table Partitioning• Filtered Indexes• Indexed Views• Data Compression• Windows Functions• Columnstore Indexes

37

Page 38: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

38

Q & A

• Questions?

• Thank you for coming to this conference…• …and this presentation!

Page 39: SQL  Server 2012 Data  Warehousing Deep Dive Dejan  Sarka, SolidQ dsarka@solidq

39

References

• Books:– SQL Server Books OnLine – Dejan Sarka, Grega Jerkič and Matija Lah: MCTS

Self-Paced Training Kit (Exam 70-463): Building Data Warehouses with Microsoft SQL Server 2012

• Courses and Seminars– SQL Server 2012 and SharePoint BI Immersion– Advanced Transact-SQL