Sql server scalability fundamentals

84
SQL Server Scalability Fundamentals Level 300

Transcript of Sql server scalability fundamentals

Page 1: Sql server scalability fundamentals

SQL Server Scalability Fundamentals

Level 300

Page 2: Sql server scalability fundamentals

Scalability

From Wikipedia:

“Scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged in order to accommodate that growth.”

Page 3: Sql server scalability fundamentals

What Are We Aiming For ? Scalability ?

High throughput ?

Low response times ?

A combination of the above ?

Some goals can be mutually exclusive,for example, Hadoop is scalable but givespoor response times

Page 4: Sql server scalability fundamentals

What We Know

Vs. Vs.

Page 5: Sql server scalability fundamentals

What We KnowDBA Tasks

Installation of OS and SQLBasic memory configurationPerfmon style monitoringMonitoring via SQL ProfilerBackup/restore and HA setup

You can read an execution planYou know what the basic SQL objects areYou know how to Google things, especially books online

Page 6: Sql server scalability fundamentals

A Good Way Of Thinking About Latency

Core

Core

Core

Core

L1

L1

L1

L1

L3

L2

L2

L2

L2

1ns 10ns 100ns 100us 10ms10us

Page 7: Sql server scalability fundamentals

Cache Out Curves

Data Size

Throughput/thread

Cache Size

Page 8: Sql server scalability fundamentals

There Are Several Of These Curves

Throughput

Touched Data Size

CPU Cache

TLB

NUMARemote

Storage

Page 9: Sql server scalability fundamentals

Response time = service time + wait time

0.053 0.871

= 0.924

Insert throughput = 1000 / 0.924

= 1082 / s

Single Threaded Performance

Page 10: Sql server scalability fundamentals

Insert Throughput By Doing The Math

Response time = service time + wait time

0.053 0.871

= 0.924

Insert throughput = 1000 / 0.924

= 1082 / s

Page 11: Sql server scalability fundamentals

“Big O” Notation

How elapsed time for an algorithm or space complexity changes in response to the size of the input data set

Page 12: Sql server scalability fundamentals

“Big O” and The Database Engine: Examples

Sort ( average and best case scenarios )O( n log(n) )

Insert into a memory optimised table with a hash index wherebucket count >= distinct values inserted O(1)

Insert into a memory optimised table with a hash index wherebucket count < distinct values inserted O(n)

Page 13: Sql server scalability fundamentals

What Gives Us The Biggest Bang For Our Buck ?

80 % 20 %

Schema design Indexing strategy T-SQL code design

Synchronizationprimitives

Leveraging thearchitecture of the modern CPU

Page 14: Sql server scalability fundamentals

LoopLow memory, Slow

HashHigh Memory, Fast

MergeRare

Join Types

Page 15: Sql server scalability fundamentals

Loop Join

Page 16: Sql server scalability fundamentals

Core

L3 Cache

L1 Instruction Cache 32KB

L0 UOP cache

L2 Unified Cache 256K

L1 Data Cache32KB

Core

CoreL1 Instruction Cache 32KB

L0 UOP cache

L2 Unified Cache 256K

L1 Data Cache32KB

Core

Bi-directional ring bus

Memory bus

C P U

Problem With Loop Joins – The Modern CPU !!!

Page 17: Sql server scalability fundamentals

L1 Cache sequential access

L1 Cache In Page Random access

L1 Cache In Full Random access

L2 Cache sequential access

L2 Cache In Page Random access

L2 Cache Full Random access

L3 Cache sequential access

L3 Cache In Page Random access

L3 Cache Full Random access

Main memory

0 20 40 60 80 100 120 140 160 1804

4

4

11

11

11

14

18

38

167

Main

memoryCPU

Main Memory Is Not As Fast As We Might Think !!!

Page 18: Sql server scalability fundamentals

Crawling A Tree In Memory

Page 19: Sql server scalability fundamentals

Hash Join

Page 20: Sql server scalability fundamentals

Row Mode Hash Join Scalability

2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

Elapsed time (ms) / Degree of Parallelism

Degree of Parallelism

Elap

sed

time

(ms)

Scalability best case scenario

NUMA node 0 boundary

If we keep going do we hit ‘Negative’ scale ?

Page 21: Sql server scalability fundamentals

Batch Mode To The Rescue ?

2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

Elapsed time (ms) / Degree of Parallelism

Degree of Parallelism

Elap

sed

time

(ms)

Scalability best case scenario

NUMA node 0 boundary

Page 22: Sql server scalability fundamentals

Two CPU sockets <> twice the throughput, WHY ?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20050,000

100,000150,000200,000250,000300,000350,000

Singleton Insert Rate / Threads

Single Sockets Inserts / s Two Sockets Inserts / s

Threads

Inse

rts /

s What About OLTP Workloads ?

Page 23: Sql server scalability fundamentals

We will come back to this later . . .

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

200,000,000

400,000,000

600,000,000

800,000,000

1,000,000,000

1,200,000,000

1,400,000,000

1,600,000,000

LOGCACHE_ACCESS Spins / Threads

Single Sockets LOGCACHE_ACCESS spins Two Sockets LOGCACHE_ACCESS spins

Multiple Sockets and Spin-locking

Page 24: Sql server scalability fundamentals

The In Memory OLTP Engine

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

2,000,0004,000,0006,000,0008,000,000

10,000,00012,000,00014,000,00016,000,000

Singleton Insert Rate / Thread Count, Hash Index Page count = 524288

Threads

Sing

leto

n In

sert

/ s

Scalability best case scenario

This looks good, but get the Page count wrong and your singleton inserts scale horribly !!!

Page 25: Sql server scalability fundamentals

What Happens When Memory Is Scarce ?

Available hash memory (MB)

Page 26: Sql server scalability fundamentals

Merge Join

Page 27: Sql server scalability fundamentals

Merge, Which Order ?

Cost relative to the batch = 87 %

Cost relative to the batch = 13 %

SELECT MAX(h.OrderDate) ,MAX(d.ProductID)FROM [Sales].[SalerOrderDetail] dINNER MERGE JOIN [Sales].[SalesOrderHeader] hON d.SalesOrderID = h.SalesOrderIDOPTION (MAXDOP 1)

SELECT MAX(h.OrderDate) ,MAX(d.ProductID)FROM [Sales].[SalesOrderHeader] hINNER MERGE JOIN [Sales].[SalerOrderDetail] dON d.SalesOrderID = h.SalesOrderIDOPTION (MAXDOP 1)

Page 28: Sql server scalability fundamentals

Digging Deeper

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'SalesOrderHeader'. Scan count 1, logical reads 689, physical reads 2, read-ahead reads 685, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'SalesOrderDetail'. Scan count 1, logical reads 1246, physical reads 3, read-ahead reads 1277, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'SalesOrderDetail'. Scan count 1, logical reads 1246, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'SalesOrderHeader'. Scan count 1, logical reads 689, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Page 29: Sql server scalability fundamentals

Turning M:N Into 1:N

Cost relative to the batch = 48 %

Cost relative to the batch = 52 %

SELECT MAX(h.OrderDate) ,MAX(b.MAX_P)FROM (SELECT SalesOrderId ,MAX(ProductId) AS MAX_P FROM [Sales].[SalesOrderDetail] GROUP BY SalesOrderId) bINNER MERGE JOIN [Sales].[SalesOrderHeader] hON b.SalesOrderID = h.SalesOrderIDOPTION (MAXDOP 1)

SELECT MAX(h.OrderDate) ,MAX(d.ProductID)FROM [Sales].[SalesOrderHeader] hINNER MERGE JOIN [Sales].[SalerOrderDetail] dON d.SalesOrderID = h.SalesOrderIDOPTION (MAXDOP 1)

Page 30: Sql server scalability fundamentals

Every Table Should Always Have A Clustered Index

Cluster ( O_ORDERKEY) Index ( O_ORDERKEY)CREATE UNIQUE CLUSTERED INDEX CIX_KeyON ORDERS_Cluster (O_ORDERKEY)WITH (FILLFACTOR = 100)

SELECT *FROM ORDERS_ClusterWHERE O_ORDERKEY = 300000

CREATE UNIQUE NONCLUSTERED INDEX CIX_KeyON ORDERS_Cluster (O_ORDERKEY)WITH (FILLFACTOR = 100)

SELECT *FROM ORDERS_ClusterWHERE O_ORDERKEY = 300000

Page 31: Sql server scalability fundamentals

Or Perhaps Not ?Cluster ( O_ORDERKEY ) Index ( O_ORDERKEY )

clustered indexes work against you when the number of non-covering secondary index seeks and scans increase

CREATE UNIQUE CLUSTERED INDEX CIX_KeyON ORDERS_Cluster (O_ORDERKEY)WITH (FILLFACTOR = 100)

CREATE INDEX IX_CustomerON ORDERS_Cluster(O_CUSTKEY)

CREATE UNIQUE NONCLUSTERED INDEX CIX_KeyON ORDERS_Heap (O_ORDERKEY)WITH (FILLFACTOR = 100)

CREATE INDEX IX_CustomerON ORDERS_Heap(O_CUSTKEY)

Page 32: Sql server scalability fundamentals

Tuning By Workload

Page 33: Sql server scalability fundamentals

Log write latency is critical to a point

OLTP and OLAP do not play nice together

Fine grained locking => no lock escalation Trace flag 1221 – prevent all lock escalation Trace flag 1224 – only escalate locks under severe

memory pressure

Avoid compression for OLTP

Certain plan shapes suit OTLP application best, next slide . . .

OLTP Checklist

Page 34: Sql server scalability fundamentals

OLTP Tuning For Dummies

1. Nested loopjoins are the

dominant join types

2. Seeks not scans

3. Serial iterators

4. There should be no sorts or spools except for performance spools and / or sorts associated

with optimized nested loop joins

Page 35: Sql server scalability fundamentals

Seeks Not Scans

Page 36: Sql server scalability fundamentals

1. Do not normalise beyond 3rd normal form2. Find most CPU/IO intensive queries with

sys.dm_exec_query_stats3. Add OPTION(LOOP JOIN) to the offending query4. Check estimated plan of bad query5. If spool found, add indexes until remedied6. Goto 2 until no more queries have non index paths7. Buy more hardware . . . wisely

OLTP Tuning For Dummies

Page 37: Sql server scalability fundamentals

Not All Spools Are BadExecution Time

Optimized Non-optimized

Constant

10,000,000(1% of rows) 6.5 minutes 26 minutes 4x

100,000,000(10% of rows) 10.4 minutes 4.3 hours 25x

250,000,000(25% of rows) 11.3 minutes 10.6 hours 56x

The “Optimized nested loop join”, data is pre-sorted on the outer side of the join, refer to Craig Freedmans blog post

Page 38: Sql server scalability fundamentals

Pursue high sequential scan IO throughput

Avoid CPU core starvation by understanding core consumption rates

Leverage columns stores and batch mode

Kimball dimensional model ?=> pursue star join optimisations

Certain plan shapes suit OLAP styleapplications best, next slide . . .

OLAP Checklist

Page 39: Sql server scalability fundamentals

Response time = service time + wait time

0.053 0.871

= 0.924

Insert throughput = 1000 / 0.924

= 1082 / s

Multi Threaded Performance

0 8 16 24 32 40 48 56 641

6

11

16

21

26

31

P = 100%P = 95%P = 90%P = 80%

Number of cores (N)

Spee

dup

Fact

or

Amdahl’s law: The increase in speed is proportional to the percentage of the workload that can be performed in parallel

Page 40: Sql server scalability fundamentals

Constructs That Force Serial Plans

All T-SQL user defined functions

CLR UDFS with data access

System functions such as OBJECT_ID(), ERROR_NUMBER()@@TRANCOUT . . .

Dynamic cursors

Page 41: Sql server scalability fundamentals

Constructs That Force Serial Regions

System table scansBackwards scansGlobal scalar aggregateSequence functionsRecursive queriesMulti-consumer spoolTOPTVFs

Page 42: Sql server scalability fundamentals

Response time = service time + wait time

0.053 0.871

= 0.924

Insert throughput = 1000 / 0.924

= 1082 / s

The Query Parallelism “Mothership of knowledge”

https://blogs.msdn.microsoft.com/craigfr/2007/04/17/parallel-query-execution-presentation/

Page 43: Sql server scalability fundamentals

What A ‘Good’ OLAP Execution Plan Should Look Like

v

2. Hash joins are the dominant

join type

v

3. Bitmap filters pushed as deep into the plan as

possiblev

1. Parallelism

v

4. Re-partitions streams iterators are not so good, we

will cover these later

Page 44: Sql server scalability fundamentals

The Optimizer Cost Model

“To understand the relevance of cost you would have to jumpin a time machine, go back in time and find a machine under a certaindevelopers desk, cost is based in the amount of time it took this machineto perform certain operations”Adam Machanic

The machine under Lubor’s ( Kollar ) desk

Page 45: Sql server scalability fundamentals

The Optimizer Has Limitations !!!

It contains hard codingCertain assumptions are explicitly hard coded into the optimizer

Not all scenarios it encounters are costed forThere are constructs and situations which the optimizer does not cost for, referred to the optimizer team as “Out of model scenarios”

The optimizer has blind spots

Page 46: Sql server scalability fundamentals

Hard Coding In The Optimizer

Hash distribution is always uniform

The memory grant for a varchar column is always have its size

The ratio between random and sequential IO is hard coded for the IO cost model

Page 47: Sql server scalability fundamentals

“Out of model” Scenarios

It always assumes that the buffer pool is cold

No IO costing for parallel plans

Data in different columns is never correlated

Cardinality estimates for table variables are not costed, unless the statement is recompiled

Page 48: Sql server scalability fundamentals

Which Query Runs The Fastest ?

1468ms

561ms

Cost relative to the batch of 44 %

Cost relative to the batch of 56 %

Page 49: Sql server scalability fundamentals

VARCHAR Columns And Memory Grants

1468ms

561ms

Cost relative to the batch of 44 %

Cost relative to the batch of 56 %

Page 50: Sql server scalability fundamentals

CPU Core Consumption RatesSELECT a.*INTO MyBigTableSourceFROM sys.all_objects aCROSS JOIN sys.all_objects bCROSS JOIN (SELECT TOP 500 * FROM sys.all_objects c) dt

SELECT COUNT(*)FROM MyBigTableSourceOPTION (MAXDOP 1)

Page 51: Sql server scalability fundamentals

Few-Outer-Row OptimizationFew-Outer-Row is a specific optimization for nested loop joins. In some data warehousing queries, the outer side of a nested loop join is a parallel scan with a filter.

Hash Join Exception To The Rule

. . . and this is what a few outer rows plan looks like

Page 52: Sql server scalability fundamentals

Few Outer Rows Optimisation

DBCC SETCPUWEIGHT(100000)

SELECT [UnitPrice]FROM FactInternetSales fisJOIN TaxYear tyON fis.DueDateKey > ty.YearStartAND fis.DueDateKey < ty.YearEnd

CREATE TABLE TaxYear ( YearStart datetime ,YearEnd datetime )

INSERT INTO TaxYear VALUES ( '20050406', '20060406', '20070406', '20080406', '20090406', '20100406', '20110406', '20120406' )

Page 53: Sql server scalability fundamentals

1. Add column store to the fact table2. Add OPTION(HASH JOIN) to the query3. Do you get a hash of the dimension and probe by the fact ?:

If not, check your statistics on the facts . . . and indexes on dimensions

4. Optimise the living daylights out of fact scan

OLAP Tuning For Dummies

Page 54: Sql server scalability fundamentals

Working Out Where Columns Begin and End Is Expensive !!!

46% of total sampled CPU time !!!

Page 55: Sql server scalability fundamentals

Avoid segment trimming

Align segments in order to get the best segment elimination possible

Make sure as little of your data as possible is in delta stores

Deleted rows are only ever removed after a column store reorg/rebuild, keep an eye on these !!!

No predicate push down on strings prior to SQL Server 2016

Optimising Column Store Scans

Page 56: Sql server scalability fundamentals

A row group can have a maximum of 1,048,576 rows, but it can be closed before it gets to this due to a number of reasons(exposed in SQL 2016 in sys.dm_db_column_store_row_group_physical_stats)

Trim_reason_desc Trim Reason

BULKLOAD BATCHSIZE specified for bulk insert, or end of bulk insert.

REORG_FORCED REORG with COMPRESS_ALL_ROWGROUPS = ON which closes every open row group and compresses it into columnar format

DICTIONARY_SIZE If Dictionary is full, the row group will be trimmed ( 16MB dictionary)

MEMORY_LIMITATION Memory pressure during index build caused row group to be trimmed

RESIDUAL_ROW_GROUP_INDEXBUILD Last row group(s) have less than 1 million rows when index rebuilt.

Segment Trimming

Page 57: Sql server scalability fundamentals

ID Value

1 Beer

2 Gin

3 Vodka

4 Whisky

5 Coca Cola

6 Wine

7 Brandy

Local Dictionary

BeverageId

Min Value 1

Max Value 2

DimBeverage Segment 001

Local Dictionary

BeverageId

Min Value 3

Max Value 4

Segment 002Local Dictionary

BeverageId

Min Value 5

Max Value 6

Segment 003

FactBeverageConsumed – Aligned

Aligned Segments

‘Alignment’ = min and max column values are in order with no overlap

Page 58: Sql server scalability fundamentals

ID Value

1 Beer

2 Gin

3 Vodka

4 Whisky

5 Coca Cola

6 Wine

7 Brandy

Local Dictionary

BeverageId

Min Value 1

Max Value 4

DimBeverage Segment 001

Local Dictionary

BeverageId

Min Value 2

Max Value 3

Segment 002Local Dictionary

BeverageId

Min Value 1

Max Value 5

Segment 003

FactBeverageConsumed – Non Aligned

Non Aligned Segments

Min and max values across segments overlap !!!

Page 59: Sql server scalability fundamentals

Scan

Local Dictionary

BeverageId

Min Value 1

Max Value 4

Segment 001

Local Dictionary

BeverageId

Min Value 2

Max Value 3

Segment 002Local Dictionary

BeverageId

Min Value 1

Max Value 5

Segment 003

Local Dictionary

BeverageId

Min Value 1

Max Value 2

Segment 001

Local Dictionary

BeverageId

Min Value 3

Max Value 4

Segment 002Local Dictionary

BeverageId

Min Value 5

Max Value 7

Segment 003

Scan Scan

ID Value

. . . .

4 Whisky

. . . .

DimBeverage

Non aligned segments Aligned segments

What Segment Alignment Gives Us

Page 60: Sql server scalability fundamentals

C P UC P U More Performance ? . . . Sorted Hash

2 4 6 8 10 12 14 16 18 20 22 240

10000

20000

30000

40000

50000

60000

70000

80000

Non-sorted column store Sorted column store

Degree of Parallelism

Tim

e (m

s)

Page 61: Sql server scalability fundamentals

SELECT p.Product ,p.ProductNumber ,p.ReorderPoint ,d.*FROM (SELECT TOP (2147483647) p0.ProductId ,p0.ProductNumber ,p0.ReorderPoint FROM bigProduct AS p0 WHERE p0.ProductId BETWEEN 1001 AND 20001) AS pCROSS APPLY (SELECT th.TransactionId ,RANK() OVER ( PARTITION BY p.ProductId ) ORDER BY th.ActualCost DESC ) AS LineTotalRank, ,RANK() OVER ( PARTITION BY p.ProductId ORDER BY th.Quantity DESC ) AS OrderQtyRank FROM bigTransactionHistory AS th WHERE th.ProductId = p.ProductId) AS dOPTION (MAXDOP 10)

Column Store Scans Are Good, Seeks Are Bad !

Page 62: Sql server scalability fundamentals

With A Clustered Column Store, Elapsed Time = 262 seconds

Table 'bigProduct'. Scan count 1, logical reads 228, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'Worktable'. Scan count 9577, logical reads 109258564, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'bigTransactionHistory'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 85220, lob physical reads 0, lob read-ahead reads 0.Table 'bigTransactionHistory'. Segment reads 30, segment skipped 0. SQL Server Execution Times: CPU time = 226750 ms, elapsed time = 262456 ms.

Page 63: Sql server scalability fundamentals

With A Conventional Index , Elapsed Time = 71 seconds

Table 'bigProduct'. Scan count 1, logical reads 228, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'bigTransactionHistory'. Scan count 9577, logical reads 98005, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 36264 ms, elapsed time = 71046 ms.

Page 64: Sql server scalability fundamentals

We want as much of our data to be in compressed row groupsfor OLAP style applications

Delta stores prevent segment elimination

Scanning delta stores has the same performance characteristic as scanning page compressed b-trees ( SQL 2012 – 14 ) and non-compressed b-trees ( SQL 2016 )

. . . Leads on to column store load strategies

Delta Stores

Page 65: Sql server scalability fundamentals

Row group closure and compression is triggered by 102,400 rows Best compression is achieved at 1,048,576 rows ( ~ million ) SSIS data flow task buffer sizes and buffer max rows can be turned:

Row Group Compression:102,400 Is The Magic Number

Can your data sources pump data into the column store in 1 million rows batches consistently ?

Can zero data flow buffer spills be guaranteed with large batch sizes ?

Page 66: Sql server scalability fundamentals

Trickle data into staging tables and transfer the rows to the column store when there is approximately 1 million of them

Turn the tuple mover off – TF634

Force row group closure and compression viaALTER INDEX <index name> ON <table name> REORGANIZE WITH (COMPRESS_ALL_ROW_GROUPS = ON)

Bulk loads are explained in depth in this MSDN article

SQL Server 2016 supports parallel inserts into column stores

Loading Strategy

Page 67: Sql server scalability fundamentals

Data Type Elimination And Predicate Pushdown Support

Data Type Min / Max Predicate Pushdown

Segment Elimination

Numeric yes no noDateTimeOffset yes no noChar yes no noVarchar yes no noNchar yes no noNvarchar yes no noBinary no no noVarbinary no no noUniqueidentifier no no no

Factor these data types out into dimension tables where possible !!!

Page 68: Sql server scalability fundamentals

New Column Store Features In SQL Server 2016

Filtered non clustered column store indexes

Ability to create a column store index on a memory optimised table

Support for row versioning

Column store can be declared in line when creating a table

Ability to specify a compression delay on the column store

Page 69: Sql server scalability fundamentals

New Column Store Features In SQL Server 2016

Support for updateable column stores on an Always On readable replica

Support for string predicate pushdown

Parallel insert now supported for clustered column stores

Simple aggregate pushdown

Support for multiple distinct counts

Page 70: Sql server scalability fundamentals

New Column Store Features In SQL Server 2016

Updateable non clustered column store indexes

Ability to create conventional b-tree indexes on clustered column store indexes

Ability to create foreign keys on clustered column stores

Batch mode windowing functions

Batch mode sort iterator support

Page 71: Sql server scalability fundamentals

New Column Store Features In SQL Server 2016

Updateable non clustered column store indexes

Ability to create conventional b-tree indexes on clustered column store indexes

Ability to create foreign keys on clustered column stores

Batch mode windowing functions

Batch mode sort iterator support

Page 72: Sql server scalability fundamentals

Optimising Windowing Function Performance With Row Stores

SELECT a.object_id ,SUBSTRING(CONVERT(VARCHAR(40), NEWID()), 1, 3) AS codeFROM sys.all_objects aCROSS JOIN sys.all_objects bCROSS JOIN (SELECT TOP 200 * FROM sys.all_objects b)INTO TestDataWHERE a.type = 'P'AND b.type = 'P'UNION ALLSELECT a.object_id ,SUBSTRING(CONVERT(VARCHAR(40), NEWID()), 1, 3) AS codeFROM sys.all_objects aCROSS JOIN sys.all_objects bWHERE a.type = 'P'AND b.type = 'P'UNION ALLSELECT a.object_id ,SUBSTRING(CONVERT(VARCHAR(40), NEWID()), 1, 3) AS codeFROM sys.all_objects aCROSS JOIN sys.all_objects bWHERE a.type = 'U'AND b.type = 'U'UNION ALLSELECT a.object_id ,SUBSTRING(CONVERT(VARCHAR(40), NEWID()), 1, 3) AS codeFROM sys.all_objects aCROSS JOIN sys.all_objects bWHERE a.type = 'V'AND b.type = 'V‘

CREATE CLUSTERED INDEX csx ON TestData (object_id, code)

Page 73: Sql server scalability fundamentals

The Plan We Get For A Simple Query Using RANK

SELECT t.object_id ,code ,RANK() OVER (PARTITION BY t.object_id ORDER BY code ASC) AS rkFROM TestData t

Page 74: Sql server scalability fundamentals

Anti Scale !!!

2 3 4 5 6 7 81,050,000

1,100,000

1,150,000

1,200,000

1,250,000

1,300,000

1,350,000

1,400,000

1,450,000

Elapsed Time (ms) / Degree of Parallelism

Degree of Parallelism

Elap

sed

Tim

e (m

s)

Page 75: Sql server scalability fundamentals

SQL Server 2016 Has A Batch Windowing Function

CREATE TABLE [dbo].[DummyTable]( [object_id] [int] NULL)

CREATE CLUSTERED COLUMNSTORE INDEX ccsi ON [dbo].[DummyTable]

SELECT t.object_id ,code ,RANK() OVER (PARTITION BY t.object_id ORDER BY code ASC) AS rkFROM TestData tLEFT JOIN DummyTable dON d.object_id = t.object_idOPTION (HASH JOIN)

How does this help us with row stores ?

Page 76: Sql server scalability fundamentals

The Magic Is In What Now Appears In The Execution Plan

Page 77: Sql server scalability fundamentals

From “Anti Scale” To Scalability !!!

2 3 4 5 6 7 80

200,000400,000600,000800,000

1,000,0001,200,0001,400,0001,600,0001,800,0002,000,000

Elapsed Time (ms) / Degree of Parallelism

Row store BATCH mode Row store ROW mode

Degree of Parallelism

Elap

sed

Tim

e (m

s)

Page 78: Sql server scalability fundamentals

What About Creating A Column Store On The TestData Table ?

2 3 4 5 6 7 80

500,000

1,000,000

1,500,000

2,000,000

2,500,000

Elapsed Time (ms) / Degree of Parallelism

Row store BATCH mode Row store ROW mode Column store BATCH mode

Degree of Parallelism

Elap

sed

Tim

e (m

s)

Page 79: Sql server scalability fundamentals

Natural Born Performance Killers

Physical reads for OLTP

Poor sequential scan rates and CPUcore starvation for OLAP

High WRITELOG latency for OLTP

Etc etc . . .

XML Processing ?

Page 80: Sql server scalability fundamentals

The Overhead Of Rendering XML

SELECT a.*FROM sys.all_objects aCROSS JOIN sys.all_objects b

SELECT a.*FROM sys.all_objects aCROSS JOIN sys.all_objects bFOR XML RAW

Where are our CPU cycles going ?

4 s 34 s

Page 81: Sql server scalability fundamentals

Where The Database Engine Is Spending Its CPU Time

Non-Xml version of the query Xml version of the query

The XML query is more than 5x CPU intensive

Page 82: Sql server scalability fundamentals

Digging Into Stack: The FOR XML RAW Version Of Query

Page 83: Sql server scalability fundamentals

Challenges Does whoever designed your infrastructure

understand the resource requirements andusage patterns of the database engine ?

“Big boxes” being purchased with little understanding of how SQL Server scales on such machines

CPU unfriendly SQL engine behaviour: sorting, hashing and index seeks ( pointer chasing )

An end of the CPU Ghz free lunch

Page 84: Sql server scalability fundamentals

What Have We Learned ?

1. There are plan shapes which suit OLTP and OLAP applications, pursue these2. Parallel query scalability suffers once the NUMA boundary is crossed3. Clustered indexes are not a silver bullet, consider non-covering secondary indexes4. Avoid column store segment trimming in order to leverage 8MB read a-heads for OLAP style applications, operational analytics is more nuanced5. Align segments by pre-sorting the data on columns frequently used in predicates6. Not all data types support segment elimination and predicate pushdown, this can be designed around7. Be cognizant of the overheads of processing XML !