Sql server scalability fundamentals

SQL Server Scalability Fundamentals

Level 300

Scalability

From Wikipedia:

“Scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged in order to accommodate that growth.”

What Are We Aiming For ? Scalability ?

High throughput ?

Low response times ?

A combination of the above ?

Some goals can be mutually exclusive,for example, Hadoop is scalable but givespoor response times

What We Know

Vs. Vs.

What We KnowDBA Tasks

Installation of OS and SQLBasic memory configurationPerfmon style monitoringMonitoring via SQL ProfilerBackup/restore and HA setup

You can read an execution planYou know what the basic SQL objects areYou know how to Google things, especially books online

A Good Way Of Thinking About Latency

Core

Core

Core

Core

L1

L1

L1

L1

L3

L2

L2

L2

L2

1ns 10ns 100ns 100us 10ms10us

Cache Out Curves

Data Size

Throughput/thread

Cache Size

There Are Several Of These Curves

Throughput

Touched Data Size

CPU Cache

TLB

NUMARemote

Storage

Response time = service time + wait time

0.053 0.871

= 0.924

Insert throughput = 1000 / 0.924

= 1082 / s

Single Threaded Performance

Insert Throughput By Doing The Math


0.053 0.871

= 0.924


= 1082 / s

“Big O” Notation

How elapsed time for an algorithm or space complexity changes in response to the size of the input data set

“Big O” and The Database Engine: Examples

Sort ( average and best case scenarios )O( n log(n) )

Insert into a memory optimised table with a hash index wherebucket count >= distinct values inserted O(1)

Insert into a memory optimised table with a hash index wherebucket count < distinct values inserted O(n)

What Gives Us The Biggest Bang For Our Buck ?

80 % 20 %

Schema design Indexing strategy T-SQL code design

Synchronizationprimitives

Leveraging thearchitecture of the modern CPU

LoopLow memory, Slow

HashHigh Memory, Fast

MergeRare

Join Types

Loop Join

Core

L3 Cache

L1 Instruction Cache 32KB

L0 UOP cache

L2 Unified Cache 256K

L1 Data Cache32KB

Core

CoreL1 Instruction Cache 32KB

L0 UOP cache

L2 Unified Cache 256K

L1 Data Cache32KB

Core

Bi-directional ring bus

Memory bus

C P U

Problem With Loop Joins – The Modern CPU !!!

L1 Cache sequential access

L1 Cache In Page Random access

L1 Cache In Full Random access



L2 Cache Full Random access



L3 Cache Full Random access

Main memory

0 20 40 60 80 100 120 140 160 1804

4

4

11

11

11

14

18

38

167

Main

memoryCPU

Main Memory Is Not As Fast As We Might Think !!!

Crawling A Tree In Memory

Hash Join

Row Mode Hash Join Scalability

2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

Elapsed time (ms) / Degree of Parallelism

Degree of Parallelism

Elap

sed

time

(ms)

Scalability best case scenario

NUMA node 0 boundary

If we keep going do we hit ‘Negative’ scale ?

Batch Mode To The Rescue ?

2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

Elapsed time (ms) / Degree of Parallelism


Elap

sed

time

(ms)


NUMA node 0 boundary

Two CPU sockets <> twice the throughput, WHY ?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20050,000

100,000150,000200,000250,000300,000350,000

Singleton Insert Rate / Threads

Single Sockets Inserts / s Two Sockets Inserts / s

Threads

Inse

rts /

s What About OLTP Workloads ?

We will come back to this later . . .

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

200,000,000

400,000,000

600,000,000

800,000,000

1,000,000,000

1,200,000,000

1,400,000,000

1,600,000,000

LOGCACHE_ACCESS Spins / Threads

Single Sockets LOGCACHE_ACCESS spins Two Sockets LOGCACHE_ACCESS spins

Multiple Sockets and Spin-locking

The In Memory OLTP Engine

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

2,000,0004,000,0006,000,0008,000,000

10,000,00012,000,00014,000,00016,000,000

Singleton Insert Rate / Thread Count, Hash Index Page count = 524288

Threads

Sing

leto

n In

sert

/ s


This looks good, but get the Page count wrong and your singleton inserts scale horribly !!!

What Happens When Memory Is Scarce ?

Available hash memory (MB)

Merge Join

Merge, Which Order ?

Cost relative to the batch = 87 %


SELECT MAX(h.OrderDate) ,MAX(d.ProductID)FROM [Sales].[SalerOrderDetail] dINNER MERGE JOIN [Sales].[SalesOrderHeader] hON d.SalesOrderID = h.SalesOrderIDOPTION (MAXDOP 1)

SELECT MAX(h.OrderDate) ,MAX(d.ProductID)FROM [Sales].[SalesOrderHeader] hINNER MERGE JOIN [Sales].[SalerOrderDetail] dON d.SalesOrderID = h.SalesOrderIDOPTION (MAXDOP 1)

Digging Deeper

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'SalesOrderHeader'. Scan count 1, logical reads 689, physical reads 2, read-ahead reads 685, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'SalesOrderDetail'. Scan count 1, logical reads 1246, physical reads 3, read-ahead reads 1277, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'SalesOrderDetail'. Scan count 1, logical reads 1246, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'SalesOrderHeader'. Scan count 1, logical reads 689, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Turning M:N Into 1:N



SELECT MAX(h.OrderDate) ,MAX(b.MAX_P)FROM (SELECT SalesOrderId ,MAX(ProductId) AS MAX_P FROM [Sales].[SalesOrderDetail] GROUP BY SalesOrderId) bINNER MERGE JOIN [Sales].[SalesOrderHeader] hON b.SalesOrderID = h.SalesOrderIDOPTION (MAXDOP 1)

SELECT MAX(h.OrderDate) ,MAX(d.ProductID)FROM [Sales].[SalesOrderHeader] hINNER MERGE JOIN [Sales].[SalerOrderDetail] dON d.SalesOrderID = h.SalesOrderIDOPTION (MAXDOP 1)

Every Table Should Always Have A Clustered Index

Cluster ( O_ORDERKEY) Index ( O_ORDERKEY)CREATE UNIQUE CLUSTERED INDEX CIX_KeyON ORDERS_Cluster (O_ORDERKEY)WITH (FILLFACTOR = 100)

SELECT *FROM ORDERS_ClusterWHERE O_ORDERKEY = 300000

CREATE UNIQUE NONCLUSTERED INDEX CIX_KeyON ORDERS_Cluster (O_ORDERKEY)WITH (FILLFACTOR = 100)

SELECT *FROM ORDERS_ClusterWHERE O_ORDERKEY = 300000

Or Perhaps Not ?Cluster ( O_ORDERKEY ) Index ( O_ORDERKEY )

clustered indexes work against you when the number of non-covering secondary index seeks and scans increase

CREATE UNIQUE CLUSTERED INDEX CIX_KeyON ORDERS_Cluster (O_ORDERKEY)WITH (FILLFACTOR = 100)

CREATE INDEX IX_CustomerON ORDERS_Cluster(O_CUSTKEY)

CREATE UNIQUE NONCLUSTERED INDEX CIX_KeyON ORDERS_Heap (O_ORDERKEY)WITH (FILLFACTOR = 100)

CREATE INDEX IX_CustomerON ORDERS_Heap(O_CUSTKEY)

Tuning By Workload

Log write latency is critical to a point

OLTP and OLAP do not play nice together

Fine grained locking => no lock escalation Trace flag 1221 – prevent all lock escalation Trace flag 1224 – only escalate locks under severe

memory pressure

Avoid compression for OLTP

Certain plan shapes suit OTLP application best, next slide . . .

OLTP Checklist

OLTP Tuning For Dummies

1. Nested loopjoins are the

dominant join types

2. Seeks not scans

3. Serial iterators

4. There should be no sorts or spools except for performance spools and / or sorts associated

with optimized nested loop joins

Seeks Not Scans

1. Do not normalise beyond 3rd normal form2. Find most CPU/IO intensive queries with

sys.dm_exec_query_stats3. Add OPTION(LOOP JOIN) to the offending query4. Check estimated plan of bad query5. If spool found, add indexes until remedied6. Goto 2 until no more queries have non index paths7. Buy more hardware . . . wisely

OLTP Tuning For Dummies

Not All Spools Are BadExecution Time

Optimized Non-optimized

Constant

10,000,000(1% of rows) 6.5 minutes 26 minutes 4x

100,000,000(10% of rows) 10.4 minutes 4.3 hours 25x

250,000,000(25% of rows) 11.3 minutes 10.6 hours 56x

The “Optimized nested loop join”, data is pre-sorted on the outer side of the join, refer to Craig Freedmans blog post

https://blogs.msdn.microsoft.com/craigfr/2009/03/18/optimized-nested-loops-joins/



Pursue high sequential scan IO throughput

Avoid CPU core starvation by understanding core consumption rates

Leverage columns stores and batch mode

Kimball dimensional model ?=> pursue star join optimisations

Certain plan shapes suit OLAP styleapplications best, next slide . . .

OLAP Checklist


0.053 0.871

= 0.924


= 1082 / s

Multi Threaded Performance

0 8 16 24 32 40 48 56 641

6

11

16

21

26

31

P = 100%P = 95%P = 90%P = 80%

Number of cores (N)

Spee

dup

Fact

or

Amdahl’s law: The increase in speed is proportional to the percentage of the workload that can be performed in parallel

Constructs That Force Serial Plans

All T-SQL user defined functions

CLR UDFS with data access

System functions such as OBJECT_ID(), ERROR_NUMBER()@@TRANCOUT . . .

Dynamic cursors

Constructs That Force Serial Regions

System table scansBackwards scansGlobal scalar aggregateSequence functionsRecursive queriesMulti-consumer spoolTOPTVFs


0.053 0.871

= 0.924


= 1082 / s

The Query Parallelism “Mothership of knowledge”

https://blogs.msdn.microsoft.com/craigfr/2007/04/17/parallel-query-execution-presentation/

https://blogs.msdn.microsoft.com/craigfr/2007/04/17/parallel-query-execution-presentation/

What A ‘Good’ OLAP Execution Plan Should Look Like

v

2. Hash joins are the dominant

join type

v

3. Bitmap filters pushed as deep into the plan as

possiblev

1. Parallelism

v

4. Re-partitions streams iterators are not so good, we

will cover these later

The Optimizer Cost Model

“To understand the relevance of cost you would have to jumpin a time machine, go back in time and find a machine under a certaindevelopers desk, cost is based in the amount of time it took this machineto perform certain operations”Adam Machanic

The machine under Lubor’s ( Kollar ) desk

The Optimizer Has Limitations !!!

It contains hard codingCertain assumptions are explicitly hard coded into the optimizer

Not all scenarios it encounters are costed forThere are constructs and situations which the optimizer does not cost for, referred to the optimizer team as “Out of model scenarios”

The optimizer has blind spots

Hard Coding In The Optimizer

Hash distribution is always uniform

The memory grant for a varchar column is always have its size

The ratio between random and sequential IO is hard coded for the IO cost model

“Out of model” Scenarios

It always assumes that the buffer pool is cold

No IO costing for parallel plans

Data in different columns is never correlated

Cardinality estimates for table variables are not costed, unless the statement is recompiled

Which Query Runs The Fastest ?

1468ms

561ms

Cost relative to the batch of 44 %


VARCHAR Columns And Memory Grants

1468ms

561ms



CPU Core Consumption RatesSELECT a.*INTO MyBigTableSourceFROM sys.all_objects aCROSS JOIN sys.all_objects bCROSS JOIN (SELECT TOP 500 * FROM sys.all_objects c) dt

SELECT COUNT(*)FROM MyBigTableSourceOPTION (MAXDOP 1)

Few-Outer-Row OptimizationFew-Outer-Row is a specific optimization for nested loop joins. In some data warehousing queries, the outer side of a nested loop join is a parallel scan with a filter.

Hash Join Exception To The Rule

. . . and this is what a few outer rows plan looks like

Few Outer Rows Optimisation

DBCC SETCPUWEIGHT(100000)

SELECT [UnitPrice]FROM FactInternetSales fisJOIN TaxYear tyON fis.DueDateKey > ty.YearStartAND fis.DueDateKey < ty.YearEnd

CREATE TABLE TaxYear ( YearStart datetime ,YearEnd datetime )

INSERT INTO TaxYear VALUES ( '20050406', '20060406', '20070406', '20080406', '20090406', '20100406', '20110406', '20120406' )

1. Add column store to the fact table2. Add OPTION(HASH JOIN) to the query3. Do you get a hash of the dimension and probe by the fact ?:

If not, check your statistics on the facts . . . and indexes on dimensions

4. Optimise the living daylights out of fact scan

OLAP Tuning For Dummies

Working Out Where Columns Begin and End Is Expensive !!!

46% of total sampled CPU time !!!

Avoid segment trimming

Align segments in order to get the best segment elimination possible

Make sure as little of your data as possible is in delta stores

Deleted rows are only ever removed after a column store reorg/rebuild, keep an eye on these !!!

No predicate push down on strings prior to SQL Server 2016

Optimising Column Store Scans

A row group can have a maximum of 1,048,576 rows, but it can be closed before it gets to this due to a number of reasons(exposed in SQL 2016 in sys.dm_db_column_store_row_group_physical_stats)

Trim_reason_desc Trim Reason

BULKLOAD BATCHSIZE specified for bulk insert, or end of bulk insert.

REORG_FORCED REORG with COMPRESS_ALL_ROWGROUPS = ON which closes every open row group and compresses it into columnar format

DICTIONARY_SIZE If Dictionary is full, the row group will be trimmed ( 16MB dictionary)

MEMORY_LIMITATION Memory pressure during index build caused row group to be trimmed

RESIDUAL_ROW_GROUP_INDEXBUILD Last row group(s) have less than 1 million rows when index rebuilt.

Segment Trimming

ID Value

1 Beer

2 Gin

3 Vodka

4 Whisky

5 Coca Cola

6 Wine

7 Brandy

Local Dictionary

BeverageId

Min Value 1

Max Value 2

DimBeverage Segment 001

Local Dictionary

BeverageId

Min Value 3

Max Value 4

Segment 002Local Dictionary

BeverageId

Min Value 5

Max Value 6

Segment 003

FactBeverageConsumed – Aligned

Aligned Segments

‘Alignment’ = min and max column values are in order with no overlap

ID Value

1 Beer

2 Gin

3 Vodka

4 Whisky

5 Coca Cola

6 Wine

7 Brandy

Local Dictionary

BeverageId

Min Value 1

Max Value 4

DimBeverage Segment 001

Local Dictionary

BeverageId

Min Value 2

Max Value 3


BeverageId

Min Value 1

Max Value 5

Segment 003

FactBeverageConsumed – Non Aligned

Non Aligned Segments

Min and max values across segments overlap !!!

Scan

Local Dictionary

BeverageId

Min Value 1

Max Value 4

Segment 001

Local Dictionary

BeverageId

Min Value 2

Max Value 3


BeverageId

Min Value 1

Max Value 5

Segment 003

Local Dictionary

BeverageId

Min Value 1

Max Value 2

Segment 001

Local Dictionary

BeverageId

Min Value 3

Max Value 4


BeverageId

Min Value 5

Max Value 7

Segment 003

Scan Scan

ID Value

. . . .

4 Whisky

. . . .

DimBeverage

Non aligned segments Aligned segments

What Segment Alignment Gives Us

C P UC P U More Performance ? . . . Sorted Hash

2 4 6 8 10 12 14 16 18 20 22 240

10000

20000

30000

40000

50000

60000

70000

80000

Non-sorted column store Sorted column store


Tim

e (m

s)

SELECT p.Product ,p.ProductNumber ,p.ReorderPoint ,d.*FROM (SELECT TOP (2147483647) p0.ProductId ,p0.ProductNumber ,p0.ReorderPoint FROM bigProduct AS p0 WHERE p0.ProductId BETWEEN 1001 AND 20001) AS pCROSS APPLY (SELECT th.TransactionId ,RANK() OVER ( PARTITION BY p.ProductId ) ORDER BY th.ActualCost DESC ) AS LineTotalRank, ,RANK() OVER ( PARTITION BY p.ProductId ORDER BY th.Quantity DESC ) AS OrderQtyRank FROM bigTransactionHistory AS th WHERE th.ProductId = p.ProductId) AS dOPTION (MAXDOP 10)

Column Store Scans Are Good, Seeks Are Bad !

With A Clustered Column Store, Elapsed Time = 262 seconds

Table 'bigProduct'. Scan count 1, logical reads 228, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'Worktable'. Scan count 9577, logical reads 109258564, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'bigTransactionHistory'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 85220, lob physical reads 0, lob read-ahead reads 0.Table 'bigTransactionHistory'. Segment reads 30, segment skipped 0. SQL Server Execution Times: CPU time = 226750 ms, elapsed time = 262456 ms.

With A Conventional Index , Elapsed Time = 71 seconds

Table 'bigProduct'. Scan count 1, logical reads 228, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'bigTransactionHistory'. Scan count 9577, logical reads 98005, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 36264 ms, elapsed time = 71046 ms.

We want as much of our data to be in compressed row groupsfor OLAP style applications

Delta stores prevent segment elimination

Scanning delta stores has the same performance characteristic as scanning page compressed b-trees ( SQL 2012 – 14 ) and non-compressed b-trees ( SQL 2016 )

. . . Leads on to column store load strategies

Delta Stores

Row group closure and compression is triggered by 102,400 rows Best compression is achieved at 1,048,576 rows ( ~ million ) SSIS data flow task buffer sizes and buffer max rows can be turned:

Row Group Compression:102,400 Is The Magic Number

Can your data sources pump data into the column store in 1 million rows batches consistently ?

Can zero data flow buffer spills be guaranteed with large batch sizes ?

Trickle data into staging tables and transfer the rows to the column store when there is approximately 1 million of them

Turn the tuple mover off – TF634

Force row group closure and compression viaALTER INDEX <index name> ON <table name> REORGANIZE WITH (COMPRESS_ALL_ROW_GROUPS = ON)

Bulk loads are explained in depth in this MSDN article

SQL Server 2016 supports parallel inserts into column stores

Loading Strategy

https://msdn.microsoft.com/en-gb/library/dn935008.aspx

Data Type Elimination And Predicate Pushdown Support

Data Type Min / Max Predicate Pushdown

Segment Elimination

Numeric yes no noDateTimeOffset yes no noChar yes no noVarchar yes no noNchar yes no noNvarchar yes no noBinary no no noVarbinary no no noUniqueidentifier no no no

Factor these data types out into dimension tables where possible !!!

New Column Store Features In SQL Server 2016

Filtered non clustered column store indexes

Ability to create a column store index on a memory optimised table

Support for row versioning

Column store can be declared in line when creating a table

Ability to specify a compression delay on the column store


Support for updateable column stores on an Always On readable replica

Support for string predicate pushdown

Parallel insert now supported for clustered column stores

Simple aggregate pushdown

Support for multiple distinct counts


Updateable non clustered column store indexes

Ability to create conventional b-tree indexes on clustered column store indexes

Ability to create foreign keys on clustered column stores

Batch mode windowing functions

Batch mode sort iterator support

Optimising Windowing Function Performance With Row Stores

SELECT a.object_id ,SUBSTRING(CONVERT(VARCHAR(40), NEWID()), 1, 3) AS codeFROM sys.all_objects aCROSS JOIN sys.all_objects bCROSS JOIN (SELECT TOP 200 * FROM sys.all_objects b)INTO TestDataWHERE a.type = 'P'AND b.type = 'P'UNION ALLSELECT a.object_id ,SUBSTRING(CONVERT(VARCHAR(40), NEWID()), 1, 3) AS codeFROM sys.all_objects aCROSS JOIN sys.all_objects bWHERE a.type = 'P'AND b.type = 'P'UNION ALLSELECT a.object_id ,SUBSTRING(CONVERT(VARCHAR(40), NEWID()), 1, 3) AS codeFROM sys.all_objects aCROSS JOIN sys.all_objects bWHERE a.type = 'U'AND b.type = 'U'UNION ALLSELECT a.object_id ,SUBSTRING(CONVERT(VARCHAR(40), NEWID()), 1, 3) AS codeFROM sys.all_objects aCROSS JOIN sys.all_objects bWHERE a.type = 'V'AND b.type = 'V‘

CREATE CLUSTERED INDEX csx ON TestData (object_id, code)

The Plan We Get For A Simple Query Using RANK

SELECT t.object_id ,code ,RANK() OVER (PARTITION BY t.object_id ORDER BY code ASC) AS rkFROM TestData t

Anti Scale !!!

2 3 4 5 6 7 81,050,000

1,100,000

1,150,000

1,200,000

1,250,000

1,300,000

1,350,000

1,400,000

1,450,000

Elapsed Time (ms) / Degree of Parallelism


Elap

sed

Tim

e (m

s)

SQL Server 2016 Has A Batch Windowing Function

CREATE TABLE [dbo].[DummyTable]( [object_id] [int] NULL)

CREATE CLUSTERED COLUMNSTORE INDEX ccsi ON [dbo].[DummyTable]

SELECT t.object_id ,code ,RANK() OVER (PARTITION BY t.object_id ORDER BY code ASC) AS rkFROM TestData tLEFT JOIN DummyTable dON d.object_id = t.object_idOPTION (HASH JOIN)

How does this help us with row stores ?

The Magic Is In What Now Appears In The Execution Plan

From “Anti Scale” To Scalability !!!

2 3 4 5 6 7 80

200,000400,000600,000800,000

1,000,0001,200,0001,400,0001,600,0001,800,0002,000,000


Row store BATCH mode Row store ROW mode


Elap

sed

Tim

e (m

s)

What About Creating A Column Store On The TestData Table ?

2 3 4 5 6 7 80

500,000

1,000,000

1,500,000

2,000,000

2,500,000


Row store BATCH mode Row store ROW mode Column store BATCH mode


Elap

sed

Tim

e (m

s)

Natural Born Performance Killers

Physical reads for OLTP

Poor sequential scan rates and CPUcore starvation for OLAP

High WRITELOG latency for OLTP

Etc etc . . .

XML Processing ?

The Overhead Of Rendering XML

SELECT a.*FROM sys.all_objects aCROSS JOIN sys.all_objects b

SELECT a.*FROM sys.all_objects aCROSS JOIN sys.all_objects bFOR XML RAW

Where are our CPU cycles going ?

4 s 34 s

Where The Database Engine Is Spending Its CPU Time

Non-Xml version of the query Xml version of the query

The XML query is more than 5x CPU intensive

Digging Into Stack: The FOR XML RAW Version Of Query

Challenges Does whoever designed your infrastructure

understand the resource requirements andusage patterns of the database engine ?

“Big boxes” being purchased with little understanding of how SQL Server scales on such machines

CPU unfriendly SQL engine behaviour: sorting, hashing and index seeks ( pointer chasing )

An end of the CPU Ghz free lunch

What Have We Learned ?

1. There are plan shapes which suit OLTP and OLAP applications, pursue these2. Parallel query scalability suffers once the NUMA boundary is crossed3. Clustered indexes are not a silver bullet, consider non-covering secondary indexes4. Avoid column store segment trimming in order to leverage 8MB read a-heads for OLAP style applications, operational analytics is more nuanced5. Align segments by pre-sorting the data on columns frequently used in predicates6. Not all data types support segment elimination and predicate pushdown, this can be designed around7. Be cognizant of the overheads of processing XML !

Sql server scalability fundamentals

Software

Transcript of Sql server scalability fundamentals