Performance Tuning SSIS

Post on 06-Feb-2016

50 views 3 download

Tags:

description

Performance Tuning SSIS. Brian Knight, CEO Pragmatic Works bknight@pragmaticworks.com. About the Ugly Guy Speaking. SQL Server MVP Founder of Pragmatic Works Co-Founder of BIDN.com, SQLServerCentral.com and SQLShare.com Written more than a dozen books on SQL Server. Mobile data. - PowerPoint PPT Presentation

Transcript of Performance Tuning SSIS

Performance Tuning SSIS

Brian Knight, CEO Pragmatic Worksbknight@pragmaticworks.com

About the Ugly Guy SpeakingSQL Server MVPFounder of Pragmatic WorksCo-Founder of BIDN.com, SQLServerCentral.com and SQLShare.comWritten more than a dozen books on SQL Server

GeoSpatial Data:Semi structured

Legacy data: binary files

Application database

Integration is a seamless, manageable operationSource, prepare, & load data in single, auditable processScale to handle heavy and complex data requirements

SQL Server Integration Services

GeoSpatialComponents

Customsource

Standardsources

Data-cleansingcomponents

Merges

Data miningcomponents

Warehouse

Reports

Mobiledata

Integration Services in Action

Cube

Advanced Session

Today’s Problems with Integration

Integration todayIncreasing data volumesIncreasingly diverse sources

Requirements reached the Tipping PointLow-impact source extractionEfficient transformationBulk loading techniques

Tuning DecisionsChoose the right tool for the jobDon’t be afraid to use T-SQLWill parallelism work?

Source OptimizationFlat files – When available, use Fast ParseOLE DB sources – Change network packet sizeUse T-SQL whenever possible in the OLE DB Source

JoiningNULL handlingWhere clauses

Impact of Compression on ETL

NONE ROW PAGE05

101520253035

0123456

BULK INSERT into a Heap with and without Data Compression

Time to BULK INSERT 50M rows (min)Table Size after Load (GB)

Compression Type

Tim

e (m

inut

es)

Tabl

e Si

ze a

fter L

oad

(GB)

* Not official Microsoft results.

Tuning the Source

Demo

Connection manager tuningFlat file tuningOLE DB Source tuning

Transform Componentsx x x

The Pipeline presents the buffer to each downstream component

x x xx x xx x xx x xx x x

SSIS Data Flow Architecture

Synchronous vs. Non Synchronous

Case Study: Patterns

105 seconds 83 seconds

Source Data Extraction

Extracting data from the source is expensiveEfficient extraction is key to improving ETL performanceInvolves bulk loading data into staging areas or warehouse

Time consuming & resource intensiveTriggers (synchronous IO penalty)Timestamp columns (Schema changes)Complex queries (delayed IO penalty)Custom (ISV, mirror, snapshot, …)

Incremental data load is key to efficient extractionNeed to know what changed at source since a point in time

Expensive lookups to determine changed columnsProviding information up front about which columns changedwill improve efficiency

SQL Server 2008: Change Data Capture (CDC)

Information about what changed at the sourceChanges captured from the log asynchronouslyEnabled per tableCDC APIs provide access to change data

Change Tables

OLTP

Data Warehouse

Change Data Capture

Demo

Traditional CDC with SSISIntegrating CDC in 2008

Lookup Component

Three modes of operationFull Cache: for small lookup datasetsNo Cache: for volatile lookup datasetsPartial Cache: for large lookup datasets

Tradeoff memory vs. performanceUse Cascaded LookupsMerge Join may be alternative

SQL Server 2008: Lookup Transform

Hydrate cache files for large data setsCan reuse cacheCan load cache during day and use in nightly ETL

DemoCascading lookup optimizationsCache file lookup

Data Destinations

Use “Fast Load” or SQL Server DestinationTable Lock on insert operationsTrace flags for improvementOld principles still apply

Destination Tuning

Demo

Building a Work Queue System

Create a work queue table.

Create a loop to shift over the work queue constantly checking out work

Spawn x times with a batch file

Demo Results

1 2 3 4 5 6 7 800:00.0

00:08.6

00:17.3

00:25.9

00:34.6

00:43.2

00:51.8

01:00.5

01:09.1

1 Process finishes in 64 seconds

Elap

sed

Tim

e

1 2 3 4 5 6 7 800:00.0

00:08.6

00:17.3

00:25.9

00:34.6

00:43.2

00:51.8

01:00.5

01:09.1

2 Processes finish in 36 seconds

Elap

sed

Tim

e

Demo Results

1 2 3 4 5 6 7 800:00.0

00:08.6

00:17.3

00:25.9

00:34.6

00:43.2

00:51.8

01:00.5

01:09.1

4 Processes finish in 28 seconds

Elap

sed

Tim

e

Demo Results

1 2 3 4 5 6 7 800:00.0

00:08.6

00:17.3

00:25.9

00:34.6

00:43.2

00:51.8

01:00.5

01:09.1

8 Processes finish in 27 seconds

Elap

sed

Tim

e

Demo Results

Parallel Load

Demo

Managing Resources

Logging events to watch pipeline internalsPipelineExecutionPlan, PipelineExecutionTree, BufferSizeTuning

System Monitor to track I/O issuesBuffers In Use tracks how many buffers are presently being usedBuffers Spooled tracks how many 10 mb buffers have been spooled to disk

Measuring PerformancePerfmon

Location

Consider the following configuration…

Where should SSIS run? (Licensing issues aside)

SQL Server 1 SQL Server 2

SSIS Server

WSRM

Windows System Resource Manager (WSRM) can throttle CPU and memory

Creates a soft throttleCan be scheduled so SSIS gets priority on weekends and nightsOnly activates policy if resources begin to become constrained (about 70%)WSRM is free with Windows Server 2003 Enterprise Edition and included in Windows Server 2008

WSRMCreating a soft schedule cap

Demo

Summary

PlanningDon’t underestimate the power of the whiteboard!

Use the right tool for the right jobLeverage the power of the engine

Patterns and PracticesUnderstand best practicesBut don’t be afraid to experiment

The End Already?

Questions

http://www.bidn.com/people/brianknight

@BrianKnight

bknight@pragmaticworks.com

http://www.youtube.com/pragmaticworks