Data Management Conference Data Warehousing John Plummer TSP Architect [email protected].

22
Data Management Conference Data Warehousing John Plummer TSP Architect [email protected]

Transcript of Data Management Conference Data Warehousing John Plummer TSP Architect [email protected].

Page 1: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

Data Management Conference

Data WarehousingJohn PlummerTSP Architect

[email protected]

Page 2: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

Agenda

• SQL Server Data Warehousing• Fast Track Overview• Fast Track Case Study• Resources

Page 3: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

What Is Data Warehousing?

WHAT WE WANT

CRM

LOB

ERP

Data Warehouse

Data Integration Analysis

Reporting

PerformanceManagement

WHAT WE

NEED

Page 4: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

Dynamic Development

Beyond Relational

Pervasive Insight

Enterprise Data Platform

Mobile andDesktop

OLAP

FILE

XMLRDBMS

Services

Query

Analysis

Reporting Integration

Synch

Search

CloudServer

SQL Server 2008 Enterprise Edition

Page 5: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

SQL Server 2008 Enterprise EditionData Warehousing• Improvements across the box

− Integration Services, Database Engine, Analysis Services

• Improvements throughout the product− Focus on performance and scalability

• End-to-end testing on large-scale, customer-driven configurations− Database Engine: to 100 billion fact table rows− Analysis Services: to 25 billion fact table rows

MERGE statement

Change Data Capture

Lookup Enhancements

SSIS Pipeline threading

DML Audit Enhancements

Star-Join Optimisations

Resource Governor

Data Compression

Backup Compression

Partition Parallelism

Spatial

Sub-space computation

Design-time advice

MOLAP writeback

Report Engine scale

Improved charting

..... .....

Page 6: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

Accelerate scalable Data Warehouse deployments at lower TCO

Pre-configured, HW reference architectures (4-32 TB)

Fast Track DW

Appliance-like time to valueFlexibility through choice of HW platformsLow TCO through commodity hardware and value pricing. Reduced risk through pre-tested and pre-tuned configurationsAvailable NOW for SQL Server 2008 EE

SI Solution Templates

Page 7: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

Key Principle 1: Tight Specification

7

Software:• SQL Server 2008

Enterprise• Windows Server 2008

Hardware:• Tight specifications for

servers, storage and networking

• ‘Per core’ building block

Configuration guidelines:• Physical table

structures• Indexes• Compression• SQL Server settings• Windows Server

settings• Loading

Page 8: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

Key Principle 2: Balanced Across All ComponentsA Holistic Approach

FCHBA

A

B

A

B

FCHBA

A

B

A

B FC S

WIT

CH

STORAGECONTROLLER

A

B

A

BCA

CH

E

SER

VER

CA

CH

ESQ

L SER

VER

WIN

DO

WS

CPU

CO

RES

CPU Feed Rate HBA Port Rate Switch Port Rate SP Port Rate

A

BDISK DISK

LUN

DISK DISK

LUN

SQL Server Read Ahead Rate

LUN Read Rate Disk Feed Rate

SQL Server 2008 Potential Performance Bottlenecks

Page 9: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

SMP SQL Server 2008 Minimum Server Configuration -

Core Balanced Using Dual Read on EMC CX4

4 G

b/s

FC

sw

itch

EMC CX4-240

SP – A500 MB/s

SP – B500 MB/s

B

A

A

BAB

BA

All ports marked A/B are rated at 4Gb/s

200 MB/s per Core

200 MB/s per Core

FC HBA 24 Gb/s

200 MB/s per Core

200 MB/s per Core

FC HBA 14 Gb/s

Quad Core CPU

CPU core rates based on tested hardware

Number and type of drives limited to available throughput 370MB/s between DAE and SP.

37

0 M

B/s

A

LUN 1

User Data, TempDB and Staging FG

B

B

A

DAE 1

DAE 2

Per CX4 Drive Details• Each DAE can hold 15 drives• Each DAE has 1 LUN per SP port• Each LUN has (2) 300GB 15k SAS drives RAID1

LUN RAID 1240 MB/s

LUN 2

User Data, TempDB and Staging FG

LUN 3

User Data, TempDB and Staging FG

LUN 4

User Data, TempDB and Staging FG

Vault Drive(5) 146GB 10k

Hot Spare(1) 300GB 15k

Log Drive(2) 72GB 10k

Hot Spare(1) 300GB 15k

Page 10: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

Key Principle 3: Sequential I/O

Sequential I/O• Ideal for data

warehousing• Scalable, predictable

performance• Large reads & writes• Requires 1/3 or fewer

drives for same performance

Random I/O• Ideal for OLTP• Not as predictable &

scalable for data warehousing

• Small reads and writes• Requires large number of

drives

Best practices focus on preserving the sequential order of data

Page 11: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

LUN0

LUN1 LUN2 LUN3 LUN4 LUN5 LUN6 LUN7 LUN8

UserDB Log

StageDBLog

RAID GP01 RAID GP02 RAID GP03 RAID GP04

RAID

GP05

Use

rD

atab

ase FG Primary

UserDBDBFile1_A.ndf

UserDBDBFile2_A.ndf

UserDBDBFile3_A.ndf

UserDBDBFile4_A.ndf

UserDBDBFile5_A.ndf

UserDBDBFile6_A.ndf

UserDBDBFile7_A.ndf

UserDBDBFile8_A.ndf

Tem

pDB

Stag

eD

atab

ase FG Staging

StageSGFile1_A.ndf

StageSGFile2_A.ndf

Stage SGFile3_A.ndf

StageSGFile4_A.ndf

StageSGFile5_A.ndf

Stage SGFile6_A.ndf

Stage SGFile7_A.ndf

StageSGFile8_A.ndf

TempDBLog

FG TempDB

TempDBTMFile_1.ndf

TempDBTMFile_2.ndf

TempDBTMFile_3.ndf

TempDBTMFile_4.ndf

TempDBTMFile_5.ndf

TempDBTMFile_6.ndf

TempDBTMFile_7.ndf

TempDBTMFile_8.ndf

Data File Layout (per 4 CPU cores)

Page 12: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

Fast Track DW Deployment• All necessary hardware purchased from one vendor• Dedicated SAN based storage• OS installed• Customer required to:

− Install system− Install SQL Server

• 2, 4 & 8 socket Intel / AMD based servers• 1.6 to 36 TB of capacity

Page 13: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

Fast Track Data Warehouse Configurations

Server CPU Cores SAN Drive Count Capacity

HP ProliantDL 385 G5p

(2) AMD Opteron Shanghai

quad core 2.7 GHz

8

(2) HPMSA2000

(16) 300GB 15k SAS

4TB (tested)8TB (max)

(2) EMC CX4-240

(16) 300GB 15k FC

4TB (tested)10 TB (max)

HP ProliantDL 585 G5

(4) AMD Opteron Shanghai

quad core 2.7 GHz

16

(4) HP MSA2000(32) 300GB

15k SAS8TB (tested)16TB (max)

(4) EMC CX4-240

(32) 300GB 15k FC

8TB (tested)16TB (max)

HP ProliantDL 785 G5

(8) AMD Opteron Shanghai

quad core 2.7 GHz

32

(8) HP MSA2000(64) 300GB

15k SAS16TB (tested)32TB (max)

(8) EMC CX4-240

(64) 300GB 15k FC

16TB (tested)32TB (max)

Dell Power Edge 2950

MLK

(2) Intel XeonHarpertown

quad core 2.66 GHz8

(2) EMC CX4-240

(16) 300GB 15k FC

4TB (tested)8TB (max)

Dell Power Edge R900

(4) Intel Xeon Dunnington

six core 2.67GHz24

(6) EMCCX4-240

(48) 300GB 15k FC

12TB (tested)24TB (max)

Page 14: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

Fast Track DW Considerations• Simple recovery mode

− Understand replication limitations

• Compression highly recommended− Except for highly random data

• Indexing− Use a clustered index for data ranges or common

restrictions− Minimize use of non-clustered indexes drives random

I/O

• Fragmentation negates sequential I/O benefits (File / Table / Index)− Pre-allocate files and manually grow− Use large extents (-E)− Use multi-step loading techniques in white paper− Trade-off: query performance versus load performance

Page 15: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

Fast Track Case Study – Environment

• Current EnvironmentDW: Teradata 4-node (5450 model) 6TB of user dataBI: Business ObjectsETL: Informatica

• Proposed Microsoft PlatformSQL Server Fast Track Data WarehouseHP DL580 Server - 4 Quad core Processors  256 GB MemorySAN Storage: MSA 2000 - 8TB of user data BI: Business ObjectsETL: SQL Server and SSIS

Page 16: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

Fast Track Case Study – Results

Teradata SQL Server Fast Track DW Comparison

Loading Subject Area 1 5:10:21 total time 51:31 total time R

6x faster

Loading Subject Area 2 4:36:08 total time 1:50.01 total time R

2.5x faster

Query times Subject Area 1

3:03 avg query time(using 9 benchmark

queries)

0:15 avg query time(using 9 benchmark

queries)R

12x faster

Query times Subject Area 2

56:44 avg query time(using 4 benchmark

queries)

8:09 avg query time(using 4 benchmark

queries)R

7x faster

Page 17: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

Fast Track Case Study – Pricing

• Microsoft Fast Track PricingHardware (8TB capacity) $152,500SQL Server – Software Cost $ 26,119

Total Price w/CAL license $178, 619

• Teradata Pricing ConsiderationsCurrent Annual Maintenance Fee $300,000 (6 TB System)Upgrade existing system – 8 TB $280,000, plus maint

(~$40K) Total Price $620,000

A faster, Microsoft solution for $178k or$620k for Teradata maintenance and

upgrade?

Page 18: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

Fast Track Benefits Summary

18

Appliance-like time to valueReduces DBA effort; fewer indexes, much higher level of sequential I/O

Choice of HW PlatformsDell, HP, Bull – more in future

Low TCO ThroughCommodity Hardware and value

pricing; Lower storage costs.

High ScaleNew reference architectures scale

up to 32 TB (assuming 2.5x compression)

Reduced RiskTested by Microsoft; better choice of hardware; application of Best

Practice

Page 19: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

Data Warehouse Roadmap Survey

review of your DW

environment

•Identify• Cost savings• Performance benefits

•Deliver• BI to more end-users• Better control for IT

• Prepare• To take advantage of

the latest innovations

Page 20: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

RequirementsExisting DWVolume of end-user data 1TB+Considering change to BI or DW infrastructure

On site surveyInterview of key stake holders in Data Warehouse environmentPerformed by IMGROUP Architect 1-2 days duration

DeliverablesPresentation of key findingsReport detailing findingsResults delivered approximately 10 days after survey

Data Warehouse Roadmap Service

Page 21: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

Call to Action...

• http://www.microsoft.com/FastTrack• Speak to Partners here today about

Fast Track• Speak to Partners here today about

Data Warehouse Roadmap Service

Page 22: Data Management Conference Data Warehousing John Plummer TSP Architect john.plummer@microsoft.com.

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after

the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.