Roger Moore – Data Warehouse SSP [email protected] 972-955-0426 Blake Price – BI-DW...
-
Upload
abram-carp -
Category
Documents
-
view
219 -
download
2
Transcript of Roger Moore – Data Warehouse SSP [email protected] 972-955-0426 Blake Price – BI-DW...
SQL Server Fast Track DWSQL Server 2008 R2 Parallel DW
(Project Madison)
Roger Moore – Data Warehouse [email protected]
972-955-0426
Blake Price – BI-DW [email protected]
972-702-9500
Microsoft Confidential
Agenda
Microsoft Data Warehouse StrategySQL Fast Track DW OverviewImplementing a SQL Fast Track DW SQL Server 2008 R2 Parallel DW Options for CustomersQ&A - Summary
END USER TOOLS & PERFORMANCE MANAGEMENT APPS
ExcelPerformancePoint
Server
BI PLATFORM
SQL Server Reporting Services
SQL Server Analysis Services
SQL Server DBMS – SQL EE, Fast Track & Parallel DW
SQL Server Integration Services
SharePoint Server
DELIVERY
Reports Dashboards Excel Workbooks
AnalyticViews Scorecards Plans
Our Integrated BI-DW Offering
Data Warehousing Vision
4Microsoft Confidential—Preliminary Information Subject to Change
Massive Scalability at Low Cost
Improved Business Agility and Alignment
Democratized Business Intelligence
Hardware Choice
Make SQL Server the gold standard for data warehousing
Deliver end 2 end solution from ETL to Database to Presentation Layer
5©2009 Microsoft Corporation
Microsoft’s on-going investments in Data Warehousing
Heterogeneous Connectivity & Workloads
Data Integrity & Quality
Compliance & Security
Data Warehouse Scale
Data Warehouse Management
2005 2008 Futures
PB Warehouses>64 Core ProcessingScale out through MPP
Perf. Management ToolsBI Resource GovernanceImproved Predictability
Mixed workload supportContinuous Loading
Integrated DQ Services (Zoomix)Master Data Management(Stratature Integration)
Rights Management
10s of TB WarehousesEnhanced Parallel
partitioningData compressionStar Join Optimization
Policy Based Admin.DB Resource GovernancePerformance Warehouse
CDC & MergeHigh Perf. Connectors(Oracle, Teradata, SAP BW)Persistent Look-up
Text Mining TaskData ProfilingEnhanced Logging
Policy based auditingEnhanced Encryption
Multi TB WarehousesEnterprise scalabilityParallel Partitioning
Unified manageability
Enterprise class ETL tool
Data Cleansing(Fuzzy lookup/matching)
Data Protection & TracingEncryption
Microsoft DW Solutions
SSIS
Microsoft & PartnerServices
Some Data Warehouses today
Big SANBig 64-core ServerConnected together
What’s wrong with this picture?
Answer: system out of balance
This server can consume 16 GB/Sec of IO, but the SAN can only deliver 2 GB/Sec
Even when the SAN is dedicated to the SQL Data Warehouse, which it often isn’tLots of disks for Random IOPS BUTLimited controllers Limited IO bandwidth
System is typically IO boundQueries are slow
Result: significant investment, not delivering performance
The Alternative: A Balanced System
Design a server + storage configuration that can deliver all the IO bandwidth that CPUs can consume when executing a SQL Relational DW workloadAvoid sharing storage devices among serversAvoid overinvesting in disk drives
Focus on scan performance, not IOPS
Layout and manage data to maximize range scan performance and minimize fragmentation
SQL Server Fast Track Data Warehouse
A method for designing a cost-effective, balanced system for Data Warehouse workloads Reference hardware configurations developed in conjunction with hardware partners using this methodBest practices for data layout, loading and management
Relational Database Only – Not SSAS, SSIS, SSRS
Fast Track Scope SQL Server Fast Track DW - Scope
Analysis Services Cubes
PerformancePoint
Reporting Services
Web Analytic Tools
Dedicated SAN, Storage Array
SharePoint Services
Microsoft Office SharePoint
Data Warehouse
Data Staging,ETL,
Presentation Layer SystemsReference Architecture Scope (dashed)
Excel Services
Pre
sen
tati
on
Dat
aP
rese
nta
tio
n D
ata
Fast Track Data Warehouse Components
Software:•SQL Server 2008 Enterprise•Windows Server 2008
Configuration guidelines:• Physical table structures• Indexes• Compression• SQL Server settings• Windows Server settings• Loading
Hardware:•Tight specifications for servers, storage and networking•‘Per core’ building block
Twelve SMP Reference Architectures
Solution to help customers and partners accelerate their data warehouse deployments
Fast Track Data Warehouse v2.0
13
Our offering includes1. DW Reference Architectures 2. SQL Server and Windows settings and guidance 3. Solution Partners with strong background in DW’ing & BI
Fast Track Data Warehouse Components Balanced across all components
FCHBA
AB
AB
FCHBA
AB
AB FC
SW
ITCH
STORAGECONTROLLER
AB
ABCA
CHE
SERV
ER
CACH
ESQ
L SE
RVER
WIN
DO
WS
CPU
CO
RES
CPU Feed Rate HBA Port Rate Switch Port Rate SP Port Rate
A
BDISK DISK
LUN
DISK DISK
LUN
SQL Server Read Ahead Rate
LUN Read Rate Disk Feed Rate
SQL Server 2008 Potential Performance Bottlenecks
Fast Track & Sequential I/O
Sequential I/OIdeal for data warehousingScalable, predictable performanceLarge reads & writesRequires 1/3 or fewer drives for same performance
Random I/OIdeal for OLTPNot as predictable & scalable for data warehousingSmall reads and writesRequires large number of drives
Best practices focus on preserving the sequential order of data
Two SQL DW Infrastructure Options: SQL Classic DW or Fast Track SQL DW
SQL 2008 Data WarehouseSMP Server
Shared Network Bandwidth
Enterprise Shared SAN Storage
Dedicated Network Bandwidth
SQL Classic DWArchitectureLeverages Shared SAN
Fast Track SQL DW ArchitectureArchitecture modeled after DW Appliances “ Appliance Like” solutionsUses Dedicated SAN arrays and Network
SAN Arrays 1:4 cpu cores8 Data Disk / Array – 4 Raid 1 PairsSimultaneous SQL Server Reads2 Log and 1 Hot SpareEMC AX4 – HP MSA2312IBM 3400
OLTP Applications SQL Fast Track DW supports “Scan Centric” DW workloads that are index light
Dedicated SAN
SQL Server Fast Track Data Warehouse 2.0 for DELL
2 Processor ConfigurationServer: Dell Power Edge R710 with 2 Quad-core Intel Xeon processors8 CPU Cores32GB MemoryStorage server: EMC CLARiiON AX4Scalability: 4 – 8 TB
4 Processor ConfigurationServer: Dell Power Edge R900 with 4 6-core Intel Xeon processors24 CPU Cores96 GB MemoryStorage server: EMC CLARiiON AX4Scalability: 12 – 24 TB
New Fast Track Data Warehouse 2.0 for IBM
2 Processor ConfigurationServer: IBM System x3650 M2 with 2 Quad-core Intel Xeon CPUsStorage server: IBM System Storage DS3400Scalability: 4 – 8 TB
4 Processor ConfigurationServer: IBM System x3850 M2 with 4 6-core Intel Xeon CPUsStorage server: IBM System Storage DS3400Scalability: 12 – 24 TB
8 processor ConfigurationServer: IBM System x3950 M2 with 8 Quad-core Intel Xeon CPUsStorage server: IBM System Storage DS3400Scalability: 16 – 32TB
SQL Server Fast Track Data Warehouse 2.0 HP Proliant AMD– now on G6 Platform
2 Processor ConfigurationServer: HP ProLiant DL385 G6 with 2 6-core AMD Opteron CPUsStorage server: MSA StorageScalability: 6 – 12 TB
4 Processor ConfigurationServer: HP ProLiant DL 585 G6 with 4 6-core AMD Opteron CPUsStorage server: MSA StorageScalability: 12 – 24 TB
8 processor ConfigurationServer: HP ProLiant DL 785 G6 with 8 6-core AMD
Opteron CPUsStorage server: MSA StorageScalability: 24 – 48TB
SQL Server Fast Track Data Warehouse 2.0 HP – now on G6 Platform
2 Processor ConfigurationServer: HP ProLiant DL380 G6 with 2 4-core Intel Xeon® 5500 Series CPUsStorage server: MSA StorageScalability: 4 – 8 TB
4 Processor ConfigurationServer: HP ProLiant DL 580 G5 with 4 6-core Intel Xeon® 7400 Series CPUsStorage server: MSA StorageScalability: 12 – 24 TB
Fast Track DW Reference ConfigurationsServer CPU
CPU Cores
SANData Drive
Count
InitialCapacity
*
MaxCapacity*
*HP Proliant DL 385 G6
(2) AMD Opteron Istanbulsix core 2.6 GHz
12 (3) HP MSA2312fc (24) 300GB 15k RPM SAS
6TB 12TB
HP Proliant DL 585 G6
(4) AMD Opteron Instanbul six core 2.6 GHz
24 (6) HP MSA2312fc (48) 300GB 15k SAS
12TB 24TB
HP Proliant DL 785 G6
(8) AMD Opteron Istanbul six core 2.8 GHz
48 (12) HP MSA2312 (96) 300GB 15k SAS
24TB 48TB
Dell PowerEdge R710
(2) Intel Xeon Nehalem quad core 2.66 GHz
8 (2) EMC AX4 (16) 300GB 15k FC
4TB 8TB
Dell Power Edge R900
(4) Intel Xeon Dunningtonsix core 2.67GHz
24 (6) EMC AX4 (48) 300GB 15k FC
12TB 24TB
IBM X3650 M2 (2) Intel Xeon Nehalem quad core 2.67 GHx
8 (2) IBM DS3400 (16) 200GB 15K FC
4TB 8TB
IBM X3850 M2 (4) Intel Xeon Dunnington six core 2.67 GHz
24 (6) IBM DS3400 (24) 300GB 15k FC
12TB 24TB
IBM X3950 M2 (8) Intel Xeon Nehalem four core 2.13 GHz
32 (8) IBM DS3400 (32) 300GB 15k SAS
16TB 32TB
Bull Novascale R460 E2
(2) Intel Xeon Nehalem quad core 2.66 GHz
8 (2) EMC AX4 (16) 300GB 15k FC
4TB 8TB
Bull Novascale R480 E1
(4) Intel Xeon Dunningtonsix core 2.67GHz
24 (6) EMC AX4 (48) 300GB 15k FC
12TB 24TB
* Core-balanced compressed capacity based on 300GB 15k SAS not including hot spares and log drives. Assumes 25% (of raw disk space) allocated for Temp DB.** Represents storage array fully populated with 300GB15k SAS and use of 2.5:1 compression ratio. This includes the addition of one storage expansion tray per enclosure. 30% of this storage should be reserved for DBA operations
Fast Track Case Study - #1
Current EnvironmentTeradata 4-node (5450 model) with 6TB of user dataBI: Business ObjectsETL: Informatica and BTEQ scripts
Proposed Microsoft PlatformSQL Server Fast Track Data WarehouseHP DL580 Server - 4 Quadcore Processors (16 core total)256 GB MemorySAN Storage: MSA 2000 (Qty 4) – 8TB User Data CapacityBI: Business ObjectsETL: SQL Server and SSIS
Fast Track Case Study – #1 Results
Teradata SQL Server Fast Track DW Comparison
Loading Subject Area 1 5:10:21 total time 0:51:31 total time R
6x faster
Loading Subject Area 2 4:36:08 total time 1:50.01 total time R
2.5x faster
Query times Subject Area 1
3:03 avg query time(using 9 benchmark
queries)
0:15 avg query time(using 9 benchmark
queries)R
12x faster
Query times Subject Area 2
56:44 avg query time(using 4 benchmark
queries)
8:09 avg query time(using 4 benchmark
queries)R
7x faster
Large Retailer with limited capabilities because of their legacy based business intelligence solution. The solution has capacity for 212 users at the cost of ~1 million in annual maintenance. Competition – Netezza & Oracle
1) Lower their maintenance cost2) They wanted to address the business needs
(POS data, etc)3) They also wanted to proliferate the advantages
of Business Intelligence across their enterprise.
Business Needs
Solution
Situation
Fast Track Case Study #2 - Retailer
Full MS BI stack Fast Track , SSRS, Excel Services , PPS & Office 2007Full deployment of the system is intended for Q1 CY10Our solution will replace and extend the existing DB2 AS400 systemSSIS will replace existing COBOL ETL (including ODI)
Microsoft Confidential- For Internal Use Only
Fast Track Data Warehouse Timeline
25Microsoft Confidential—Preliminary Information Subject to Change
2008 Fast Track 3.0…4.0
2009 2010
Enterprise ETL ServicesStar Join Query Optimizations
DW Reference ArchitecturesPredictable performance at low costFaster time to solution
Fast Track Data Warehouse
Fast Track Data Warehouse 2.0
New Reference Architectures from IBMUpdated Configurations from HP, Dell and BullEMC as a Service Partner for Fast Track
Microsoft to create new Test Harness for validation of new Fast Track configurationsNEC to validate new Reference Architectures
Fast Track vNextFuture Partners to create new Validated Reference Architectures with Test Harness
New Test Harness for Partners
Fast Track Data Warehouse Benefits
Appliance-like time to valueReduces DBA effort; fewer indexes, much higher level of sequential I/O
Choice of HW PlatformsDell, HP, Bull, EMC and IBM – more
in future
Low TCO ThroughCommodity Hardware and value
pricing; Lower storage costs.
High ScaleNew reference architectures scale
up to 48TB (assuming 2.5x compression)
Reduced RiskValidated by Microsoft; better
choice of hardware; application of Best Practice
SummaryFaster time to solutionHigh scale: up to 48TBLow TCO with better price performance; industry standard hardwareBetter performance out of the box and predictable performanceReduced risk through balanced hardware & Best practicesIntegration with Madison Hub & Spoke Architecture
Fast Track Data
Warehouse offers
customers
Twelve reference architectures from HP, Dell, Bull, EMC and IBMPre-tested SQL Server & Windows Guidance & SettingsSystem Integrators & Partners with DW-BI Expertise
SQL Server Fast Track Data Warehouse has 3
components
SQL Server Fast Track Implementation
Fast Track Data Warehouse Implementation
Introduction - Artis ConsultingIdentifying Fast Track OpportunitiesRecommended Architecture for using Fast Track within the Microsoft BI PlatformFast Track DW Implementation Key Principles
Who is Artis Consulting?
Dallas-based eight year-old consulting firm dedicated to delivering solutions based on the Microsoft Information Worker and Business Intelligence product suite Microsoft Gold Certified Partner with competencies in Information Worker and Business Intelligence solutionsConsistent participant in Technology Adoption Programs (TAPs) with MicrosoftFounder of MS Business Intelligence Community (www.msbic.com)Recipient of numerous awards from Microsoft for our expertise in delivering technology solutionsOur approach and differentiators
Design reflects your businessApproach toward “unstructured data”Solution “accelerators”Relevant experience
What does Artis do?
Our motto is… the right information to the right person
at the right time
Artis has three practice areas to accomplish this
Data Warehousing/Enterprise Reporting – Includes establishing a data repository of key sources of information that are integrated and organized the way the client thinks about their business and allowing users easy and flexible access to that information to improve decision making.Performance Management - Performance Management applications enable you to MONITOR key metrics, ANALYZE trends and exceptions and PLAN for changing business climates. Utilizing an integrated solution can enable your organization to align actions with business strategy.Portals & Collaboration – Managing the vast amounts of structured and unstructured corporate information to provide users with capabilities like content management, portal management, forms, workflow, and search, all in a secure and collaboration environment (e.g., Intranets, Extranets, and Internet sites)
How can I tell if Fast Track is right for
my company? Workload is a DW workload
Analysis of query types show more large data scans than smaller scans or seeksWorkloads may be scan-centric if current workarounds implemented for performance constraints are removed that prevent it from being scan-intensive
How can I tell if Fast Track is right for
my company? Data is non-volatileReads 90% or more of data accessLoads can cause writes but writes through updates are rareUpdates limited to data quality issues
Very large fact tables 100’s GB to TBLow to medium user concurrency requirements
10’s to 100’s of active users, not thousands
How can I tell if Fast Track is right for
my company? High use of partitioned tables with clustered indexesLimited usage of secondary indexesMy company is willing/able to buy FTDW hardware as a package
Hardware standards and methodology can be used if some customization is desired
Will not have be able to leverage Microsoft Support
Workload affinityAttribute Workload Affinity Data Warehouse OLTP
Use Case Description
Read-mostly (90%-10%) Updates generally limited to data
quality requirements High-volume bulk inserts Medium to low overall query
concurrency; peak concurrent query request ranging from 10-30.
Concurrent query throughput characterized by analysis and reporting needs
Large range scans and/or aggregations
Complex queries (filter, join, group-by, aggregation)
Balanced read-update ratio (60%-
40%) Concurrent query throughput
characterized by operational needs Fine-grained inserts and updates High transaction throughput (for
example, 10s K/sec) Medium-to-high overall user
concurrency. Peak concurrent query request ranging from 50-100 or more
Usually very short transactions (for example, discrete minimal row lookups)
Data Model
Highly normalized centralized data
warehouse model Denormalization in support of
reporting requirements often serviced from BI applications such as SQL Server Analysis Services
Dimensional data structures hosted on the database with relatively low concurrency, high volume analytical requests
Large range scans are common Ad-hoc analytical use cases
Highly normalized operational data
model Frequent denormalization for
decision support; high concurrency, low latency discrete lookups
Historical retention of data is limited
Denormalized data models extracted from other source systems in support of operational event decision making
Data Architecture
Significant use of heap table
structures Large partitioned tables with
clustered indexes supporting range restricted scans
Very large fact tables (for example, hundreds of gigabytes to multiple terabytes)
Very large data sizes (for example, hundreds of terabytes to a petabyte)
Minimal use of heap table
structures Clustered index table structures
support detailed record lookups (1 to few rows per request).
Smaller fact tables (for example, less than100 GB)
Relatively small data sizes (for example., few terabytes)
Database Optimization
Minimal use of secondary indexes
(described earlier as index-light) Partitioning is common
Heavy utilization of secondary
index optimization
Fast Track Scan-intensive Non-volatile Index-light Partition aligned
What Fast Track is NotFast Track is not a substitute for EE OLTP environments.Fast Track is not a new version of SQL Server
SQL Server 2008 EnterprisePre-tested hardware configurationsGuidance for physical implementationIndependent of SQL Server Parallel DW
Fast Track is not a substitute for proper database administrators
Fast Track helps size and configure the equipment not maintain it
What Fast Track is NotFast Track does not replace a solid DW Project Methodology
Iterative approach - Identify high value/low cost subject areas for first implementationsHigh velocity of releaseContinuous user feedback
Fast Track does not require a specific DW Methodology, however logical data model is very important
Kimball/Inmon/Corporate Information FactoryDW data model needs to be user-focused
END USER TOOLS & PERFORMANCE MANAGEMENT APPS
ExcelPerformancePoint
Server
BI PLATFORM
SQL Server Reporting Services
SQL Server Analysis Services
SQL Server Fast Track
SQL Server Integration Services
SharePoint Server
DELIVERY
Reports Dashboards Excel Workbooks
AnalyticViews Scorecards Plans
Our Integrated BI-DW Offering
Microsoft BI Stack Usage
MOSS/SharePoint
Presented in various formats…
Reporting Services
Excel/Excel Services
Dashboards/Scorecards
Custom Apps
Business Analysts/Power Users
ProClarity
Operational Users
Report Builder
Executives/Leadership
KPI Web Parts
Virtual Earth
SQL Server Analysis ServicesSQL Server Fast Track
SQL Server Integration Services
ODBC, XLS,Text, etc.
Consistent, Accurate view of your data…
To the appropriate audience….
Security, Workflow, Search, Meta Data, Business Data Catalog, Content MgmtNative MOSS
Value of using Analysis Services with SQL Server Fast
TrackSales example
Value of using Analysis Services with SQL Server Fast
Track (continued)Provides ability to create auto-aggregating measures and complex calculationsAllows creation of dimensions with multiple navigation paths (hierarchies)Provides ability to store aggregated measures across hierarchies
Recommend passing through detail queries to the Fast Track DW
Exposes aggregated, hierarchical data to Reporting Services, PerformancePoint, and Excel (and many other 3rd party tools)Will scale from 100’s to 1000’s of users
Value of using Analysis Services with SQL Server Fast
Track (continued)Recommendation
Use pre-aggregation capability of Analysis Services to scale solution to higher number of usersTarget high-volume, random I/O requests to Analysis ServicesLoading Analysis Services is an ideal workload for Fast Track
Standard Reporting (SSRS)
Can be highly formatted – tables, charts, gauges, mapsSSRS reports usually created by developersReports can be developed against SQL and SSAS sources
Target Audience – Operational Users
Ad-Hoc Reporting (Excel)
Excel 2007 will natively connect to SSAS using PivotTablesExcel reports can be shared through SharePoint document libraries and Excel Services web parts
Target Audience – Analysts and some Operational Users
Ad-Hoc Reporting (Report Builder)
Flexible and intuitive ad-hoc reporting tool that requires minimal trainingReport Builder can be used against SQL and SSAS sources
Target Audience – Analysts and some Operational Users
Dashboard/Scorecards
Provides at-a-glance insights and guided analyticsDashboards usually created by developers or analystsDashboards typically use SSAS as data sourceCan mix structured and unstructured elements
Target Audience – Executive and Operational Users
Analytics
Provides open-ended analyticsAnalytic views usually created by developers or analystsPerformancePoint/ProClarity uses SSAS as data source
Target Audience – Analysts
Fast Track DW Implementation Key
Principles
Fast Track Data Warehouse Components Balanced across all components
FCHBA
AB
AB
FCHBA
AB
AB FC
SW
ITCH
STORAGECONTROLLER
AB
ABCA
CHE
SERV
ER
CACH
ESQ
L SE
RVER
WIN
DO
WS
CPU
CO
RES
CPU Feed Rate HBA Port Rate Switch Port Rate SP Port Rate
A
BDISK DISK
LUN
DISK DISK
LUN
SQL Server Read Ahead Rate
LUN Read Rate Disk Feed Rate
SQL Server 2008 Potential Performance Bottlenecks
Current Fast Track Architectures are rated at 200 MB/s per CPU core
Achieving Sequential Scan
FT GoalsMaintain sequential data layout
Data logically and physically ordered
Minimize disk head movementLeverage
RAID-1 dual-readStorage Enclosure pre-fetchSQL Server Read-Ahead
These elements combine to create optimized Sequential Scan performance
Fast Track DW Implementation Key
Principles
Storage Layer
Storage – Disk Configuration
Fast Track is very disk efficientFT uses RAID-1 to enable sequential I/O
2 disk RAID-1 array per CPU coreDepending on which FT system is selected, system will have at least 16 RAID-1 arraysCreates virtual affinity between a RAID-1 array and a CPU coreData is evenly split across RAID-1 arrays using partitioning
Enables ability to load data sequentially
Sequential I/O uses about 1/3 of the number of disks versus random I/O to get same level of performance
Storage – Disk Configuration
Creating RAID GroupsHP MSA, EMC AX, IBM DS
11 disks per enclosure10 dedicated to user data1 hot spare
1 Storage Enclosure per (4) physical Cores
2 socket quad core server2 Storage Enclosures – 22 total disks
Raid ConfigurationPrimary data: (4) 2 disk RAID-1 arraysLog: (1) 2 disk RAID-1 array
Example - 2 Socket Fast Track Storage Components
SW
ITC
H
SP
A
SP B
SQL Server 2008 Minimum Server Configuration SMP Core-Balanced Architecture using Dual Read on HP MSA 2312
Per MSA2312 Drive Details• Each MSA can hold 12 drives, this configuration requires 11• MSA is 2U in total (capacitor eliminates need for battery)• Each MSA SP port controls 4 LUNs, SP-A also controls LOG LUN• Each pair of LUNs consists of (2) 300GB 15k FC drives RAID1
Each SP rated at 500MB/s or 1000MB/sfor both SP’s
Using 300GB 15k FC driveseach LUN rated at 125MB/seach SP controls 4 LUN’s at 500MB/s or 1000MB/s per MSA DAE
Each SP port rated at 4Gb/sor 400MB/s and 1600MB/s for all 4 SP ports.
Each HBA port rated at 4Gb/sor 400MB/s and 1600MB/s for all 4 HBA ports.
03 04
RAID GP02
LUN3
LUN4
01 02
RAID GP01
LUN1
LUN2
05 06
RAID GP03
LUN5
LUN6
07 08
RAID GP04
LUN7
LUN8
09 10
RAID GP05
LUN0(Logs)
HS
Quad Core CPU* Compressed Data
200MB/s per Core*200MB/s per Core*
HBA FC 1 4Gb/s or 400MB/s x
2
200MB/s per Core*200MB/s per Core*
HBA FC 24Gb/s or 400MB/s x
2
DAE = Disk Array EnclosureHBA = Host Bus AdapterSP = Storage ProcessorFC = Fibre ChannelPorts = 4Gbs FC
SQL Server File Layout
LUN16 LUN 2 LUN 3
Local Drive 1
Log LUN 1
Permanent DB Log
LUN 1
Tem
pD
B
TempDB.mdf (25GB) TempDB_02.ndf (25GB) TempDB_03ndf (25GB) TempDB_16.ndf (25GB)
Permanent FG
Permanent_1.ndf
Per
ma
na
nt_
DB
Sta
ge
D
ata
ba
se Stage FG
Stage_1.ndf Stage_2.ndf Stage_3.ndf Stage_16.ndf
Stage DB Log
Permanent_2.ndf Permanent_3.ndf Permanent_16.ndf
Log LUN 2
Permanent DB Log
Stage DB Log
SQL Server Files
User DatabasesCreate at least one Filegroup containing one data file per LUN
FT targets 1:1 LUN to CPU core affinityMake all files the same sizeEffectively stripes database files across data LUNs
Multiple file groups may be advantageousDisable Auto-Grow for the databaseTransaction Log is allocated to a Log LUN.
SQL Server Files
Transaction LogCreate a single transaction log file per database and place on a dedicated Log LUN.Enable auto-grow for log filesThe transaction log size for each database should be at least twice the size of the largest operation
SQL Server FilesTempdb
Create one Tempdb data file per LUNMake all files the same size
Follow standard tempdb best practicesAuto-Grow should be enabled for tempdb
Use large growth increment (10% of initial size)
Fast Track DW Implementation Key
Principles
Data Loading
Techniques to Maximize Scan Throughput
Use Clustered indexes on Fact TablesLoad techniques to avoid fragmentation
Load in Clustered Index order (e.g. date) when possible
Index Creation always MAXDOP 1, SORT_IN_TEMPDBIsolate volatile tables in separate filegroupIsolate staging tables in separate filegroup or DBPeriodic maintenance
Minimizing File fragmentation
Pre-allocate database filesSize files correctly to prevent growthDo not shrink files
Do not use NTFS file fragmentation tools
Rebuild table to ensure disk block level optimal organization
Concurrent load operations to the same file will induce fragmentationDML change operations (Update/Delete) may induce fragmentation
Conventional data loads lead to fragmentation
Bulk Inserts into Clustered Index using a moderate ‘batchsize’ parameter
Each ‘batch’ is sorted independentlyOverlapping batches lead to page splits
1:32
1:31
1:35
1:34
1:33
1:36
1:38
1:37
1:40
1:39
1:32
1:31
1:35
1:34
1:33
Key Order of Index
Data Loading – Recommendations for Incremental Loads
Clustered Index Table LoadsOption 1 – Direct load into table
Sorts and commit size must fit into memory
Option 2 – Empty tableLoad into empty clustered index tableSerial or parallelizedNon-parallelized INSERT SELECT statement to move to final table
Incremental Loads - Heap
Minimal Logging is recommendedTable locking may be required
Partitioned/Non PartitionedLoad directly into target tableSet BATCHSIZE appropriatelyParallelize Bulk Inserts if necessary
Fast Track DW Implementation Key
Principles
High Availability
High Availability
Fans 6 hot plug redundant fans, 3 shown
Core I/O -2 USB, 1 serial, 1 video port,3 RJ-45 PS2 keyboard/mouse support
I/O slots11 PCIe slots std.,Option to upgrade to 2 HTx and 7 PCIe
Power Supplies -3+3 redundant power supplies
Clustering With Server 2008
No single points of failure in Failover Clustering!
Make Clustering SimpleEasy to create, use, and manageEnabling the IT Generalist
Reduce Total Cost of OwnershipMaking Clusters a smart business choice for the enterprise
Support for 16 node clusters
Fast Track DW Implementation Key
Principles
Validation
System Validation
Validation is intended to confirm the proper installation and configuration of a Fast Track RAValidation is achieved in two phases
Synthetic IO testingValidates storage, network, and operating systemSQLIO can be used to generate IOPerfmon can be used to monitor results
SQL Server testingValidates performance across SQL Server stackFinal step of deployment process
Core Fast Track Metrics
These metrics are use to both validate and position Fast Track RA’s
Maximum Consumption Rate – Ability of SQL Server to process data for a specific CPU and Server combination and a standard SQL query.
Benchmark Consumption Rate – Ability of SQL Server to process data for a specific CPU and Server combination and a user workload or query.
User Data Capacity – Maximum available SQL Server storage for a specific Fast Track RA assuming 2.5:1 page compression factor and 300 GB 15K SAS. 30% of this storage should be reserved for DBA operations
MCR
Similar in concept to Miles Per Gallon rating for a new car
Not necessarily what you will see when you drive the car, but a good starting point
Provides a standard reference point for
Simple evaluationsRelative comparison between different Fast Track configurationsSystem validation and benchmarking
Current value for published Fast Track RA’s
200MB/s per core
Fast Track Benchmark ResultsActual results from Fast Track validation
HP 2 socket, 8 core Configuration
Server
Windows Server OS
MCR 1.6 GB/s
Storage Enclosure
Storage Enclosure
Fib
er
Sw
itch
500 MB/s
500 MB/s
500 MB/s
500 MB/s
300 MB/s
300 MB/s
300 MB/s
300 MB/s
300 MB/s
300 MB/s
300 MB/s
300 MB/s
HBA
HBA
Min2
GB/s Min 2 GB/s
BCR
Similar to actual MPG you get with your current driving habitsProvides a workload specific reference point
Defines the ideal outcome of the Full Evaluation scenarioCan be compared to MCR to choose an appropriate FT configurationProvides a framework for validating Fast Track data warehouse configurations.
Fast Track Benchmark Results
Server
SQL Server OS
BCR 1.2 GB/s
HBA
HBA
Storage Enclosure
Storage Enclosure
Fib
er
Sw
itch
1.2 GB/s
1.2 GB/s
300 MB/s
300 MB/s
300 MB/s
300 MB/s
150 MB/s
150 MB/s
150 MB/s
150 MB/s
150 MB/s
150 MB/s
150 MB/s
150 MB/s
Actual results from Fast Track validation
HP 2 socket, 8 core Configuration
UDCUDC is customer supplied and is the data capacity required
Plan for projected growthBased on customer projectionsNeeds to be allocated up-front
Allocate for data management needsStaging database requirementsTemporary objects
Allocate for TempDBTypically 20-30% of primary data spaceTempdb is not compressed
77©2009 Microsoft Corporation
Fast Track SMP RA for SQL Server 2008 CPU Core Calculator v2.4Updated 10/09/2009 - uw
This spreadsheet can be used to estimate the number of cores required to support a user workload and workload mix.Enter your factors into the green fields and the results will be calculated in the pink cells.The spreadsheet uses a weighted average to determine the number of cores required based on your inputs.User Variable Input
Anticipated total number of users expected on the system 3,000 users
Adjust for workload mix
Estimated % of workload
Estimated % data found in
SQL Server cache
Estimated Query Data
Scan Volume MB (Uncompressed)
Desired Query Response Time
(seconds)(under load)
Estimated Disk Scan volume MB (Uncompressed)
Estimated percent of actual query concurrency 1% concurrency Simple 70% 10% 8,000 25 7,200Fast Track DW CPU max core consumption rate
(MCR) in MB/s of page compressed data per core 200 MB/s Average 20% 0% 75,000 180 75,000
Estimated compression ratio (default = 2.5:1) 2.5 :1 Complex 10% 0% 450,000 1,200 450,000Estimated drive serial throughput speed in
compressed MB/s 100 MB/s 100%Number of data drives in single storage array 8 drives
Usable capacity per drive 272 GB
Space Reserved for TempDB 26%
Calculations and Results
% of core consumption rate achieved
Expected per CPU core
consumption rate (MB/s)
Calculated Single Query Scan
Volume in MB (compressed)
Calculated Target
Concurrent Queries
Estimated Target Queries
per Hour
Required IO Throughput in
MB/s
Estimated Number of Cores
Required
Estimated Single Query Run Time
(seconds)
Simple 100% 200 2,880 21 3,024 2,419 12.10 0.5Average 50% 100 30,000 6 120 1,000 10.00 9.4Complex 25% 50 180,000 3 9 450 9.00 112.5
30 3,153 3,869 32.00
Arrays Required based on throughput
Single Array Throughput in
MB/s
Throughput in MB/s for All
Required Arrays5 800 4,000
Suggested Fast Track RA Server Requirements No of CPU
coresNumber of
arrays
Total Compressed Data Capacity
(TB)
Max achievable IO Throughput
in MB/s
Max achievable CPU consumption in
MB/s
Required IO Throughput in
MB/s
32 8 16 6,400 6,400 3,869
SQL Server 2008 R2 Parallel DW (Project Madison)
SQL Server Parallel Data Warehouse
Choice of hardware vendorHigh scale through Massively Parallel Processing (MPP) systemHub and Spoke architectureDeep integration with Microsoft BI
79
A data warehouse appliance with massive scalability
Formerly known as Project “Madison”
Scale-Out of SQL Server: 10s TB ►100s TB ►PB
Reference Architectures from HP, Bull, EMC, Dell, IBM
Low cost of ownership
Simplified deployment and maintenance via appliance model
Integration with existing SQL Server 2008 data warehouses via Hub & Spoke Architecture
Available 1HCY10
Preview program running
SQL Server Parallel DW Architecture
Database Servers
Du
al
Infi
nib
an
d
Control Nodes
Active / Passive
Landing Zone
Backup Node
Storage Nodes
Spare Database Server
Du
al
Fib
er
Ch
an
nel
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Management Servers
Client Drivers
ETL Load Interface
Corporate Backup Solution
Data Center Monitoring
Corporate Network Private Network
SQL
SQL
82©2009 Microsoft Corporation
Date Dim
D_DATE_SK
D_DATE_ID
D_DATE
D_MONTH
…
Store Sales
Ss_sold_date_sk
Ss_item_sk
Ss_customer_sk
Ss_cdemo_sk
Ss_store_sk
Ss_promo_sk
Ss_quantity
…
Promotion
P_PROMO_SK
P_PROMO_ID
P_START_DATE_SK
P_END_DATE_SK
…
Customer
C-CUSTOMER_SK
C_CUSTOMER_ID
C_CURRENT_ADDR
…
Item
I_ITEM_SK
I_ITEM_ID
I_REC_START_DATE
I_ITEM_DESC
…
Store
S_STORE_SK
S_STORE_ID
S_REC_START_DATE
S_REC_END_DATE
S_STORE_NAME
…
Customer
Demographics
CD_DEMO_SK
CD_GENDER
CD_MARITAL_STATUS
CD_EDUCATION
…
1
Trillion
Rows
100 Million73, 049
1.92 Million1, 902
2, 500
502, 000
Parallel Data Warehouse DemonstrationTPCDS – 150+ Terabytes
Date Dim
D_DATE_SK
D_DATE_ID
D_DATE
D_MONTH
…
Item
I_ITEM_SK
I_ITEM_ID
I_REC_START_
DATE
I_ITEM_DESC
…
Store Sales
Ss_sold_date_sk
Ss_item_sk
Ss_customer_sk
Ss_cdemo_sk
Ss_store_sk
Ss_promo_sk
Ss_quantity
…
Promotion
P_PROMO_SK
P_PROMO_ID
P_START_DATE
_SK
P_END_DATE_
SK
…
Store
S_STORE_SK
S_STORE_ID
S_REC_START_D
ATE
S_REC_END_DAT
E
S_STORE_NAME
…
Customer
C-
CUSTOMER_SK
C_CUSTOMER_I
D
C_CURRENT_AD
DR
…
Customer
Demographics
CD_DEMO_SK
CD_GENDER
CD_MARITAL_STATU
S
CD_EDUCATION
…
Database Distributed & Replicated Tables
EX: Data Distribution with Replication
C I
D
CD
S
P
C I
D
CD
S
P
C I
D
CD
S
P
C I
D
CD
S
P
C I
D
CD
S
P
C I
D
CD
S
P
SS
SS
SS
SS
SS
SS
84©2009 Microsoft Corporation
Parallel Data Warehouse Demo - Results Query
Cache flushedInner joins
Sample Results625K rows returned in 11 seconds from 1 trillion row table Final product will be even faster
ReportRetailer: day-part analysisSales, Time, Date, Prod type
SQL Parallel and Fast Track Hub and Spoke
85<Session Name> Microsoft NDA-only
Central EDW Hub
Regional Reporting
Departmental Reporting
ETL Tools
High Performance HQ
Reporting
SQL Parallel DW Multi-Temperature
Auto Publish
FR
ES
H D
ATA
L
OA
DIN
G
Most Recent - 3 Months
2 Years 7 Years
User Queries
BI Server
Queries
• User Data• Hot -> Warm -> Cold• Stage -> ODS ->
Prod
•Back-up / Archive• Data structure in
synch• Fast response to
users
• Easy Data Movement
• High Availability
Case study: Tier 1 Carrier - CDR Architectureincluding Multi Temperature Archive
UP TO 500M ROWS/DAY
HIGH-SPEEDPARALLELUPDATES
COSTMGT
REVENUEASSURANCE
MARGINANALYSIS
120 TB HIGH CAPACITY‘WARM’ CDRs
FRAUD DETECTION
BILLING60 TB HIGH PERFORMANCEFOR MEDIATION & AUGMENTATION USING ETL TOOLS
220TB ARCHIVE DW
ROLL OFF TO ARCHIVE
SQL Server Parallel DW – An Appliance Experience
All hardware from a single vendorMultiple vendors to chose from
HP, Dell, IBM, Bull, EMC
Orderable at the rack or cluster Vendor will
Assemble appliancesImage appliances with OS, SQL Server and Madison software
Appliance installed in less than a daySupport –
Microsoft provides first call supportHardware partner provides onsite break / fix support Microsoft Confidential
SQL Server Parallel DWGoing Forward
Parallel Data Warehouse TimeLine
91Microsoft Confidential—Preliminary Information Subject to Change
2008 Beyond
2009 2010
Parallel Data Warehouse
MTP Program LaunchedCirca 10 Customers Provided with early Madison BenchmarkMadison Named as SQL Server Parallel DW
Microsoft Announce Intention to Acquire DATAllegro (July)Acquisition Closes (Sept)150TB demo of DATAllegro on SQL Server run at BI Conference (Oct)
Hardware Architectures IdentifiedEarly whitepapers / guidanceLaunch date estimated Summer 2010
Project “Madison” MTP 2 Program to Launch (fully functional, fully performant)TAP Program (on client site)RTM in Summer 2010
Parallel Data Warehouse
PDW vNextFocus on continually lowering the costs of high end DW, while increasing performanceAdditional Hardware PartnersAdditional functionalityFurther integration with MS stack
?
SQL Parallel DW Beta Programs
Two ProgramsMTP – Madison Technology Preview - PoC
15-20 participantsDuration of 4 weeksMTP 2 starts Feb 2010
TAP – Beta production implementation4 – 6 customersFirst iteration 9 to 12 weeks
Microsoft Confidential- For Internal Use Only
SQL EE
ClassicSQL EE
Fast
Track
SQL
Parallel
Supports Data Marts or EDW
• Commodity Hardware• SMP approach• Leverage Microsoft’s DW
scale out guidance from SQL CAT Team / MSDN
SQL Server Fast Track DW Data Marts or EDW
• SMP appliance “like” approach
• Commodity Hardware• Reference architectures 1-
48TB• Software +Hardware stack
tuned for DW “Scan Centric” Use Cases
• Index Light• Workload Affinity determines fit for
Fast Track
SQL Parallel DWProject MadisonVery Large EDW
• Massively Parallel Processing scale-out
• True DW Appliance• Commodity Hardware• Scale to 1 PB• Hub-and spoke
architecture supports SMP spokes
Microsoft Data Warehouse Solutions
Next Steps – Options for our Customers
Quick Start DW Roadmap ServiceOnsite ½ day meeting to review customer requirements and to present MS DW solutionsMTC – Microsoft Technology Center
Dallas, TX1 Day Envisioning Session1-2 Day DW-BI Architectural Design Session1-2 week PoC - DW SQL EE– Fast Track – SSIS-SSAS
SQL PDW-Madison MTP(PoC) - TAPDW Needs Assessment & Roadmap
Links for Microsoft Data Warehousing
Visit www.microsoft.com/fasttrackVisit www.microsoft.com/madisonVisit the SQL Server DW Portal on TechNet
http://technet.microsoft.com/en-gb/sqlserver/dd421879.aspxDownload 4 new white papers on EDW architecture
96©2009 Microsoft Corporation
Hub-and-spoke federation can integrate SMP reference architectures with MadisonScale SQL Server to 1 petabyteSet a new bar in appliance pricing and performanceMTPs and TAPs are this year
Fast Track Data Warehouse reference architectures are available todayMaximize use of key SQL Server 2008 Enterprise data warehousing enhancementsScale up today with SMP, scale out tomorrow with MPP
Summary
Project MadisonSQL PDW
Fast Track Data
Warehouse
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.