High Performance Analytical Appliance MPP Database Server Platform for high performance Prebuilt...

39

Transcript of High Performance Analytical Appliance MPP Database Server Platform for high performance Prebuilt...

PDW – Built for High Availability & PerformanceDavid Leibowitz

SP06

PDW Overview

High Performance Analytical Appliance• MPP Database Server Platform for high performance• Prebuilt appliance with HW & SW included and optimally

configured• Shared nothing architecture; In-Memory Columnstore engine

Value• Lowest price per terabyte of high end DW appliances on the

market• Up to 100x faster than legacy SMP Database queries• Up to 15x data compression

Scale & Support• Scales from Terabytes to Petabytes; Built with Big Data in

mind• Architected with redundancy throughout• Integrated, single call Support

What is SQL Server 2012 PDW?

Shared Nothing ArchitectureFault tolerance

• All h/w components have redundancy• CPU• DISK• NETWORK• POWER• STORAGE PROCESSORS

• Control Nodes Failover Clustering• Compute Nodes Part of a single cluster

per rack

MPP Massively Parallel

Processing

SMP Symmetric Processing

• Multiple CPUs used to complete individual processes simultaneously

• All CPUs share the same memory, disks, and network controllers

• Mostly, the solution is housed on a shared SAN• Increase compute capacity via scale-up design

• Uses separate CPUs running in parallel to execute a single query

• Each CPU has its own allocated memory• High-speed communications between nodes• Increase compute capacity via scale-out design

SMP vs. MPP Architecture

2nd Scale Unit (additional 3 nodes optional )

Base Unit (3 nodes)

Infiniband Switch

Ethernet Switch

Management Control Node

Management Failover Node

Ethernet Switch

Infiniband Switch

3rd Scale Unit (additional 3 nodes optional)JBOD

Compute ServerCompute ServerCompute Server

JBOD

JBOD

Compute ServerCompute ServerCompute Server

JBOD

JBOD

Compute ServerCompute ServerCompute Server

JBOD

Dell PDW Modular Design

• 3 – 54 Compute Nodes• 1 – 6 Racks• 1TB, 2TB or 3TB Drives• 22.65 TB – 1,223TB Raw• 113 TB – 6PB User Data

Linear Scale Mixed Workloads

No Downtime Start Small & Grow

113TB 6 PB

Linear Scale Solution

MPP Architecture

Shared Nothing ArchitectureI/O and CPU affinity within SMP nodes• Eliminates contention per user query• Utilize full resources for each user query• Multiple physical instances of tables

• Distribute large tables• Replicate small tables

Sample Data Model Across Compute

Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day

Store Dim

Store Dim IDStore NameStore MgrStore Size

Product Dim

Prod Dim IDProd CategoryProd Sub CatProd Desc

MktgCampaign Dim

Mktg Camp IDCamp NameCamp MgrCamp StartCamp End

Sales Facts

Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold

SQL

SQL

SQL

Sample Data Model – Distributed Tables

SQL

SQL

SQL

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

Sales Facts

Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold

SF-1

SF-2

SF-3

Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day

Store Dim

Store Dim IDStore NameStore MgrStore Size

Product Dim

Prod Dim IDProd CategoryProd Sub CatProd Desc

MktgCampaign Dim

Mktg Camp IDCamp NameCamp MgrCamp StartCamp End

Sample Data Model – Replicated Tables

SQL

SQL

SQL

Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day

Store Dim

Store Dim IDStore NameStore MgrStore Size

Product Dim

Prod Dim IDProd CategoryProd Sub CatProd Desc

MktgCampaign Dim

Mktg Camp IDCamp NameCamp MgrCamp StartCamp End

TD

PD

SD

MD

TD

PD

SD

MD

Smaller Dimension Tables are Replicated

on Every Compute Node

TD

PD

SD

MD

Sales Facts

Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-2

SF-3

Result: Fact -Dimension Joins can be performed locally

Infiniband Switch

Ethernet Switch

Management Control Node

Management Failover Node

Ethernet Switch

Infiniband Switch

JBOD

Compute ServerCompute ServerCompute Server

JBOD

JBOD

Compute ServerCompute ServerCompute Server

JBOD

JBOD

Compute ServerCompute ServerCompute Server

JBOD

PDW Query Execution

27x Faster

7x Smaller

37x Smaller

28x Faster

16.17% Impact

From 2.5 Hours to

5.5 Minutes From 3.3 TB To 486 GB

3.3TB to 90 GB From43 min to 90 sec Minimal

concurrency impact

Real World Summary Results

Hub & Spoke Enterprise Business Intelligence with PDW

Distributed DW Architecture

Why Distributed DW?• Full SQL Server functionality• Distributes the workload• Allows existing and new data marts to be

integrated into the EDW• Better solution than consolidation• Enables publishing• Expand and add spokes without impacting

users

Distributed Architecture• Parallel Database Export (PDE) technology

enables rapid data movement and consistency between distributed SQL Servers

• Support different SLAs and user groups:High-performance loading and queriesGuaranteed server resourcesData concurrencyCustomized workloads

Hub & Spoke Design

PDWv2 Hub

Landing Zone/SSIS Server

DW Loader push of flat files over 56GB

Infiniband

Backup Node

SQLServer (QSDW 2000)

SSAS ProcessingServer

Remote Table Copy over 56GB Infiniband

Remote Table Copy over 56GB Infiniband

Backup push over 56GB Infiniband

SMP SQL Server (Ex: Fast Track

Reference Architecture, Quick

Start Data Warehouse)

Hub & Spoke Extended

SSRS, Power View, Excel, SharePoint

Create Remote Table (CRTAS)• Enables the high-speed PDE feature• Selects data from a PDW appliance and copies that data to a new table in an SQL Server SMP database• Sample transfer rate to 4 socket 24 core

server: 120GB / Hour plus compression

CRTAS PreqrequisitesMust be co-located and on same Infiniband® network:• Requires Infiniband® HCA card in remote SQL Server

SMP• Requires physical server placement within ~100

meters ofPDW appliance• Recommend externally facing network to be

firewalled• Exception for the SQL Server admin/management

ports• PDW to SQL Server SMP is the only supported• Configuration• Target table(s) must not already exist

CRTAS ExampleCREATE REMOTE TABLEOrderReporting.dbo.OrdersAT( 'Data Source = SQL_Sales, 1433;User ID = Madrid;Password = TechEd2013;' )ASSELECT * FROM 2010Q4.dbo.Orders;

CRTAS Monitoring Performance• Performance counters on destination SMP SQL Server:

Databases: Bulk Copy Rows/SecDatabases: Bulk Copy Throughput/Sec (KB)

• On the PDW appliance, use the following DMV-basedquery to view the data export status:

SELECT * FROM sys.dm_pdw_dms_workers WHERE type = 'PARALLEL_COPY_READER';

Enhancing Enterprise BI Performance with PDW

Enhancing Performance & Scale• Table Design• CTAS• Data Loading

Enhancing Fact Table Performance• Partition tables where appropriate

• Common key is date (or integer surrogate)

• Similar guidelines to SMP SQL Server partition

• Use partition switch for large inserts/updates

Partition a Replicated TableCREATE TABLE Customers

(id integer NOT NULL,lastName varchar(20),postalCode varchar(10)

WITH(PARTITION (id RANGE LEFTFOR VALUES (10,20,30,40,50)) );

Partition Distributed TableCREATE TABLE Orders

(id integer NOT NULL,lastName varchar(20),shipdate datetime

WITH(DISTRIBUTION = HASH(id),

PARTITION (shipdate RANGE RIGHT FOR VALUES(‘1992-01-01’,’1992-02-01’,’1992-03-01’..)));

Other Performance Considerations• Consider de-normalizing tables (traditional

Master / Detail)• Use CTAS as Swiss Army Knife

CTAS• Creates a new table based upon SELECT• Executes in parallel, minimal logging• Copy tables (or subsets) for querying• Change replicated table to distributed• Change the distribution column• Use to periodically defrag tables• Reduce the overhead of a DELETE

SQL Server Integration Services

(SSIS)

DWLoader

• Achieve data load speeds of up to 1.7 TB per hour Accommodate multiple and concurrent incremental loads

• Provides transactional protection and configurable batch size (10,000)

• Supports direct load of compressed files

• SQL Server Parallel Data Warehouse Connection Manager

• SQL Server Parallel Data Warehouse Destination

High Speed Data Loads

Data Loading with SSIS• SQL Server PDW Destination Component• Loads occur in parallel, both within a

package and among multiple packages concurrently

• SSIS can run on Loading Server or another server outside of the PDW appliance

• Leverages DMS for parallel operations

SSIS – Management vs. Performance• Row level locking• PDW Connections and Queries are costly• Data type conversion in the destination adapter is

expensive. Match destination data types (String, decimals)

• Consider ELT rather than ETL• Consider using SSIS control flow to instantiate

DWLoader

Related contentFind Me Later At the Dell Booth

msdn

Resources for Developers

http://microsoft.com/msdn

Learning

Microsoft Certification & Training Resources

www.microsoft.com/learning

TechNet

Resources

Sessions on Demand

http://channel9.msdn.com/Events/TechEd

Resources for IT Professionals

http://microsoft.com/technet

Evaluate this session

Scan this QR code to evaluate this session.

© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.