All up datawarewhouse – from smp to parallel
-
Upload
sriram-jayaraman -
Category
Technology
-
view
303 -
download
1
Transcript of All up datawarewhouse – from smp to parallel
All up datawarewhouse – From SMP to Parallel Data warehousing
UNSTRUCTURED
UNBALANCED
UNPREDICTABLE
Take 1 big SANAdd a little ServerAdd a bigger ServerAdd more networking
POTENTIAL PERFORMANCE BOTTLENECKS
FCHBA
AB
FCHBA
AB
FC S
WIT
CH
STORAGECONTROLLER
AB
ABCA
CHE
SERV
ER
CACH
ESQ
L SE
RVER
WIN
DO
WS
CPU
CO
RES
CPU Feed Rate HBA Port Rate Switch Port Rate SP Port Rate
A
BDISK DISK
LUN
DISK DISK
LUN
SQL Server Read Ahead Rate
LUN Read Rate Disk Feed Rate
It’s all about …. SIZING
One SHOE does not FIT ALL
Transaction processing
simplifies and accelerates data
capture for accurate business
decisions
Data warehousing
enables common data model for
single version of the truth
Analysis leads to optimized business processes and improved performance
Data Warehouse Scope
Dat
a P
ath
Data Warehouse
Analysis Services Cubes
PerformancePoint
Dedicated SAN, Storage Array
Reporting Services
Web Analytic Tools
Integration Services ETL
SharePoint ServicesMicrosoft Office SharePoint
Data Staging,Bulk Loading
Supporting Systems
BI Data Storage Systems
Presentation Layer Systems
Data Warehouse Scope (dashed)
Pre
sen
tati
on
Dat
aP
rese
nta
tio
n D
ata
Data Warehouse Scenarios
• No longer exclusive to large enterprises and specialists analysts
• Growth of affordable self-service BI tools such as PowerPivot and Reporting Services has created a DW requirement for smaller businesses and individual departments
Microsoft Data Warehousing Offerings
Scalable and reliable SMP
platform for data warehousing on any hardware
Scalable and reliable platform
for data warehousing on any hardware
Reference architectures
offering best price performance for
data warehousing
Appliance for high end MPP Data Warehousing
delivering highest scalability and performance
Ideal for data marts or small to
mid-sized enterprise data
warehouses (EDWs)
Ideal for large data marts or mid-
sized EDWs
Ideal for data marts or small to mid-sized data
warehouses with scan-centric workloads
Ideal for high scale or high performance data
marts and EDWs
Software only
Integrated Appliance
(Software and Hardware)
Reference Architectures(Software and
Hardware)
DW Appliance(Fully integrated
Software and Hardware)
Scale-Up DW Scale-Up DW Scale-Up DW Scale-Out DW with MPP
10s of terabytes <5 terabytes 5–80 terabytes 10s - 100s of TB
Software Assurance; Premier Mission Critical
Support3-Year Support Plus 24
Software Assurance; Premier Mission Critical
SupportMission Critical
Advantage Program
Enterprise Fast Track Data Warehouse RA
BDW Appliance
Parallel Data
Warehouse
Microsoft Data Warehouse Offerings
Effort to Build Very High Very Low Moderate
Moderate
Moderate Moderate
Very Low
Capacity Variable 5 TB 14 TB 20 TB 40 TB 40 TB 500 TB
Concurrency Variable Light Light Medium Medium High Very High
Query Complexity
Variable Medium Medium
Medium Medium High Very High
Business Data Warehouse Appliance
Business Data Warehouse Appliance
Agile• Deploy in hours/days, not in
months
• Easy to use through built-in dedicated tools to load and manage your data warehouse
• Designed for up to 5TB data warehouses
• Fast Track 3.0 compliant, license path to Fast-Track
Complete• Hardware +
Software + Services• Pre-tuned, pre
configured, pre-installed. Turn on and go!
• Single point of contact for support
Optimized• Specifically for small to
medium data warehouse workload
• Designed for performance, energy efficiency, and value by HP and Microsoft’s best engineers
• Security and reliability built in
Scenarios
Small/Departmental Data Warehouse
Spoke in EDW Hub and Spoke Architecture
Reference Architectures
Fast Track Data Warehouse Components
Software:• SQL Server 2008 R2
Enterprise• Windows Server 2008
R2Configuration guidelines:
• Physical table structures
• Indexes• Compression• SQL Server settings• Windows Server
settings• Loading
Hardware:•Tight specifications for servers, storage and networking•‘Per core’ building block
Processing
Networking
Server
Storage
SQL Server Parallel Data Warehouse
SQL Server Parallel Data Warehouse• Tier-1 Enterprise Data Warehouse Appliance Offering
– High scalability from tens to hundreds of terabytes– High performance through the MPP system
• Flexibility and Choice – Choice of deployment options through distributed architecture
• Most Comprehensive Solution– Complete data warehouse solution spanning desktop,
enterprise data warehouse, and data marts
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
CONTROL RACK DATA RACK
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
CONTROL RACK
Client connections always go through the control node
Contains no persistent user data
Parallel Data Warehouse advantages:
o Processes SQL requests
o Prepares execution plan
o Orchestrates distributed execution
Local SQL Server processes final query plan and aggregates results
Provided by DataDirect
o Open database connectivity (ODBC), object linking and embedding database (OLE DB), Java Database Connectivity (JDBC), and ActiveX® Data Objects (ADO.net) client drivers
o Wire protocol (SeQuel link)
o Drivers are available for 32 bits and 64 bits
CONTROL NODE
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
CONTROL RACK
Provides Support and Patching for the Appliance
Holds image for re-deployment of compute node
Holds Active Directory
MANAGEMENT NODE
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
CONTROL RACK
Provides high-capacity storage for data files from ETL processes
Is available as a sandbox for other applications and scripts that run on the internal network
Provides SQL Server Integration Services
LANDING ZONE
Source
Landing Zone Files
Data Loade
r
Compute Nodes
DWLoader or
SQL Server Integration
Services
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
CONTROL RACK
Provides Integrated Backup Solution
Integrates with 3rd party backup option
Orderable in different sizes
BACKUP NODE
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
DATA RACK
• Data Rack Servers 10 active + 1 passive
• HP ProLiant DL360 G7 compute nodes
• InfiniBand, FC and Ethernet switching, 42U rack
• Expansion Grow from 1–4 data racks, storage options, test/dev system
• Storage 10x HP StorageWorks MSA P2000 G3
• Consists of COMPUTE NODES and STORAGE NODES
SQL
DATA RACK
• Data Rack Servers 10 active + 1 passive
• HP ProLiant DL360 G7 compute nodes
• InfiniBand, FC and Ethernet switching, 42U rack
• Expansion Grow from 1–4 data racks, storage options, test/dev system
• Storage 10x HP StorageWorks MSA P2000 G3
COMPUTE NODE
Each MPP node is a highly tuned symmetric multi-processing (SMP) node with standard interfaces
Provides dedicated hardware, database, and storage
Runs SQL Server
Spare Node provides failover in case of node failure
Drives are configured as RAID 1
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Client Drivers
ETL Load Interface
Support/Patching
Corporate BackupSolution
CONTROL RACK DATA RACK
PDW – Client Connectivity
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
QUERY
CONTROL RACK DATA RACK
??????
???
?????????
???
???
???
???
???
???
PDW – Query Processing
Replicated
A table structure exists as a full copy within each discrete Parallel Data Warehouse node.
Data Layout Approaches
Distributed
A table structure is hashed on a single column and uniformly distributed across all nodes on the appliance. Each distribution is a separate physical table in the database management system (DBMS).
Ultra Shared-Nothing
Provides the ability to design a schema of both distributed and replicated tables to minimize data movement between nodes. Small sets of data can be more efficiently stored in full
(replicated). Certain set operations (such as single-node operations)
are more efficient against full sets of data.
Ultra Shared-Nothing Architecture
Extends Traditional Shared-Nothing Design Pushes shared-nothing architecture into the SMP node—there is IO and CPU affinity
within SMP nodeso Eliminates contention for user querieso Uses full resources for each user query
Provides multiple physical instances of tableso Distributes large tableso Replicates small tables
Redistributes rows as needed
Provides Fault Tolerance All hardware components have redundancy (including CPUs, disks, networks, power, and
storage processors) Control and compute nodes use failover clustering Management nodes have active and standby states
Administrative Console
https://controlnodeipaddress
Dashboard Query activity Load activity Backup and restore Active locks Active sessions Alerts Appliance state
Parallel Data Warehouse Configuration Manager
Appliance topology
Services status Network
configuration Privileges
Parallel database copy technology enables rapid data movement and consistency between EDW and data marts
Create SQL Server 2008 R2, Fast Track Data Warehouse,and SQL Server Analysis Services Data Marts
Supports user groups with very different service-level agreements (SLAs):• Performance• Capacity• Loading• Concurrency
Flexible Business Alignment
A distributed architecture gives you the flexibility to add or change diverse workloads
or user groups while maintaining data consistency across the enterprise
Landing Zone
ETL Tools
Distributed Data Warehouse ArchitecturesDepartmental Reporting
RegionalReporting
High-Performance
ReportingCentral
EDW Hub
RegionalReporting
with Business Decision
Appliance
Third-Party
RDBMSThird-PartyData
Integration
Mobile Applicat
ions
Determining the Right SolutionWhat is the workload? Number of concurrent users Query complexity Query mix Load processing Performance requirements
What is the customer looking for in a solution? Simplicity in the appliance 100 percent compatibility with SQL Server 2008 R2 Enterprise scalability Economical hardware Incremental expansion and high availability by default
Parallel Datawarehouse
Enterprise-class scalability to hundreds of terabytes High performance Interoperability with leading BI products Mission critical support and maintenance Mature SQL Server platform with high security and
robust engineering process Strong data warehouse vision and roadmap that includes
industry-leading technologies
Value to Customer
Supporting Features
MPP with ultra shared-nothing architecture Distributed query optimization Balanced hardware with pre-tested and pre-tuned appliances optimized for
data warehousing Third-party product integration (for example, Microstrategy, Business Objects,
and Informatica) Mission critical support and maintenance Road map includes column store, petabyte scalability, real-time data
warehousing, MDM, and data quality
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment
on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.