Fast Track, Microsoft SQL Server 2008 Parallel Data Warehouse
and Traditional Data Warehouse Design BI Best Practices and Tuning
for Scaling SQL Server 2008
Slide 2
Slide 3
Data Warehouse
Slide 4
Fast Track
Slide 5
PDW
Slide 6
Traditional MD design SSAS PDW SSAS
Slide 7
Characteristic Typical BI (DWs & DMs)OLTP (Operational
Database) Data Activity Large reads (disjoint sequential scans)
Large writes (new data appends) Indexed reads and writes Large
scale hashing Small transactions Constant small index reads,
writes, and updates Database sweet spot size 100s of Gigabytes to
Terabytes (need medium to large storage farms) Gigabytes (require
smaller to medium sized storage farms) Time period Historical
(contributes to large data volumes) Current Queries Largely
unpredictablePredictable I/O throughput requirement Up to 20 GB/sec
sustained throughput IOPS is more important than sustained
throughput
Slide 8
Microsoft/HP Fast Track reference configurations OR SQL Server
Parallel Data Warehouse (PDW) SQL Server/HP Traditional DW design
reference configurations Different logical and physical DB design
philosophies Mmm, what will my logical & physical DB design
look like ? Lower hardware costs
Slide 9
It is not uncommon to have hundreds of disk drives to support
the I/O throughput requirements in a traditional DW environment
RAID 5
Slide 10
How does Fast Track and PDW get its speed ? X-Ray view at the
physical disk level First lets look at a traditional DW..
Slide 11
Data is stored wherever it happens to land Sequential data Fact
table Initial load Fact table 2 nd day load Fact table 3 rd day
load Fact table 5 th day load Fact table 6 th day load
Slide 12
Column Index / Column Index / Column Pre-Calculated data
Pre-Calculated data Duplicate data
Slide 13
Disk throughput is slower with indexes, aggregates and summary
tables Index-lite is faster because there is less disk head
movement Eliminating indexes and storing data sequentially will
provide the fastest disk throughput rates Index Summary table
Traditional DW design with indexes & summary tables Fast Track
& PDW Index-lite Fast Track & PDW Fastest sequential scan
rates
Slide 14
Example: Average disk Seek time is typically about 4ms; Full
stroke is about 7.5ms. At 15K RPM = 250 revolutions/sec. = 4ms for
a full revolution = Average latency is about 2ms. Fast Track &
PDW are designed to stream large blocks of data sequentially which
is even faster than average latency because disk heads are directly
over the streaming data.
Slide 15
Seek time is typically 2 - 4x longer than average latency. By
eliminating seek time you can have approximately 2 4x fewer disk
drives in order to maintain a given throughput level. Fast Track
& PDW are designed to stream large blocks of data sequentially!
Why does PDW and Fast Track want data to be stored sequentially
?
Slide 16
Slide 17
Fast Track and PDW get its speed from FAST scan rates ! In
addition, HP and SQL Server PDW uses Massively Parallel Processing
(MPP) to expand Fast Track concepts in a BI appliance Fast scan
rates
HP SQL Server 2008 Parallel Data Warehouse (PDW) Control Rack
Data Rack
Slide 21
Free Your IT Pressures... Get More Value Without HP Factory
ExpressWith HP Factory Express Faster time to solution Free up
valuable IT resources Maximize your IT investment
Slide 22
ProLiant Servers
Slide 23
Slide 24
Miscellaneous Techniques to Improve SQL Server BI
Performance
Slide 25
Slide 26
Slide 27
Slide 28
Slide 29
Slide 30
SQL Server Analysis Services 2008
Slide 31
SQL Server Analysis Services 2008 Techniques to Improve
Performance SSAS SSAS has to major components Formula Engine (does
most of the analysis work and tries to keep cells in memory) Fast
clock speeds are best Storage Engine (if cells are not in memory,
the Storage Engine gets the data from disk) Goal is to minimize
Storage Engine use and keep data in memory for the Formula Engine
to use Faster Storage (SSD) OR more disk drives for quicker
responses to Storage Engine Manage your partitions in your AS
Database by query performance required Because Large Cubes > 100
GB may not fit in memory. So we design the partitions to get into
memory as quickly as possible. Best Practice less than 4 million
cells per partition
Slide 32
Tune memory
Slide 33
Slide 34
Slide 35
Slide 36
Buffers are allocated via Execution Trees Each of these
Numbered Steps represents a new Execution Tree Spawning multiple
copies of the package with a horizontal partition of data will
create more process space and execution trees
Sign up for TechEd 2011 and save $500 starting June 8 June 31
st http://northamerica.msteched.com/registration You can also
register at the North America 2011 kiosk located at registration
Join us in Atlanta next year