[BIA 300 HD] SecretsSSASStorage Misner

41
November 6-9, Seattle, WA Secrets of SSAS Storage Stacia Misner Principal Consultant Data Inspirations

description

SSAS Storage

Transcript of [BIA 300 HD] SecretsSSASStorage Misner

  • November 6-9, Seattle, WA

    Secrets of SSAS Storage

    Stacia Misner

    Principal Consultant

    Data Inspirations

  • Stacia Misner Consultant, Educator, Mentor, Author

    SQL Server MVP, SSAS Maestro

    BIA-300-HD

  • Overview

    Storage Architecture

    Partitions

    Aggregations

    Storage Impact on Queries

    BIA-300-HD

  • Storage Architecture

    Query Architecture

    Dimension Storage

    Measure Group Storage

    BIA-300-HD

  • Query Architecture

    Analysis Services Server

    Client Application MDX Query

    Storage Engine

    Formula Engine

    Formula Engine Cache

    Calculation Engine Subcube Operations

    Dimension Data

    Measure Group Data

    Storage Engine

    Storage Engine Cache Attribute Store

    Hierarchy Store

    Aggregations

    Fact Data

    Compute Cell Data

    Populate Axes

    Query Parser

    BIA-300-HD

  • Dimension Storage

    Attribute Store

    Key Store

    DataID Attribute Key Member

    Property Store

    DataID Attribute Property

    Ordered by DataID for fast random access

    Key Hash Name Hash

    Relationship Store

    DataIDs

    Bitmap Indexes

    BIA-300-HD

  • Dimension Storage

    Hierarchy Store

    Set Store

    DataID

    Structure Store (Parent level)

    Level Index

    DataID Parent

    DataID

    FirstChild

    DataID

    Children

    Count

    Path of each member

    Applies only to natural hierarchies

    Structure Store (Child level)

    Level Index

    DataID Parent

    DataID

    FirstChild

    DataID

    Children

    Count

    BIA-300-HD

  • Measure Group Storage

    STORAGE ENGINE CACHE Loads as queries execute

    Clears with cleaner thread or processing of

    partition

    AGGREGATION DATA Responds to request with aggregated values

    in storage

    Summarizes lower level aggregated values

    on-the-fly as needed

    FACT DATA Scans MOLAP partitions and partitions

    segments in parallel

    Uses bitmap indexes to scan pages to find

    requested data

    Storage Engine

    Measure Group Data

    Storage Engine

    Storage Engine Cache Aggregations

    Fact Data

    BIA-300-HD

  • Partitions

    Partitioning Strategy

    Partition Storage

    Partition Design

    Merging Partitions

    BIA-300-HD

  • Partitioning Strategy

    Storage

    MOLAP

    HOLAP

    ROLAP

    Query

    20% Agg

    30% Agg

    15% Agg

    Processing

    Current Year

    Prior Year

    History

    BIA-300-HD

  • MOLAP

    cache

    Data

    Partition Storage: MOLAP

    Multidimensional OLAP

    Storage of data and aggregations in cache

    Highly compressed and indexed

    RDBMS UDM

    Analysis Server

    MDX

    Aggs

    Unified Dimensional Model

    BIA-300-HD

  • MOLAP

    cache

    SQL

    Partition Storage: HOLAP

    Hybrid OLAP

    Storage of aggregations only in cache

    Translation of MDX to SQL for retrieval of detail data

    RDBMS UDM

    Analysis Server

    MDX

    Data MDX

    Aggs

    BIA-300-HD

  • SQL

    Partition Storage: ROLAP

    Relational OLAP

    Relational storage of data and aggregations

    Real-time analysis

    RDBMS UDM

    Analysis Server

    Data/Aggs MDX

    BIA-300-HD

  • ROLAP Dimension

    Dimension data exists in relational storage only

    Analysis Services maintains a cache of requested dimension data

    Use case: Hundreds of millions of members in dimension

    BIA-300-HD

  • Partition Design

    Use one source fact table per partition

    Insulate partition from table changes by using a view

    Table Binding

    BIA-300-HD

  • Partition Design

    Use separate queries to same fact table for each partition

    Configure as last step in cube design to use most recent DSV

    Query Binding

    BIA-300-HD

  • 2005

    Merging Partitions

    Use Management Studio interface for manual merge

    Execute XMLA script for automated merge in Execute DDL Task (SSIS)

    2008

    2007

    2006 History

    BIA-300-HD

  • Aggregations

    Aggregation Concepts

    Aggregation Usage

    Aggregation Wizard Optimizations

    Aggregation Designer

    Usage Based Optimization

    BIA-300-HD

  • Aggregation Concepts

    Aggregation-aware queries Sales by country, gender, or both Sales by category Sales by year Sales by group

    Aggregation-less queries Sales by gender and year Sales by category and year Sales by country and category

    Aggregation design

    Indexed view for aggregation (ROLAP only)

    BIA-300-HD

  • Aggregation Concepts

    BIA-300-HD

  • Aggregation Usage

    Default

    Full

    Include in every aggregation (or a lower level attribute)

    Use only for most commonly used attributes, sparingly

    None

    Exclude in every aggregation

    Use for infrequently used attributes

    Unrestricted

    No constraints

    Analysis Services decides whether to include or exclude

    Rule of thumbuse only 5-10 Unrestricted attributes per dimension

    BIA-300-HD

  • Aggregation Usage Default Rules

    ATTRIBUTE

    FULL NONE UNRESTRICTED

    Granularity attribute for measure group

    Special dimension types*

    Natural hierarchies

    with attribute relationships

    Non-aggregatable attributes

    All others

    *Many-to-many dimension

    Unmaterialized referenced dimension

    Data mining dimension

    BIA-300-HD

  • Aggregation Wizard Optimizations

    Remove unneeded attributes

    Add user hierarchies for natural

    hierarchies where possible

    Ensure correct attribute relationships

    Set AttributeHierarchyEnabled to False

    for member properties

    Configure AggregationUsage in

    advance

    Use correct estimates for partition

    rows and attributes within a partition

    BIA-300-HD

  • Aggregation Designer

    Profiler: Aggregation hit

    Profiler: No aggregation hit

    BIA-300-HD

  • Usage Based Optimization

    SERVER PROPERTY

    SETTING

    Log\QueryLog\CreateQueryLogTable true

    Log\QueryLog\QueryLogConnectionString Data source=localhost;Initial Catalog=DW

    Log\QueryLog\QueryLogSampling 1

    Log\QueryLog\QueryLogTableName OlapQueryLog

    AS creates if doesnt exist, else logs to table

    Clear property value to disable

    logging

    OlapQueryLog records deleted when:

    Measures added to or removed from measure group Dimensions added to or removed from measure group Attributes added to or removed from dimension

    BIA-300-HD

  • Usage Based Optimization OLAPQueryLog

    Filter Criteria Query Details

    BIA-300-HD

  • Storage Impact on Queries

    Which Engine is the Bottleneck?

    Query Analysis

    Partitioning

    Aggregation Design

    BIA-300-HD

  • Which Engine is the Bottleneck?

    Time to execute query (cold cache)

    Storage Engine time = add elapsed time for each Query Subcube

    event

    Formula Engine = Total execution time (Query End event) Storage Engine time

    Bottleneck is engine consuming 30% or more of total query execution

    time

    BIA-300-HD

  • Prepare for Query

    Clear Cache

    Reload MDX Script (without caching)

    BIA-300-HD

  • Profiler Results Begin

    Query Begin indicates successful query parsing

    Serialize Results Current counts members on each axis

    BIA-300-HD

  • Storage Engine Reads Partitions

    BIA-300-HD

  • Query Subcube Event

    Sum each Query Subcube Event to compute total Storage Engine

    query time

    Review TextData for vectors returned to formula enginevery cryptic

    BIA-300-HD

  • Query Subcube Verbose Event

    Use Query Subcube Verbose Event to understand vectors

    BIA-300-HD

  • Subcube Details

    VALUE RESULT

    0 Default member returned

    * All members returned

    + Selected members returned

    - Slice below granularity returned

    4 Single members DataID

    BIA-300-HD BIA-300-HD

  • Profiler Results End

    Serialize Results Current reports total number of cells in query results

    Query End reports total query duration

    BIA-300-HD

  • Duration Analysis

    BIA-300-HD

  • Partitioning

    Create multiple partitions to enable optimal scans of fact data

    Partition by one or more attributes used in many queries, such as Year

    Define slice

    BIA-300-HD

  • Aggregation Design

    Aggregation Design Wizard

    View aggregation candidates

    Update member and fact record counts

    Develop and apply aggregation design

    Usage-Based Optimization Wizard

    Capture query sampling in usage log

    Tune aggregation performance to actual usage

    Aggregation Utility (2005) / Aggregation Designer (2008)

    Override aggregation design algorithm

    Best Practice!

    Good Practice

    BIA-300-HD

  • Resources

    Analysis Services 2008 Performance Guide

    http://tinyurl.com/9vqqmuy

    Analysis Services 2008 R2 Operations Guide

    http://tinyurl.com/8wvdyg4

    BIA-300-HD

    Stacia Misner

    [email protected]

    blog.datainspirations.com

    Twitter: @StaciaMisner

  • PASS Resources

    Free SQL Server and BI training Free 1-day Training Events Regional Event

    Local and Virtual User Groups Free Online Technical Training

    Learning Center

    This is Community

    BIA-300-HD

  • November 6-9, Seattle, WA

    Thank you for attending this session and

    the 2012 PASS Summit in Seattle

    BIA-300-HD