Dimensional Modeling for OLAP Cubes

41
How to Build Analysis Services Cubes 2005 / 2008 Jen Underwood MCITP, MCDBA, MCSD [email protected] April 25, 2009 2005 / 2008

Transcript of Dimensional Modeling for OLAP Cubes

Page 1: Dimensional Modeling for OLAP Cubes

How to Build

Analysis Services Cubes

2005 / 2008

Jen Underwood

MCITP, MCDBA, MCSD

[email protected]

April 25, 2009

2005 / 2008

Page 2: Dimensional Modeling for OLAP Cubes

Session Agenda

• Roadmaps and Resources

• Dimensional Modeling and Data Marts

– Basic Concepts

– Common Mistakes

• Microsoft Analysis Services

– What, Why and How

– Architecture

– Features

Demo: Building a Cube Top Down and Bottom Up

© 2009 by Jen Underwood. All rights reserved.

• Demo: Building a Cube Top Down and Bottom Up

– Unified Dimensional Model (UDM) and Data Source Views (DSV)

– Facts, Dimensions and Hierarchies

– Calculations and Named Sets

– Perspectives

– Not Covered: Data Mining, Partitioning, Aggregation Design and Processing

• Additional Resources

Page 3: Dimensional Modeling for OLAP Cubes

Roadmaps and Resources

Page 4: Dimensional Modeling for OLAP Cubes

Microsoft Solution Framework BI Roadmap

• Great implementation kit that can be downloaded for free from Microsoft

• Sample project requirements templates, basic project schedule and set up guides

© 2009 by Jen Underwood. All rights reserved.

SOURCE: Microsoft Performance Point Methodology Kit

Page 5: Dimensional Modeling for OLAP Cubes

Microsoft Solution Framework BI Roadmap

© 2009 by Jen Underwood. All rights reserved.

SOURCE: Revised from the Microsoft Performance Point Methodology Kit

Page 6: Dimensional Modeling for OLAP Cubes

Kimball Data Warehouse Toolkit Roadmap

• Kimball Group Microsoft Data Warehouse and ETL toolkit books are excellent resources

• Kimball toolkits also contain many useful templates and utilities

© 2009 by Jen Underwood. All rights reserved.

SOURCE: Kimball Data Warehouse Toolkit Books

Page 7: Dimensional Modeling for OLAP Cubes

Kimball Data Warehouse Toolkit Roadmap

© 2009 by Jen Underwood. All rights reserved.

SOURCE: Kimball Data Warehouse Toolkit Books

Page 8: Dimensional Modeling for OLAP Cubes

Moss Atre BI Roadmap

• Business Intelligence Roadmap by Larissa Moss

and Shaku Atre is also a great resource

• Moss/Atre’s CD includes checklists and an

extremely comprehensive, detailed project plan

Detailed Entry and Exit Criteria by Phase

© 2009 by Jen Underwood. All rights reserved.

SOURCE: Business Intelligence Roadmap, Larissa Moss and Shaku Atre

http://www.atre.com

Page 9: Dimensional Modeling for OLAP Cubes

Dimensional Modeling

Page 10: Dimensional Modeling for OLAP Cubes

Basic Dimensional Modeling

• A data modeling technique for data warehouses

• An Analysis Services cube is a logical structure

defined off of a Dimensional data model

• Cubes contain Dimensions and Facts

– Facts: numeric values, measures and calculations

– Dimensions: Attributes, hierarchies and properties

• Dimensional modeling is different than transactionalSOURCE: SQL Server Central

© 2009 by Jen Underwood. All rights reserved.

• There is a learning curve!!!

– Expect to experiment

– Be ready for multiple data model revisions

– Excel Pivot tables can aid in testing design concepts

• Excellent resources for further self study

– Kimball Group books, website and numerous

Dimensional design tips

– IBM ‘s free Dimensional Modeling Redbook guides

Page 11: Dimensional Modeling for OLAP Cubes

Relational vs. Dimensional Data Modeling

Relational

• Data is normalized

• Many tables and relationships

• Volatile data changes

• SQL is used to manipulate data

• Basic OLTP reports

• Data designed for business systems

Dimensional

• Data is more de-normalized

• Several Fact tables related to Dimensions

• Non volatile data

• MDX is used to manipulate data

• Interactive, drag and drop OLAP reports

© 2009 by Jen Underwood. All rights reserved.

• Data designed for business systems

• Data designed for concurrent handling of many small transactions by many users

• User is usually constrained by an application that understands the data design

• Users are typically operational staff

reports

• Data designed for analysis systems

• Suited for analyzing large amounts of data by a modest numbers of users

• Designed for do-it-yourself data analysis

• Users are typically analysts, company strategists, or executives

Page 12: Dimensional Modeling for OLAP Cubes

Dimensional Modeling Design ProcessLowest level of

Fact detail

© 2009 by Jen Underwood. All rights reserved.

SOURCE: IBM Dimensional Design Redbook

Page 13: Dimensional Modeling for OLAP Cubes

Kimball Data Warehouse Bus Architecture

• Data warehouse architecture that uses conformed / shared Dimensions across

business processes

• Matrix can assist in planning, visualizing, designing and validating

© 2009 by Jen Underwood. All rights reserved.

Page 14: Dimensional Modeling for OLAP Cubes

Stars and Snowflakes

Star – A database schema for representing

multidimensional data

– Simplest form of data warehouse schema with one or more Dimensions and Facts

– Hierarchies are stored “flattened” in the Dimensional table

Snowflake

– A star schema further normalized

© 2009 by Jen Underwood. All rights reserved.

– A star schema further normalized

through the use of referenced

“outrigger” tables

– Increased number of joins can impact

performance, some design tradeoffs

between redundant storage and

performance

– Hierarchies are separated into referenced

Dimension tables SOURCE: LearnDataModeling.com

Page 15: Dimensional Modeling for OLAP Cubes

Types of Dimensions and Facts

Dimensions

• Regular: Directly relates to a Fact

• Referenced “Outrigger” relates indirectly to a Fact (snowflake)

• Parent-Child: Relates to itself

• Many-to-Many: N: 1 key mapping to a Fact , overlapping in time

• Bank Account, Diagnosis

Facts

• Regular Fact: Measure is additive

• Sales Amount

• Fact-less Fact: Many-to-Many Dimension

• Event Tracking, Mappings and Unnatural Hierarchies

• Semi-additive : Snapshots that use an aggregation function Count, LastNonEmpty, Max, Min, etc.

© 2009 by Jen Underwood. All rights reserved.

• Bank Account, Diagnosis

• Role Playing: One with multiple roles

• Order Date, Ship Date

• Junk: Contains low cardinality flags, codes

• Degenerate: 1: 1 attribute put in the Fact

• Order Number

• Account: Special type with financial account aggregation intelligence

• Income, Expense

LastNonEmpty, Max, Min, etc.

• Average Daily Balance, Inventory

• Non-additive: Measure is not aggregated

• Ratio, Average, Computation

Page 16: Dimensional Modeling for OLAP Cubes

Hierarchies

• Identified by how the Fact measures will be analyzed, drilled or pivoted

– Display Total Volume “By Region, Country down to Exchange”

• Categories or paths to summarize Dimension attributes

– “Exchanges” Region > Country > Exchange // Asia Pacific > Japan > Jasdaq

– “Calendar Date” Year > Quarter > Month > Date // 2009 > Q2 > April > April 25, 2009

• Natural and Unnatural Hierarchies

– Natural hierarchies attributes are related between each level , “Exchanges” and

“Calendar Date” are Natural

© 2009 by Jen Underwood. All rights reserved.

“Calendar Date” are Natural

– Unnatural hierarchies do not have relationships between levels,

an example might be custom defined client groupings

• Hierarchy attribute relationships are critical for

Analysis Services cube performance

• BIDS 2008 has new best practices alerts to aid

dimension and attribute relationship design

Page 17: Dimensional Modeling for OLAP Cubes

Surrogate Keys

• Critical, data warehouse created, artificial keys that keeps Dimension member keys independent from source system natural keys

• Surrogate keys are NOT the source system natural primary key

• Data warehouse Dimension-Fact relationships are based on surrogate keys

• Allows the data warehouse to be flexible by addressing source system natural key changes as attribute changes

• Also handles unknown natural key situations elegantly

• Implemented in ETL “surrogate key pipeline” using look ups to assign natural keys with a data warehouse surrogate key each time either a Dimension record or a

© 2009 by Jen Underwood. All rights reserved.

with a data warehouse surrogate key each time either a Dimension record or a Fact record loaded

SOURCE: InformationManagement.com

Page 18: Dimensional Modeling for OLAP Cubes

Slowing Changing Dimensions

• Dimensions that change over time are called Slowly Changing Dimensions– Examples could be Customer, Employee, Region

• Various table designs with implementation logic in ETL processes

• Type 1 Overwrite– Historical values are updated in place (replaced) so comparisons use current data only

– Provides only an AS-IS view of the business

• Type 2 Create another Dimension record as the active record – Tracks historical data by creating multiple records in dimension with separate keys,

effective dates and status flags

© 2009 by Jen Underwood. All rights reserved.

effective dates and status flags

– Keys referenced in Facts as of the effective date

– Historical values are not replaced and historical comparisons contain valid context

– Provides AS-IS and AS-WAS views of the business

• Type 3 Creating new columns to extend Dimension record

– Can be used to view current and prior values side-by-side in same Dimension row

– Partially limited to amount of created SCD columns

• Other Types 4, 6 and Hybrid

Page 19: Dimensional Modeling for OLAP Cubes

Common Dimensional Modeling Errors

• Not using Surrogate Keys

• Over Normalizing Dimensions and Facts

– Refer specifically to Kimball Design Tips #25 and #95

• Not implementing Slowly Changing Dimensions

– The Data Warehouse should be able to provide accurate reporting over time

• Summarizing Facts versus keeping the grain at lowest level

– The Data Warehouse should provide access to the most detailed data so it can be rolled

up based on a variety of business questions that may be unknown during design

© 2009 by Jen Underwood. All rights reserved.

up based on a variety of business questions that may be unknown during design

• Designing departmental versus enterprise wide bus dimensional models

• “Build it and they will come” mentality

– The Data Warehouse should have executive sponsorship

– Design the Data Warehouse with the users around business processes

• Assuming Dimensions and Facts will be populated from a single source system

• Overlooking unknown, error, late arriving and early arriving data handling

Page 20: Dimensional Modeling for OLAP Cubes

Microsoft Analysis Services

Page 21: Dimensional Modeling for OLAP Cubes

Why Analysis Services

• Empowers users with sophisticated ad-hoc reporting and data analysis

• Reduces custom SQL report scripts and development tasks

• Speed!!! OLAP is FAST since it can store pre-calculated values

– Analysis Services can handle extremely large, multi-terabyte databases

– Query processing time can be reduced from minutes, hours or days to seconds

• Analysis Services data sources can be used with many existing tools

– Excel Pivot Tables for analyzing information using business terms

– Non-Microsoft products and platforms

© 2009 by Jen Underwood. All rights reserved.

– Non-Microsoft products and platforms

• Cognos, Business Objects and other vendors business intelligence tools

• Oracle, Teradata, SAP BI, Siebel and many other vendors databases/data sources

• Easily distribute reports over the intranet or internet

– Excel Services for web based viewing of analysis

– Report Center in SharePoint for centralized report management

– SQL Server Reporting Services for personalized views of data

– PerformancePoint Scorecards and Dashboards

– Embed report viewers in analytics applications and many other possibilities!

Page 22: Dimensional Modeling for OLAP Cubes

Analysis Services Architecture

SQL Server 2005 Analysis Services

Visual Studio 2008/2005

Multi-Dimensional (MDX) Engine

Data Mining (DMX) Engine

KPI Engine

An

aly

sis

Se

rvic

es

Ma

na

ge

ment O

bje

cts

XM

L fo

r An

aly

sis

(XM

L/A

)

Driv

er

XM

L/A

|| OD

BO

DDL XML/A (MDX)

Clie

nt a

pp

lica

tions

De

ve

lop

er &

Ad

min

To

ols

© 2009 by Jen Underwood. All rights reserved.

SQL Server 2008/2005

Management Studio

Storage Engine

MOLAP

An

aly

sis

Se

rvic

es

Ma

na

ge

ment O

bje

cts

XM

L fo

r An

aly

sis

(XM

L/A

)

UNIFIED DimensionAL MODEL

ROLAP HOLAP

Local

cube

Driv

er

XM

L/A

|| OD

BO

Clie

nt a

pp

lica

tions

De

ve

lop

er &

Ad

min

To

ols

Data Sources

SOURCE: January 2007 presentation by Tom Casey, Microsoft

Page 23: Dimensional Modeling for OLAP Cubes

Analysis Services Highlighted Features

• Unified Dimensional Model (UDM) “Metadata Layer”

• Data Source Views (DSV)

• Facts, Dimensions and Hierarchies

• Calculations, Named Sets and KPIs

• Perspectives

• Actions

• Role –based Dimensional Cell Security

© 2009 by Jen Underwood. All rights reserved.

• Role –based Dimensional Cell Security

• Translations / Cube Localization

Page 24: Dimensional Modeling for OLAP Cubes

Analysis Services 2008 New Features

• Cube Performance and Scalability

– Much better cube subspace computation

and MDX query performance

– New MOLAP write-back capability

– Enhanced scale-out and back up

• Improved Cube Design

– Best practices alerts

© 2009 by Jen Underwood. All rights reserved.

– Best practices alerts

– Attribute relationship designer

– Aggregations design and

management

– New dynamic named sets

– Personalization extensions

– Improved BIDS usability

SOURCE: SQL Server 2008 Whitepaper: An Introduction to New Data Warehouse Scalability Features in SQL Server 2008 by Eric N. Hanson and others

Page 25: Dimensional Modeling for OLAP Cubes

BIDS Helper for 2005/2008 Cube Design

• FREE! Check it out on CodePlex

• Calculation Helpers

• Delete Unused Aggregations

• Deploy MDX Script

• Dimension Health Check

• Dimension Optimization

© 2009 by Jen Underwood. All rights reserved.

• Printer Friendly

Dimension Usage

• Smart Diff

• Visualize Attribute Lattice

SOURCE: CodePlex BIDS Helper Project

Page 26: Dimensional Modeling for OLAP Cubes

• A central metadata repository defining business

entities, logic, calculations, and metrics

• UDM has four key elements

– heterogeneous data access

– a rich end-user model

– advanced analytics

– proactive caching

Unified Dimensional Model (UDM)

“Metadata Layer”

© 2009 by Jen Underwood. All rights reserved.

– proactive caching

• UDM provides flexible, role-based

security model to cell level granularity

• UDM unites Relational and Dimensional

models by combining the best aspects

from both

SOURCE: Introducing Microsoft Analysis Services

Page 27: Dimensional Modeling for OLAP Cubes

Calculations and KPIs

• A calculation is an MDX expression or script

– define a calculated member

– a named set

– a scoped assignment in a cube

– defined not by the data of the cube, but by

expressions that can reference other parts of the

cube, other cubes, or even information outside

Analysis Services

© 2009 by Jen Underwood. All rights reserved.

Analysis Services

• A Key Performance Indicator (KPI) is a

quantifiable measurement for gauging business

performance

– An Analysis Services KPI provides a visual

representation of KPI metrics over time

– Rather than displaying numbers, visual

representations can be defined to indicate

positive, neutral, or negative current status and

trend progress against goalsSOURCE: MSDN

Page 28: Dimensional Modeling for OLAP Cubes

Perspectives

• A perspective defines a viewable subset of a cube that provides focused, business-specific or application-specific viewpoints on the cube. The perspective controls the visibility of objects that are contained by a cube

• The following objects can be displayed or hidden in a perspective:– Dimensions

– Attributes

– Hierarchies

– Measure groups

– Measures

© 2009 by Jen Underwood. All rights reserved.

– Measures

– Key Performance Indicators (KPIs)

– Calculations (calculated members, named sets, and script commands)

– Actions

SOURCE: MSDN

Page 29: Dimensional Modeling for OLAP Cubes

Actions

• Actions are used to extend cube functionality

– CommandLine

– Dataset

– Drill through

– Html

– Proprietary

– Report

– Rowset

© 2009 by Jen Underwood. All rights reserved.

– Rowset

– Statement

– URL

Page 30: Dimensional Modeling for OLAP Cubes

Dimensional Cell Security

• Analysis Services security is based on and integrated with Microsoft Windows security

• Each role can contain one or more specific user accounts or user groups

• Analysis Services contains role based security at the database and cube levels

• Security can dynamically restrict access to Dimensions and values

• Visual totals allow filtering of aggregate values for a Dimension so that the Grand Total only

displays the aggregate for the visible members for a Dimension

• To apply security down to a cell-level, MDX expressions are used

• Customized cube security can be developed using named sets or dynamic SSAS stored

© 2009 by Jen Underwood. All rights reserved.

• Customized cube security can be developed using named sets or dynamic SSAS stored

procedures to construct named sets

SOURCE: Microsoft

Page 31: Dimensional Modeling for OLAP Cubes

• Translations provide a simple, centrally managed mechanism for storing and

presenting user interface elements to users in their preferred languages

Translations / Cube Localization

© 2009 by Jen Underwood. All rights reserved.

SOURCE: Microsoft

Page 32: Dimensional Modeling for OLAP Cubes

Demo: Building a Cube

Top Down and Bottom Up

Page 33: Dimensional Modeling for OLAP Cubes

Basic Steps

• New Analysis Services Project in BIDS

• Create Data Source Connection

– Top Down without a source; BIDS creates an empty source schema

– Bottom Up with a populated source

• Create Data Source View

– Define relationships and calculated members

• Define Measures

• Create Dimensions

– Hierarchies and relationships

© 2009 by Jen Underwood. All rights reserved.

– Hierarchies and relationships

– Define default members, settings and error handling

• Define Measure Group/Dimension Relationships

• Create Calculations, Named Sets, KPIs and Actions

• Add Perspectives and Translations

• Add Security

• Design Partitions and Aggregation

• Process

• Browse, Review and Revise

Page 34: Dimensional Modeling for OLAP Cubes

Demo: Building a Cube Top Down

© 2009 by Jen Underwood. All rights reserved.

Page 35: Dimensional Modeling for OLAP Cubes

Demo: Building a Cube Bottom Up

© 2009 by Jen Underwood. All rights reserved.

Page 36: Dimensional Modeling for OLAP Cubes

Additional Resources

Page 37: Dimensional Modeling for OLAP Cubes

Resources

• Microsoft MSDN (msdn.microsoft.com)

• Microsoft TechNet (technet.microsoft.com)

• Kimball Group (www.kimballgroup.com)

• Atre (www.atre.com)

• Microsoft SQL Server 2005 Analysis Services Step by Step

by Reed Jacobson and Stacia Misner

• Microsoft Business Intelligence, January 2007 Presentation

© 2009 by Jen Underwood. All rights reserved.

• Microsoft Business Intelligence, January 2007 Presentation

by Tom Casey

• Applied Microsoft Analysis Services 2005,

by Teo Lachev

• SQL Server Central (www.sqlservercentral.com)

• Database Journal (www.databasejournal.com)

• SQL Server Magazine (www.sqlmag.com)

• IBM Redbooks

Page 38: Dimensional Modeling for OLAP Cubes

Resources

• Numerous Microsoft Analysis Services White Papers and Webcasts

• http://www.ssas2008-info.com

• Excellent SSAS videos by Craig Utley at http://www.learnmicrosoftbi.com

• http://www.microsoft.com/bi

• http://www.codeplex.com/bidshelper

• http://www.solidq.com/webcasts

• http://channel9.msdn.com/wiki/sqlserver

© 2009 by Jen Underwood. All rights reserved.

• http://channel9.msdn.com/wiki/sqlserver

• http://davefackler.blogspot.com/search/label/SSAS

Page 39: Dimensional Modeling for OLAP Cubes

Analysis Services Changes from 2005Issue Type Issue DescriptionThe shallow exists function now works differently

with named sets that contain enumerated members

or cross joins of enumsets.

In SQL Server 2005 Analysis Services (SSAS), the shallow exists function did

not work with named sets that contained enumerated members or cross

joins of enumsets.

VBA functions handle null values and empty values

differently than they were handled in SQL Server

2005 Analysis Services (SSAS) Analysis Services.

In SQL Server 2005 Analysis Services (SSAS), VBA functions returned 0 or an

empty string when either null values or empty values were used as

arguments. In SQL Server 2008, they will return null.

The Migration Wizard will fail because DSO is not

installed by Default.

By default, SQL Server 2008 does not install the DSO (Decision Support

Objects) backward compatibility component. The backward compatibility

package is installed by default but the DSO component of the package will

be disabled. Since the SQL Server Analysis Services Migration Wizard relies

on this component, it will fail unless the component is installed.

It is not recommended to put the partition location in The server manages the Data folder and creates or drops folders as objects

© 2009 by Jen Underwood. All rights reserved.

It is not recommended to put the partition location in

the Data folder.

The server manages the Data folder and creates or drops folders as objects

are created, deleted, and altered. Therefore, specifying a partition storage

location inside the Data folder is strongly discouraged, especially in the

subfolders for databases, cubes, and Dimensions. Restore or Sync will

require that you move partition storage locations outside the Data folder.

You might get unexpected results for queries that use

the "EXISTING" MDX keyword in ProClarity Analytics

Server and Microsoft Office PerformancePoint Server

2007.

ProClarity Analytics Server and Microsoft Office PerformancePoint Server

2007 use the EXISTING keyword in MDX incorrectly in certain scenarios. Due

to changes made in SQL Server 2008 Analysis Services, these queries might

return unexpected results.

Calculation precedence rules Calculation precedence rules have changed from previous versions of

Analysis Services. Because of the above custom rollup, expressions may

return different results than with previous versions of Analysis Services.

SOURCE: SQL Server 2008 Books Online

***This is a partial listing. Refer to SQL Server 2008 Books Online for a complete listing!!!***

Page 40: Dimensional Modeling for OLAP Cubes

Analysis Services Discontinued Features

• Minimal feature removals from Analysis Services 2005 to 2008

• Analysis Services 2000 to 2005 has a much longer list of removed and changed

features due to the major architectural changes between those versions

Category Discontinued Feature CommentsTools Surface Area Configuration Tool The Surface Area Configuration Tool is discontinued for

SQL Server 2008. For more information, see Backward

Compatibility

SOURCE: SQL Server 2008 Books Online

***This is a partial listing. Refer to SQL Server 2008 Books Online for a complete listing***

© 2009 by Jen Underwood. All rights reserved.

features due to the major architectural changes between those versions

• If concurrently upgrading SQL Server 2008 relational data sources

– Related but different topic with a lot of great resources to reference

– There will be many other items to check and additional upgrade tasks

Page 41: Dimensional Modeling for OLAP Cubes

© 2009 by Jen Underwood. All rights reserved.© 2009 by Jen Underwood. All rights reserved.