Dimensional Modeling for OLAP Cubes
-
Upload
jen-underwood -
Category
Data & Analytics
-
view
3.337 -
download
0
Transcript of Dimensional Modeling for OLAP Cubes
How to Build
Analysis Services Cubes
2005 / 2008
Jen Underwood
MCITP, MCDBA, MCSD
April 25, 2009
2005 / 2008
Session Agenda
• Roadmaps and Resources
• Dimensional Modeling and Data Marts
– Basic Concepts
– Common Mistakes
• Microsoft Analysis Services
– What, Why and How
– Architecture
– Features
Demo: Building a Cube Top Down and Bottom Up
© 2009 by Jen Underwood. All rights reserved.
• Demo: Building a Cube Top Down and Bottom Up
– Unified Dimensional Model (UDM) and Data Source Views (DSV)
– Facts, Dimensions and Hierarchies
– Calculations and Named Sets
– Perspectives
– Not Covered: Data Mining, Partitioning, Aggregation Design and Processing
• Additional Resources
Roadmaps and Resources
Microsoft Solution Framework BI Roadmap
• Great implementation kit that can be downloaded for free from Microsoft
• Sample project requirements templates, basic project schedule and set up guides
© 2009 by Jen Underwood. All rights reserved.
SOURCE: Microsoft Performance Point Methodology Kit
Microsoft Solution Framework BI Roadmap
© 2009 by Jen Underwood. All rights reserved.
SOURCE: Revised from the Microsoft Performance Point Methodology Kit
Kimball Data Warehouse Toolkit Roadmap
• Kimball Group Microsoft Data Warehouse and ETL toolkit books are excellent resources
• Kimball toolkits also contain many useful templates and utilities
© 2009 by Jen Underwood. All rights reserved.
SOURCE: Kimball Data Warehouse Toolkit Books
Kimball Data Warehouse Toolkit Roadmap
© 2009 by Jen Underwood. All rights reserved.
SOURCE: Kimball Data Warehouse Toolkit Books
Moss Atre BI Roadmap
• Business Intelligence Roadmap by Larissa Moss
and Shaku Atre is also a great resource
• Moss/Atre’s CD includes checklists and an
extremely comprehensive, detailed project plan
Detailed Entry and Exit Criteria by Phase
© 2009 by Jen Underwood. All rights reserved.
SOURCE: Business Intelligence Roadmap, Larissa Moss and Shaku Atre
http://www.atre.com
Dimensional Modeling
Basic Dimensional Modeling
• A data modeling technique for data warehouses
• An Analysis Services cube is a logical structure
defined off of a Dimensional data model
• Cubes contain Dimensions and Facts
– Facts: numeric values, measures and calculations
– Dimensions: Attributes, hierarchies and properties
• Dimensional modeling is different than transactionalSOURCE: SQL Server Central
© 2009 by Jen Underwood. All rights reserved.
• There is a learning curve!!!
– Expect to experiment
– Be ready for multiple data model revisions
– Excel Pivot tables can aid in testing design concepts
• Excellent resources for further self study
– Kimball Group books, website and numerous
Dimensional design tips
– IBM ‘s free Dimensional Modeling Redbook guides
Relational vs. Dimensional Data Modeling
Relational
• Data is normalized
• Many tables and relationships
• Volatile data changes
• SQL is used to manipulate data
• Basic OLTP reports
• Data designed for business systems
Dimensional
• Data is more de-normalized
• Several Fact tables related to Dimensions
• Non volatile data
• MDX is used to manipulate data
• Interactive, drag and drop OLAP reports
© 2009 by Jen Underwood. All rights reserved.
• Data designed for business systems
• Data designed for concurrent handling of many small transactions by many users
• User is usually constrained by an application that understands the data design
• Users are typically operational staff
reports
• Data designed for analysis systems
• Suited for analyzing large amounts of data by a modest numbers of users
• Designed for do-it-yourself data analysis
• Users are typically analysts, company strategists, or executives
Dimensional Modeling Design ProcessLowest level of
Fact detail
© 2009 by Jen Underwood. All rights reserved.
SOURCE: IBM Dimensional Design Redbook
Kimball Data Warehouse Bus Architecture
• Data warehouse architecture that uses conformed / shared Dimensions across
business processes
• Matrix can assist in planning, visualizing, designing and validating
© 2009 by Jen Underwood. All rights reserved.
Stars and Snowflakes
Star – A database schema for representing
multidimensional data
– Simplest form of data warehouse schema with one or more Dimensions and Facts
– Hierarchies are stored “flattened” in the Dimensional table
Snowflake
– A star schema further normalized
© 2009 by Jen Underwood. All rights reserved.
– A star schema further normalized
through the use of referenced
“outrigger” tables
– Increased number of joins can impact
performance, some design tradeoffs
between redundant storage and
performance
– Hierarchies are separated into referenced
Dimension tables SOURCE: LearnDataModeling.com
Types of Dimensions and Facts
Dimensions
• Regular: Directly relates to a Fact
• Referenced “Outrigger” relates indirectly to a Fact (snowflake)
• Parent-Child: Relates to itself
• Many-to-Many: N: 1 key mapping to a Fact , overlapping in time
• Bank Account, Diagnosis
Facts
• Regular Fact: Measure is additive
• Sales Amount
• Fact-less Fact: Many-to-Many Dimension
• Event Tracking, Mappings and Unnatural Hierarchies
• Semi-additive : Snapshots that use an aggregation function Count, LastNonEmpty, Max, Min, etc.
© 2009 by Jen Underwood. All rights reserved.
• Bank Account, Diagnosis
• Role Playing: One with multiple roles
• Order Date, Ship Date
• Junk: Contains low cardinality flags, codes
• Degenerate: 1: 1 attribute put in the Fact
• Order Number
• Account: Special type with financial account aggregation intelligence
• Income, Expense
LastNonEmpty, Max, Min, etc.
• Average Daily Balance, Inventory
• Non-additive: Measure is not aggregated
• Ratio, Average, Computation
Hierarchies
• Identified by how the Fact measures will be analyzed, drilled or pivoted
– Display Total Volume “By Region, Country down to Exchange”
• Categories or paths to summarize Dimension attributes
– “Exchanges” Region > Country > Exchange // Asia Pacific > Japan > Jasdaq
– “Calendar Date” Year > Quarter > Month > Date // 2009 > Q2 > April > April 25, 2009
• Natural and Unnatural Hierarchies
– Natural hierarchies attributes are related between each level , “Exchanges” and
“Calendar Date” are Natural
© 2009 by Jen Underwood. All rights reserved.
“Calendar Date” are Natural
– Unnatural hierarchies do not have relationships between levels,
an example might be custom defined client groupings
• Hierarchy attribute relationships are critical for
Analysis Services cube performance
• BIDS 2008 has new best practices alerts to aid
dimension and attribute relationship design
Surrogate Keys
• Critical, data warehouse created, artificial keys that keeps Dimension member keys independent from source system natural keys
• Surrogate keys are NOT the source system natural primary key
• Data warehouse Dimension-Fact relationships are based on surrogate keys
• Allows the data warehouse to be flexible by addressing source system natural key changes as attribute changes
• Also handles unknown natural key situations elegantly
• Implemented in ETL “surrogate key pipeline” using look ups to assign natural keys with a data warehouse surrogate key each time either a Dimension record or a
© 2009 by Jen Underwood. All rights reserved.
with a data warehouse surrogate key each time either a Dimension record or a Fact record loaded
SOURCE: InformationManagement.com
Slowing Changing Dimensions
• Dimensions that change over time are called Slowly Changing Dimensions– Examples could be Customer, Employee, Region
• Various table designs with implementation logic in ETL processes
• Type 1 Overwrite– Historical values are updated in place (replaced) so comparisons use current data only
– Provides only an AS-IS view of the business
• Type 2 Create another Dimension record as the active record – Tracks historical data by creating multiple records in dimension with separate keys,
effective dates and status flags
© 2009 by Jen Underwood. All rights reserved.
effective dates and status flags
– Keys referenced in Facts as of the effective date
– Historical values are not replaced and historical comparisons contain valid context
– Provides AS-IS and AS-WAS views of the business
• Type 3 Creating new columns to extend Dimension record
– Can be used to view current and prior values side-by-side in same Dimension row
– Partially limited to amount of created SCD columns
• Other Types 4, 6 and Hybrid
Common Dimensional Modeling Errors
• Not using Surrogate Keys
• Over Normalizing Dimensions and Facts
– Refer specifically to Kimball Design Tips #25 and #95
• Not implementing Slowly Changing Dimensions
– The Data Warehouse should be able to provide accurate reporting over time
• Summarizing Facts versus keeping the grain at lowest level
– The Data Warehouse should provide access to the most detailed data so it can be rolled
up based on a variety of business questions that may be unknown during design
© 2009 by Jen Underwood. All rights reserved.
up based on a variety of business questions that may be unknown during design
• Designing departmental versus enterprise wide bus dimensional models
• “Build it and they will come” mentality
– The Data Warehouse should have executive sponsorship
– Design the Data Warehouse with the users around business processes
• Assuming Dimensions and Facts will be populated from a single source system
• Overlooking unknown, error, late arriving and early arriving data handling
Microsoft Analysis Services
Why Analysis Services
• Empowers users with sophisticated ad-hoc reporting and data analysis
• Reduces custom SQL report scripts and development tasks
• Speed!!! OLAP is FAST since it can store pre-calculated values
– Analysis Services can handle extremely large, multi-terabyte databases
– Query processing time can be reduced from minutes, hours or days to seconds
• Analysis Services data sources can be used with many existing tools
– Excel Pivot Tables for analyzing information using business terms
– Non-Microsoft products and platforms
© 2009 by Jen Underwood. All rights reserved.
– Non-Microsoft products and platforms
• Cognos, Business Objects and other vendors business intelligence tools
• Oracle, Teradata, SAP BI, Siebel and many other vendors databases/data sources
• Easily distribute reports over the intranet or internet
– Excel Services for web based viewing of analysis
– Report Center in SharePoint for centralized report management
– SQL Server Reporting Services for personalized views of data
– PerformancePoint Scorecards and Dashboards
– Embed report viewers in analytics applications and many other possibilities!
Analysis Services Architecture
SQL Server 2005 Analysis Services
Visual Studio 2008/2005
Multi-Dimensional (MDX) Engine
Data Mining (DMX) Engine
KPI Engine
An
aly
sis
Se
rvic
es
Ma
na
ge
ment O
bje
cts
XM
L fo
r An
aly
sis
(XM
L/A
)
Driv
er
XM
L/A
|| OD
BO
DDL XML/A (MDX)
Clie
nt a
pp
lica
tions
De
ve
lop
er &
Ad
min
To
ols
© 2009 by Jen Underwood. All rights reserved.
SQL Server 2008/2005
Management Studio
Storage Engine
MOLAP
An
aly
sis
Se
rvic
es
Ma
na
ge
ment O
bje
cts
XM
L fo
r An
aly
sis
(XM
L/A
)
UNIFIED DimensionAL MODEL
ROLAP HOLAP
Local
cube
Driv
er
XM
L/A
|| OD
BO
Clie
nt a
pp
lica
tions
De
ve
lop
er &
Ad
min
To
ols
Data Sources
SOURCE: January 2007 presentation by Tom Casey, Microsoft
Analysis Services Highlighted Features
• Unified Dimensional Model (UDM) “Metadata Layer”
• Data Source Views (DSV)
• Facts, Dimensions and Hierarchies
• Calculations, Named Sets and KPIs
• Perspectives
• Actions
• Role –based Dimensional Cell Security
© 2009 by Jen Underwood. All rights reserved.
• Role –based Dimensional Cell Security
• Translations / Cube Localization
Analysis Services 2008 New Features
• Cube Performance and Scalability
– Much better cube subspace computation
and MDX query performance
– New MOLAP write-back capability
– Enhanced scale-out and back up
• Improved Cube Design
– Best practices alerts
© 2009 by Jen Underwood. All rights reserved.
– Best practices alerts
– Attribute relationship designer
– Aggregations design and
management
– New dynamic named sets
– Personalization extensions
– Improved BIDS usability
SOURCE: SQL Server 2008 Whitepaper: An Introduction to New Data Warehouse Scalability Features in SQL Server 2008 by Eric N. Hanson and others
BIDS Helper for 2005/2008 Cube Design
• FREE! Check it out on CodePlex
• Calculation Helpers
• Delete Unused Aggregations
• Deploy MDX Script
• Dimension Health Check
• Dimension Optimization
© 2009 by Jen Underwood. All rights reserved.
• Printer Friendly
Dimension Usage
• Smart Diff
• Visualize Attribute Lattice
SOURCE: CodePlex BIDS Helper Project
• A central metadata repository defining business
entities, logic, calculations, and metrics
• UDM has four key elements
– heterogeneous data access
– a rich end-user model
– advanced analytics
– proactive caching
Unified Dimensional Model (UDM)
“Metadata Layer”
© 2009 by Jen Underwood. All rights reserved.
– proactive caching
• UDM provides flexible, role-based
security model to cell level granularity
• UDM unites Relational and Dimensional
models by combining the best aspects
from both
SOURCE: Introducing Microsoft Analysis Services
Calculations and KPIs
• A calculation is an MDX expression or script
– define a calculated member
– a named set
– a scoped assignment in a cube
– defined not by the data of the cube, but by
expressions that can reference other parts of the
cube, other cubes, or even information outside
Analysis Services
© 2009 by Jen Underwood. All rights reserved.
Analysis Services
• A Key Performance Indicator (KPI) is a
quantifiable measurement for gauging business
performance
– An Analysis Services KPI provides a visual
representation of KPI metrics over time
– Rather than displaying numbers, visual
representations can be defined to indicate
positive, neutral, or negative current status and
trend progress against goalsSOURCE: MSDN
Perspectives
• A perspective defines a viewable subset of a cube that provides focused, business-specific or application-specific viewpoints on the cube. The perspective controls the visibility of objects that are contained by a cube
• The following objects can be displayed or hidden in a perspective:– Dimensions
– Attributes
– Hierarchies
– Measure groups
– Measures
© 2009 by Jen Underwood. All rights reserved.
– Measures
– Key Performance Indicators (KPIs)
– Calculations (calculated members, named sets, and script commands)
– Actions
SOURCE: MSDN
Actions
• Actions are used to extend cube functionality
– CommandLine
– Dataset
– Drill through
– Html
– Proprietary
– Report
– Rowset
© 2009 by Jen Underwood. All rights reserved.
– Rowset
– Statement
– URL
Dimensional Cell Security
• Analysis Services security is based on and integrated with Microsoft Windows security
• Each role can contain one or more specific user accounts or user groups
• Analysis Services contains role based security at the database and cube levels
• Security can dynamically restrict access to Dimensions and values
• Visual totals allow filtering of aggregate values for a Dimension so that the Grand Total only
displays the aggregate for the visible members for a Dimension
• To apply security down to a cell-level, MDX expressions are used
• Customized cube security can be developed using named sets or dynamic SSAS stored
© 2009 by Jen Underwood. All rights reserved.
• Customized cube security can be developed using named sets or dynamic SSAS stored
procedures to construct named sets
SOURCE: Microsoft
• Translations provide a simple, centrally managed mechanism for storing and
presenting user interface elements to users in their preferred languages
Translations / Cube Localization
© 2009 by Jen Underwood. All rights reserved.
SOURCE: Microsoft
Demo: Building a Cube
Top Down and Bottom Up
Basic Steps
• New Analysis Services Project in BIDS
• Create Data Source Connection
– Top Down without a source; BIDS creates an empty source schema
– Bottom Up with a populated source
• Create Data Source View
– Define relationships and calculated members
• Define Measures
• Create Dimensions
– Hierarchies and relationships
© 2009 by Jen Underwood. All rights reserved.
– Hierarchies and relationships
– Define default members, settings and error handling
• Define Measure Group/Dimension Relationships
• Create Calculations, Named Sets, KPIs and Actions
• Add Perspectives and Translations
• Add Security
• Design Partitions and Aggregation
• Process
• Browse, Review and Revise
Demo: Building a Cube Top Down
© 2009 by Jen Underwood. All rights reserved.
Demo: Building a Cube Bottom Up
© 2009 by Jen Underwood. All rights reserved.
Additional Resources
Resources
• Microsoft MSDN (msdn.microsoft.com)
• Microsoft TechNet (technet.microsoft.com)
• Kimball Group (www.kimballgroup.com)
• Atre (www.atre.com)
• Microsoft SQL Server 2005 Analysis Services Step by Step
by Reed Jacobson and Stacia Misner
• Microsoft Business Intelligence, January 2007 Presentation
© 2009 by Jen Underwood. All rights reserved.
• Microsoft Business Intelligence, January 2007 Presentation
by Tom Casey
• Applied Microsoft Analysis Services 2005,
by Teo Lachev
• SQL Server Central (www.sqlservercentral.com)
• Database Journal (www.databasejournal.com)
• SQL Server Magazine (www.sqlmag.com)
• IBM Redbooks
Resources
• Numerous Microsoft Analysis Services White Papers and Webcasts
• http://www.ssas2008-info.com
• Excellent SSAS videos by Craig Utley at http://www.learnmicrosoftbi.com
• http://www.microsoft.com/bi
• http://www.codeplex.com/bidshelper
• http://www.solidq.com/webcasts
• http://channel9.msdn.com/wiki/sqlserver
© 2009 by Jen Underwood. All rights reserved.
• http://channel9.msdn.com/wiki/sqlserver
• http://davefackler.blogspot.com/search/label/SSAS
Analysis Services Changes from 2005Issue Type Issue DescriptionThe shallow exists function now works differently
with named sets that contain enumerated members
or cross joins of enumsets.
In SQL Server 2005 Analysis Services (SSAS), the shallow exists function did
not work with named sets that contained enumerated members or cross
joins of enumsets.
VBA functions handle null values and empty values
differently than they were handled in SQL Server
2005 Analysis Services (SSAS) Analysis Services.
In SQL Server 2005 Analysis Services (SSAS), VBA functions returned 0 or an
empty string when either null values or empty values were used as
arguments. In SQL Server 2008, they will return null.
The Migration Wizard will fail because DSO is not
installed by Default.
By default, SQL Server 2008 does not install the DSO (Decision Support
Objects) backward compatibility component. The backward compatibility
package is installed by default but the DSO component of the package will
be disabled. Since the SQL Server Analysis Services Migration Wizard relies
on this component, it will fail unless the component is installed.
It is not recommended to put the partition location in The server manages the Data folder and creates or drops folders as objects
© 2009 by Jen Underwood. All rights reserved.
It is not recommended to put the partition location in
the Data folder.
The server manages the Data folder and creates or drops folders as objects
are created, deleted, and altered. Therefore, specifying a partition storage
location inside the Data folder is strongly discouraged, especially in the
subfolders for databases, cubes, and Dimensions. Restore or Sync will
require that you move partition storage locations outside the Data folder.
You might get unexpected results for queries that use
the "EXISTING" MDX keyword in ProClarity Analytics
Server and Microsoft Office PerformancePoint Server
2007.
ProClarity Analytics Server and Microsoft Office PerformancePoint Server
2007 use the EXISTING keyword in MDX incorrectly in certain scenarios. Due
to changes made in SQL Server 2008 Analysis Services, these queries might
return unexpected results.
Calculation precedence rules Calculation precedence rules have changed from previous versions of
Analysis Services. Because of the above custom rollup, expressions may
return different results than with previous versions of Analysis Services.
SOURCE: SQL Server 2008 Books Online
***This is a partial listing. Refer to SQL Server 2008 Books Online for a complete listing!!!***
Analysis Services Discontinued Features
• Minimal feature removals from Analysis Services 2005 to 2008
• Analysis Services 2000 to 2005 has a much longer list of removed and changed
features due to the major architectural changes between those versions
Category Discontinued Feature CommentsTools Surface Area Configuration Tool The Surface Area Configuration Tool is discontinued for
SQL Server 2008. For more information, see Backward
Compatibility
SOURCE: SQL Server 2008 Books Online
***This is a partial listing. Refer to SQL Server 2008 Books Online for a complete listing***
© 2009 by Jen Underwood. All rights reserved.
features due to the major architectural changes between those versions
• If concurrently upgrading SQL Server 2008 relational data sources
– Related but different topic with a lot of great resources to reference
– There will be many other items to check and additional upgrade tasks
© 2009 by Jen Underwood. All rights reserved.© 2009 by Jen Underwood. All rights reserved.