ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech...

59
ISQS 3358, Business Intelligence ISQS 3358, Business Intelligence Cubism – Measures and Cubism – Measures and Dimensions Dimensions Zhangxi Lin Texas Tech University 1

Transcript of ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech...

Page 1: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

ISQS 3358, Business IntelligenceISQS 3358, Business Intelligence

Cubism – Measures and Cubism – Measures and DimensionsDimensionsZhangxi Lin

Texas Tech University

1

Page 2: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

OutlineOutlineWhere we’ve beenPopulating fact tableCreating a cube with SSISMeasuresTypes of dimensions Cube design tabs

2

Page 3: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Structure and Components Structure and Components of Business Intelligenceof Business Intelligence

3

SSMSSSMS SSISSSIS SSASSSAS

SSRSSSRS

SASEM

SASEM

SASEG

SASEG

Page 4: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Snowflake Schema of the Data Snowflake Schema of the Data MartMart

4

Manufacturingfact

DimProduct

DimProductSubType

DimProductType

DimBatch

DimMachine

DimMachineType

DimMaterial

DimPlant

DimCountry

1

2

3

4

5

8

6

7

910

Page 5: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Where we’ve been and where we Where we’ve been and where we are noware now Exercise 1: Getting started Exercise 2: Creating data marts Exercise 3: Creating a cube from a data mart Exercise 4: Populating dimensions of a data

mart Exercise 5: Exploring features of ETL data

conversion tasks Exercise 6: Loading fact tables

5

Page 6: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

What we need to do with the half-What we need to do with the half-done data mart?done data mart?Populate DimBatch dimenstion tablePopulate ManufacturingFact tableBuild an OLAP cube (we already did

this before)Check measuresCheck dimensions

6

Page 7: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

LOADING FACT LOADING FACT TABLESTABLES

7

Page 8: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Exercise 6: Loading Fact Exercise 6: Loading Fact TablesTablesProject name: MMMFactLoad-lastnamePackage name: FactLoad.dtsxTasks

◦ Create Inventory Fact table◦ Load Dim Batch◦ Load Manufacturing Fact◦ Load Inventory Fact

Deliverable: email a screenshot of the “green” outcome of the ETL project to [email protected]

8

Page 9: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Inventory Fact TableInventory Fact Table

Create a Table InventoryFact in database MaxMinManufacturingDM-lastname.◦ Compound primary key: DateOfInventory,

ProductCode, and Material◦ Define two foreign keys

Column Name Data Type Allow Nulls

InventoryLevel Int No

NumberOnBackorder Int No

DateOfInventory Datatime No

ProductCode Int No

Material Varchar(30) No

9

Page 10: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Data Sources for Loading Data Sources for Loading FactFactFor loading DimBatch table and

ManufacturingFact table◦ BatchInfo.CSV

For loading InventortyFact table ◦ OREDB.OrderProcessingSystem.Inventory

10

Page 11: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Control Flow for Loading Facts and Control Flow for Loading Facts and the Remaining Dimensionthe Remaining Dimension

Note: to ease debugging, you may use three packages and test them one by one, instead of doing everything in one package

11

Page 12: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Flat File Connection Flat File Connection Data types

◦BatchNumber, MachinNumber: four-byte signed integer [DT_I4]

◦ProductCode, NumberProduced, NumberRejected: four-byte signed integer [DT_I4]

◦TimeStarted, TimeStopped: database timestamp [DT_DBTimeStamp]

Only check BatchNumber as the input of Dim Batch

All columns are needed for fact tables

12

Page 13: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Load DimBatch Data FlowLoad DimBatch Data Flow

13

Page 14: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Load DimBatch Data FlowLoad DimBatch Data Flow

14

Note: Because of duplication in the source file, we may insert An Aggregate item after the Flat File Source item.

Page 15: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

The Flat File SourceThe Flat File Source

15

Page 16: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

16

Sort Transformation

In the Aggregate item,Define “Group-by” BatchNumber.

In Derived column item, Define BatchName From BatchNumber

Page 17: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Load Fact Data FlowLoad Fact Data Flow

17

Page 18: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Derived Columns for the Fact Derived Columns for the Fact tabletable

18

Page 19: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Expressions for the Derived Expressions for the Derived ColumnsColumnsAcceptedProducts

◦ [NumberProduced] – [NumberRejected]ElapsedTimeForManufacture

◦ DATEDIFF(“mi”, [TimeStarted],[TimeStopped])

DateOfManufacture◦ (DT_DBTIMESTAMP)SUBSTRING((DT_WSTR,

25)[TimeStarted],1,10) This expression converts TimeStarted into a

string and selects the first ten characters of that string. This string is then converted back into a date time, without the time portion.

19

Page 20: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

20

OLE DB DestinationFor loading the facttable

Page 21: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Load Inventory FactLoad Inventory FactOLE DB Source

◦ OrderProcessingSystem.InventoryFactOLE DB Destination

◦ MaxMinManufacturingDM-lastname.InventoryFactNo transformation

There are two ways to loading the table◦ Create the table and use ETL to load it◦ Import directly from the source to the database

MaxMinManufacturingDM-lastname

21

Page 22: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Debugging ResultsDebugging Results

22

Loading DimBatch Loading ManufacturingFact

Page 23: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

BUILDING AN OLAP BUILDING AN OLAP CUBECUBE

23

Page 24: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Three Steps to Create a Cube from Three Steps to Create a Cube from Data SourcesData SourcesDefining data sourceDefining data source view

◦ Add in three new columns of year, quarter, and month for the two fact tables

Building a cube. ◦ Define a new dimension Dim Time from

Manufacturing Fact tableCustomize the cube:

◦ Link two fact tables in a cube◦ Define new primary key for Dim Time◦ Define calculated measures◦ Relate dimensions to measures

24

Page 25: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

T-SQL Expressions for DS View T-SQL Expressions for DS View Definition - ManufactureDefinition - Manufacture YearOfManufacture

CONVERT(char(4),YEAR(DateOfManufacture)) QuarterOfManufacture

CONVERT(char(4), YEAR(DateOfManufacture)) + CASE WHEN MONTH (DateOfManufacture) BETWEEN 1 AND 3

THEN 'Q1' WHEN MONTH (DateOfManufacture) BETWEEN 4 AND 6

THEN 'Q2' WHEN MONTH (DateOfManufacture) BETWEEN 7 AND 9

THEN 'Q3'ELSE 'Q4'END

MonthOfManufactureCONVERT(char(4), YEAR(DateOfManufacture)) +

RIGHT('0'+CONVERT(varchar(2), MONTH(DateOfManufacture)),2)

25

Page 26: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

T-SQL Expressions for DS View T-SQL Expressions for DS View Definition - InventoryDefinition - Inventory YearOfInventory

CONVERT(char(4),YEAR(DateOfInventory)) QuarterOfInventory

CONVERT(char(4), YEAR(DateOfInventory)) + CASE WHEN MONTH (DateOfInventory) BETWEEN 1 AND 3

THEN 'Q1' WHEN MONTH (DateOfInventory) BETWEEN 4 AND 6

THEN 'Q2' WHEN MONTH (DateOfInventory) BETWEEN 7 AND 9

THEN 'Q3'ELSE 'Q4'END

MonthOfInventoryCONVERT(char(4), YEAR(DateOfInventory)) +

RIGHT('0'+CONVERT(varchar(2), MONTH(DateOfInventory)),2)

26

Page 27: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Data Source ViewData Source View

27

New columns

Page 28: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Select Measures PageSelect Measures Page

28

Uncheck ManufactureFact Count

Page 29: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Review New Dimensions Review New Dimensions PagePage

29

Rename ManufacturingFact to Dim Time

Page 30: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

30

The finished cube

New dimensionCreated from the Fact table

Page 31: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

31

CubeStructure

Page 32: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

MEASURES MEASURES

32

Page 33: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

FactsFactsMeasurements associated with a specific

business process.Types of measures

◦Most facts are additive (calculative), such as sum; others are semi-additive (those that can be added along some dimensions, not along others), non-additive (such as max, average), or descriptive (e.g. factless fact table).

Many facts can be derived from other facts. So, non-additive facts can be avoided by calculating it from additive facts.

33

Page 34: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Calculated measuresCalculated measuresThe definition of calculated measure is stored

in the OLAP cube itself. The actual values that result from a calculated measure are not calculated, however, until a query containing that calculated measure is executed. The results of that calculation are then cached in the cube. The cached value is then delivered to any subsequent users requesting the same calculation.

The expressions of calculation are created using a language known as Multidimensional Expression Language (MDX) script. MDX is different from T-SQL. It is a special language with features designed to handle the advanced mathematics and formulas required by OLAP analysis. This is not found in T-SQL.

34

Page 35: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

35

Define Format String “#, #” for measures:

AcceptedProduct, RejectedProject

Page 36: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

36

Defining a format string

Page 37: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

37

Page 38: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

38

Page 39: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Define Calculated Define Calculated MeasuresMeasures

39

Page 40: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

DIMENSIONSDIMENSIONS

40

Page 41: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Managing DimensionsManaging Dimensions

41

Page 42: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Managing DimensionsManaging Dimensions

42

Page 43: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Relating Dimensions to Measure Relating Dimensions to Measure GroupsGroups

43

Page 44: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Completed Dimension Completed Dimension DefinitionsDefinitions

44

Page 45: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Types of DimensionsTypes of Dimensions Fact dimensions: the Dimensions created from attributes in a fact

table Parent-Child dimensions: Built on a table containing a self-

referential relationship, such as a parent attribute. Role playing dimensions: related to the same measure group

multiple times; each relationship represents a different role the dimension play; for example, time dimension plays three different roles: date of sale, data of shipment, and date of payment

Reference dimensions: Not related directly to the measure group but to another regular dimension which in turn related to the measure group

Data mining dimensions: the information discovered by data mining

Many-to-many dimensions: e.g. multiple ship to addresses Slowly changing dimensions

◦ Type 1 SCD – no track◦ Type 2 SCD – tracking the entire history, adding four attributes: SCD

Original ID, SCD Start Date, SCD End Date, SCD Status◦ Type 3 SCD – Similar to Type 2 SCD but only track current state and the

original state; two additional attribute: SCD Start Date, SCD Initial Value

45

Page 46: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

CUBE DESIGN TABS CUBE DESIGN TABS

46

Page 47: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Understanding the Cube Designer Understanding the Cube Designer Tabs Tabs

47

Cube Structure: Use this tab to modify the architecture of a cube. Dimension Usage: Use this tab to define the relationships

between dimensions and measure groups, and the granularity of each dimension within each measure group.

Calculations: Use this tab to examine calculations that are defined for the cube, to define new calculations for the whole cube or for a subcube, to reorder existing calculations, and to debug calculations step by step by using breakpoints.

KPIs: Use this tab to create, edit, and modify the Key Performance Indicators (KPIs) in a cube.

Actions: Use this tab to create or modify drillthrough, reporting, and other actions for the selected cube..

Partitions: Use this tab to create and manage the partitions for a cube. Partitions let you store sections of a cube in different locations with different properties, such as aggregation definitions.

Perspectives: Use this tab to create and manage the perspectives in a cube. A perspective is a defined subset of a cube, and is used to reduce the perceived complexity of a cube to the business user.

Translations: Use this tab to create and manage translated names for cube objects, such as month or product names.

Browser: Use this tab to view data in the cube.

ISQS 6339, Data Mgmt & Business Intelligence

Page 48: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Key Performance Indicators Key Performance Indicators (KPIs)(KPIs)

48

Digital dashboardCreating a KPI

ISQS 6339, Data Mgmt & Business Intelligence

Page 49: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

The MDX expression for KPI The MDX expression for KPI Status Status Expression Expression (MaxMinManufacturingDM)(MaxMinManufacturingDM)

49

Case When ROUND([Measures].[percent Rejected],4) < 0.0103

Then 1When ROUND([Measures].[percent Rejected],4) >= 0.0103 AND

ROUND([Measures].[percent Rejected],4) >= 0.0104 Then .5

When ROUND([Measures].[percent Rejected],4) >= 0.0104 AND ROUND([Measures].[percent Rejected],4) >= 0.0105 Then 0

When ROUND([Measures].[percent Rejected],4) >= 0.0105 AND ROUND([Measures].[percent Rejected],4) >= 0.0106 Then -.5

Else -1End

ISQS 6339, Data Mgmt & Business Intelligence

Page 50: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

50

Calculated measure

Page 51: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

51

KPI definition anddeployment

Page 52: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

KPI BrowserKPI Browser

52

Browser View

ISQS 6339, Data Mgmt & Business Intelligence

Page 53: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

ActionsActions

53

Instructions stored inside the cubeAllow the OLAP cubes to “reach out and

touch someone.”Enable us to define commands,

statements, and directives that are to be executed outside of the cube

Linked to certain objects in the cube, which can be enacted as a menu when a user is browsing the objects. The user can select one of the these actions to accomplish certain tasks.

ISQS 6339, Data Mgmt & Business Intelligence

Page 54: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Types of ActionsTypes of Actions

54

Action◦ Dataset◦ Proprietary◦ Rowset - Retrieve a rowset.◦ Statement◦ URL

Drillthrough Action. Defines a dataset to be returned as a drillthrough to a more detailed level

Report Action. Launch a SQL Server 2005 Reporting Services report

ISQS 6339, Data Mgmt & Business Intelligence

Page 55: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

55

Defining Actions

Page 56: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

56

Perspectives

Page 57: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

57

Translations

Page 58: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Q & AQ & A Conceptual level

◦ What are rationale behind the structure of “Data Source”, “Data Source View” and “Cube”?

◦ Why time dimension is so important in a data mart?◦ Why is the multi-levels of dimensions, such as Material-MachineType-

Machine in MaxMinManufacturingDM, useful? ◦ Why do you need to change the primary key of DimTime after it was

created from the MaxMinManufacturingFact table?◦ Can you summarize a number of main differences between a regular

database design and a data mart design? Technical level

◦ After you made changes in a data source node why do you have to check “Mapping” in the data destination node again?

◦ When there is a red wave line under an object, such as a table during cube design, what does it imply? How to solve it? Specifically, when a fact table has such a problem how could it be fixed?

◦ Why not all dimensions appear in the cube structure diagram? ◦ What is the difference between the variable names in the format of

Name and [Name]? ◦ Do you understand the parameters configured in the data flow tasks,

such as those in data sources, data destination, Aggregate node, Derived Column node, etc?

Any other questions?

58

Page 59: ISQS 3358, Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University 1.

Data Mart Application Development Data Mart Application Development DebuggingDebugging Problem 0: You cannot find your database entry. Problem 1: The source node is red after running a

data flow task◦ Causes?

Problem 2: The destination node is red after running a data flow task◦ Causes?

Problem 3: Even though you redefined the source node, the problem remains.

Open problems◦ What are frequently encountered problems in ETL

application implementation?◦ What are the problems you encountered in building a

cube?

59