Dimensional Fact Model @ BI Academy - 2016

19
Dimensional Fact Model Stuttgart, 9/3/2016 Stefano Cazzella @StefanoCazzella http://caccio.blogdns.net http://bimodeler.com stefano.cazzella{at}gmail.com 1 1 BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella

Transcript of Dimensional Fact Model @ BI Academy - 2016

Page 1: Dimensional Fact Model @ BI Academy - 2016

Dimensional Fact Model

Stuttgart, 9/3/2016

Stefano Cazzella @StefanoCazzella

http://caccio.blogdns.net

http://bimodeler.com

stefano.cazzella{at}gmail.com

1 1 BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella

Page 2: Dimensional Fact Model @ BI Academy - 2016

My Professional Timeline

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 2

2001 2003 2005 2007 2009 2011 2013 2015

Master

degree in

Software

Engineering

Business

Intelligence

Specialist

Business

Consultant

Delivery Manager

Methodology

Industrialization of

the delivery phase

University of Rome

« La Sapienza »

Project

Manager

Datamat S.p.A.

a Finmeccanica

company

Sopra Steria Group

Consulting – IT Services – Software Solutions

Page 3: Dimensional Fact Model @ BI Academy - 2016

BI Trends

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 3

Data Integration

Descriptive

Predictive

Prescriptive

Deep learning

Business

Value

Business

Intelligence

Data

Warehouse

Simulation &

forecasting

Optimization &

automation

Semantic &

AI

Time

Digital transformation of every market

Data explosion: exponential growth of digital data

Page 4: Dimensional Fact Model @ BI Academy - 2016

Disruptive scenario

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 4

Innovative technologies

•Internet of Things

•Big Data

•Distributed computing

•In Memory systems

•Cloud

•Mobile

Complex architectures

•Data federation

•Data store

•No SQL

•Distributed file system

•Appliances

•Real-time data integration

Business transformations

•Frenetic time-to-market

•API / service economy

•Data-driven company

•Business process automation

… more … … more … … more …

Page 5: Dimensional Fact Model @ BI Academy - 2016

Business

Design

Build

Business

Desing

Build

New processes ? Roles ?

Waterfall process Iterative process

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 5

Business

Analyst

Engineer

Technician

Data

Scientist

Business

Analyst

Engineer

Technician

Page 6: Dimensional Fact Model @ BI Academy - 2016

Project Layers for Data Mart

Business

•Dimensional Fact Model

Design

•Relational model

Build

•DBMS specific DDL

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 6

Page 7: Dimensional Fact Model @ BI Academy - 2016

Why Dimensional Fact Model ?

Formal language well-specified syntax and an unequivocally interpretation (semantic) based on a sound algebraic definition

Simple and effective graphical notation (representation)

Does not imply any technical/implementation choice

Specifically designed to represent multi-dimensional models

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 7

1

2

3

4

Page 8: Dimensional Fact Model @ BI Academy - 2016

Multi-dimensional model

The SALES event:

On Nov. 25th, 2014

the Store 2 sold 10

pieces of Product X

for a total revenue of

€ 220

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 8

Product

Store

Day

Product X

Store 2

Store 1

Store 3

Product Y

Units sold: 10 pieces

Revenue: € 220

Product Z

3-dimensional SALES hyper-space

Page 9: Dimensional Fact Model @ BI Academy - 2016

DFM Notation Compendium

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 9

Hierarchy

Dimension

Dimensonal attribute

Non-dimensonal

attribute

Measure

Fact schema SALES

Dependency

Page 10: Dimensional Fact Model @ BI Academy - 2016

Data Mart building process

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 10

Business user’s needs

Model

transformation

Logical data model

(Relational model:

tables, columns, etc.)

Phisical data model

(DDL with indexes,

partions, etc.)

Model

transformation

Multidimensional

data model

(Dimensional Fact Model)

Requirements

definition

Data Mart

Deployment Implementation

strategy

Technical knowledge

Page 11: Dimensional Fact Model @ BI Academy - 2016

Data Mart building process

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 11

Business user’s needs

Model

transformation

Logical data model

(Relational model:

tables, columns, etc.)

Phisical data model

(DDL with indexes,

partions, etc.)

Model

transformation

Multidimensional

data model

(Dimensional Fact Model)

Requirements

definition

Data Mart

Deployment Implementation

strategy

Technical knowledge

Page 12: Dimensional Fact Model @ BI Academy - 2016

Business - From requisite to DFM

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 12

• Context: weblog analytics - the

analysis of the visits of several

web sites belonging to different

domains (eg. Google Analytics)

• Requisite: monitoring and

analyzing the number of visits

and their monthly and daily

average duration for each page

of the websites, or each domain,

distributed by the geographic

region of the IP of the visitors.

12

Domain definition

Aggregation rules

Optional dependencies

+

Page 13: Dimensional Fact Model @ BI Academy - 2016

Design choice

• Star-schema (denormalized dimension table)

• Snow-flake (hierarchies implemented by tables in 3NF)

Reference ROLAP model:

• Use natural key (the dimension attribute PK column)

• Use surrogate key (add a new column with no business meaning)

• Use slow-changing dimension (SCD) of type 2

• Use implicit dimension (no dimension table, only a column in the fact table)

Hierarchy implementation strategy (for every dimension)

• Text VARCHAR(250) ; Currency NUMBER(9,2) ; etc.

Domain Data type association

• Table name prefix (D for Dimensions, F for Facts) ; Number NBR ; etc.

Standard naming conventions and abbreviations

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 13

Page 14: Dimensional Fact Model @ BI Academy - 2016

Transform DFM in a Relational Model

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 14

Model

transformation

Fact grain Technical design choices:

• Reference ROLAP model star-schema

• Hierarchy Viewer use surrogate key

• Hierarchy Page SCD – Type 2

• Hierarchy Time denormalized with natural key

Surrogate key

SCD-2

Start date

End date

14

Page 15: Dimensional Fact Model @ BI Academy - 2016

Build choice

• Microsoft SqlServer – Oracle DBMS – SAP Hana– Apache Hive / Hadoop

Choice the DBMS

• Generate unique keys / primary keys / integrity constraints (foreign keys)

Generate constraints?

• Add clustered indexes / column-store indexes / bitmap indexes / etc.

Add specific indexes

• Organize fact tables in partitions (by hash, value, range, etc.)

Define table partitions

• Define file groups / tablespaces for tables, partitions, indexes

Distribute data over multiple volumes

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 15

Page 16: Dimensional Fact Model @ BI Academy - 2016

In-Memory Computing Engine

Session management

Request Processing / Execution Control

Transaction

Manager

Metadata

Manager

SQL Parser

SQL Script Calc. Engine

MDX

Relational Engines

Row Store Column Store

Persistence Layer Page Management Logger

Disk Storage

Authorization

Manager

Data Volumes Log Volumes

SAP HANA Architecture

Row tables

versus

Column tables

Partitioning by

HASH, RANGE,

ROUNDROBIN

Use extended

tables for

warm data

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 16

Page 17: Dimensional Fact Model @ BI Academy - 2016

Phisical model and DDL

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 17

Implementation choices & best practice:

• DBMS SAP HANA

• All tables are Column-tables

• Fact F_VISITS partitioned by HASH on DAY

• Fact F_VISITS indexed by PAGE

Partition by HASH

BTREE index

17

Unload priority for memory optimization

Create a column table

Preload columns for

performance optimization

Page 18: Dimensional Fact Model @ BI Academy - 2016

BI Modeler

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 18

• In order to apply a model-driven approach, BI Project teams

need a software tool to:

Manage (draw) all the models - DFM, relational, etc.

Support (and drive) the model transformation process

• There was (are) no many tools able to do that so, in 2006 I

started working on the development of …

http://bimodeler.com

Page 19: Dimensional Fact Model @ BI Academy - 2016

DEMO

BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 19

Create a DFM from scratch

Define the fact schema and its measures

Add some dimensions / hierarchies

Define and associate domains to attributes and measures

Transform a DFM in a relational data model

Define an implementation strategy for Hierarchies

Associate Data type to domains

Apply a naming convention

Add physical properties to the relational model

Choose a DBMS

Create partitions

Create indexes

Generate DDL script