Dimensional Fact Model @ BI Academy - 2016
-
Upload
caccio -
Category
Technology
-
view
3.528 -
download
2
Transcript of Dimensional Fact Model @ BI Academy - 2016
Dimensional Fact Model
Stuttgart, 9/3/2016
Stefano Cazzella @StefanoCazzella
http://caccio.blogdns.net
http://bimodeler.com
stefano.cazzella{at}gmail.com
1 1 BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella
My Professional Timeline
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 2
2001 2003 2005 2007 2009 2011 2013 2015
Master
degree in
Software
Engineering
Business
Intelligence
Specialist
Business
Consultant
Delivery Manager
Methodology
Industrialization of
the delivery phase
University of Rome
« La Sapienza »
Project
Manager
Datamat S.p.A.
a Finmeccanica
company
Sopra Steria Group
Consulting – IT Services – Software Solutions
BI Trends
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 3
Data Integration
Descriptive
Predictive
Prescriptive
Deep learning
Business
Value
Business
Intelligence
Data
Warehouse
Simulation &
forecasting
Optimization &
automation
Semantic &
AI
Time
Digital transformation of every market
Data explosion: exponential growth of digital data
Disruptive scenario
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 4
Innovative technologies
•Internet of Things
•Big Data
•Distributed computing
•In Memory systems
•Cloud
•Mobile
Complex architectures
•Data federation
•Data store
•No SQL
•Distributed file system
•Appliances
•Real-time data integration
Business transformations
•Frenetic time-to-market
•API / service economy
•Data-driven company
•Business process automation
… more … … more … … more …
Business
Design
Build
Business
Desing
Build
New processes ? Roles ?
Waterfall process Iterative process
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 5
Business
Analyst
Engineer
Technician
Data
Scientist
Business
Analyst
Engineer
Technician
Project Layers for Data Mart
Business
•Dimensional Fact Model
Design
•Relational model
Build
•DBMS specific DDL
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 6
Why Dimensional Fact Model ?
Formal language well-specified syntax and an unequivocally interpretation (semantic) based on a sound algebraic definition
Simple and effective graphical notation (representation)
Does not imply any technical/implementation choice
Specifically designed to represent multi-dimensional models
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 7
1
2
3
4
Multi-dimensional model
The SALES event:
On Nov. 25th, 2014
the Store 2 sold 10
pieces of Product X
for a total revenue of
€ 220
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 8
Product
Store
Day
Product X
Store 2
Store 1
Store 3
Product Y
Units sold: 10 pieces
Revenue: € 220
Product Z
3-dimensional SALES hyper-space
DFM Notation Compendium
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 9
Hierarchy
Dimension
Dimensonal attribute
Non-dimensonal
attribute
Measure
Fact schema SALES
Dependency
Data Mart building process
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 10
Business user’s needs
Model
transformation
Logical data model
(Relational model:
tables, columns, etc.)
Phisical data model
(DDL with indexes,
partions, etc.)
Model
transformation
Multidimensional
data model
(Dimensional Fact Model)
Requirements
definition
Data Mart
Deployment Implementation
strategy
Technical knowledge
Data Mart building process
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 11
Business user’s needs
Model
transformation
Logical data model
(Relational model:
tables, columns, etc.)
Phisical data model
(DDL with indexes,
partions, etc.)
Model
transformation
Multidimensional
data model
(Dimensional Fact Model)
Requirements
definition
Data Mart
Deployment Implementation
strategy
Technical knowledge
Business - From requisite to DFM
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 12
• Context: weblog analytics - the
analysis of the visits of several
web sites belonging to different
domains (eg. Google Analytics)
• Requisite: monitoring and
analyzing the number of visits
and their monthly and daily
average duration for each page
of the websites, or each domain,
distributed by the geographic
region of the IP of the visitors.
12
Domain definition
Aggregation rules
Optional dependencies
+
Design choice
• Star-schema (denormalized dimension table)
• Snow-flake (hierarchies implemented by tables in 3NF)
Reference ROLAP model:
• Use natural key (the dimension attribute PK column)
• Use surrogate key (add a new column with no business meaning)
• Use slow-changing dimension (SCD) of type 2
• Use implicit dimension (no dimension table, only a column in the fact table)
Hierarchy implementation strategy (for every dimension)
• Text VARCHAR(250) ; Currency NUMBER(9,2) ; etc.
Domain Data type association
• Table name prefix (D for Dimensions, F for Facts) ; Number NBR ; etc.
Standard naming conventions and abbreviations
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 13
Transform DFM in a Relational Model
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 14
Model
transformation
Fact grain Technical design choices:
• Reference ROLAP model star-schema
• Hierarchy Viewer use surrogate key
• Hierarchy Page SCD – Type 2
• Hierarchy Time denormalized with natural key
Surrogate key
SCD-2
Start date
End date
14
Build choice
• Microsoft SqlServer – Oracle DBMS – SAP Hana– Apache Hive / Hadoop
Choice the DBMS
• Generate unique keys / primary keys / integrity constraints (foreign keys)
Generate constraints?
• Add clustered indexes / column-store indexes / bitmap indexes / etc.
Add specific indexes
• Organize fact tables in partitions (by hash, value, range, etc.)
Define table partitions
• Define file groups / tablespaces for tables, partitions, indexes
Distribute data over multiple volumes
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 15
In-Memory Computing Engine
Session management
Request Processing / Execution Control
Transaction
Manager
Metadata
Manager
SQL Parser
SQL Script Calc. Engine
MDX
Relational Engines
Row Store Column Store
Persistence Layer Page Management Logger
Disk Storage
Authorization
Manager
Data Volumes Log Volumes
SAP HANA Architecture
Row tables
versus
Column tables
Partitioning by
HASH, RANGE,
ROUNDROBIN
Use extended
tables for
warm data
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 16
Phisical model and DDL
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 17
Implementation choices & best practice:
• DBMS SAP HANA
• All tables are Column-tables
• Fact F_VISITS partitioned by HASH on DAY
• Fact F_VISITS indexed by PAGE
Partition by HASH
BTREE index
17
Unload priority for memory optimization
Create a column table
Preload columns for
performance optimization
BI Modeler
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 18
• In order to apply a model-driven approach, BI Project teams
need a software tool to:
Manage (draw) all the models - DFM, relational, etc.
Support (and drive) the model transformation process
• There was (are) no many tools able to do that so, in 2006 I
started working on the development of …
http://bimodeler.com
DEMO
BI ACADEMY Conference - Stuttgart, 9/3/2016 - Stefano Cazzella 19
Create a DFM from scratch
Define the fact schema and its measures
Add some dimensions / hierarchies
Define and associate domains to attributes and measures
Transform a DFM in a relational data model
Define an implementation strategy for Hierarchies
Associate Data type to domains
Apply a naming convention
Add physical properties to the relational model
Choose a DBMS
Create partitions
Create indexes
Generate DDL script