Data Architecture Process in a BI environment

22
Data Architecture Process in a Business Intelligence Environment WHAT DO DATA ARCHITECTS IN A BUSINESS INTELLIGENCE(BI) ENVIRONMENT DO? AUTHOR: SASHA CITINO, SENIOR CONSULTANT (DATA ARCHITECTURE) PUBLISHED ON: SEPTEMBER 29 TH , 2016

Transcript of Data Architecture Process in a BI environment

Data Architecture

Process in a Business

Intelligence EnvironmentWHAT DO DATA ARCHITECTS IN A BUSINESS INTELLIGENCE(BI) ENVIRONMENT DO?

AUTHOR: SASHA CITINO, SENIOR CONSULTANT (DATA ARCHITECTURE)

PUBLISHED ON: SEPTEMBER 29TH, 2016

About the Author: Sasha Citino

9/29/2016Written and Published by Sasha Citino (Data Architect)

2

Sasha Citino has 15+ years of experience in Information Technology industry. Sasha got her start in IT as a VB 6 Developer but quickly moved into the world of “Data”.

Sasha has 12+ years of experience in designing, developing and implementing Data Warehouses in SQL Server and Oracle environments.

Sasha has experience in Business Intelligence (Architecture and Development) in multiple industries such as Real Estate(commercial and industrial), Telecommunications, Retail, Fast Food, Casino Gaming, Supply Chain distribution and logistics, Healthcare, Supplementary Insurance.

Sasha has been the lead Data Architect for multiple multi-million dollar BI projects for the last 8 years. She thoroughly enjoys BI and all of its components.

Contact Sasha: [email protected]

Agenda

What is Business Intelligence?

Data Warehouse (DW) vs Business Intelligence (BI)

What is Data Architecture?

Visual representation of Data Warehouse Architecture

Components of a Data Warehouse

What do Data Architects need to know in a BI environment

Data Architect Relationships in a BI environment

Key Architecture Process Roles of a Data Architect in a BI environment

Note on Data Architecture Standards

Step by Step Data Architecture Process in a BI (traditional) environment

9/29/2016Written and Published by Sasha Citino (Data Architect)

3

What is BI?

BI Encompasses:

Tools, applications, methodologies for data

collection and transformation from a

variety of internal and external data

sources

Providing data analytical tools to end users

to allow them to analyze data (adhoc),

report on/present important business KPI’s

(key performance indicators) via

dashboards, reports as well as other data

visualization tools

Providing avenues for external consumers

of data to extract data from a single,

stable, robust and dimensional data

repository

Business Intelligence or BI is a technology based process or mechanism for analyzing and presenting data in a format that allows business users,

including executives, managers and other users to make informed business decisions.

9/29/2016Written and Published by Sasha Citino (Data Architect)

4

Data

Warehouse(DW)

vs Business

Intelligence(BI)

So what really is a Data Warehouse?:

A Data Warehouse is a large storage of data that is collected from multiple data sources including but not limited to, operational systems, financial systems, the internet, and flat files.

A Data Warehouse is frequently known as the central repository for a company’s data.

The data in a Data Warehouse is extracted from multiple data sources in raw form, aligned to mature business processes and then goes through transformation phase, utilizing best practice DW methodologies to turn raw data into a format that allows for simple, high performance consumption of the data via data visualization tools, adhoc analytical tools as well as external consumers.

After 15 years working in Business Intelligence starting at custom application development, moving to report development, ETL development, Database management in a DW environment, supporting multiple Data Warehouse environments in a variety of industries and eventually architecting dimensional Data Warehouses, in my professional opinion, a Data Warehouse is an integral, necessary “Component” of Business Intelligence.

9/29/2016Written and Published by Sasha Citino (Data Architect)

5

What is Data Architecture?

Defines rules, structures and policies to support business objectives

Mechanism for how data is governed, defined, stored and managed in a

Data Warehouse

Integrates data from multiple source systems within an organization

Allows for consumption of data by reporting tools, data visualization tools,

adhoc analysis as well as external consumers.

9/29/2016Written and Published by Sasha Citino (Data Architect)

6

Data Warehouse Architecture

Note: Image used

from Oracle Data

Warehousing

Concepts

whitepaper9/29/2016Written and Published by Sasha Citino (Data Architect)

7

Data Architecture Components

Data SourcesData is extracted from multiple data sources. Data Sources can be:

• Operational Systems

• ERP Systems

• CRM Systems

• Financial Systems

• Flat Files

• Internet

Data WarehouseData Warehouse has multiple components:

• Data Staging Database

• Persistent Staging Database (stores raw data historically)

• Metadata

• Summary/Aggregated data in dimensional form (dimensions/facts)

• Data Marts

• Data Architecture Modeling Tools (e.g. Erwin, Embarcadero, R)

ConsumersData Warehouse data is consumed by a variety of Users:

• Data Analysts

• Report/Data Visualization Developers/Users

• Data mining

• External consumers such as other business applications

Data is Extracted, Transformed and Loaded to Target Objects using ETL tools/processes -->9/29/2016Written and Published by Sasha Citino (Data Architect)

8

Data Architects in a BI Environment

Data Architects in a BI Environment should:

Understand the End to end vision of the BI Project

Get Business Buy-in (without support, the success of the project is at risk)

Understand legacy systems and how systems relate

Understand business processes and how they translate to one or more dimensional models

Address data migration, cleansing and storage requirements/issues

Work closely with and develop strong relationships with project SME’s (subject matter experts) and project teams throughout the BI project

Architect for the Business Process, at the lowest grain allowing for aggregation and acutely aware of how Time affects metrics, attributes, kpi’s

Architect for flexibility, robustness, re-usability

Verify concepts ALWAYS prior to transferring development to other teams (ETL, Reporting)

9/29/2016Written and Published by Sasha Citino (Data Architect)

9

Data Architect Relationships in a BI environment

Performance

/ DBA Team

Report

Developers

External

CustomersETL

Developers

Business

Analysts

Project

SME’s

Data

Analysts

Quality

Assurance

Testers

Data

Architect

9/29/2016Written and Published by Sasha Citino (Data Architect)

10

Data Architect

Role

Data Profiling

“Data Investigation”

Integration Design

Aligning data from multiple systems and

sources

Dimensional Modeling

Structures data in conformed format for

faster reporting on large data volumes

Organize data for effective and efficient

analysis according to business processes

Define Data Architecture Standards

See next slide for note on Standards

Key Architecture Process Roles of a Data Architect in a BI environment

9/29/2016Written and Published by Sasha Citino (Data Architect)

11

Note on Data

Architecture

Standards

Data Architecture standards may vary by company or architect but they should always include:

Consistent naming conventions for tables(staging, dimension, facts, helper, cross reference)

Consistent naming convention for fields

Consistent strategy and naming convention for Indexes/Partitions

Clear definition on how Nulls in dimensions and facts are handled

The data modeling tool in use/to be used

Clear definition of the data types that can be used

Metadata requirements for tables (e.g. insert_date, update_date, current_flg,sourcesystem, effective_from_dt and effective_to_dt) that should be present on each data warehouse dimension/fact table.

It is critical to any data warehouse environment to have well defined and consistent standards surrounding naming

conventions, handling of nulls, dimension/fact design strategies, types of data architect artifacts required and the data modeling tool(s) used.

9/29/2016Written and Published by Sasha Citino (Data Architect)

12

Data Architecture Process

The Data Architecture Process, once matured, is repeatable, dependable, effective and efficient and aligns to business processes.

Components of Data Architecture in a BI Environment:

Step 1 – Receive/Understand Requirements

Step 2 – Data Profiling

Step 3 – Conceptual Model Design

Step 4 – Logical Model Design

Step 5 – Physical Model Design (also known as ERD (entity relationship diagram)

Step 6 – ETL Mapping

Step 7 – Data Model Reviews

Step 8 – Metadata/Data Validation post development

9/29/2016Written and Published by Sasha Citino (Data Architect)

13

Data Architecture Process - Requirements

Business Requirements for the business process to be architected can be delivered to a data architect in multiple formats:

Through Business and/or User Requirement specifications for a new business process and/or enhancement to an existing business process/data model

Through Source System SME’s, typically when upgrades to source system(s) affect the Data Warehouse (new fields, changed fields, changed logic)

Through self examination (data architect reviews existing data models and identifies new metrics/attributes that can be added to enhance the robustness of a data model and provide added business value).

Through listening! It is extremely important for a Data Architect to be an excellent listener. You may notice repeated statements from, for example, the reporting team on aggregations/calculations/groupings that a seasoned data architect can identify as an opportunity for improvement of the existing data model. While this may not provide added business value, it may help in performance of the environment and/or simplification of the DW environment for reporting.

9/29/2016Written and Published by Sasha Citino (Data Architect)

14

Data Architecture Process – Data Profiling

What are you profiling?

• Select Business Process

• Decide on grain of data

• Identify dimensions/dimensional attributes

• Identify facts/metrics

Understand Metadata

• Analyze tables pertaining to business process subject area

• Data Sources

• Table sizes

• Row counts

• Fields/columns

• Relationships

• Natural/Primary Keys

Generate Profiling Outputs

Upon completion of the data profiling process. The following outputs can be generated:

• Summary analysis of Metadata

• Source queries that relate tables and select attributes and metrics according to filter/aggregation business process criteria

• These source queries can also be used to validate landed data post ETL development

9/29/2016Written and Published by Sasha Citino (Data Architect)

15

Data Architecture Process – Conceptual Model Design

During the Conceptual Model Design phase, the Data Architect:

Creates a conceptual schema which is a high level visual description of the business process informational needs.

Identifies dimensions that relate to the business process

Identifies at a high level the metrics/facts that relate to the business process

Output: The conceptual model (example seen in pic)

The conceptual model can be used to communicate with the business without too much technical information

The conceptual model can also be used to update the Bus Matrix (pivot of business processes and what dimensions are used by each)

9/29/2016Written and Published by Sasha Citino (Data Architect)

16

Data Architecture Process – Logical Model Design

During the Logical Model Design

Phase, a Data Architect:

Identifies Data Metrics (typically in raw

form) that support the subject area.

Documents relationship between

metrics and dimensions.

Identifies all fields needed for subject

area and their metadata attributes

Output: The Logical Data Model

An example of a fact table logical

design, can be seen in picture shown

9/29/2016Written and Published by Sasha Citino (Data Architect)

17

Data Architecture Process – Physical Model Design

Select Modeling Tool

The physical data modeling for a business process is typically completed using a Data Modeling tool.

Examples of Data modeling Tools:

• Erwin Data Modeler

• Embarcadero

• R

• Visio

**There are many tools, all depends on your company’s preference.

Create Dimensional Model

• Create the Entity Relationship Diagram for Dimensional Model;

• Create Dimension Tables/Fact table(s)

• Define Physical properties for each Dimension Attribute and Fact metric. Physical properties are:

• Data Type,

• Data Length /Scale/Precision

• Relationships,

• Indexes,

• Storage Schemas

Output of Physical Model

Once the Physical model has been created using a modeling tool. The following artifacts are produced:

• ERD (entity relationship diagram)

• DDL (Data Definition Language) for each dimension/fact table

• DDL’s are used to create the physical tables on the database

9/29/2016Written and Published by Sasha Citino (Data Architect)

18

Data Architecture Process – ETL Mapping

What is ETL Mapping?• ETL means Extract, Transform, Load.

This is the mechanism by which data is extracted from source systems, transformed according to business requirements and then loaded to target dimension and fact tables in the Data Warehouse.

• The Data Architect during the ETL mapping phase, identifies the rules/business logic for the ETL Developers to accurately Extract, Transform and Load data to defined dimensions and facts.

• The ETL Mapping document is absolutely critical to the success of the ETL Team’s ability to develop the processes to populate data.

ETL Mapping ContentThe Data Architect creates an ETL mapping template to:

• Identify Source Systems, source tables, source fields

• Identify Target Tables /fields

• Define the Type of DW Table (fact/ dimension)

• Define/Identify Grouping Logic; Filters; Column Order/Type; Data Type/Length/Precision/Scale;

• Define Transformation Logic (rules)

• Define Default values for Null attributes, keys, metrics

• Source Queries for ETL Developers to get insight into the data they are working with.

ETL Mapping Outputs

• ETL Mapping Document

• Metadata for the Data Warehouse environment

• Data Dictionary (Note: this is not always done by the data architect but rather a member of the Data Governance team)

9/29/2016Written and Published by Sasha Citino (Data Architect)

19

Data

Architecture

Process – Data

Model Review

In a mature BI environment, the Data

Architect conducts Data Model Reviews

with ETL Developers, Report Developers

and possibly Business Analyst to:

Ensure data model meets business

requirements

Provide ETL Developers with the overview

of the business process/subject area.

Review the ETL logic with Developers to

ensure they understand what needs to be

done

Provide Report Developers with an

overview of the data model giving them

insight into the data they will soon report

on

Prior to Hand-Off to the Development Teams, the Data Architect will perform Data Model Review(s) to ensure everyone is on the

same page and understands the tasks that need to be completed and/or what data will become available for consumption by the data visualizers (reporting team).

9/29/2016Written and Published by Sasha Citino (Data Architect)

20

Data

Architecture

Process –

Metadata

Validation

Metadata Validation by the Data Architect involves:

Checking data for consistency and completeness

Checking for duplicates;

Verifying row grain uniqueness/natural keys;

Verifying data formatting;

Verifying row counts and data match expected row counts and data from source queries (data profiling step)

Verifying the non-existence of orphaned or null surrogate keys; landed data matches expected source query results.

Note: If validation fails, the DA will work with ETL team to resolve. If validation passes, the DA will notify/ Hand-off to Reporting Team

Upon completion of development work by the ETL Team, the Data Architect reviews the data landed to the target dimension/fact

table(s) to ensure that it complies with the rules defined in the ETL mapping document

9/29/2016Written and Published by Sasha Citino (Data Architect)

21

Data Architecture Process – Wrap

up

Upon completion of all data architecture process steps, ETL development

and successful Metadata Validation, the Data Architecture process is

complete for this business process/enhancement. The Data Architect will

continue to be a resource to the Reporting, Quality Assurance and

Performance teams as needed.

9/29/2016Written and Published by Sasha Citino (Data Architect)

22