Data Architecture Process in a BI environment
-
Upload
sasha-citino -
Category
Documents
-
view
47 -
download
0
Transcript of Data Architecture Process in a BI environment
Data Architecture
Process in a Business
Intelligence EnvironmentWHAT DO DATA ARCHITECTS IN A BUSINESS INTELLIGENCE(BI) ENVIRONMENT DO?
AUTHOR: SASHA CITINO, SENIOR CONSULTANT (DATA ARCHITECTURE)
PUBLISHED ON: SEPTEMBER 29TH, 2016
About the Author: Sasha Citino
9/29/2016Written and Published by Sasha Citino (Data Architect)
2
Sasha Citino has 15+ years of experience in Information Technology industry. Sasha got her start in IT as a VB 6 Developer but quickly moved into the world of “Data”.
Sasha has 12+ years of experience in designing, developing and implementing Data Warehouses in SQL Server and Oracle environments.
Sasha has experience in Business Intelligence (Architecture and Development) in multiple industries such as Real Estate(commercial and industrial), Telecommunications, Retail, Fast Food, Casino Gaming, Supply Chain distribution and logistics, Healthcare, Supplementary Insurance.
Sasha has been the lead Data Architect for multiple multi-million dollar BI projects for the last 8 years. She thoroughly enjoys BI and all of its components.
Contact Sasha: [email protected]
Agenda
What is Business Intelligence?
Data Warehouse (DW) vs Business Intelligence (BI)
What is Data Architecture?
Visual representation of Data Warehouse Architecture
Components of a Data Warehouse
What do Data Architects need to know in a BI environment
Data Architect Relationships in a BI environment
Key Architecture Process Roles of a Data Architect in a BI environment
Note on Data Architecture Standards
Step by Step Data Architecture Process in a BI (traditional) environment
9/29/2016Written and Published by Sasha Citino (Data Architect)
3
What is BI?
BI Encompasses:
Tools, applications, methodologies for data
collection and transformation from a
variety of internal and external data
sources
Providing data analytical tools to end users
to allow them to analyze data (adhoc),
report on/present important business KPI’s
(key performance indicators) via
dashboards, reports as well as other data
visualization tools
Providing avenues for external consumers
of data to extract data from a single,
stable, robust and dimensional data
repository
Business Intelligence or BI is a technology based process or mechanism for analyzing and presenting data in a format that allows business users,
including executives, managers and other users to make informed business decisions.
9/29/2016Written and Published by Sasha Citino (Data Architect)
4
Data
Warehouse(DW)
vs Business
Intelligence(BI)
So what really is a Data Warehouse?:
A Data Warehouse is a large storage of data that is collected from multiple data sources including but not limited to, operational systems, financial systems, the internet, and flat files.
A Data Warehouse is frequently known as the central repository for a company’s data.
The data in a Data Warehouse is extracted from multiple data sources in raw form, aligned to mature business processes and then goes through transformation phase, utilizing best practice DW methodologies to turn raw data into a format that allows for simple, high performance consumption of the data via data visualization tools, adhoc analytical tools as well as external consumers.
After 15 years working in Business Intelligence starting at custom application development, moving to report development, ETL development, Database management in a DW environment, supporting multiple Data Warehouse environments in a variety of industries and eventually architecting dimensional Data Warehouses, in my professional opinion, a Data Warehouse is an integral, necessary “Component” of Business Intelligence.
9/29/2016Written and Published by Sasha Citino (Data Architect)
5
What is Data Architecture?
Defines rules, structures and policies to support business objectives
Mechanism for how data is governed, defined, stored and managed in a
Data Warehouse
Integrates data from multiple source systems within an organization
Allows for consumption of data by reporting tools, data visualization tools,
adhoc analysis as well as external consumers.
9/29/2016Written and Published by Sasha Citino (Data Architect)
6
Data Warehouse Architecture
Note: Image used
from Oracle Data
Warehousing
Concepts
whitepaper9/29/2016Written and Published by Sasha Citino (Data Architect)
7
Data Architecture Components
Data SourcesData is extracted from multiple data sources. Data Sources can be:
• Operational Systems
• ERP Systems
• CRM Systems
• Financial Systems
• Flat Files
• Internet
Data WarehouseData Warehouse has multiple components:
• Data Staging Database
• Persistent Staging Database (stores raw data historically)
• Metadata
• Summary/Aggregated data in dimensional form (dimensions/facts)
• Data Marts
• Data Architecture Modeling Tools (e.g. Erwin, Embarcadero, R)
ConsumersData Warehouse data is consumed by a variety of Users:
• Data Analysts
• Report/Data Visualization Developers/Users
• Data mining
• External consumers such as other business applications
Data is Extracted, Transformed and Loaded to Target Objects using ETL tools/processes -->9/29/2016Written and Published by Sasha Citino (Data Architect)
8
Data Architects in a BI Environment
Data Architects in a BI Environment should:
Understand the End to end vision of the BI Project
Get Business Buy-in (without support, the success of the project is at risk)
Understand legacy systems and how systems relate
Understand business processes and how they translate to one or more dimensional models
Address data migration, cleansing and storage requirements/issues
Work closely with and develop strong relationships with project SME’s (subject matter experts) and project teams throughout the BI project
Architect for the Business Process, at the lowest grain allowing for aggregation and acutely aware of how Time affects metrics, attributes, kpi’s
Architect for flexibility, robustness, re-usability
Verify concepts ALWAYS prior to transferring development to other teams (ETL, Reporting)
9/29/2016Written and Published by Sasha Citino (Data Architect)
9
Data Architect Relationships in a BI environment
Performance
/ DBA Team
Report
Developers
External
CustomersETL
Developers
Business
Analysts
Project
SME’s
Data
Analysts
Quality
Assurance
Testers
Data
Architect
9/29/2016Written and Published by Sasha Citino (Data Architect)
10
Data Architect
Role
Data Profiling
“Data Investigation”
Integration Design
Aligning data from multiple systems and
sources
Dimensional Modeling
Structures data in conformed format for
faster reporting on large data volumes
Organize data for effective and efficient
analysis according to business processes
Define Data Architecture Standards
See next slide for note on Standards
Key Architecture Process Roles of a Data Architect in a BI environment
9/29/2016Written and Published by Sasha Citino (Data Architect)
11
Note on Data
Architecture
Standards
Data Architecture standards may vary by company or architect but they should always include:
Consistent naming conventions for tables(staging, dimension, facts, helper, cross reference)
Consistent naming convention for fields
Consistent strategy and naming convention for Indexes/Partitions
Clear definition on how Nulls in dimensions and facts are handled
The data modeling tool in use/to be used
Clear definition of the data types that can be used
Metadata requirements for tables (e.g. insert_date, update_date, current_flg,sourcesystem, effective_from_dt and effective_to_dt) that should be present on each data warehouse dimension/fact table.
It is critical to any data warehouse environment to have well defined and consistent standards surrounding naming
conventions, handling of nulls, dimension/fact design strategies, types of data architect artifacts required and the data modeling tool(s) used.
9/29/2016Written and Published by Sasha Citino (Data Architect)
12
Data Architecture Process
The Data Architecture Process, once matured, is repeatable, dependable, effective and efficient and aligns to business processes.
Components of Data Architecture in a BI Environment:
Step 1 – Receive/Understand Requirements
Step 2 – Data Profiling
Step 3 – Conceptual Model Design
Step 4 – Logical Model Design
Step 5 – Physical Model Design (also known as ERD (entity relationship diagram)
Step 6 – ETL Mapping
Step 7 – Data Model Reviews
Step 8 – Metadata/Data Validation post development
9/29/2016Written and Published by Sasha Citino (Data Architect)
13
Data Architecture Process - Requirements
Business Requirements for the business process to be architected can be delivered to a data architect in multiple formats:
Through Business and/or User Requirement specifications for a new business process and/or enhancement to an existing business process/data model
Through Source System SME’s, typically when upgrades to source system(s) affect the Data Warehouse (new fields, changed fields, changed logic)
Through self examination (data architect reviews existing data models and identifies new metrics/attributes that can be added to enhance the robustness of a data model and provide added business value).
Through listening! It is extremely important for a Data Architect to be an excellent listener. You may notice repeated statements from, for example, the reporting team on aggregations/calculations/groupings that a seasoned data architect can identify as an opportunity for improvement of the existing data model. While this may not provide added business value, it may help in performance of the environment and/or simplification of the DW environment for reporting.
9/29/2016Written and Published by Sasha Citino (Data Architect)
14
Data Architecture Process – Data Profiling
What are you profiling?
• Select Business Process
• Decide on grain of data
• Identify dimensions/dimensional attributes
• Identify facts/metrics
Understand Metadata
• Analyze tables pertaining to business process subject area
• Data Sources
• Table sizes
• Row counts
• Fields/columns
• Relationships
• Natural/Primary Keys
Generate Profiling Outputs
Upon completion of the data profiling process. The following outputs can be generated:
• Summary analysis of Metadata
• Source queries that relate tables and select attributes and metrics according to filter/aggregation business process criteria
• These source queries can also be used to validate landed data post ETL development
9/29/2016Written and Published by Sasha Citino (Data Architect)
15
Data Architecture Process – Conceptual Model Design
During the Conceptual Model Design phase, the Data Architect:
Creates a conceptual schema which is a high level visual description of the business process informational needs.
Identifies dimensions that relate to the business process
Identifies at a high level the metrics/facts that relate to the business process
Output: The conceptual model (example seen in pic)
The conceptual model can be used to communicate with the business without too much technical information
The conceptual model can also be used to update the Bus Matrix (pivot of business processes and what dimensions are used by each)
9/29/2016Written and Published by Sasha Citino (Data Architect)
16
Data Architecture Process – Logical Model Design
During the Logical Model Design
Phase, a Data Architect:
Identifies Data Metrics (typically in raw
form) that support the subject area.
Documents relationship between
metrics and dimensions.
Identifies all fields needed for subject
area and their metadata attributes
Output: The Logical Data Model
An example of a fact table logical
design, can be seen in picture shown
9/29/2016Written and Published by Sasha Citino (Data Architect)
17
Data Architecture Process – Physical Model Design
Select Modeling Tool
The physical data modeling for a business process is typically completed using a Data Modeling tool.
Examples of Data modeling Tools:
• Erwin Data Modeler
• Embarcadero
• R
• Visio
**There are many tools, all depends on your company’s preference.
Create Dimensional Model
• Create the Entity Relationship Diagram for Dimensional Model;
• Create Dimension Tables/Fact table(s)
• Define Physical properties for each Dimension Attribute and Fact metric. Physical properties are:
• Data Type,
• Data Length /Scale/Precision
• Relationships,
• Indexes,
• Storage Schemas
Output of Physical Model
Once the Physical model has been created using a modeling tool. The following artifacts are produced:
• ERD (entity relationship diagram)
• DDL (Data Definition Language) for each dimension/fact table
• DDL’s are used to create the physical tables on the database
9/29/2016Written and Published by Sasha Citino (Data Architect)
18
Data Architecture Process – ETL Mapping
What is ETL Mapping?• ETL means Extract, Transform, Load.
This is the mechanism by which data is extracted from source systems, transformed according to business requirements and then loaded to target dimension and fact tables in the Data Warehouse.
• The Data Architect during the ETL mapping phase, identifies the rules/business logic for the ETL Developers to accurately Extract, Transform and Load data to defined dimensions and facts.
• The ETL Mapping document is absolutely critical to the success of the ETL Team’s ability to develop the processes to populate data.
ETL Mapping ContentThe Data Architect creates an ETL mapping template to:
• Identify Source Systems, source tables, source fields
• Identify Target Tables /fields
• Define the Type of DW Table (fact/ dimension)
• Define/Identify Grouping Logic; Filters; Column Order/Type; Data Type/Length/Precision/Scale;
• Define Transformation Logic (rules)
• Define Default values for Null attributes, keys, metrics
• Source Queries for ETL Developers to get insight into the data they are working with.
ETL Mapping Outputs
• ETL Mapping Document
• Metadata for the Data Warehouse environment
• Data Dictionary (Note: this is not always done by the data architect but rather a member of the Data Governance team)
9/29/2016Written and Published by Sasha Citino (Data Architect)
19
Data
Architecture
Process – Data
Model Review
In a mature BI environment, the Data
Architect conducts Data Model Reviews
with ETL Developers, Report Developers
and possibly Business Analyst to:
Ensure data model meets business
requirements
Provide ETL Developers with the overview
of the business process/subject area.
Review the ETL logic with Developers to
ensure they understand what needs to be
done
Provide Report Developers with an
overview of the data model giving them
insight into the data they will soon report
on
Prior to Hand-Off to the Development Teams, the Data Architect will perform Data Model Review(s) to ensure everyone is on the
same page and understands the tasks that need to be completed and/or what data will become available for consumption by the data visualizers (reporting team).
9/29/2016Written and Published by Sasha Citino (Data Architect)
20
Data
Architecture
Process –
Metadata
Validation
Metadata Validation by the Data Architect involves:
Checking data for consistency and completeness
Checking for duplicates;
Verifying row grain uniqueness/natural keys;
Verifying data formatting;
Verifying row counts and data match expected row counts and data from source queries (data profiling step)
Verifying the non-existence of orphaned or null surrogate keys; landed data matches expected source query results.
Note: If validation fails, the DA will work with ETL team to resolve. If validation passes, the DA will notify/ Hand-off to Reporting Team
Upon completion of development work by the ETL Team, the Data Architect reviews the data landed to the target dimension/fact
table(s) to ensure that it complies with the rules defined in the ETL mapping document
9/29/2016Written and Published by Sasha Citino (Data Architect)
21
Data Architecture Process – Wrap
up
Upon completion of all data architecture process steps, ETL development
and successful Metadata Validation, the Data Architecture process is
complete for this business process/enhancement. The Data Architect will
continue to be a resource to the Reporting, Quality Assurance and
Performance teams as needed.
9/29/2016Written and Published by Sasha Citino (Data Architect)
22