Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.
-
Upload
santiago-dole -
Category
Documents
-
view
213 -
download
1
Transcript of Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.
Data Warehousing – An Introductory Perspective
DWCC BBSRDWCC BBSR
Agenda
Why Data Warehouse Definition and Architecture Terminology
The Business Need
Business DecisionsAre not made byRolling Dices
We Don’t know
What we don’t know
I think…. errrr,
I guess so
Current Business Environment
Competitive Ever Changing Chaotic Global Urgency to make decisions Competitive advantages stems from well informed
decisions Based on an understanding of:
Your Products Your Customers Preferences The Competition Your own company strengths
The Value Pyramid
Each layer providesValue en route to atargeted business Outcome
Business Outcomes
Actions
Decisions
Knowledge
Information
Data
Increased revenueIncreased productivity
Reduced costsCompetitive advantage
Definitions
A collection of integrated, subject oriented databases designed to support the DSS function where each unit of data is relevant at some moment of time (Inmon 1991)
A copy of transaction data specifically structured to Query and Analysis (Kimball 1996)
Data Warehouse is NOT a specific technology It is a series of processes, procedures and tools that
help the enterprise understand more about itself, its products, its customers and the market it services.
It is NOT possible to purchase a Data Warehouse But, it is possible to build one.
FEATURES
•Non Volatile - Used mainly for reporting purpose and it is independent of transactional data.
•Subject Orientation- All relevant data is stored together. Ex: Sales, Finance, Marketing, Customer data etc.
•Historical data- Can contain data of several years depending on company requirements.
sachin_kambhoj:
sachin_kambhoj: sachin_kambhoj
:
sachin_kambhoj:
Subject Orientation.
Operational Datawarehouse
AUTO
HEALTH
LIFE
CASUALTY
Customer
Policy
Premium
Claims
Applications Subjects
Goals and Applications
Goals of a Data Warehouse Provide reliable, High performance access Consistent view of Data: Same query, same data. All users
should be warned if data load has not come in. Slice and dice capability Quality of data is a driver for business re-engineering.
Data Warehousing Applications: Customer Profitability Analysis Customer satisfaction and retention Buyer behavior. Pricing, Promotion Analysis Market research Inventory optimization
OLTP v/s Data Warehouse
OLTP system runs the business, Data Warehouses tell you how to run the business
Characteristic OLTP Data Warehouse
Orientation Transaction Analysis
Data Access Record at a time Set at a time
Updates Frequent & Unscheduled
Periodic & Scheduled
Response time Seconds required Minutes acceptable
Concurrent users Many Few
Availability Guaranteed As needed
Data structures Highly normalized Often de-normalized
Data nature Current historical
If most of your business needs are To report on data in a single transaction processing system All the historical data you need are in the system Data in the system is clean Your hardware can support reporting against the live system data The structure of the system data is relatively simple Your firm does not have much interest in end user adhoc
query/report tools
Data warehousing may not be for your business!!
Modeling Constructs
Entity Relationship Diagram Star schema Snow flake schema
Within the implementation of a warehouse, several of these constructs may be integrated to form an optimal design
Entity Relationship Diagram
• Based on set theory and SQL
• Highly normalized
• Optimized for update and fast transaction turnaround
• Not suited for querying in a data warehouse environment
• diagrams like these are very difficult for users to visualize and memorize.
Star Schema
A central fact table surrounded by a number of dimension tables.
Dimensions are business entities on which calculations are
done. They can be numeric or alphanumeric. Example: Product table comprising brand name, category, packaging type, size.
Facts are numerical measurements of business with
respect to dimensions.They are numeric and additive (summable across any combination)
e.g. A sales fact table could contain time, product and store
key along with dollars sold, units sold, dollars cost.
Snow Flake Schema
Normalized version of the star schema with the addition of normalized dimension tables.
Normalization helps to reduce redundancy in the dimension tables, but affects performance and user comprehension.
DW TerminologyGranularity Granularity (or grain) defines the level of detail stored in the
physical warehouse Low granularity indicates lot of detail while high granularity
indicates less detail. Example: A commercial airline is building a data warehouse. What
will the granularity be? Choice A: Each record represents a flight Choice B: Each record represents the customer on a flightThere is no correct answer. To a large extent, the granularity
depends on the business User’s exploitation needs.However, you should be aware that the granularity of data
affects Volumes of Data, Data Maintenance, Indexing Level of Data Exploration Query and Reporting constraints
DW TerminologyMetadata At all levels of the data warehouse, information is
required to support the maintenance and use of the data warehouse.
Metadata is data about data.There are two views of Metadata Business – are warehouse attributes and properties
for use by business users Technical – describe data flow from Operational
systems into the data warehouseOLAP Online Analytical processing Tool(s) for Analytical Reporting including Graphical
capabilities.
DW Terminology OLAP Tools available for exploring the information built
in a DW : Multi-dimensional On-line Analytical Processing
(MOLAP) The data from data warehouse is queried and dumped
periodically on to a server on local network to a data storage called Multi-dimensional Database (MDDB) provided by the OLAP tool. This MDDB forms a Data Mart which is then used for querying and reporting.
Relational On-Line Analytical Processing (ROLAP) Refers to the ability to conduct OLAP analysis directly
against a relational warehouse without any constraints on the number of dimensions, database size, analytical complexity, or number and type of users.
Hybrid On-line Analytical Processing (HOLAP) An environment with a combination of MOLAP and ROLAP
data storage. Summarized information is typically stored in an MDDB and detailed data is stored in a Relational environment.
Terminology
Data Mart- Contains Data about a specific subject. Eg. Official data, Customer data, Campaign data etc.
Metadata- Data about data. Describes the data stored in Data warehouse.
Data Cubes- Central object of data containing information in a multidimensional structure.
Data Cleansing- Regular cleaning of data.
ETL- Extraction, Transformation and Loading of Data.
Data Mining- A mechanism which uses intelligent algorithms to discover patterns, clusters and models from data.
Stages
Heterogeneous Source Systems
Operational
StagingArea Data
Warehouse
Business Intelligence
External
Legacy
Data Mining
Query & Reporting
OLAP
Extraction, Transformation & Loading (ETL)
A Typical Data Warehouse
Data Warehouse
Detailed Data
DataMart
DataMart
DataMart
Summarized DataMeta Data
Facilitates in firing queries on detailed data.
Data marts contain data specific to a subject.
MOLAP/ROLAP/HOLAP
Query Tool by MDD Vendor
CustomLoader
Data Warehouse(RDBMS)
OLAP Engine
MDD Proprietary API
Cubes
MDD Proprietary API
Rows
SQL
Rows
MDD
Database
Storage
Periodic,
Manual
Data
Load
OLAP Terminology
Analytical technique whereby the user navigates from the most summarized to the most detailed level.
Regio
n
Month
Product
RegionState
District
Location
OLAP Terminology Rotation Or Dicing
Regio
n
M
O
N
T
h
Product
Mon
th
P
R
O
D
U
C
t
Region
OLAP Terminology
SlicingReg
ion
M
O
N
T
h
Product
Products and Vendors
Data Warehouses Oracle Sybase DB2
OLAP tools Oracle Express Hyperion Essbase
Data Mining Oracle Darwin IBM Intelligent Data Miner
Querying & Reporting Oracle Discoverer Business Objects