Data Warehousing Final
Transcript of Data Warehousing Final
-
7/31/2019 Data Warehousing Final
1/31
-
7/31/2019 Data Warehousing Final
2/31
TEAM
Rachana Kola 31
Tasneem Taj 30
Vijay Kumar 29
Shilpa Kasani 28
Ankita Golchha 27
Bharat Jain 26
-
7/31/2019 Data Warehousing Final
3/31
Data warehousing
Data Warehousing is a database used for reporting &analysis.
It focuses on data storage.
Essential components of Data warehouse system.
Data warehouse can be subdivided into data marts.
-
7/31/2019 Data Warehousing Final
4/31
Characteristics of data warehouse
Conceptualview.
Unlimiteddimensions.
Dynamicsparse matrix
handling.
Client /server
architecture.
Accessibility&
transparency.
-
7/31/2019 Data Warehousing Final
5/31
OLTP & OLAP
OLTP : Online transaction processing It is characterizedby a large no. of short online transactions. ( Insert ,Update , Delete )
OLAP : Online Analytical Processing It is characterizedby relatively low volume of transactions.
-
7/31/2019 Data Warehousing Final
6/31
OLTP v/s OLAP
OLTP (Operational
system)
OLAP (Data warehouse
system)
Sources of data Operational data. Consolidation data.
Purposes of data Control and runfundamental businesstasks.
Planning , problemsolving & decisionmaking.
What the data reveals Snapshot . Multi dimensional view.
Inserts & updates Short and fast. Periodic long running.
Queries Simple queries. Complex queries
Processing speed Very fast. Depends on the amount
of data involved.Space requirements Relatively small. Large due to aggregation
and history data.
Database design Highly normalized. De normalized.
Backup & recovery Backup essential
regularly.
Reloading OLTP data as a
recovery method.
-
7/31/2019 Data Warehousing Final
7/31
-
7/31/2019 Data Warehousing Final
8/31
ARCHITECTURE
External datasources
EXTRACT
CLEANTRANSFORM
LOAD
REFRESH
MetadataRepository
Data warehouse
Reports
OLAP
Data Mining
Operationalsystems
Serves
-
7/31/2019 Data Warehousing Final
9/31
COMPONENTS
3 main systems required : Source systems
Data staging area
Presentation servers
Operational data :
Internal data
External data
Load manager :
Simple transformation of data to prepare the data for entryinto the warehouse.
-
7/31/2019 Data Warehousing Final
10/31
CONTD..
Warehouse manager : Analysis of data. Transformation & merging of data. Backing up & archiving of data.
Detailed summarized archived data.
Meta data : Meta data means data about data. Extraction & loading process. Warehouse management process. Query management process.
End user access tools.
-
7/31/2019 Data Warehousing Final
11/31
ETL PROCESS
Extract
Transform
Cleansing
Loading
-
7/31/2019 Data Warehousing Final
12/31
IMPORTANT TERMS
Drill down
Roll up
Aggregation
Granularity
-
7/31/2019 Data Warehousing Final
13/31
-
7/31/2019 Data Warehousing Final
14/31
DIMENSIONAL DATA MODEL Dimensional data model is most often used in data warehousing
systems.
The objective of dimensional modeling is to represent a set of business
measurements in a standard framework that is easily understandable by
end users.
The main components of a Dimensional Model are Fact Tables and
Dimension Tables.
A fact table is a table that contains the measures of interest. A dimension is a structure usually composed of one or more hierarchies
that categorizes data.
-
7/31/2019 Data Warehousing Final
15/31
Example of dimensional model
-
7/31/2019 Data Warehousing Final
16/31
STAR SCHEMA The star schema is also called star-join
schema, data cube, or multi-dimensional
schema.
It is the simplest style of data
warehouse schema.
The star schema consists of one or more
fact tables referencing any number
of dimension tables.
A star schema classifies the attributes of
an event into facts (measured
numeric/time data), and descriptive
dimension attributes (product id,
customer name, sale date) that give the
facts a context.
-
7/31/2019 Data Warehousing Final
17/31
SAMPLE STAR SCHEMA
-
7/31/2019 Data Warehousing Final
18/31
Advantages of star schema
The main advantages of star schemas are that they:
Provide a direct and intuitive mapping between the business entities
being analyzed by end users and the schema design.
Provide highly optimized performance for typical star queries.
Are widely supported by a large number of business intelligence
tools, which may anticipate or even require that the data-warehouse
schema contain dimension tables
-
7/31/2019 Data Warehousing Final
19/31
SNOW FLAKE SCHEMA
That dimensional table is normalized into multiple lookuptables each representing a level in the dimensional hierarchy.
Each point of the star explodes into more points.
Extension of the star schema.
-
7/31/2019 Data Warehousing Final
20/31
SAMPLE SNOWFLAKE SCHEMA
-
7/31/2019 Data Warehousing Final
21/31
-
7/31/2019 Data Warehousing Final
22/31
FACT CONSTELLATION SCHEMA
-
7/31/2019 Data Warehousing Final
23/31
FACT CONSTELLATION SCHEMA Shaped like a constellation of stars
More complex than star or snowflake
With each star schema it is possible to construct factconstellation schema
By splitting the original star schema into more starschemes each of them describes facts on another level ofdimension hierarchies
-
7/31/2019 Data Warehousing Final
24/31
-
7/31/2019 Data Warehousing Final
25/31
PERSONAL PRODUCTIVITY
APPLICATIONS Useful for manipulating and presenting data on
individual PCs.
Developed for a standalone environment Address applications requiring only small volumes of
warehouse data.
-
7/31/2019 Data Warehousing Final
26/31
DATA QUERY & REPORTING
Data access through simple, list-oriented queries, and thegeneration of basic reports.
Provide a view of historical data .
Do not address the enterprise need for in-depth analysisand planning.
-
7/31/2019 Data Warehousing Final
27/31
PLANNING & ANALYSIS
Address such essential business requirements.
Referred to as on-line analytical processing (OLAP)applications.
Mandates that the organization look not only at pastperformance but, more importantly, at the futureperformance of the business.
The combined analysis of historical data with futureprojections is critical to the success of today'scorporation.
-
7/31/2019 Data Warehousing Final
28/31
Advantages of data warehousingProvides business
users a customer centric view of the
companysheterogeneous data.
Added value tocompanys customersthrough better access
to information.
Historicalinformation.
Enhanced dataquality.
Supplements disasterrecovery plans.
One stop shop.
Provides saving inbilling processes,
reduces fraud lossesetc.
-
7/31/2019 Data Warehousing Final
29/31
Disadvantages of data warehousing
Not optimal for unstructured data.
Data warehouses get outdated relatively quickly.
Duplicate functionality between data warehouses &operational systems.
Extremely expensive. Costs :
Time spent in careful analysis.
Design & implementation.
Hardware costs. Software costs.
On going support & maintenance.
-
7/31/2019 Data Warehousing Final
30/31
Conclusion
Data warehousing is necessary to analyze the businessneeds, integrate data from several sources, model thedata in an appropriate manner to present the businessinformation in the form of dashboards and reports.
-
7/31/2019 Data Warehousing Final
31/31