Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

20
Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran

Transcript of Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Page 1: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Data Conversion to a Data warehouse

Presented By

Sanjay Gunasekaran

Page 2: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Main Topics

• Brief Overview of Data Warehouse

• Concept of Data Conversion

• Importance of Data conversion and the steps involved

• Common Industry Methodology

• Outline and Analysis done in the Alternate Plan paper

Page 3: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Data warehousing

• It is a concept and not a product

• A method to analyze massive amounts of data to make better business decisions.

• Helpful in analyzing Sales data(E.g..) and make decisions that affect the company’s performance.

• A Data warehouse in general contains Summarized, De-normalized and Replicated data that is infrequently updated and is optimized for

decision support applications.

Page 4: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Comparison between Operational Environment and Data Warehouse

• Detailed

• Current

• Transaction Driven

• Minimum redundancy

• Static Structure

• Small amount of data

• Constantly updated

• Summarized

• Variable over time

• Analysis driven

• Some redundancy

• Flexible structure

• Huge volumes of data

• Infrequently Updated

Data WarehouseOperational Environment

Page 5: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Data Warehouse Concepts• Multidimensional Model

a) Facts

- Table containing aggregate information required for analysis.

b) Dimensions

- Classes of descriptors of the facts.

c) Hierarchies

- Level of Aggregation of data.

• Databases

a) Relational

i) Oracle

b) Multi-Dimensional

i) Oracle Express

ii) Essbase

iii) Gentium

Page 6: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Implementation Steps

• Analyze user requirements for the Data warehouse.

• Analyze existing transaction Processing Data.

• Design the Data warehouse (Multi-dimensional Model)

• Create the Data warehouse (Relational or Multi-dimensional)

• Extract and clean the operational data.

• Migrate and load the data into the warehouse.

• Do decision support analysis on the warehouse data using OLAP tools.

• Create reports for reporting purposes.

Page 7: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Data Warehouse Architecture

Terminology's

a) OLTP systems d) Staging Area

b) Metadata e) Extraction, Loading & Migration

c) Data Warehouse f) External Data

OLTP SYSTEMS

GeneralLedger Accounts

Payable PurchaseOrder

ExtractionCleaningLoading

MetaData

Data Warehouse

End User

Enternal Data FromLegacy Systems

Staging Area

Page 8: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Data Warehouse Architecture (Contd..)

• OLTP Systems– Online Transaction Processing Systems, Production Systems.

Systems used to manage and run the business.

• Metadata– consists of information about the data that feeds, gets transformed

and exists in the Data Warehouse

• Data Warehouse– Core of the Architecture

– supports informational processing by providing a solid platform of integrated, historical data from which to do analysis

Page 9: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Data Warehouse Architecture (Contd..)

• Staging Area– Data Warehouse workbench

– the place where raw data is brought in, cleaned, combined, archived and eventually exported to either the Data Warehouse or to one or more Data Marts

• Extraction, Cleaning & Loading– Known as the Data Conversion process.

– The process by which data from the operational systems are moved to the Warehouse

– One of the most important steps in the implementation of a Data Warehouse.

• External Data

Page 10: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Data Conversion

• Loading of data from the operational system to the Data warehouse.

• Process wherein data is extracted, cleaned, combined, archived and eventually loaded into the Data warehouse.

• Complex, time-consuming and unglamorous.

• Comprises of the following processes:

a) Extraction

b) Cleaning

c) Loading

• Very, Very important section of the Data warehousing process.

Page 11: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Importance of Data Conversion

• The Data warehouse holds the information that is the key to a corporation’s decision making process.

• Unreliable and “Dirty” data can effect the performance of the corporation.

• Examples

a) Marketing communications.

b) Retail Sales

c) Medical records

Page 12: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Steps in Data Conversion

• Extract data from the operational systems to intermediate schema (Staging area).

- Staging area is the Data warehouse workbench where the data is cleaned, combined, archived and eventually exported to the Data warehouse.. It has the same schema structure as the operational system.

• Convert the intermediate schema to “load data”.

• Aggregate the “load data”.

• Migrate the “load data” from the staging area to the Data Warehouse server (if the staging area is not on the same server as the warehouse).

• Load the data into the Data warehouse.

Page 13: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Data Conversion Process

PlanConversion

CreateConversion

Specifiactions

ConditionData

TransformData

Clean DataIntegrate

Data

Extract Source Data toIntermediate Schemas

AggregateLoad Data

Move andLoad Data

Quality Assurance of Data

Page 14: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Data Conversion

• Extraction

- Routines are created to read source data and move it to an intermediate staging area.

- Staging Area has the same schema as the source. It is important as the data is cleaned before it is uploaded into the warehouse.

• Convert intermediate Schemas to “Load Data”

- Data cleaning process. It comprises of:

- Data examination

- Data parsing

- Data correction

- Record matching

- Data transformation

Page 15: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Data Conversion (Contd..)

• Aggregate “Load data”

- “Load data” is aggregated by executing a series of sorts externally.

• Move the “Load data” from the staging area onto the Data

warehouse server

- Done if the Data warehouse server is different

• Load the data onto the Data warehouse

- Done using SQL routines or bulk-load utilities.

Page 16: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Paper Outline

• Brief explanation of Data warehousing concept

• Data warehouse architecture

• Data conversion

• Importance of data conversion

• Common Industry methodology

• Analysis of Data conversion process using an example:

- Sales Order System

Page 17: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Overall Analysis

• Concept of the paper was to outline the Data Conversion process.

• Design a Relational Database, Staging Area and Data Warehouse.

• Move Data from the Relational database to the Staging Area

• Move Data from the Staging area to the Warehouse.

Page 18: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

In-depth Analysis

• Designed the Relational Database to reflect the Transactional processing system of a common Organization.

• Designed the Staging Area to reflect only the Sales system.

• Designed the Data Warehouse for the Sales system.

• Built the relational database(source system) for the quoted example (Sales System) in Oracle

• Built the Staging Area in Oracle.

• Built the Data Warehouse in Oracle (Multi Dimensional Design in a relational Database).

• Created Views for the source tables(Transparency)

• Created synonyms for the views (as source tables were in a different server)

Page 19: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

In-depth Analysis (Contd..)• Wrote SQL scripts to first move data from the synonyms created, to the Staging

area.

• Wrote SQL scripts and procedures to move data from the Staging Area to the Data Warehouse.

– Data was moved first from the Staging area tables to the dimension tables namely Product, Location and Customer.

– Time dimension table was populated with 10 years of data. Additional scripts were written to populate the time dimension with data every year.

– Data was moved from the Staging area to the fact table (Core Table).

• Wrote scripts to check for the consistency of data. These scripts checked the total records moved from the Source system to the Satging area and from the Staging area to the Data Warehouse. Additionally, they checked for the total amount moved from the database to the Data Warehouse.

Page 20: Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.

Conclusion

• The importance of the Data warehouse can only be achieved by OLAP analysis and Data Mining.

• Data Conversion is one of the most critical process in implementing a Data warehouse

• Warehouse holds the information that is of great value to the enterprise

• Data conversion process must be done effectively and efficiently