Data Warehouse Quick Start Bundle User Guide Informatica Cloud Standard Edition....

14
Data Warehouse Quick Start Bundle User Guide

Transcript of Data Warehouse Quick Start Bundle User Guide Informatica Cloud Standard Edition....

Data Warehouse Quick Start Bundle

User Guide

Overview This bundle contains mappings that can help load dimension tables in a data warehousing project.

When building a data warehouse based on star schema you create a fact tables and dimension tables. Fact tables are used to store transaction records like sales, purchase orders. Dimension tables store information about the master data like products, customers, date etc. Since transactions are added and modified very frequently fact tables grows very fast. But master tables like product tables do not change very frequently. Since changes to dimension tables are smaller in magnitude compared to changes in fact tables, these dimensions are known as slowly growing or slowly changing dimensions. Slowly changing dimensions

Slowly changing dimensions (SCD) are dimension tables that have slowly increasing dimension data and updates to existing dimensions. When updating existing dimensions, you decide whether to keep all historical dimension data, no historical data, or just the current and previous versions of dimension data.

SCD Type 1

When you do not need historical information in a slowly changing dimension table, you can drop or truncate the existing table before using a new session in a workflow. However, in most cases, inserting new dimensions and updating existing dimensions can be more efficient than reloading the entire table. This type of dimension is called as Slowly changing dimension type 1.

SCD Type 2

If you need historical information in a dimension table, then you can choose to implement the type of dimension called Slowly changing dimension type 2. You can decide how to differentiate between current and historical data in the target:

To keep a full history, you might version new data by:

Creating a version number and versioning the primary key.

Creating a composite key using a current version flag.

Creating an effective date range.

In this bundle while implementing SCD, we will be using the effective date range to differentiate between historical

records.

Date Dimension

Date Dimension is dimension tables that contain a record for each day in the calendar. The records contain multiple attributes about the day like Quarter numbers, Day of the Week to make reporting using date attribute

easier.

Handling dimension keys

Each dimension table has a generated surrogate key to enable access to dimensional data. The surrogate key is generated as a primary key for each row written to the target. The logic for generating the surrogate keys as below

If the dimension has no rows when the mapping is run, then the sequence starts at 1 If the dimension has existing rows, then the maximum sequence value is fetched from the surrogate key

column and the sequence the new rows are inserted with sequence starting from maximum_sequence _value +1

Supported Sources:

Type Names

Database Oracle

Database Mysql

Database DB2

Database Sybase

Database Teradata

Application SAP

SaaS Applications Salesforce, Workday,

NetSuite etc.

Supported Targets:

Type Names

Database Oracle

Database Mysql

Database DB2

Database Sybase

Database Teradata

Bundle Information This bundle contains the following mappings. 1. Data_Warehouse_Dimension_With_Upsert

Cloud mapping to load data warehouse dimension with update and insert functionality, also called as Slowly Changing Dimension Type 1.

2. Data_Warehouse_Dimension_With_History

Cloud mapping to load data warehouse dimension with historical records, also called as Slowly Changing Dimension Type 2.

3. Data_Warehouse_Date_Dimension

Cloud mapping to load data dimension.

Installing the Bundle The Data Warehouse Quick Start Bundle bundle appears as an available bundle in your organization. To view and install the bundle, in your organization, click Configure > Published Bundles. After you install the bundle, you can use the objects in the bundle.

Prerequisites Informatica Cloud Standard Edition.

Data_Warehouse_Dimension_With_Upsert (SCD Type 1) This mapping can be used to load data from a master table into a data warehouse dimension table. No history of the changes will be maintained. New records from the source will be inserted and if a record read from the source already exists in the target, it will be updated.

Sample Source Table Structue: SCD_1_PROMOTIONS

Column_Name Data_Type Notes

PROMO_CODE (Primary

key)

VARCHAR2(20 BYTE) The mapping is

designed for source

table with a single

column primary key.

PROMO_NAME VARCHAR2(30 BYTE)

PROMO_SUBCATEGORY VARCHAR2(30 BYTE)

PROMO_CATEGORY VARCHAR2(30 BYTE)

PROMO_COST NUMBER(10,2)

PROMO_BEGIN_DATE DATE

PROMO_END_DATE DATE

PROMO_TOTAL VARCHAR2(15 BYTE)

INSERT_DT DATE

LAST_UPDATE_DT DATE

Sample Target Dimension Table Structue: SCD_1_DIM_PROMOTIONS

Column_Name Data_Type Notes

PROMO_KEY (Primary NUMBER(6,0) Surrogate key

key)

PROMO_CODE

(Candidate key)

VARCHAR2(20 BYTE)

PROMO_NAME VARCHAR2(30 CHAR)

PROMO_SUBCATEGORY VARCHAR2(30 CHAR)

PROMO_CATEGORY VARCHAR2(30 CHAR)

PROMO_COST NUMBER(10,2)

PROMO_BEGIN_DATE DATE

PROMO_END_DATE DATE

PROMO_TOTAL VARCHAR2(15 CHAR)

INSERT_DT DATE

LAST_UPDATE_DT DATE

DW_INSERT_DT DATE Date when the record

was inserted into the

data warehouse.

DW_UPDATE_DT DATE Date when the record

was last updated in the

data warehouse. For

new records it will be

same as

DW_INSERT_DT

Mapping Data Flow The following image shows the data flow of the mapping

Transformation Logic: EXP_Set_Flag

Calculates the o_Flag based on the value returned by lookup on the target. The flag is set to “I” or “U” to Insert or Update in the target.

EXP_gen_key

Generates next sequence number for the surrogate key based on the maximum sequence number returned by the lookup.

Mapping Parameters The following table describes the parameters in the mapping: Parameter Name Label in Task Description

Source_Connection Source Connection Provide the Source Connection

name

Source Table Object Source Table Object Select the source table

Target DW Connection Target DW Connection Select the target connection

Target Dimension Table Object Target Dimension Table Object Select the dimension table target

Choose a Primary Key Field From

Source Choose a Primary Key Field From

Source Select the Primary key from the

source Table Ex:PROMO_CODE

Choose target field corresponding

to the source Primary Key Choose target field corresponding

to the source Primary Key Choose the field from the target field which has a relation to the source primary Key e.g. PROMO_CODE

Choose Primary Key field from

target

Choose Primary Key (Dimension

key) field from target

Choose the primary key field

from the target table e.g. PROMO_KEY

TGT_INS_Field_MAPPPING TGT_INS_Field_MAPPPING Link the field DIM_KEY_INSERT from the incoming fields to Dimension Key field in the target e.g. PROMO_KEY

Sample Files

You can use the following sample files to work with the mapping: Download the sample files and scripts from Resource tab from the link.

Source Table: SCD_1_PROMOTIONS (Data_Warehouse_Dimension_With_Upsert/Script/ CreateTable_SCD_1.sql)

Target Table: SCD_1_DIM_PROMOTIONS (Data_Warehouse_Dimension_With_Upsert/Script/ CreateTable_SCD_1.sql)

Data_Warehouse_Dimension_With_History (SCD Type 2)

This mapping can be used to load data from a master table into a data warehouse dimension table. The history of the changes will be maintained by using effective date fields. New records from the source will be inserted. If a record read from the source already exists in the target, then it will be compared against the existing row to check for any changes. If the source row data is the same, then it is ignored. But if the data is different, then the record will be inserted as the latest version and the existing version’s effective to date will

be updated.

Sample Source Table Structue: CUSTOMER_DETAILS

Column Name Data Type Notes

CUSTOMER_NO

(Primary key)

NUMBER(10,0)

NAME VARCHAR2(50 BYTE)

CITY VARCHAR2(50 BYTE)

STATE VARCHAR2(50 BYTE)

COUNTRY VARCHAR2(50 BYTE)

PHONE_NUMBER VARCHAR2(50 BYTE)

Sample Target Dimension Table Structue: DIM_CUSTOMER_SCD_2

Column Name Data Type Notes

CUSTOMER_KEY

(Primary key)

NUMBER(10,0) The dimension key

CUSTOMER_NO

(Candidate key)

NUMBER(10,0)

NAME VARCHAR2(50 BYTE)

CITY VARCHAR2(50 BYTE)

STATE VARCHAR2(50 BYTE)

COUNTRY VARCHAR2(50 BYTE)

PHONE_NUMBER VARCHAR2(50 BYTE)

EFF_FROM_DT DATE The effective from date

field to capture the

history of changes

EFF_FROM__DT DATE The effective from date

field to capture the

history of changes

MD5_CHECKSUM VARCHAR2(50 BYTE) The Field used to store

the MD5 checksum of

the row. It is used to

check for changes with

corresponding record

from source.

DW_INSERT_DT DATE Date when the record

was inserted in the data

warehouse.

DW_UPDATE_DT DATE Date when the record

was last updated in the

data warehouse. For

new records it will be

same as

DW_INSERT_DT

Note: Required Fields in the target (but need not have the same name)

Column_Name

EFF_FROM_DT

EFF_TO_DT

MD5_CHECKSUM

Transformation Logic:

The expression logic for sequence generation is similar to SCD1 mapping 1. EXP_FLAG_REC

Calculates the o_Flag based on the value returned by lookup on the target and comparing the old and current MD5 values. The flag is set to “I” or “U” to Insert or Update in the target. MD5 value for the current row is stored in var_SRC_MD5_VALS. Horizontal macro functionality is used to build the concatenated string for MD5 calculation.

Mapping Data Flow The following image shows the data flow of the mapping:

Mapping Parameters The following table describes the parameters in the mapping: Parameter Name Label in Task Description

Source_Connection Source_Connection Provide the Source Connection name

Source Table Object Source Table Object Select the source table

Target DW Connection Target DW Connection Select the target table

Target Dimension Table Object Target Dimension Table Object Select the target table

Choose the primary key from the

source

Choose the primary key from source Select the Primary key from the

source Table

e.g. CUSTOMER_NO

Choose the Field from target

corresponding to source primary key

Choose the Field From target

corresponding to source primary key

Choose the field from the target field

which has a relation to the source

primary key

e.g. CUSTOMER_NO

Choose the primary key field from

target

Choose a Primary Key field from

target Parameter Details Choose the primary key field from

the target Table.

e.g. CUSTOMER_KEY

Choose the port from the target that

contains the MD5 value

Choose the port from the target that

contains the MD5 value Parameter

Details

Choose a field from the target ports

that holds the MD5 checksum values

e.g MD5_CHECKSUM

Sample Files You can use the following sample files to work with the Mapping:

Source Table: CUSTOMER_DETAILS (Data_Warehouse_Dimension_With_History/Script CreateTable_SCD_2.SQL)

Target Table: DIM_CUSTOMER_SCD_2 (Data_Warehouse_Dimension_With_History/Script CreateTable_SCD_2.SQL)

Data_Warehouse_Date_Dimension(Date Dimension Load) Use this cloud mapping to load the Date Dimension. The target date dimension table structure is packaged in the bundle documentation.

Sample Source file: date_range.txt

Column Name Sample Value Notes

IN_FromDate 1/1/1990 First day in the date

dimension

IN_ToDate 1/1/2050 Last day in the date

dimension

Sample Target Dimension Table Structue: DIM_DATE

Column Name Data Type Notes

DATE_KEY NUMBER(20,0), The dimension key.

Based on Julian

Date corresponding

to the day..

DATE_DT DATE

MONTH_DAY_NUM NUMBER(20,0),

MONTH_ NUM NUMBER(20,0),

YEAR_ NUM NUMBER(20,0),

WEEK_DAY_NUM NUMBER(20,0),

WEEK_OF_MONTH NUMBER(20,0),

WEEK_OF_YEAR_NUM NUMBER(20,0),

YEAR_DAY_NUM NUMBER(20,0),

WEEK_NAME_LONG_STR VARCHAR2(100

BYTE)

WEEK_NAME_SHORT_STR VARCHAR2(100

BYTE)

MONTH_NAME_LONG_STR VARCHAR2(100

BYTE)

MONTH_NAME_SHORT_STR VARCHAR2(100

BYTE)

IS_LEAP_YR_FLAG VARCHAR2(1BYTE)

IS_WEEK_DAY_FLAG VARCHAR2(1BYTE)

QUATER_STR VARCHAR2(1BYTE)

Mapping Data Flow The following image shows the data flow of the mapping:

Data_Warehouse_Date_Dimension

Transformation Logic:

mplt_caluculate_date_dim_using_jtx

A mapplet with java transformation is used to calculate the date attributes for all the days in the input date interval.

Mapping Parameters The following table describes the parameters in the mapping:

Parameters Name Label in Task Description

Date_Range_Src_Connection Date_Range_Src_Connection Provide the flatfile connection object

Date_Range_Src Object Date_Range_Src Object Select the source file with the date range information

Date_Dimension_Tgt_Connection Date_Dimension_Tgt_Connection Provide the target connection

Date_Dimension_Tgt Object Date_Dimension_Tgt Object Select the target table object

Map fields from source to the Map fields from source to the Map the fileds of the source to the Mapplet Parameter Details Mapplet Parameter Details fields in the mapplet

TGT_FIELD_MAPPING TGT_FIELD_MAPPING Do not clear the automatch, map only Parameter Details Parameter Details the fields that are unmapped

Sample Files You can use the following sample files to work with the mapping:

Source Table: date_range.txt (Data_Warehouse_Date_Dimension/Sample Source/date_range.txt)

Target Table: DIM_DATE (Data_Warehouse_Date_Dimension/Script/Date_Dim_ORACLE.sql)

Additional Information

Informatica Global Customer Support You can contact a Customer Support Center online or by telephone. For online support, click Submit Support Request in the Informatica Cloud application. You can also use Informatica MySupport to log a case. MySupport requires a user name and password. You can request a user name and password at https://mysupport.informatica.com. The telephone numbers for Informatica Global Customer Support are available from the Informatica web site at http://www.informatica.com/us/services-and-training/support-services/global-support-centers/.