Data Warehouse Quick Start Bundle User Guide Informatica Cloud Standard Edition....
Transcript of Data Warehouse Quick Start Bundle User Guide Informatica Cloud Standard Edition....
Overview This bundle contains mappings that can help load dimension tables in a data warehousing project.
When building a data warehouse based on star schema you create a fact tables and dimension tables. Fact tables are used to store transaction records like sales, purchase orders. Dimension tables store information about the master data like products, customers, date etc. Since transactions are added and modified very frequently fact tables grows very fast. But master tables like product tables do not change very frequently. Since changes to dimension tables are smaller in magnitude compared to changes in fact tables, these dimensions are known as slowly growing or slowly changing dimensions. Slowly changing dimensions
Slowly changing dimensions (SCD) are dimension tables that have slowly increasing dimension data and updates to existing dimensions. When updating existing dimensions, you decide whether to keep all historical dimension data, no historical data, or just the current and previous versions of dimension data.
SCD Type 1
When you do not need historical information in a slowly changing dimension table, you can drop or truncate the existing table before using a new session in a workflow. However, in most cases, inserting new dimensions and updating existing dimensions can be more efficient than reloading the entire table. This type of dimension is called as Slowly changing dimension type 1.
SCD Type 2
If you need historical information in a dimension table, then you can choose to implement the type of dimension called Slowly changing dimension type 2. You can decide how to differentiate between current and historical data in the target:
To keep a full history, you might version new data by:
Creating a version number and versioning the primary key.
Creating a composite key using a current version flag.
Creating an effective date range.
In this bundle while implementing SCD, we will be using the effective date range to differentiate between historical
records.
Date Dimension
Date Dimension is dimension tables that contain a record for each day in the calendar. The records contain multiple attributes about the day like Quarter numbers, Day of the Week to make reporting using date attribute
easier.
Handling dimension keys
Each dimension table has a generated surrogate key to enable access to dimensional data. The surrogate key is generated as a primary key for each row written to the target. The logic for generating the surrogate keys as below
If the dimension has no rows when the mapping is run, then the sequence starts at 1 If the dimension has existing rows, then the maximum sequence value is fetched from the surrogate key
column and the sequence the new rows are inserted with sequence starting from maximum_sequence _value +1
Supported Sources:
Type Names
Database Oracle
Database Mysql
Database DB2
Database Sybase
Database Teradata
Application SAP
SaaS Applications Salesforce, Workday,
NetSuite etc.
Supported Targets:
Type Names
Database Oracle
Database Mysql
Database DB2
Database Sybase
Database Teradata
Bundle Information This bundle contains the following mappings. 1. Data_Warehouse_Dimension_With_Upsert
Cloud mapping to load data warehouse dimension with update and insert functionality, also called as Slowly Changing Dimension Type 1.
2. Data_Warehouse_Dimension_With_History
Cloud mapping to load data warehouse dimension with historical records, also called as Slowly Changing Dimension Type 2.
3. Data_Warehouse_Date_Dimension
Cloud mapping to load data dimension.
Installing the Bundle The Data Warehouse Quick Start Bundle bundle appears as an available bundle in your organization. To view and install the bundle, in your organization, click Configure > Published Bundles. After you install the bundle, you can use the objects in the bundle.
Prerequisites Informatica Cloud Standard Edition.
Data_Warehouse_Dimension_With_Upsert (SCD Type 1) This mapping can be used to load data from a master table into a data warehouse dimension table. No history of the changes will be maintained. New records from the source will be inserted and if a record read from the source already exists in the target, it will be updated.
Sample Source Table Structue: SCD_1_PROMOTIONS
Column_Name Data_Type Notes
PROMO_CODE (Primary
key)
VARCHAR2(20 BYTE) The mapping is
designed for source
table with a single
column primary key.
PROMO_NAME VARCHAR2(30 BYTE)
PROMO_SUBCATEGORY VARCHAR2(30 BYTE)
PROMO_CATEGORY VARCHAR2(30 BYTE)
PROMO_COST NUMBER(10,2)
PROMO_BEGIN_DATE DATE
PROMO_END_DATE DATE
PROMO_TOTAL VARCHAR2(15 BYTE)
INSERT_DT DATE
LAST_UPDATE_DT DATE
Sample Target Dimension Table Structue: SCD_1_DIM_PROMOTIONS
Column_Name Data_Type Notes
PROMO_KEY (Primary NUMBER(6,0) Surrogate key
key)
PROMO_CODE
(Candidate key)
VARCHAR2(20 BYTE)
PROMO_NAME VARCHAR2(30 CHAR)
PROMO_SUBCATEGORY VARCHAR2(30 CHAR)
PROMO_CATEGORY VARCHAR2(30 CHAR)
PROMO_COST NUMBER(10,2)
PROMO_BEGIN_DATE DATE
PROMO_END_DATE DATE
PROMO_TOTAL VARCHAR2(15 CHAR)
INSERT_DT DATE
LAST_UPDATE_DT DATE
DW_INSERT_DT DATE Date when the record
was inserted into the
data warehouse.
DW_UPDATE_DT DATE Date when the record
was last updated in the
data warehouse. For
new records it will be
same as
DW_INSERT_DT
Mapping Data Flow The following image shows the data flow of the mapping
Transformation Logic: EXP_Set_Flag
Calculates the o_Flag based on the value returned by lookup on the target. The flag is set to “I” or “U” to Insert or Update in the target.
EXP_gen_key
Generates next sequence number for the surrogate key based on the maximum sequence number returned by the lookup.
Mapping Parameters The following table describes the parameters in the mapping: Parameter Name Label in Task Description
Source_Connection Source Connection Provide the Source Connection
name
Source Table Object Source Table Object Select the source table
Target DW Connection Target DW Connection Select the target connection
Target Dimension Table Object Target Dimension Table Object Select the dimension table target
Choose a Primary Key Field From
Source Choose a Primary Key Field From
Source Select the Primary key from the
source Table Ex:PROMO_CODE
Choose target field corresponding
to the source Primary Key Choose target field corresponding
to the source Primary Key Choose the field from the target field which has a relation to the source primary Key e.g. PROMO_CODE
Choose Primary Key field from
target
Choose Primary Key (Dimension
key) field from target
Choose the primary key field
from the target table e.g. PROMO_KEY
TGT_INS_Field_MAPPPING TGT_INS_Field_MAPPPING Link the field DIM_KEY_INSERT from the incoming fields to Dimension Key field in the target e.g. PROMO_KEY
Sample Files
You can use the following sample files to work with the mapping: Download the sample files and scripts from Resource tab from the link.
Source Table: SCD_1_PROMOTIONS (Data_Warehouse_Dimension_With_Upsert/Script/ CreateTable_SCD_1.sql)
Target Table: SCD_1_DIM_PROMOTIONS (Data_Warehouse_Dimension_With_Upsert/Script/ CreateTable_SCD_1.sql)
Data_Warehouse_Dimension_With_History (SCD Type 2)
This mapping can be used to load data from a master table into a data warehouse dimension table. The history of the changes will be maintained by using effective date fields. New records from the source will be inserted. If a record read from the source already exists in the target, then it will be compared against the existing row to check for any changes. If the source row data is the same, then it is ignored. But if the data is different, then the record will be inserted as the latest version and the existing version’s effective to date will
be updated.
Sample Source Table Structue: CUSTOMER_DETAILS
Column Name Data Type Notes
CUSTOMER_NO
(Primary key)
NUMBER(10,0)
NAME VARCHAR2(50 BYTE)
CITY VARCHAR2(50 BYTE)
STATE VARCHAR2(50 BYTE)
COUNTRY VARCHAR2(50 BYTE)
PHONE_NUMBER VARCHAR2(50 BYTE)
Sample Target Dimension Table Structue: DIM_CUSTOMER_SCD_2
Column Name Data Type Notes
CUSTOMER_KEY
(Primary key)
NUMBER(10,0) The dimension key
CUSTOMER_NO
(Candidate key)
NUMBER(10,0)
NAME VARCHAR2(50 BYTE)
CITY VARCHAR2(50 BYTE)
STATE VARCHAR2(50 BYTE)
COUNTRY VARCHAR2(50 BYTE)
PHONE_NUMBER VARCHAR2(50 BYTE)
EFF_FROM_DT DATE The effective from date
field to capture the
history of changes
EFF_FROM__DT DATE The effective from date
field to capture the
history of changes
MD5_CHECKSUM VARCHAR2(50 BYTE) The Field used to store
the MD5 checksum of
the row. It is used to
check for changes with
corresponding record
from source.
DW_INSERT_DT DATE Date when the record
was inserted in the data
warehouse.
DW_UPDATE_DT DATE Date when the record
was last updated in the
data warehouse. For
new records it will be
same as
DW_INSERT_DT
Note: Required Fields in the target (but need not have the same name)
Column_Name
EFF_FROM_DT
EFF_TO_DT
MD5_CHECKSUM
Transformation Logic:
The expression logic for sequence generation is similar to SCD1 mapping 1. EXP_FLAG_REC
Calculates the o_Flag based on the value returned by lookup on the target and comparing the old and current MD5 values. The flag is set to “I” or “U” to Insert or Update in the target. MD5 value for the current row is stored in var_SRC_MD5_VALS. Horizontal macro functionality is used to build the concatenated string for MD5 calculation.
Mapping Parameters The following table describes the parameters in the mapping: Parameter Name Label in Task Description
Source_Connection Source_Connection Provide the Source Connection name
Source Table Object Source Table Object Select the source table
Target DW Connection Target DW Connection Select the target table
Target Dimension Table Object Target Dimension Table Object Select the target table
Choose the primary key from the
source
Choose the primary key from source Select the Primary key from the
source Table
e.g. CUSTOMER_NO
Choose the Field from target
corresponding to source primary key
Choose the Field From target
corresponding to source primary key
Choose the field from the target field
which has a relation to the source
primary key
e.g. CUSTOMER_NO
Choose the primary key field from
target
Choose a Primary Key field from
target Parameter Details Choose the primary key field from
the target Table.
e.g. CUSTOMER_KEY
Choose the port from the target that
contains the MD5 value
Choose the port from the target that
contains the MD5 value Parameter
Details
Choose a field from the target ports
that holds the MD5 checksum values
e.g MD5_CHECKSUM
Sample Files You can use the following sample files to work with the Mapping:
Source Table: CUSTOMER_DETAILS (Data_Warehouse_Dimension_With_History/Script CreateTable_SCD_2.SQL)
Target Table: DIM_CUSTOMER_SCD_2 (Data_Warehouse_Dimension_With_History/Script CreateTable_SCD_2.SQL)
Data_Warehouse_Date_Dimension(Date Dimension Load) Use this cloud mapping to load the Date Dimension. The target date dimension table structure is packaged in the bundle documentation.
Sample Source file: date_range.txt
Column Name Sample Value Notes
IN_FromDate 1/1/1990 First day in the date
dimension
IN_ToDate 1/1/2050 Last day in the date
dimension
Sample Target Dimension Table Structue: DIM_DATE
Column Name Data Type Notes
DATE_KEY NUMBER(20,0), The dimension key.
Based on Julian
Date corresponding
to the day..
DATE_DT DATE
MONTH_DAY_NUM NUMBER(20,0),
MONTH_ NUM NUMBER(20,0),
YEAR_ NUM NUMBER(20,0),
WEEK_DAY_NUM NUMBER(20,0),
WEEK_OF_MONTH NUMBER(20,0),
WEEK_OF_YEAR_NUM NUMBER(20,0),
YEAR_DAY_NUM NUMBER(20,0),
WEEK_NAME_LONG_STR VARCHAR2(100
BYTE)
WEEK_NAME_SHORT_STR VARCHAR2(100
BYTE)
MONTH_NAME_LONG_STR VARCHAR2(100
BYTE)
MONTH_NAME_SHORT_STR VARCHAR2(100
BYTE)
IS_LEAP_YR_FLAG VARCHAR2(1BYTE)
IS_WEEK_DAY_FLAG VARCHAR2(1BYTE)
QUATER_STR VARCHAR2(1BYTE)
Mapping Data Flow The following image shows the data flow of the mapping:
Data_Warehouse_Date_Dimension
Transformation Logic:
mplt_caluculate_date_dim_using_jtx
A mapplet with java transformation is used to calculate the date attributes for all the days in the input date interval.
Mapping Parameters The following table describes the parameters in the mapping:
Parameters Name Label in Task Description
Date_Range_Src_Connection Date_Range_Src_Connection Provide the flatfile connection object
Date_Range_Src Object Date_Range_Src Object Select the source file with the date range information
Date_Dimension_Tgt_Connection Date_Dimension_Tgt_Connection Provide the target connection
Date_Dimension_Tgt Object Date_Dimension_Tgt Object Select the target table object
Map fields from source to the Map fields from source to the Map the fileds of the source to the Mapplet Parameter Details Mapplet Parameter Details fields in the mapplet
TGT_FIELD_MAPPING TGT_FIELD_MAPPING Do not clear the automatch, map only Parameter Details Parameter Details the fields that are unmapped
Sample Files You can use the following sample files to work with the mapping:
Source Table: date_range.txt (Data_Warehouse_Date_Dimension/Sample Source/date_range.txt)
Target Table: DIM_DATE (Data_Warehouse_Date_Dimension/Script/Date_Dim_ORACLE.sql)
Additional Information
Informatica Global Customer Support You can contact a Customer Support Center online or by telephone. For online support, click Submit Support Request in the Informatica Cloud application. You can also use Informatica MySupport to log a case. MySupport requires a user name and password. You can request a user name and password at https://mysupport.informatica.com. The telephone numbers for Informatica Global Customer Support are available from the Informatica web site at http://www.informatica.com/us/services-and-training/support-services/global-support-centers/.