Slowly Changing Dimensions

23
Slowly Changing Dimensions (SCD) - Types | Data Warehouse Slowly Changing Dimensions: Slowly changing dimensions are the dimensions in which the data cha slowly, rather than changing regularly on a time basis. For example, you may have a customer dimension in a retail domain. Let say the customer is in I month he does some shopping. Now creating the sales report for the customers is easy. Now assum customer is transferred to United States and he does shopping there. How to record such a chang customer dimension? You could sum or average the sales done by the customers. In this case you won't get the exact the sales done by the customers. As the customer salary is increased after the transfer, he/she shopping in United States compared to in India. If you sum the total sales, then the sales done might look stronger even if it is good. You can create a second customer record and treat the t customer as the new customer. However this will create problems too. Handling these issues involves SCD management methodologies which referred to as Type 1 to Type di erent types of slowly changing dimensions are explained in detail below. SCD Type 1: SCD type 1 methodology is used when there is no need to store historical data in th table. This method overwrites the old data in the dimension table with the new data. It is used errors in the dimension. As an example, i have the customer table with the below data. surrogate_key customer_id customer_name Location ------------------------------------------------ 1 1 Marspton Illions Here the customer name is misspelt. It should be Marston instead of Marspton. If you use type1 simply overwrites the data. The data in the updated table will be. surrogate_key customer_id customer_name Location ------------------------------------------------ 1 1 Marston Illions

description

It contains of description of SCD type 1,2,3

Transcript of Slowly Changing Dimensions

Slowly Changing Dimensions (SCD) - Types | Data Warehouse

Slowly Changing Dimensions: Slowly changing dimensions are the dimensions in which the data changes slowly, rather than changing regularly on a time basis.

For example, you may have a customer dimension in a retail domain. Let say the customer is in India and every month he does some shopping. Now creating the sales report for the customers is easy. Now assume that the customer is transferred to United States and he does shopping there. How to record such a change in your customer dimension?

You could sum or average the sales done by the customers. In this case you won't get the exact comparison of the sales done by the customers. As the customer salary is increased after the transfer, he/she might do more shopping in United States compared to in India. If you sum the total sales, then the sales done by the customer might look stronger even if it is good. You can create a second customer record and treat the transferred customer as the new customer. However this will create problems too.

Handling these issues involves SCD management methodologies which referred to as Type 1 to Type 3. The different types of slowly changing dimensions are explained in detail below.

SCD Type 1: SCD type 1 methodology is used when there is no need to store historical data in the dimension table. This method overwrites the old data in the dimension table with the new data. It is used to correct data errors in the dimension.

As an example, i have the customer table with the below data.

surrogate_key customer_id customer_name Location------------------------------------------------1 1 Marspton Illions

Here the customer name is misspelt. It should be Marston instead of Marspton. If you use type1 method, it just simply overwrites the data. The data in the updated table will be.

surrogate_key customer_id customer_name Location------------------------------------------------1 1 Marston Illions

The advantage of type1 is ease of maintenance and less space occupied. The disadvantage is that there is no historical data kept in the data warehouse.

SCD Type 3: In type 3 method, only the current status and previous status of the row is maintained in the table. To track these changes two separate columns are created in the table. The customer dimension table in the type 3 method will look as

surrogate_key customer_id customer_name Current_Location previous_location--------------------------------------------------------------------------1 1 Marston Illions NULL

Let say, the customer moves from Illions to Seattle and the updated table will look as

surrogate_key customer_id customer_name Current_Location previous_location--------------------------------------------------------------------------1 1 Marston Seattle Illions

Now again if the customer moves from seattle to NewYork, then the updated table will be

surrogate_key customer_id customer_name Current_Location previous_location--------------------------------------------------------------------------1 1 Marston NewYork Seattle

The type 3 method will have limited history and it depends on the number of columns you create.

SCD Type 2: SCD type 2 stores the entire history the data in the dimension table. With type 2 we can store unlimited history in the dimension table. In type 2, you can store the data in three different ways. They are Versioning Flagging Effective Date

SCD Type 2 Versioning: In versioning method, a sequence number is used to represent the change. The latest sequence number always represents the current row and the previous sequence numbers represents the past data.

As an example, lets use the same example of customer who changes the location. Initially the customer is in Illions location and the data in dimension table will look as.

surrogate_key customer_id customer_name Location Version--------------------------------------------------------1 1 Marston Illions 1

The customer moves from Illions to Seattle and the version number will be incremented. The dimension table will look as

surrogate_key customer_id customer_name Location Version--------------------------------------------------------1 1 Marston Illions 12 1 Marston Seattle 2

Now again if the customer is moved to another location, a new record will be inserted into the dimension table with the next version number.

SCD Type 2 Flagging: In flagging method, a flag column is created in the dimension table. The current record will have the flag value as 1 and the previous records will have the flag as 0.

Now for the first time, the customer dimension will look as.

surrogate_key customer_id customer_name Location flag--------------------------------------------------------1 1 Marston Illions 1

Now when the customer moves to a new location, the old records will be updated with flag value as 0 and the latest record will have the flag value as 1.

surrogate_key customer_id customer_name Location Version--------------------------------------------------------1 1 Marston Illions 02 1 Marston Seattle 1

SCD Type 2 Effective Date: In Effective Date method, the period of the change is tracked using the start_date and end_date columns in the dimension table.

surrogate_key customer_id customer_name Location Start_date End_date-------------------------------------------------------------------------1 1 Marston Illions 01-Mar-2010 20-Fdb-20112 1 Marston Seattle 21-Feb-2011 NULL

The NULL in the End_Date indicates the current version of the data and the remaining records indicate the past data.

Design/Implement/Create SCD Type 2 Effective Date Mapping in Informatica How to create or implement slowly changing dimension (SCD) Type 2 Effective Date mapping in informatica?

SCD type 2 will store the entire history in the dimension table. In SCD type 2 effective date, the dimension table will have Start_Date (Begin_Date) and End_Date as the fields. If the End_Date is Null, then it indicates the current row. Know more about SCDs atSlowly Changing Dimensions Concepts.

We will see how to implement the SCD Type 2 Effective Date in informatica. As an example consider the customer dimension. The source and target table structures are shown below:

--Source Table

Create Table Customers( Customer_Id Number Primary Key, Location Varchar2(30));

--Target Dimension Table

Create Table Customers_Dim( Cust_Key Number Primary Key, Customer_Id Number, Location Varchar2(30), Begin_Date Date, End_Date Date);

The basic steps involved in creating a SCD Type 2 Effective Date mapping are Identifying the new records and inserting into the dimension table with Begin_Date as the Current date (SYSDATE) and End_Date as NULL. Identifying the changed record and inserting into the dimension table with Begin_Date as the Current date (SYSDATE) and End_Date as NULL. Identify the changed record and update the existing record in dimension table with End_Date as Curren date.

We will divide the steps to implement the SCD type 2 Effective Date mapping into four parts.

SCD Type 2Effective Dateimplementation - Part 1

Here we will see the basic set up and mapping flow require for SCD type 2 Effective Date. The steps involved are: Create the source and dimension tables in the database. Open the mapping designer tool, source analyzer and either create or import the source definition. Go to the Warehouse designer or Target designer and import the target definition. Go to the mapping designer tab and create new mapping. Drag the source into the mapping. Go to the toolbar, Transformation and then Create. Select the lookup Transformation, enter a name and click on create. You will get a window as shown in the below image.

Select the customer dimension table and click on OK. Edit the lookup transformation, go to the ports tab and remove unnecessary ports. Just keep only Cust_key, customer_id and location ports in the lookup transformation. Create a new port (IN_Customer_Id) in the lookup transformation. This new port needs to be connected to the customer_id port of the source qualifier transformation.

Go to the conditions tab of the lookup transformation and enter the condition as Customer_Id = IN_Customer_Id Go to the properties tab of the LKP transformation and enter the below query in Lookup SQL Override. Alternatively you can generate the SQL query by connecting the database in the Lookup SQL Override expression editor and then add the WHERE clause.

SELECT Customers_Dim.Cust_Key as Cust_Key, Customers_Dim.Location as Location, Customers_Dim.Customer_Id as Customer_IdFROM Customers_DimWHERE Customers_Dim.End_Date IS NULL

Click on Ok in the lookup transformation. Connect the customer_id port of source qualifier transformation to the In_Customer_Id port of the LKP transformation. Create an expression transformation with input/output ports as Cust_Key, LKP_Location, Src_Location and output ports as New_Flag, Changed_Flag. Enter the below expressions for output ports.

New_Flag = IIF(ISNULL(Cust_Key), 1,0)Changed_Flag = IIF( NOT ISNULL(Cust_Key) AND LKP_Location != SRC_Location, 1, 0)

The part of the mapping flow is shown below.

SCD Type 2Effective Dateimplementation - Part 2

In this part, we will identify the new records and insert them into the target with Begin Date as the current date. The steps involved are:

Now create a filter transformation to identify and insert new record in to the dimension table. Drag the ports of expression transformation (New_Flag) and source qualifier transformation (Customer_Id, Location) into the filter transformation. Go the properties tab of filter transformation and enter the filter condition as New_Flag=1 Now create a update strategy transformation and connect the ports of filter transformation (Customer_Id, Location). Go to the properties tab and enter the update strategy expression as DD_INSERT. Now drag the target definition into the mapping and connect the appropriate ports of update strategy transformation to the target definition. Create a sequence generator and an expression transformation. Call this expression transformation as "Expr_Date". Drag and connect the NextVal port of sequence generator to the Expression transformation. In the expression transformation create a new output port (Begin_Date with date/time data type) and assign value SYSDATE to it. Now connect the ports of expression transformation (Nextval, Begin_Date) to the Target definition ports (Cust_Key, Begin_Date). The part of the mapping flow is shown in the below image.

SCD Type 2Effective Dateimplementation - Part 3

In this part, we will identify the changed records and insert them into the target with Begin Date as the current date. The steps involved are:

Create a filter transformation. Call this filter transformation as FIL_Changed. This is used to find the changed records. Now drag the ports from expression transformation (changed_flag), source qualifier transformation (customer_id, location), LKP transformation (Cust_Key) into the filter transformation. Go to the filter transformation properties and enter the filter condition as changed_flag =1. Now create an update strategy transformation and drag the ports of Filter transformation (customer_id, location) into the update strategy transformation. Go to the properties tab and enter the update strategy expression as DD_INSERT. Now drag the target definition into the mapping and connect the appropriate ports of update strategy transformation to the target definition. Now connect the Next_Val, Begin_Date ports of expression transformation (Expr_Date created in part 2) to the cust_key, Begin_Date ports of the target definition respectively. The part of the mapping diagram is shown below.

SCD Type 2Effective Dateimplementation - Part 4

In this part, we will update the changed records in the dimension table with End Date as current date.

Create an expression transformation and drag the Cust_Key port of filter transformation (FIL_Changed created in part 3) into the expression transformation. Go to the ports tab of expression transformation and create a new output port (End_Date with date/time data type). Assign a value SYSDATE to this port. Now create an update strategy transformation and drag the ports of the expression transformation into it. Go to the properties tab and enter the update strategy expression as DD_UPDATE. Drag the target definition into the mapping and connect the appropriate ports of update strategy to it. The complete mapping image is shown below.

Design/Implement/Create SCD Type 2 Version Mapping in Informatica

Q) How to create or implement slowly changing dimension (SCD) Type 2 versioning mapping in informatica?

SCD type 2 will store the entire history in the dimension table. Know more about SCDs atSlowly Changing Dimensions DW Concepts.

We will see how to implement the SCD Type 2 version in informatica. As an example consider the customer dimension. The source and target table structures are shown below:

--Source Table

Create Table Customers( Customer_Id Number Primary Key, Location Varchar2(30));

--Target Dimension Table

Create Table Customers_Dim( Cust_Key Number Primary Key, Customer_Id Number, Location Varchar2(30), Version Number);

The basic steps involved in creating a SCD Type 2 version mapping are Identifying the new records and inserting into the dimension table with version number as one. Identifying the changed record and inserting into the dimension table by incrementing the version number.

Lets divide the steps to implement the SCD type 2 version mapping into three parts.

SCD Type 2 version implementation - Part 1

Here we will see the basic set up and mapping flow require for SCD type 2 version. The steps involved are: Create the source and dimension tables in the database. Open the mapping designer tool, source analyzer and either create or import the source definition. Go to the Warehouse designer or Target designer and import the target definition. Go to the mapping designer tab and create new mapping. Drag the source into the mapping. Go to the toolbar, Transformation and then Create. Select the lookup Transformation, enter a name and click on create. You will get a window as shown in the below image.

Select the customer dimension table and click on OK. Edit the lookup transformation, go to the ports tab and remove unnecessary ports. Just keep only Cust_key, customer_id, location ports and Version ports in the lookup transformation. Create a new port (IN_Customer_Id) in the lookup transformation. This new port needs to be connected to the customer_id port of the source qualifier transformation.

Go to the conditions tab of the lookup transformation and enter the condition as Customer_Id = IN_Customer_Id Go to the properties tab of the LKP transformation and enter the below query in Lookup SQL Override. Alternatively you can generate the SQL query by connecting the database in the Lookup SQL Override expression editor and then add the order by clause.

SELECT Customers_Dim.Cust_Key as Cust_Key, Customers_Dim.Location as Location, Customers_Dim.Version as Version, Customers_Dim.Customer_Id as Customer_IdFROM Customers_DimORDER BY Customers_Dim.Customer_Id, Customers_Dim.Version--

You have to use an order by clause in the above query. If you sort the version column in ascending order, then you have to specify "Use Last Value" in the "Lookup policy on multiple match" property. If you have sorted the version column in descending order then you have to specify the "Lookup policy on multiple match" option as "Use First Value" Click on Ok in the lookup transformation. Connect the customer_id port of source qualifier transformation to the In_Customer_Id port of the LKP transformation. Create an expression transformation with input/output ports as Cust_Key, LKP_Location, Src_Location and output ports as New_Flag, Changed_Flag. Enter the below expressions for output ports.

New_Flag = IIF(ISNULL(Cust_Key), 1,0)Changed_Flag = IIF( NOT ISNULL(Cust_Key) AND LKP_Location != SRC_Location, 1, 0)

The part of the mapping flow is shown below.

SCD Type 2 version implementation - Part 2

In this part, we will identify the new records and insert them into the target with version value as 1. The steps involved are:

Now create a filter transformation to identify and insert new record in to the dimension table. Drag the ports of expression transformation (New_Flag) and source qualifier transformation (Customer_Id, Location) into the filter transformation. Go the properties tab of filter transformation and enter the filter condition as New_Flag=1 Now create a update strategy transformation and connect the ports of filter transformation (Customer_Id, Location). Go to the properties tab and enter the update strategy expression as DD_INSERT. Now drag the target definition into the mapping and connect the appropriate ports of update strategy transformation to the target definition. Create a sequence generator and an expression transformation. Call this expression transformation as "Expr_Ver". Drag and connect the NextVal port of sequence generator to the Expression transformation. In the expression transformation create a new output port (Version) and assign value 1 to it. Now connect the ports of expression transformation (Nextval, Version) to the Target definition ports (Cust_Key, Version). The part of the mapping flow is shown in the below image.

SCD Type 2 Version implementation - Part 3

In this part, we will identify the changed records and insert them into the target by incrementing the version number. The steps involved are:

Create a filter transformation. This is used to find the changed record. Now drag the ports from expression transformation (changed_flag), source qualifier transforamtion (customer_id, location) and LKP transformation (version) into the filter transformation. Go to the filter transformation properties and enter the filter condition as changed_flag =1. Create an expression transformation and drag the ports of filter transformation except the changed_flag port into the expression transformation. Go to the ports tab of expression transformation and create a new output port (O_Version) and assign the expression as (version+1). Now create an update strategy transformation and drag the ports of expression transformation (customer_id, location,o_version) into the update strategy transformation. Go to the properties tab and enter the update strategy expression as DD_INSERT. Now drag the target definition into the mapping and connect the appropriate ports of update strategy transformation to the target definition. Now connect the Next_Val port of expression transformation (Expr_Ver created in part 2) to the cust_key port of the target definition. The complete mapping diagram is shown in the below image:

You can implement the SCD type 2 version mapping in your own way. Remember that SCD type2 version mapping is rarely used in real time.

Design/Implement/Create SCD Type 2 Flag Mapping in InformaticaQ) How to create or implement slowly changing dimension (SCD) Type 2 Flagging mapping in informatica?

SCD type 2 will store the entire history in the dimension table. Know more about SCDs atSlowly Changing Dimensions Concepts.

We will see how to implement the SCD Type 2 Flag in informatica. As an example consider the customer dimension. The source and target table structures are shown below:

--Source Table

Create Table Customers( Customer_Id Number Primary Key, Location Varchar2(30));

--Target Dimension Table

Create Table Customers_Dim( Cust_Key Number Primary Key, Customer_Id Number, Location Varchar2(30), Flag Number);

The basic steps involved in creating a SCD Type 2 Flagging mapping are Identifying the new records and inserting into the dimension table with flag column value as one. Identifying the changed record and inserting into the dimension table with flag value as one. Identify the changed record and update the existing record in dimension table with flag value as zero.

We will divide the steps to implement the SCD type 2 flagging mapping into four parts.

SCD Type 2 Flag implementation - Part 1

Here we will see the basic set up and mapping flow require for SCD type 2 Flagging. The steps involved are: Create the source and dimension tables in the database. Open the mapping designer tool, source analyzer and either create or import the source definition. Go to the Warehouse designer or Target designer and import the target definition. Go to the mapping designer tab and create new mapping. Drag the source into the mapping. Go to the toolbar, Transformation and then Create. Select the lookup Transformation, enter a name and click on create. You will get a window as shown in the below image.

Select the customer dimension table and click on OK. Edit the lookup transformation, go to the ports tab and remove unnecessary ports. Just keep only Cust_key, customer_id and location ports in the lookup transformation. Create a new port (IN_Customer_Id) in the lookup transformation. This new port needs to be connected to the customer_id port of the source qualifier transformation.

Go to the conditions tab of the lookup transformation and enter the condition as Customer_Id = IN_Customer_Id Go to the properties tab of the LKP transformation and enter the below query in Lookup SQL Override. Alternatively you can generate the SQL query by connecting the database in the Lookup SQL Override expression editor and then add the WHERE clause.

SELECT Customers_Dim.Cust_Key as Cust_Key, Customers_Dim.Location as Location, Customers_Dim.Customer_Id as Customer_IdFROM Customers_DimWHERE Customers_Dim.Flag = 1

Click on Ok in the lookup transformation. Connect the customer_id port of source qualifier transformation to the In_Customer_Id port of the LKP transformation. Create an expression transformation with input/output ports as Cust_Key, LKP_Location, Src_Location and output ports as New_Flag, Changed_Flag. Enter the below expressions for output ports.

New_Flag = IIF(ISNULL(Cust_Key), 1,0)Changed_Flag = IIF( NOT ISNULL(Cust_Key) AND LKP_Location != SRC_Location, 1, 0)

SCD Type 2 Flag implementation - Part 2

In this part, we will identify the new records and insert them into the target with flag value as 1. The steps involved are:

Now create a filter transformation to identify and insert new record in to the dimension table. Drag the ports of expression transformation (New_Flag) and source qualifier transformation (Customer_Id, Location) into the filter transformation. Go the properties tab of filter transformation and enter the filter condition as New_Flag=1 Now create a update strategy transformation and connect the ports of filter transformation (Customer_Id, Location). Go to the properties tab and enter the update strategy expression as DD_INSERT. Now drag the target definition into the mapping and connect the appropriate ports of update strategy transformation to the target definition. Create a sequence generator and an expression transformation. Call this expression transformation as "Expr_Flag". Drag and connect the NextVal port of sequence generator to the Expression transformation. In the expression transformation create a new output port (Flag) and assign value 1 to it. Now connect the ports of expression transformation (Nextval, Flag) to the Target definition ports (Cust_Key, Flag). The part of the mapping flow is shown in the below image.

SCD Type 2 Flag implementation - Part 3

In this part, we will identify the changed records and insert them into the target with flag value as 1. The steps involved are:

Create a filter transformation. Call this filter transformation as FIL_Changed. This is used to find the changed records. Now drag the ports from expression transformation (changed_flag), source qualifier transformation (customer_id, location), LKP transformation (Cust_Key) into the filter transformation. Go to the filter transformation properties and enter the filter condition as changed_flag =1. Now create an update strategy transformation and drag the ports of Filter transformation (customer_id, location) into the update strategy transformation. Go to the properties tab and enter the update strategy expression as DD_INSERT. Now drag the target definition into the mapping and connect the appropriate ports of update strategy transformation to the target definition. Now connect the Next_Val, Flag ports of expression transformation (Expr_Flag created in part 2) to the cust_key, Flag ports of the target definition respectively. The part of the mapping diagram is shown below.

SCD Type 2 Flag implementation - Part 4

In this part, we will update the changed records in the dimension table with flag value as 0.

Create an expression transformation and drag the Cust_Key port of filter transformation (FIL_Changed created in part 3) into the expression transformation. Go to the ports tab of expression transformation and create a new output port (Flag). Assign a value "0" to this Flag port. Now create an update strategy transformation and drag the ports of the expression transformation into it. Go to the properties tab and enter the update strategy expression as DD_UPDATE. Drag the target definition into the mapping and connect the appropriate ports of update strategy to it. The complete mapping image is shown below.

I have added one more filter transformation to check if already existing user with no changes (changes flags=0)Used a filter condition changes flags=1 means it will update only changed flags