Data Warehousing Concept Using ETL Process For SCD Type-1

10
 © 2012, http://www.journalofcomputerscience.com- TIJCSA All Rights Reserved 9 Data Warehousing Concept Using ETL Process For SCD Type-1 K.Srikanth 1 , N.V.E.S.Murthy 2 , J.Anitha 3 1 Andhra University, M.Tech (Ph.D), Visakhapatnam, India. 2 Andhra University, Professor, Visakhapatnam, India. 3 Andhra University, M.Tech (Ph.D), Visakhapatnam, India. 1 [email protected] 2 dr [email protected] 3 [email protected] Abstract:  A Type 1 change overwrites an existing dimensional attribute with new information. In the customer name-change example, the new name overwrites the old name, and the value for the old version is lost. A Type One change updates only the attribute, doesn't insert new records, and affects no keys. The new incoming record (changed/modified data set) replaces the existing old record in target. It is easy to implement but does not maintain any history of prior attribute values.  Slowly Changing Dimensions (SCDs) are dimensions that have data that changes slowly, rather than changing on a time-based, regular schedule. Keywords- ETL; Metadata; Mapping; Transformation. I. INTRODUCTION With Slowly Changing Dimensions (SCDs) data changes slowly[1], rather than changing on a time-based, regular schedule. For example, you may have a dimension in your database that tracks the sales records of your company's salespeople. Creating sales reports seems simple enough, until a salesperson is transferred from one regional office to another. How do you record such a change in your sales dimension? You could calculate the sum or average of each salespersons sales, but if you use that to compare the performance of salesmen, that might give misleading information. If the salesperson was transferred and used to work in a hot market where sales were easy, and now works in a market where sales are infrequent, his/her totals will look much stronger than the other salespeople in their new region. Or you could create a second salesperson record and treat the transferred person as a new sales person, but that creates problems. Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes over time; People change their names for some reason[3]; These

Transcript of Data Warehousing Concept Using ETL Process For SCD Type-1

Page 1: Data Warehousing Concept Using ETL Process For  SCD Type-1

7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1

http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 1/10

 

© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 9

Data Warehousing Concept Using ETL Process ForSCD Type-1

K.Srikanth1, N.V.E.S.Murthy

2, J.Anitha

3

1Andhra University, M.Tech (Ph.D), Visakhapatnam, India.2Andhra University, Professor, Visakhapatnam, India.

3Andhra University, M.Tech (Ph.D), Visakhapatnam, India.

[email protected]

2

dr [email protected]@gmail.com 

Abstract: 

A Type 1 change overwrites an existing dimensional attribute with new information. Inthe customer name-change example, the new name overwrites the old name, and the value for the

old version is lost. A Type One change updates only the attribute, doesn't insert new records, andaffects no keys. The new incoming record (changed/modified data set) replaces the existing old

record in target. It is easy to implement but does not maintain any history of prior attribute values.  Slowly Changing Dimensions (SCDs) are dimensions that have data that changes slowly, rather 

than changing on a time-based, regular schedule.

Keywords- ETL; Metadata; Mapping; Transformation.

I. INTRODUCTION

With Slowly Changing Dimensions (SCDs) data changes slowly[1], rather than

changing on a time-based, regular schedule. For example, you may have a dimension in

your database that tracks the sales records of your company's salespeople. Creating sales reports

seems simple enough, until a salesperson is transferred from one regional office to another.

How do you record such a change in your sales dimension? You could calculate the sumor average of each salespersons sales, but if you use that to compare the performance of salesmen,

that might give misleading information. If the salesperson was transferred and used to work in a hot

market where sales were easy, and now works in a market where sales are infrequent, his/her totals

will look much stronger than the other salespeople in their new region. Or you could create a second

salesperson record and treat the transferred person as a new sales person, but that creates problems.

Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes over time; People change their names for some reason[3]; These

Page 2: Data Warehousing Concept Using ETL Process For  SCD Type-1

7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1

http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 2/10

K.Srikanth1, N.V.E.S.Murthy

2, J.Anitha

3, The International Journal of Computer

Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012 

© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 10

are a few examples of Slowly Changing Dimensions since some changes are happening to themover a period of time. The new incoming record (changed/modified data set) replaces the existing

old record in target.Using the oracle emp table source data implemented on SCD type-1, how tomodify and how to store the date in emp table Table 1. 

A.  Implementation:

Source:

Table 1: Oracle SQL Query On EMP Table

II.SOURCE TABLE AN SOURCE ANALYZER 

Add a relational Table source definition to a mapping, U need to connect it to a source qualifier transformation. The source qualifier transformation represents the records that the informatica

server reads when it runs a session Figure 1.

Figure 1: Source Table an Source Analyzer

Page 3: Data Warehousing Concept Using ETL Process For  SCD Type-1

7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1

http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 3/10

K.Srikanth1, N.V.E.S.Murthy

2, J.Anitha

3, The International Journal of Computer

Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012 

© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 11

III.  TARGET TABLE AN TARGET DESIGNER 

Target definitions define the structure of tables in the target database, or the structure of filetargets the Power Center Server creates when you run a workflow. If you add a target definition to

the repository that does not exist in a relational database, you need to create target tables in your 

target database Figure 2. You do this by generating and executing the necessary SQL code withinthe Warehouse Designer.

Figure 2: Target Table an Target Designer

IV.  EXPRESSION TRANSFORMATION IN INFORMATICA

Expression transformation is a connected, passive transformation used to calculate values

on a single row[5]. Examples of calculations are concatenating the first and last name, adjusting theemployee salaries, converting strings to date etc. Expression transformation can also be used to test

conditional statements before passing the data to other transformations.

A.  Creating an Expression Transformation: 

Just follow the below steps to create an expression transformation1.  In the mapping designer, create a new mapping or open an existing

mapping.

2.  Go to Toolbar->click Transformation -> Create. Select the expression

transformation. Figure 3.

3.  Enter a name, click on Create and then click on Done.

Page 4: Data Warehousing Concept Using ETL Process For  SCD Type-1

7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1

http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 4/10

K.Srikanth1, N.V.E.S.Murthy

2, J.Anitha

3, The International Journal of Computer

Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012 

© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 12

Figure 3: Diagram for Expression Transformation

Figure 4: Creating Expression port logic

You can add ports to expression transformation either by selecting and dragging ports

from other transformations or by opening the expression transformation and create ports manuallyFigure 4.We can add the port inset_flag using string datatype. In expression transformation

implement the employee key either true or false.

IIF(ISNULL(EMPKEY,’TRUE’,’FALSE’)

V. ROUTER TRANSFORMATION IN INFORMATICA

Router transformation is an active and connected transformation[8]. It is similar to the filter 

transformation used to test a condition and filter the data. In a filter transformation, you can specifyonly one condition and drops the rows that do not satisfy the condition Figure 5. Where as in a

router transformation, you can specify more than one condition and provides the ability for route thedata that meet the test condition[6]. Use router transformation if you need to test the same input data

on multiple conditions.

Page 5: Data Warehousing Concept Using ETL Process For  SCD Type-1

7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1

http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 5/10

K.Srikanth1, N.V.E.S.Murthy

2, J.Anitha

3, The International Journal of Computer

Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012 

© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 13

A.  Creating Router Transformation 

Follow the below steps to create a router transformation1.  In the mapping designer, create a new mapping or open an existing mapping

2.  Go the toolbar->Click on Transformation->Create

3.  Select the Router Transformation, enter the name, click on create and then click on Done.4.  Select the ports from the upstream transformation and drag them to the router transformation.

You can also create input ports manually on the ports tab.

Figure 5: Creating Router Transformation

We can implement the Router transformation split the two new Groups ports. One group

name Insert second group name update.

Insert: Insert_flag=’True’Update:Insert_flag=’false’

VI. UPDATE STRATEGY TRANSFORMATION IN INFORMATICA

Update strategy transformation is an active and connected transformation. Update strategy

transformation is used to insert, update, and delete records in the target table. It can also reject therecords without reaching the target table[7]. When you design a target table, you need to decide

what data should be stored in the target.

When you want to maintain a history or source in the target table, then for every change inthe source record you want to insert a new record in the target table. When you want an exact copy

of source data to be maintained in the target table, then if the source data changes you have toupdate the corresponding records in the target[2]. The design of the target table decides how to

handle the changes to existing rows Figure 6. In the informatica, you can set the update strategy attwo different levels:

Page 6: Data Warehousing Concept Using ETL Process For  SCD Type-1

7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1

http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 6/10

K.Srikanth1, N.V.E.S.Murthy

2, J.Anitha

3, The International Journal of Computer

Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012 

© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 14

•  Session Level: Configuring at session level instructs the integration service to either treat all

rows in the same way (Insert or update or delete) or use instructions coded in the session

mapping to flag for different database operations.

•  Mapping Level: Use update strategy transformation to flag rows for inert, update, delete or 

reject.

A.  Flagging Rows in Mapping with Update Strategy:

You have to flag each row for inserting, updating, deleting or rejecting. The constants and their 

numeric equivalents for each database operation are listed below.•  DD_INSERT: Numeric value is 0. Used for flagging the row as Insert.

•  DD_UPDATE: Numeric value is 1. Used for flagging the row as Update.

•  DD_DELETE: Numeric value is 2. Used for flagging the row as Delete.

•  DD_REJECT: Numeric value is 3. Used for flagging the row as Reject.

Figure 6: Update Strategy Transformation

In this Update Strategy Transformation using only Insert and Update.

Transformation Attribute Value

Update Strategy Expression: 0Update Strategy Expression: 1

VII. SEQUENCE GENERATOR TRANSFORMATION

•  Passive and Connected Transformation.•  The Sequence Generator transformation generates numeric values.•  Use the Sequence Generator to create unique primary key values[5], replace missing primary

keys, or cycle through a sequential range of numbers.

Page 7: Data Warehousing Concept Using ETL Process For  SCD Type-1

7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1

http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 7/10

K.Srikanth1, N.V.E.S.Murthy

2, J.Anitha

3, The International Journal of Computer

Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012 

© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 15

We use it to generate Surrogate Key in DWH environment mostly. When we want toMaintain history, then we need a key other than Primary Key to uniquely identify the record. So we

create a Sequence 1,2,3,4 and so on Figure 7. We use this sequence as the key. Example: If EMPNO is the key, we can keep only one record in target and can’t maintain history[10]. So we useSurrogate key as Primary key and not EMPNO.

A.  Sequence Generator Ports : 

The Sequence Generator transformation provides two output ports: NEXTVAL and CURRVA.

•  We cannot edit or delete these ports.•  Likewise, we cannot add ports to the transformation.

NEXTVAL: 

Use the NEXTVAL port to generate sequence numbers by connecting it to a Transformation or target.

Figure 7: Sequence Generator Transformation

VIII. SCD TYPE-1 MAPPING DESIGN

The complete Slowly Changing Dimension Mapping Design flow, Figure 8. This flow will provide completion information of SCD-Type-1 source data how to load target, maintain the data

 processing.

Page 8: Data Warehousing Concept Using ETL Process For  SCD Type-1

7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1

http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 8/10

K.Srikanth1, N.V.E.S.Murthy

2, J.Anitha

3, The International Journal of Computer

Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012 

© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 16

Figure 8: Slowly Changing Dimensions (SCDs) Flow 

A.  Insert :

Insert into new employee records and Update the data complete information in this tableTableSame data will display the graphical mode in ETL processing,after inert and update data

available in Table 3.

Table 2: New record inserted table 

Page 9: Data Warehousing Concept Using ETL Process For  SCD Type-1

7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1

http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 9/10

K.Srikanth1, N.V.E.S.Murthy

2, J.Anitha

3, The International Journal of Computer

Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012 

© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 17

Table 3 : Display the Designer Preview Data

Result: Display the preview data using Slowly Changing Dimensions (SCDs) Type-1 only Thenew incoming record (changed/modified data set) replaces the existing old record in target.

Source Data: Table 1

Target Data : Table 2Table 3[Graphical view]

IX.  CONCLUSIONS AND FUTURE WORK 

Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of data from several sources .In this paper, we have focused on the problem A TypeOne change updates only the attribute, doesn't insert new records, and affects no keys. It is easy to

implement but does not maintain any history of prior attribute values. Slowly Changing

Dimensions (SCDs) are dimensions that have data that changes slowly, rather than changing on a

time-based, regular schedule. Under the framework of conventional ETL, the ETL process isdefined[7] for different data source, develop and compile program or script, retrieval records from

database.In this paper, a useful engineering made study for ETL tool selection was developed. Inthe end. all three initial objec-tives were achieved[9]. Comprehensive ETL criteria were identified.

testing procedures were developed. and this work was applied to commercial ETL tools. The studycovered all major aspects of ETL usage and can be used to effectivel! compare and evaluate various

ETL tools.

REFERENCES

[1] I. William, S. Derek, and N. Genia, DW 2.0: The Architecture for the Next Generation of Data

Warehousing. Burlington, MA: Morgan Kaufman, 2008, pp. 215-229.

[2] R. J. Davenport, September 2007. [Online] ETL vs. ELT: A Subjective View. In Source IT

Page 10: Data Warehousing Concept Using ETL Process For  SCD Type-1

7/29/2019 Data Warehousing Concept Using ETL Process For SCD Type-1

http://slidepdf.com/reader/full/data-warehousing-concept-using-etl-process-for-scd-type-1 10/10

K.Srikanth1, N.V.E.S.Murthy

2, J.Anitha

3, The International Journal of Computer

Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No. 10 December 2012 

© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 18

Consulting Ltd., U.K. Available at: http://www.insource.co.uk/pdf/ETL_ELT.pdf.

[3] T. Jun, C. Kai, Feng Yu, T. Gang, “The Research and Application of ETL Tools in Business Intelligence Project,” in Proc. International Forum on Information Technology and 

 Applications, 2009, IEEE,  pp.620-623.

[4] Kimball, R., Caserta, J.: The Data Warehouse ETL Toolkit: Practical Techniques for Extracting,Cleaning,Conforming, and Delivering Data. John Wiley & Sons,2004.[5] Labio, W., Garcia-Molina, H.: E±cient Snapshot Di®erential Algorithms for Data Warehousing.

VLDB,1996.[6] Informatica Power Center, Available at:

www.informatica.com/ products/ data integration/ power center/ default.htm .

[7] Teradata, Available at: www.teradata.com.

[8] Sun SPACE M9000 Processor, Available at: http://www.sun.com/servers/highend/m9000/

[9] L. Troy, C. Pydimukkala, How to Use Power Center with Teradata to Load and Unload  Data, Informatica Corporation [Online], Available at: www.myinformatica.com.

[10] Widom, J.: Research Problems in Data Warehousing. CIKM, 1995.