Technet Etl Design Questionnaire

15
 Informatica Technology Network http://technet.informatica.com   © 2009 Informatica Corporation. All Rights Reserved. 1 ETL Design Questionnaire Applies to: Informatica PowerCenter Summary This article provides a questionnaire that can be useful when being involved in the design process of an interface. Author Bio Author(s): Matthias Urech Company: interface-development.com Created on: January 8, 2009 Matthias Urech, founder and project leader of interface - development.com, has a proven track record of applying data integration solutions for several companies across the industry. He played a key role to implement projects in the area of data integration, data migration, data consolidation and data warehousing. Matthias contributed several articles about data integration best practices for Informatica Technology Network, resulting in nomination as member of developer wall of fame and receiving an award as contributor of the year in 2008.

Transcript of Technet Etl Design Questionnaire

Page 1: Technet Etl Design Questionnaire

8/8/2019 Technet Etl Design Questionnaire

http://slidepdf.com/reader/full/technet-etl-design-questionnaire 1/15

 

Informatica Technology Network  http://technet.informatica.com  

  © 2009 Informatica Corporation. All Rights Reserved.

ETL Design Questionnaire

Applies to:

Informatica PowerCenter

Summary

This article provides a questionnaire that can be useful when being involved in the design process of aninterface.

Author BioAuthor(s): Matthias Urech

Company: interface-development.com

Created on: January 8, 2009

Matthias Urech, founder and project leader of interface - development.com, has a proven trackrecord of applying data integration solutions for several companies across the industry. Heplayed a key role to implement projects in the area of data integration, data migration, dataconsolidation and data warehousing.

Matthias contributed several articles about data integration best practices for InformaticaTechnology Network, resulting in nomination as member of developer wall of fame and

receiving an award as contributor of the year in 2008.

Page 2: Technet Etl Design Questionnaire

8/8/2019 Technet Etl Design Questionnaire

http://slidepdf.com/reader/full/technet-etl-design-questionnaire 2/15

ETL Design Questionnaire 

Informatica Technology Network  http://technet.informatica.com  

  © 2009 Informatica Corporation. All Rights Reserved.

Table of Contents

Introduction ......................................................................................................................................................... 3 ETL Design Questionnaire..................................................................................................................................4 

Question 1: Who is involved? ......................................................................................................................... 4 Question 2: What is the scope of the interface? .............................................................................................5 Question 3: How is the data food chain? ........................................................................................................ 5 Question 4: How is the relationship between the systems? ...........................................................................6 Question 5: How is the data life cycle? ...........................................................................................................7 Question 6: How are changes captured? ....................................................................................................... 7 Question 7: What are the load frequencies? ..................................................................................................8 Question 8: What are the load types?.............................................................................................................8 Question 9: Which data volume is extracted? ................................................................................................9 Question 10: What is the strategy to load data?.............................................................................................9 Question 11: What is the load order? ...........................................................................................................10 Question 12: How is the interface data flow? ...............................................................................................10 Question 13: How is the target layout?.........................................................................................................11 Question 14: How is the logical mapping?....................................................................................................11 Question 15: Where are data quality issues addressed? .............................................................................12 Question 16: What are the operational actions?...........................................................................................12 Question 17: How is the scheduling configuration?......................................................................................13 Question 18: What level of documentation is provided?...............................................................................14 

Disclaimer and Liability notice .......................................................................................................................... 15 

Page 3: Technet Etl Design Questionnaire

8/8/2019 Technet Etl Design Questionnaire

http://slidepdf.com/reader/full/technet-etl-design-questionnaire 3/15

ETL Design Questionnaire 

Informatica Technology Network  http://technet.informatica.com  

  © 2009 Informatica Corporation. All Rights Reserved.

Introduction

Do not design a bridge by counting the number of people who swim across the river today. That’s also truefor ETL projects. Depending on your role in the ETL project, your work starts sometimes when the data flowneeds to be built. Fine, someone will tell you what you have to do. But sometimes you will be involved earlierin the project to design the interface. As a matter of fact, you are faced then with the challenge of gatheringrequirements. That’s where the ETL Design Questionnaire comes in place. Asking the right questions is notonly essential, but will put you also in a position of controlling the design process. As a side effect, you will be

recognized as a professional ETL developer that has a plan.

Sounds that the questionnaire is an exciting and useful tool to work with. But what makes the differencecompared to other methodologies or frameworks (i.e. Informatica Velocity)? Each of the provided questionsis supported by figures, tables or graphical elements (let’s call them diagrams). It is neither about creatingcomprehensive documentation nor strictly answering all questions. Consider this set of questions rather as apresentation of multiple views in order to get a common understanding about the interface for all involvedparties. The main goal is to gather as much information with a simple approach. The key word here is“tailoring”. Decide yourself what you want to use!

Like everybody has a schema for getting rich that will not work, using the proposed questions in this articlewill neither prevent you for going through the design process nor is the list of questions complete. At the endof the day, your goal should be to provide the requirements for developing a working interface, respond to

changes, and have a good costumer communication and collaboration. I hope nonetheless that the providedETL Design Questionnaire will be useful to you and to your challenges.

Page 4: Technet Etl Design Questionnaire

8/8/2019 Technet Etl Design Questionnaire

http://slidepdf.com/reader/full/technet-etl-design-questionnaire 4/15

ETL Design Questionnaire 

Informatica Technology Network  http://technet.informatica.com  

  © 2009 Informatica Corporation. All Rights Reserved.

ETL Design Questionnaire

Question 1: Who is involved?

Basically, the ETL team is responsible for extracting, transforming and loading data. More specifically, thereare more tasks to think about within and outside the ETL team. In detail:

• Performing data analysis

• Defining data quality strategy

• Gather business rules

• Develop interface

• Establish test plans

• Perform reconciliation

• Execute tests

• Prepare deployment and support rollout

• Documentation of interface

These are just some of the tasks and the list is by far not complete. However, all those tasks haveto be done by someone. The objective of the role/task diagram is to define the involved people andtheir responsibilities. In short: you can simple ask "who does what?". For example: subject matterexpert (who) provides business rules (what).

Figure 1: Role/Task Diagram

Page 5: Technet Etl Design Questionnaire

8/8/2019 Technet Etl Design Questionnaire

http://slidepdf.com/reader/full/technet-etl-design-questionnaire 5/15

ETL Design Questionnaire 

Informatica Technology Network  http://technet.informatica.com  

  © 2009 Informatica Corporation. All Rights Reserved.

Question 2: What is the scope of the interface?

Consider the interface as a black box for the moment. First, we want to understand the scope before buildingthe interface. In detail, you want to know what causes an event that provides input for the interface and whatis the output. For example: time and expense data has been posted after the month end (event) will readdaily charged hours for each employee (input) with the interface and deliver aggregated hours for thefinancial period (output).

Figure 2: Interface Scope Diagram

Question 3: How is the data food chain?

Understanding the data food chain is important in order to get the big picture about the involved systems.This gives you as early as possible an overview about constraints and dependencies in the project. The datafood chain diagram should state the involved systems (box), interfaces (arrow) and description of the datatypes (arrow description).

Figure 3: Data Food Chain Diagram

Page 6: Technet Etl Design Questionnaire

8/8/2019 Technet Etl Design Questionnaire

http://slidepdf.com/reader/full/technet-etl-design-questionnaire 6/15

ETL Design Questionnaire 

Informatica Technology Network  http://technet.informatica.com  

  © 2009 Informatica Corporation. All Rights Reserved.

Question 4: How is the relationship between the systems?

After drawing the data food chain you are able to easily focus on the systems that are involved in the ETLproject. The goal is to identify the system relationship in order to find out the system that owns the data. Thiswill help you also to see if you need to build additional interfaces or checks to ensure referential integrity.

Figure 4: System Relationships

Figure 4 illustrates the three types of system relationships:

•  Master / Slave This is the most common relationship. Data will be maintained in system 1 andprovided to system 2.

•  Master / Master (one direction) In this relationship, data will be maintained in both systems. Only system 1 will beable to update data in system 2. Therefore, additional efforts (either manual orautomatic) have to be done to prevent data inconsistency and loss of dataquality.

•  Master / Master (both directions) As already mentioned in the previous relationship, data will be maintained in bothsystems. This relationship shows that both systems are able to update data.

Of course, the road still doesn’t end here since some systems are connected to more than one(see data food chain diagram). In such case you should prioritize the data flow order and check ifthe data food chain makes sense at all.

Page 7: Technet Etl Design Questionnaire

8/8/2019 Technet Etl Design Questionnaire

http://slidepdf.com/reader/full/technet-etl-design-questionnaire 7/15

ETL Design Questionnaire 

Informatica Technology Network  http://technet.informatica.com  

  © 2009 Informatica Corporation. All Rights Reserved.

Question 5: How is the data life cycle?

By knowing the types of relationships, you are now able to draw the data flow in the data life cycle diagram.For example: the data flow arrow will point from system 1 to system 2 in case of a Master/Slave relationship.What’s left is to move the data flow arrow horizontal to define at which point of time an action (i.e. create) insystem 1 will cause a certain action in system 2.

Figure 5: Data Life Cycle Diagram

Please note that the given actions in system 1 are just examples. Some systems only allow to flag datainactive instead of delete them.

Question 6: How are changes captured?

The maintenance of data content is a key element in the development phase of an interface. There areseveral ways to capture changes:

•  Log Parsing Parsing logs is about scanning logs and capturing changes according to log entries. While this is aneffective way of doing it also contains some dangers since it’s the nature of logs to get full and do notlog anything anymore. Moreover, you will potentially loose all changes if the log is truncated before yourun the interface.

•  Audit Columns Most source systems have audit columns to store the date and time a record was added or modified.The most common approach to detect changes is to compare the last modified date with the date whenthe interface ran the last time. This approach works fine if the date in the audit columns is indicatingonly effective changes and the modified date does not overwrite the create date.

•  Current Data Selection In order to get only the current record is to select the end date field that contains NULL or a date in thefuture (e.g. 12/31/9999). This approach requires that you have a snapshot of data of the last interfacerun in order to compare and determine the changes. There’s no problem as long as the target systemdetects the changes. Otherwise, you have to think about providing additional data loads to deliver thefull dataset in case the snapshot runs out of sync.

  Timed Data Selection Timed data selection is when you check if the create or modified date fields are equal to a certain date(e.g. SYSDATE -1). This works fine if the interface is running only once per day and do never fail.Otherwise, you need to have a plan B to deliver the changes or a complete dataset.

•  Changed Data Capture Changed data capture preserves that either the source system or the interface captures the changes.Interface changes are captures by retrieving a full copy of the target system in order to compare it witha full copy of the source system. Although this approach is not the most efficient technique, it is the one

Page 8: Technet Etl Design Questionnaire

8/8/2019 Technet Etl Design Questionnaire

http://slidepdf.com/reader/full/technet-etl-design-questionnaire 8/15

ETL Design Questionnaire 

Informatica Technology Network  http://technet.informatica.com  

  © 2009 Informatica Corporation. All Rights Reserved.

that is most reliable. It’s essential that you only take an exact copy of the target system. Any otherattempt (e.g. copy of previous extraction) will have the risk to miss some data.

Choosing the approach that best fit to the environment is key. It’s important that you know all constraintsand consider them during development. Spending enough research time will prevent you from troublesrunning the interface in production.

Question 7: What are the load frequencies?

Interfaces are scheduled to be executed on a certain frequency. While scheduling itself is an operationaltask, the frequency has an impact how the interface has to deal with the data at the point of execution. Thefrequency to load data can by on a yearly, quarterly, monthly, weekly, daily, intra-daily or ad-hoc basis.Please consider that this list is by far not complete. Instead, you should think through all possible cases andwrite all load frequencies down.

Question 8: What are the load types?

The load type diagram outlines the load types and date ranges that are used in the interface.

Figure 6: Load Type Diagram

Page 9: Technet Etl Design Questionnaire

8/8/2019 Technet Etl Design Questionnaire

http://slidepdf.com/reader/full/technet-etl-design-questionnaire 9/15

ETL Design Questionnaire 

Informatica Technology Network  http://technet.informatica.com  

  © 2009 Informatica Corporation. All Rights Reserved.

Question 9: Which data volume is extracted?

Knowing the extracted data volume is not only important for performance tuning but also essential fordelivering exactly the required amount of data. The data volume contains one of the following datasets:

•  Complete Dataset All data is provided. This includes current and history data without applying anyfilter.

•  Current Dataset 

For each row only the most current data (known by source) is provided.•  History Dataset 

Only history data is provided.

•  Reduced Dataset Filter is applied with one or more criteria (e.g. client number, report date, flag).

Determining the data volume may take some detective work. You might have to perform sourcesystem analysis, study the history concept of the source system or simply talk to subject matterexperts in order to find out the criteria that are applied to the dataset.

Question 10: What is the strategy to load data?

Defining a load strategy is nothing else than building a relation between load frequency, load type and datavolume. In other words: it’s like a cube that has three dimensions: frequency (load frequency), date (loadtype) and amount (data volume). Once you have defined the load strategy it’s recommended to review eachcombination of load frequency, load type and data volume in order to identify gaps, reduce duplicates andcomplexity.

Having a proper load strategy will help you in getting a clear understanding about what data theinterface will Extract, Transform and Load (ETL). In addition, the load strategy table also containsthe parameters that can be used in order to make the interface more flexible. This can reduce theoverhead of creating multiple interfaces when only certain attributes of an interface need to bechanged.

Table 1: Load Strategy Table

Page 10: Technet Etl Design Questionnaire

8/8/2019 Technet Etl Design Questionnaire

http://slidepdf.com/reader/full/technet-etl-design-questionnaire 10/15

ETL Design Questionnaire 

Informatica Technology Network  http://technet.informatica.com  

  © 2009 Informatica Corporation. All Rights Reserved.

Question 11: What is the load order?

Defining dependencies between jobs is very important. Especially when you are loading data to a datawarehouse or a target with enabled constraints. Without, you can’t enforce referential integrity. Beside ofoutlining dependencies, a load order table also shows jobs that are running in parallel.

Table 2: Load Order Table

Question 12: How is the interface data flow?

The interface data flow diagram is mostly used to outline the extract, transform and load (ETL) process. Thegoal is to have a common understanding about the data flow and the involved applications and actions todeliver data between the systems.

Figure 7: Interface Data Flow Diagram

Page 11: Technet Etl Design Questionnaire

8/8/2019 Technet Etl Design Questionnaire

http://slidepdf.com/reader/full/technet-etl-design-questionnaire 11/15

ETL Design Questionnaire 

Informatica Technology Network  http://technet.informatica.com  

  © 2009 Informatica Corporation. All Rights Reserved.

Question 13: How is the target layout?

All that matters is the result of the solution or in other words: what is loaded into the target. Therefore, theearlier you know what you have to provide the earlier you can begin with the development.

Table 3: Target Definition Table

Question 14: How is the logical mapping?

We presume that source and target is known. The logical mapping table helps you defining the linking ofsource and target fields and to document business rules. The logical mapping is like water. It’s easier to buildsomething on it if linking and business rules are frozen.

Table 4: Logical Mapping Table

Page 12: Technet Etl Design Questionnaire

8/8/2019 Technet Etl Design Questionnaire

http://slidepdf.com/reader/full/technet-etl-design-questionnaire 12/15

ETL Design Questionnaire 

Informatica Technology Network  http://technet.informatica.com  

  © 2009 Informatica Corporation. All Rights Reserved.

Question 15: Where are data quality issues addressed?

Here, it is about defining if you should care about data quality. You should address as many data qualityissues to the source as possible since future interface development initiatives would otherwise have to dealwith it again. However, some issues like incomplete data might be best addressed in the interface.

Table 5: Data Quality Assignment Table

Question 16: What are the operational actions?

Some operational actions are overseen during development. As a result, you have to put your hands againon the interface. Thinking about operational steps from the beginning will help you identify hidden

requirements and perform accurate effort estimates.

Table 6: Operational Action Table

Page 13: Technet Etl Design Questionnaire

8/8/2019 Technet Etl Design Questionnaire

http://slidepdf.com/reader/full/technet-etl-design-questionnaire 13/15

ETL Design Questionnaire 

Informatica Technology Network  http://technet.informatica.com  

  © 2009 Informatica Corporation. All Rights Reserved.

Question 17: How is the scheduling configuration?

In most organizations operational tasks are not done by the developer. Therefore, the interface will behanded over and an operational handbook is provided. The operational handbook contains information abouthow to schedule the interface. Figure 9 displays the scheduling workflow with all its dependencies:

Figure 8: Scheduling Workflow Diagram

Moreover, a scheduling configuration table is also provided explaining in detail how the scheduling job issetup and executed.

Table 7: Scheduling Configuration Table

Page 14: Technet Etl Design Questionnaire

8/8/2019 Technet Etl Design Questionnaire

http://slidepdf.com/reader/full/technet-etl-design-questionnaire 14/15

ETL Design Questionnaire 

Informatica Technology Network  http://technet.informatica.com  

  © 2009 Informatica Corporation. All Rights Reserved.

Question 18: What level of documentation is provided?

Table 8 supports you in defining the documentation scope:

Table 8: Documentation Decision Table

Page 15: Technet Etl Design Questionnaire

8/8/2019 Technet Etl Design Questionnaire

http://slidepdf.com/reader/full/technet-etl-design-questionnaire 15/15

ETL Design Questionnaire 

Informatica Technology Network http://technet informatica com

Disclaimer and Liability notice

Informatica offers no guarantees and assumes no responsibility or liability of any type with respect to thecontent of this software asset, including any liability resulting from incompatibility between the content withinthis asset and the materials and services offered by Informatica. You agree that you will not hold, or seek tohold, Informatica responsible or liable with respect to the content of this software asset.