ETL Using Informatica Power Center
-
Upload
edureka -
Category
Technology
-
view
460 -
download
2
Transcript of ETL Using Informatica Power Center
www.edureka.co
View Informatica course details at www.edureka.co/informatica
ETL Using Informatica Power Center
For Queries:Post on Twitter @edurekaIN: #askEdurekaPost on Facebook /edurekaIN
For more details please contact us: US : 1800 275 9730 (toll free)INDIA : +91 88808 62004Email Us : [email protected]
www.edureka.co/informatica
Slide 2 www.edureka.co/informatica
At the end of this session, you will be able to understand:
The Information Economy
ETL - an Overview
Why ETL is still relevant?
Informatica Overview
The Informatica Platform
Why Informatica
Informatica Partners & Customers
Informatica Architecture Overview & Components
Usecase1 - Loading Product Dimension table using Slowly changing dimension (SCD)
Usecase2 - Populate Sales summary table using Incremental Aggregation
Job trends
Scope of this course
Objectives
Slide 3 www.edureka.co/informatica
MergersAcquisitions
&Divestitures
Acquire &Retain
Customers
OutsourceNon-coreFunctions
ImproveDecisions
ModernizeBusiness
ImproveEfficiency& Reduce
Costs
Lack of relevant, trustworthy and timely data
GovernanceRisk
Compliance
IncreasePartnerNetworkEfficiency
IncreaseBusinessAgility
ConsolidationGlobalization GrowthOperationalEfficiency
Governance
The Information Economy
Lack of Trustworthy Data Impedes Key Business Imperatives
Slide 4 www.edureka.co/informatica
ETL - An Overview
ETL stands for Extraction, Transformation and Load
The "E" represents the ability to consistently and reliably extract data with high performance and minimal impact to the source system
The "T" represents the ability to transform one or more data sets in batch or real-time into a consumable format
The "L" stands for loading data into a persistent or virtual data store
Slide 5 www.edureka.co/informatica
Why ETL is Still Relevant
Is ETL becoming a History with the advent of Big Data?
Data needs to flow from source applications into analytic data stores in a controlled, reliable, secure manner
Information needs to be standardized, with regards to semantics, format and lexicon, for accurate analysis
Operational results need to be consistent and repeatable
Operational results need to be verifiable and transparent
Slide 6 www.edureka.co/informatica
Facilitates Integration of data from various data sources for building a Data warehouse
Businesses have data in multiple databases with different codification and formats
Transformation is required to convert and to summarize operational data into a consistent, business oriented format
Pre-Computation of any derived information
Summarization is also carried out to pre-compute summaries and aggregates
Makes data available in a query-able format
Why ETL is Still Relevant
Mergers and acquisitions also create disparities in data representation and pose more difficult challenges in ETL.
Slide 7 www.edureka.co/informatica
Informatica – A Product Company
Informatica Corp Provides data integration software and services for various business, industries and government organizations including telecommunications, health care, financial and insurance services.
Founded : 1993
2012 Revenue : $811.6 million
7-year Annual CAGR: 17% per year
Employees : 2,810+
Partners : 450+» Major SI, ISV, OEM and On-Demand Leaders
Customers: Over 5,000» Customers in 82 countries» Direct Presence in 28 countries» # 1 in Customer Loyalty Rankings
(7 Years in a Row)
Slide 8 www.edureka.co/informatica
The Informatica Approach
Comprehensive, Unified, Open and Economical Approach
Slide 9 www.edureka.co/informatica
Informatica Products & Their Functionalities
There are a wide range of Products available under the Informatica product suite that helps satisfy the data integration requirements within the enterprise and beyond
Informatica's product is a portfolio focused on Data Integration: Data Integration & ETL Information Lifecycle ManagementComplex Event ProcessingData MaskingData QualityData ReplicationData VirtualizationMaster Data ManagementUltra Messaging
Currently at version 9.6, these components form a toolset for establishing and maintaining enterprise-wide data warehouses
Slide 10 www.edureka.co/informatica
Informatica Products & Their Functionalities
Slide 11 www.edureka.co/informatica
A Singular Focus on Data Integration
Why Informatica?
Proven technology leadership
A track record of continuous innovation
The most neutral trusted partner
Long history of customer success
Slide 12 www.edureka.co/informatica
Business Glossary, ICC Manageability
Informatica 8.6.1Cloud Synch.
Q4 2008
Application ILMQ1 2009
Application Information Lifecycle Management
CEPPowerCenter CE
Q3 2009
Informatica 9.0Informatica Cloud 9
Q4 2009
Collaboration, Pervasive DQ, Data ServicesAddress Validation
Q2 2009
Address Validation for DQ
Complex Event Processing and Cloud IaaS
MDMUltra Messaging
Q1 2010
Multi-domain MDMUltra-low Latency Messaging
InformaticaMarketplace
Q2 2010
Online exchange for solutions
CloudQ4 2010
Trust framework, plug-ins
MDMILM
Q3 2010
Test data mgmt
Why Informatica?
A Track Record of Continuous Innovation
Slide 13 www.edureka.co/informatica
Financial Services and Insurance
Tele-communications
Manufacturing
Retail and Services
Healthcare and Life Sciences
Utilities and Energy
Government andPublic Sector
Transportation and Distribution
Over 4,200 Leaders Rely on Informatica
Why Informatica?
Slide 14 www.edureka.co/informatica
PowerCenter:
It is a single, unified enterprise data integration platform that allows companies and government organizations of all sizes to access, discover and integrate data from virtually any business system, in any format and deliver that data throughout the enterprise at any speed.
An ETL tool ( Extract, Transform and Load)
The main advantages of PowerCenter over other ETL tools lies in its robustness, for it can be used in both Windows and Unix based systems.
PowerCenter can read from a variety of different sources and write to as many targets, while transforming data in between.
The main advantages of PowerCenter over other ETL tools, and hence a reason for its popularity over other such tools are as follows:
» It is robust, and can be used in both windows and UNIX based systems» It is high performing yet very simple for developing, maintaining and administering
Introduction to PowerCenter
Slide 15 www.edureka.co/informatica
The architecture of Informatica PowerCenter (version 9.x onwards) is based on the service Oriented Architecture (SOA) concept.
A service-oriented architecture (SOA) can be defined as a group of services, which communicate with each other. The process of communication involves either simple data passing or it could involve two or more services coordinating same activity.
Informatica 9.X represents a major change in the architecture of the product line.
Aim: Its main aim is to provide improved performance and high availability.
Approach: By reengineering the underlying architecture has been made even more services-based.
PowerCenter Architecture - SOA
Slide 16 www.edureka.co/informatica
PowerCenter Architecture
Single Unified Architecture
Slide 17 www.edureka.co/informatica
PowerCenter Architecture - Proven Scalability
Threaded Parallel Processing
Slide 18 www.edureka.co/informatica
PowerCenter Architecture - Proven Scalability
Pipeline Parallel Processing
Slide 19 www.edureka.co/informatica
Client Components of PowerCenter
PowerCenter Repository Manager
PowerCenter Designer
PowerCenter Workflow Manager
PowerCenter Workflow Monitor
PowerCenter Administration Console (browser based)
Slide 20 www.edureka.co/informatica
The PowerCenter server components comprises of the following services:
Repository service: The Repository service manages the repository. It retrieves, inserts, and updates metadata into the repository database tables.
Integration service: The Integration service runs sessions and workflows.
SAP BW service: The SAP BW service looks out for RFC requests from SAP BW and initiates workflows to extract data from, or load data into the SAP BW.
Web services hub: The Web services hub receives requests from web service clients and exposes PowerCenterworkflows as services.
Server Components of PowerCenter
Slide 21Slide 21Slide 21 www.edureka.co/informatica
ODBC
Targets
Native drivers/ODBC
Native drivers/ODBC
HTTPS
SOURCES
Native drives
TCP/IP
TCP/IP
ODBC
Power Center Client
Administrator
Security
Domain MetadataRepository
Native drives
TCP/IP
DOMAIN
RepositoryService
RepositoryService Process
Overall Architecture of PowerCenter
IntegrationService
Slide 22 www.edureka.co/informatica
The salient features of a Domain are as follows:
» A Domain is a logical collection or set of nodes and services.» The PowerCenter Domain is the fundamental administrative unit of PowerCenter.» A Domain can be a single PowerCenter installation, or it can consist of multiple PowerCenter installations.
The salient features of a node are as follows:
» A node is a logical representation of a physical machine. It has physical attributes such as a hostname and a port number.
» Each node runs a service manager which is responsible for the application and core services.» A node can be a gateway node or a worker node, but it can belong to only one Domain.
Informatica - Domain & Nodes
Slide 23 www.edureka.co/informatica
A service can be described as follows:
A service is a resource that provides specialized functions. All PowerCenter processes run as services on a node.
PowerCenter has two types of services:
» Application Services represent server based functions including Repository and Integration Services.
» Core Services represent functions that manage and maintain the environment in which PowerCenter operates and include services like Log Service, Licensing Service, and Domain Service amongst many others.
Informatica- Services
Slide 24 www.edureka.co/informatica
Component-based development is a technique where predefined components or functional units, or both, with specific functionalities are used to assemble the final product.
PowerCenter follows the component-based development methodologies by allowing to build a data flow from a source to the target, using different components (called transformations) and linking them to each other as required.
Component Based Development Techniques
Slide 25 www.edureka.co/informatica
The advantages of a component-based development model are as listed below:
As the functional units are already built, the developer need not build them from scratch and can instead use them directly. Apart from making the entire process easier, this reduces the development time as well.
This approach makes bug-fixing easier as well as aid in various maintenance activities, with only the malfunctioned components needing to be fixed.
Reusability is also another factor that works in the favor of a component-based development model
Component Based Development Techniques
Slide 26 www.edureka.co/informatica
Transformation is the process in ETL where we actually apply the business rules in the data flow
It is here that the data cleansing and formatting activities are actually performed along with data validation, which is one of its main functionalities
In PowerCenter, transformations are the functional components
In order to meet all kinds of requirements, a wide range of transformations are available within Informatica
The hierarchy goes in this way
» Transformation» Mapping» Sessions» Workflow
Transformation -> Mapping -> Session -> Workflow
Slide 27 www.edureka.co/informatica
Informatica PowerCenter is the premium data integration solution available today
"Database neutral” - will communicate with any database
Powerful data transformations convert one application’s data to another’s format
Informatica PowerCenter – DI Solution
Manufacturing(DB2)
Sales (SalesForce)
Billing (Sybase)
Resource Planning (PSFT)
Inventory(SQL Server)
Marketing (ORCL)
Accounting (upgraded)
Informatica PowerCenter
Slide 28 www.edureka.co/informatica
A company purchases a new accounts payable application
PowerCenter can move the existing account data to the new application
» Preserves data lineage for tax, accounting, and other legally mandated purposes
Data Migration
InformaticaPowerCenter
Accounting (Old)
Accounting (New)
Slide 29 www.edureka.co/informatica
Company A purchases Company B
To achieve the benefits of consolidation, Company B’s billing system must be integrated into Company A’s billing system
Application Integration
InformaticaPowerCenterBilling A Billing B
Slide 30 www.edureka.co/informatica
Data Warehousing
Data warehouses put information from many sources together for analysis
Data is moved from many databases to the Data warehouse
Inventory(SQL Server)
InformaticaPowerCenter
Marketing (ORCL)
Accounting (upgraded)
Manufacturing(DB2)
Resource Planning (PSFT)
Sales (SalesForce)
Billing (Sybase)
Data warehouse
Slide 31 www.edureka.co/informatica
Middleware
Informatica can connect variety of sources, including the most of the Application Sources
SAP certified Data Integration tool
Can pull and push data into SAP R3, SAP BW systems
Have connectivity adapter for majority of the Application Sources
Can be used as Middleware between two Applications like SAP R3, SAP BW etc.
Slide 32 www.edureka.co/informatica
Some Unique Features of Informatica
Single Administration console to Administer all the application services
Unified Users, Groups, Privileges and Roles admin across PC AE Tools
Single Sign on for all the client tool - Once you login to one client tool, others are automatically logged in
In built version control
Grid and High availability
In built scheduling tool
Slide 33 www.edureka.co/informatica
Loading Product Dimension table using Slowly changing dimension (SCD)
Populate Sales summary table using Incremental Aggregation
Demonstrating Informatica PowerCenter Partitioning capability
Use Cases
Slide 34 www.edureka.co/informatica
Fresher » Data Warehouse Developer» ETL developer
Mid Level» Data Specialist» Sr. ETL Developer» Informatica Designer» Informatica Administrator
Senior Level» ETL Architect» Informatica Architect» Technical Manager
Job Trends
Informatica - Role Wise Comparison
Slide 35 www.edureka.co/informatica
Job Trends
Informatica Skill Requirements
Slide 36 www.edureka.co/informatica
Job Trends
Informatica Other Skill Requirements
Slide 37 www.edureka.co/informatica
Informatica PowerCenter Basic
Informatica PowerCenter Advanced Transformations
Informatica PowerCenter Installation and Configuration
Informatica PowerCenter Administration and Operation Basics
PowerCenter Troubleshooting & Performance Tuning
Best Practices and Methodology
Ample amount of Lab to be followed after each module
Scope of This Course
Slide 38 www.edureka.co/informatica
Module 9
» Performance Tuning & Optimization
Module 10
» PowerCenter Repository Manager
Module 11
» Informatica Administration Console & Security
Module 12
» Informatica 9.X - Technical Architecture
Module 13
» Informatica Installation & Operations Manual
Module 14
» Command line utilities
Module 15
» ETL Scenarios using Informatica
Module 16
» Best Practises & Velocity Methodologies
Module 1
» Informatica PowerCenter 9.X – An overview
Module 2
» ETL Fundamentals
Module 3
» PowerCenter Designer
Module 4
» PowerCenter Workflow Manager & Monitor
Module 5
» Advanced Transformation Techniques
Module 6
» Parameters & Variables
Module 7
» Debugging Troubleshooting Error Handling & Recovery
Module 8
» Cache
Course Topics
Slide 39
LIVE Online Class
Class Recording in LMS
24/7 Post Class Support
Module Wise Quiz
Project Work
Verifiable Certificate
www.edureka.co/informatica
How it Works
Slide 40