Genio7 Technical Wp

33
Genio Suite™ A Technical Overview

Transcript of Genio7 Technical Wp

Page 1: Genio7 Technical Wp

Genio Suite™ A Technical Overview

Page 2: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 2

While every attempt has been made to ensure the accuracy and completeness of the information in this document, some typographical or technical errors may exist. Hummingbird Connectivity – a division of Open Text cannot accept responsibility for customers’ losses resulting from the use of this document. The information contained in this document is subject to change without notice.

This document contains proprietary information that is protected by copyright. This document, in whole or in part, may not be photocopied, reproduced, or translated into another language without prior written consent from Hummingbird Connectivity.

This edition published December 2007

Page 3: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 3

Contents Abstract 5 Introduction 6 Data Integration for the Enterprise 7

Genio Suite™ 7 Genio Suite Architecture 7 Genio Suite Components 9

Genio Engine 9 Genio Repository 9 Genio Designer 10 Genio Scheduler 11 Genio MetaLinks 11 Genio DataLinks 12

Genio Suite Features 13 Environment Neutral 13 Extraction 13 Middleware and Standards Support 15 Incremental Extraction 15 Transformation Functions 15 Support of Stored Procedures and SQL Functions 17 Support for External Functions 18 National Language and Unicode Support 18 Mapping 18 Joins, Aggregation, Distinction, Filter and Sorting 18 Transaction and Nested Transaction Support 19 Validation against Metadata 19 Loading 19 Scheduling 21 Failover Capabilities 22 Performance Measurement 22 Process Optimization and Tuning 22 Data Quality Management 23 Error Handling 23 Audit and Monitoring 23 Parallelism and Process Slicing 24

Genio — Unique Processing Methodology 25 Use of the Genio Engine Exclusively 25

Page 4: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 4

Use of the Genio Engine with Processing on the Remote Database 25 Exclusive Use of the Transformation Capabilities of Remote Databases 26

Genio — Superior Productivity 27 Procedural Scripting Language 27 Reusability 27 Tracking Changes 27 Dynamic Impact Analysis 28 Auto-Documentation 28 Versioning 28 Metadata Management 29

Looking Forward 30 What is the Hummingbird Connectivity Vision? 30 Genio Suite Development Directions 30

Conclusion 32

Page 5: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 5

Abstract This paper is intended for IT professionals interested in understanding and learning about Genio Suite™. It begins with an introduction to the product, and then it explores the product architecture and unique features, and concludes with an indication of future product directions.

Page 6: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 6

Introduction The challenge of managing and leveraging all of the data within an enterprise grows increasingly complex. More and more applications, such as Customer Relationship Management (CRM), Enterprise Resource Planning (ERP), and Supply Chain Management (SCM), have become embedded in the enterprise’s daily business, and combined with Web applications and legacy systems, they have created an elaborate and complicated IT environment. Many of these applications represent large investments by the company and yet the data contained in these systems is often isolated and not easily accessible.

However, in today’s competitive and demanding business environment, organizations are recognizing the value of analyzing all enterprise data to gain a ‘single version of the truth’—truths about customer relationships, business performance, and supplier capabilities. And, this analysis is starting to take place in ‘real time,’ as businesses operate on 24x7 requirements.

The first, and most challenging, step in this analysis process is Data Integration—accessing and consolidating the disparate data and systems to feed data warehouses, operational data stores and analytic applications—which is the basis for analysis of the entire enterprise. Moreover, to enable faster implementation of business processes, organizations need a solution that will provide the ability to interchange data between all systems in their IT environment.

Page 7: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 7

Data Integration for the Enterprise Genio is Hummingbird Connectivity universal data integration solution that accesses, transforms, enriches, cleanses and directs information from across the entire spectrum of enterprise systems and applications. This powerful solution spans the functional areas of data extract, load, and transformation (ETL) and enterprise application integration (EAI) to handle today’s large data volumes and ‘near real-time’ requirements. It provides accurate, consistent, and timely information by connecting any data source to any target system throughout the enterprise. Genio allows organizations to make more informed, strategic decisions based on a single, accurate version of the truth.

Genio Suite™ Genio extends an organization’s existing investment in technology and human resources by seamlessly integrating its IT infrastructure. Genio distributes data extraction, transformation, and load processes across computing resources and is platform- and database-independent. This allows organizations to select the operating system and RDBMS of their choice to store the Genio repository rather than locking users into a single vendor or proprietary solution.

Adding to this capability is a powerful component of Genio that uses native capabilities of source and target relational databases to delegate transformation processes, leveraging existing technology and minimizing network traffic.

Genio Suite Architecture The Genio architecture is built on an extensible, component-based, hub-and-spoke design. It features a centralized engine and metadata repository (the hub) and data sources and targets (the spokes) between which the hub exchanges data. Unlike other hub-and-spoke architecture products, Genio can optimize data management processes, avoid bottlenecks and reduce network traffic by leveraging the local database capabilities.

Page 8: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 8

Figure 1 Genio Architecture

The benefit of a hub-and-spoke architecture with a centralized and open repository is that organizations can maintain full control of all data exchange processes, business rules, and metadata that make up any and all projects within the enterprise. This enhances environment management and empowers knowledge workers to make better, more efficient use of business intelligence and analytical applications.

Since its initial development, Genio has followed an open and extensible design concept in order to provide a solid platform for future development, simplifying development of additional functionality and unifying the look and feel of different applications in the suite.

This structural architecture has enabled Hummingbird Connectivity to develop a procedural approach to data transformation and exchange processing that gives users unlimited capabilities to transform and process all types of data. With this approach, users are not limited to the functions provided by the tool. Rather, they are free to develop their own re-usable code transformation to any degree of complexity.

Genio is built on client/server architecture, and incorporates a centralized and open metadata repository. It can be implemented within a distributed deployment model, allowing multiple developers to work on projects simultaneously with complete version control and customized access privileges.

Page 9: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 9

Genio Suite Components

Figure 2 Genio Components

Genio Suite offers an integrated set of components that allows organizations to design, deploy and maintain data transformation and exchange processes. Genio Suite main components include Genio Engine, Genio Repository, Genio Designer, Genio Scheduler, Genio MetaLinks and Genio DataLinks.

Genio Engine Genio harnesses the power of a scalable, multi-threaded, transformation engine that brokers information from any source to any target. The Genio architecture supports distribution and synchronization of data transformation and exchange processes over multiple Genio engines. This is crucial as data volumes increase in size and as transformation processes increase in complexity. It allows Genio to leverage the power of existing distributed computing resources. The Genio Engine supports Windows®, UNIX (Sun® Solaris, IBM AIX and HP/UX) and popular Linux platforms (SUSE and Red Hat).

Genio Repository The Genio Repository stores and manages all aspects of data transformation and exchange process metadata. All technical metadata (e.g. data structures, transformation rules…), business metadata (e.g. business rules, Data Flows…) and production metadata (e.g. programs, logs…) are stored in this repository. This repository is database-neutral and completely open. It can reside on Oracle,

Page 10: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 1 0

Microsoft® SQL Server, Sybase Adaptive Server, Sybase SQL Anywhere, IBM UDB, Informix or MySQL and on any platform supported by these databases.

Each component of a data transformation and exchange process is created as an object and stored in this repository. Genio also automatically maintains relationships between those objects and delivers a comprehensive set of dependency management features. Moreover, Genio automatically identifies any change made to metadata, and it leverages its dependency management capabilities to deliver a one-of-a-kind dynamic impact analysis. Therefore, every dependent objects impacted by a change (internally or externally) will be automatically indentified even before the next data transformation and exchange process is executed. This ensures information quality and consistency, and reduces the time required to maintain data integration processes.

Genio Designer Genio Designer is a multi-user graphical environment for designing data transformation and exchange processes. Data structures can be imported directly from source and target systems or using metadata bridges (Genio MetaLinks). User-defined business rules, functions, and procedures created in Genio Designer are stored as objects within the Genio Repository, and are completely reusable from project to project. Genio also incorporates a graphical interface that provides a complete and powerful graphical procedural scripting environment for designing data transformation processes of any complexity.

Page 11: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 1 1

Figure 3 Genio Designer

Genio Scheduler Genio Scheduler provides the ability to program processes execution either on calendar basis or on event basis. Genio Scheduler also provides monitoring of process executions as well as full history and audit-trail reporting. If desired, Genio Scheduler can work alongside external schedulers like IBM Tivoli® or CA-Unicenter. The substitution process is straightforward, as it can be implemented using standard API or command line interface calls to the underlying architecture.

Genio MetaLinks Genio MetaLinks are ‘pluggable’ metadata bridges embedded in Genio Designer that enable the import of data structures from CASE tools, ERP systems, XML Schema or Web Services Description Language documents.

Genio Suite currently provides the following MetaLinks:

• Genio MetaLink for Web Services

• Genio MetaLink for XML Schema

Page 12: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 1 2

• Genio MetaLink for SAP R/3

• Genio MetaLink for SAP BW

• Genio MetaLink for SAP Idoc

• Genio MetaLink for Sybase PowerDesigner

• Genio MetaLink for Computer Associates Erwin

Genio DataLinks Genio DataLinks provide native connectivity to most relational and multi-dimensional database systems and flat files. Data access is achieved through native APIs, ADO/OLEDB, pass-through ODBC or generic ODBC. Genio has specific grammar for all connectivities, which is used to generate native SQL statements. Genio is also able to natively access files, such as Fixed Length Files (including mainframe flat files), Delimited Files (CSV…) or XML files, and allows processing of any complex files (EDI, IDoc, WebLogs, etc.).

Genio also offers the native population of multi-dimensional databases, such as Hyperion Essbase. With this functionality, users can directly create all hierarchies or members, set all necessary attributes, and load or refresh cubes. Through native access, users do not require an additional staging area or complex multi-layer tools from multiple vendors. There are two advantages to this approach — namely much better performance due to the elimination of any staging area, and maintenance of programmatic control of multi-dimensional cubes within the transformation logic.

Genio also gives organizations the ability to access information stored in SAP applications, combine it with data from other sources, and then share it with other systems throughout the enterprise. It delivers native connectivity to extract SAP data, supports bi-directional data interchange with SAP through the SAP IDoc format, and populates SAP BW with data from other systems.

Genio is certified by SAP for both CA-ALE and BW-STA interfaces.

Genio can also interface with thousands of other systems, by leveraging its Web Services connectivity. This Universal and comprehensive Web Services support allow Genio to interfaces with all internal and external Web Services compliant applications. It also allows Genio to participate in Service Oriented Architectures (SOA), and integrate other application function in Data Integration processes.

Page 13: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 1 3

Genio Suite Features

Environment Neutral Genio is completely platform and database neutral.

All interfaces developed with Genio leverage Genio graphical interface and Genio procedural language. These features allow Genio users to develop generic business rules without binding them to any specific environment. Objects created in Genio Designer are stored in the centralized metadata repository. This centralized development model eliminates the need to re-code business rules, lookup tables, and custom functions for each new transformation project. At execution time, the Genio Engine reloads those metadata driven processes and generates the appropriate code for the target environment.

This allows the portability of these interfaces on virtually any platform.

Currently Genio supports natively, either in 32 bit mode or in 64 bit mode, 6 main platforms (Windows, Sun Solaris, IBM AIX, HP/UX, SUSE Linux and Red Hat Linux).

Extraction Genio extracts from the source databases using native SQL grammar, making it possible to optimize the use of source database power and minimize network traffic. This is done by accessing only the source rows that are pertinent to the transformation work, thereby avoiding loading of all the data into a staging area.

When working with text sources, Genio has a variety of tools to help manage complex structures like hierarchical data dumps from mainframes or EDI files. Even when dealing with these kinds of files, the Genio function is still the same, and makes no distinction if the source or target is a text file or database table.

Supported Sources

File Formats

• AS-400 Flat Files

• Delimited Text files (CSV…)

• Fixed Text files

• SAP IDoc

• Standard Flat Files (Cobol)

• Palm Pilot JFile, JfilePro

Page 14: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 1 4

• XML files

RDBMS

• Microsoft Access 97, 2000, 2002, 2003 (Native, ODBC)

• Microsoft SQL Server 6.5, 7, 2000, 2005 (ADO/OLEDB, ODBC)

• Oracle 7, 8, 8i, 9i, 10g (Native, ADO/OLEDB, ODBC)

• Informix 7.3x, 9.x (ADO/OLEDB, ODBC)

• Sybase Adaptive Server 11.x, 12.x , 15 (Native, ADO/OLEDB, ODBC)

• Sybase IQ 12.x (Native SQL, ODBC)

• Sybase SQL Anywhere 5.5, 6.x, 7.x, 8.x, 9.x (Native SQL, ODBC)

• Teradata V2R3.x, V2R4.x, V2R5.x, V2R6.x (Native SQL, ODBC)

• IBM UDB 5.x, 6.x, 7.x, 8.x (ADO/OLEDB, ODBC)

• MySQL 4.1.x (Native SQL, ODBC)

• Generic ODBC

Mainframe Connectivity (Native, ODBC)

• IBM DB2 AS400

• IBM DB2 MVS, OS/930, Z/OS

• IBM IMS

• Adabas

• VSAM

• Natural

• CICS

Other Systems and Standards

• SAP R/3 (Native)

• Web Services (Native)

Page 15: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 1 5

Middleware and Standards Support Genio supports all middleware providing ODBC, Web Services or command line interfaces. It also natively supports message-oriented middleware (MOM), such as IBM WebSphere MQ. Also, Genio can use FTP protocol, MAPI, RSH or any external application to get access to data or push data to the target environment.

By leveraging these capabilities, Genio provide access to virtually any legacy system, ERP, CRM, SCM systems and custom applications.

Incremental Extraction Genio offers multiple strategies to perform incremental extraction. Simple approaches, like selection limits based on time stamps, use of database log tables (i.e. Oracle Snapshot) or use of database triggers to capture changes, can easily be implemented to enable incremental extraction. These techniques are environment-neutral and can be implemented without any additional software investments.

Genio can also leverage its Web Services connectivity to capture data changes in most applications. By accessing the application layer of operational systems through Web Services, Genio can get access to any business transactions that occurred in these systems during a certain period of time (e.g., new purchase orders, updated product records, recently edited documents…)

Genio also provides specific DataLinks that can automatically capture changes from legacy databases or VSAM files.

Finally, Genio can also interface with products like Sybase Replication Server and directly process deltas created by this external middleware. This can also be done by leveraging MOM solutions or any message-based environment.

Transformation Functions Genio has a complete set of transformation functions making it as capable as a programming language, but providing a graphical and optimized user interface to make the design of transformation routines more productive.

Genio offers roughly 150 generic functions that can be used to build complex expressions or custom functions. These functions cover the entire spectrum of string, dates, number or boolean manipulation. Complex clauses—such as IF, THEN, ELSE or CASE—can also be written in expressions. These functions can be processed inside the Genio Engine on Windows, UNIX or Linux, but they can also be automatically translated in native SQL functions in order to execute them on the database engine side.

Standard Functions

Page 16: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 1 6

String Functions

ASCII, CHR, CONVERTTO, FILL, FORMAT (from Binary), FORMAT (from Boolean), FORMAT (from Date), FORMAT (from Numbers), FORMAT (from String), LEFT, LENGTH, LOCATE, LOCATE (with start), LOWER, LTRIM, MID, MID (with length), REPLACE, RIGHT, RTRIM, SOUNDEX, STRING (from Date), STRING (from Numbers), TRIM, UCHAR, UNICODE, UPPER

Date Functions

ADDMONTH, DATE (from three numbers), DATE (from six numbers), DATE (from seven numbers), DATE (from string), DATE (from string with format), DATEADD (number of days), DATEADD (specified period of time), DATEDEC, DATEDIFF (number of days), DATEDIFF (specified period of time), DATEINC, DAY, DAYOFWEEK, DAYOFYEAR, HOUR, LASTDAYOFMONTH, MINUTE, MONTH, NANOSECOND, SECOND, TIMESTAMP, TODAY, WEEK, WEEK3, YEAR

Mathematical and Trigonometric Functions

ACOS, ASIN, ATAN, COS, COT, EXP, LOG, MOD, SIN, SQRT, TAN

Numeric Functions

ABS, CEILING, CEILING (with decimal), DTOR, FLOOR, FLOOR (with decimal), NULLIFZERO, POWER, RND, ROUND, ROUND (with decimal), RTOD, SIGN, TRUNCATE, VAL, VAL (with decimal separator), VAL (with format), ZEROIFNULL

XML Functions

GetNodeName, GetNodeVale, GetXPath

Aggregate Functions

AVG, AVG_DISTINCT, AVG_OVER, CORR, COUNT, COUNT_ALL, COUNT_DISTINCT, COUNT_OVER, COVAR_POP, COVAR_SAMP, KURTOSIS, MAX, MIN, PERCENT_RANK, RANK_OVER, REGR_AVGX, REGR_AVGY, REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE, REGR_SXX, REGR_SXY, REGR_SYY, SKEW, SKEW_DISTINCT, STDDEV, STDDEV_DISTINCT, STDDEV_DISTINCT OVER_1, STDDEV_DISTINCT OVER_3, STDDEV_OVER_1, STDDEV_OVER_3, STDDEV_POP, STDDEV_POP_DISTINCT, STDDEV_SAMP, STDDEV_SAMP_DISTINCT, SUM, SUM_OVER_1, SUM_OVER_2, SUM_OVER_3, VAR, VAR_POP, VAR_POP_DISTINCT, VAR_SAMP, VAR_SAMP_DISTINCT

Page 17: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 1 7

Miscellaneous Functions

BINARY (from string), BOOLEAN (from string), CASE, GETREGISTRYVALUE, GETINIFILEVALUE, GENERATEFILENAMEFORDM, IFTHEN, IFTHENELSE, ISDATE, ISEMPTY, ISNULL, ISNUMERIC, MAX, MIN, NBEXCEPTIONS, SCASE, SLEEP, SYSVAR, THREADID, VALUEOF, NVL

Operators

Arithmetic Operators

+, –, *, /

Boolean Operators

NOT, AND, OR

Relational Operators

= <>, <, >, <=, >=

Additional Operators

BETWEEN, BITWISEAND, BITWISEOR, BITWISEXOR, Concatenation (&), IN, LIKE, NOT IN, NOT LIKE

Using these standard functions, Genio users can create their own macros to describe business rules. For example, a function ‘Discount’ can be calculated from a given sales amount and used everywhere across the Genio transformations, being processed inside the Genio Engine or on a remote database.

Support of Stored Procedures and SQL Functions Genio can also invoke stored procedures or any piece of SQL code that can be executed on databases. These SQL scripts can be declared in Genio’s repository to guarantee the reusability of existing code defined within relational databases, either source or target. These stored procedures and SQL Functions can either be used to retrieve data, either to extend Genio transformation feature set or simply to trigger external processing on the database side. This also enables better distribution of

Page 18: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 1 8

processing by allowing the use of remote databases transformation functions within data integration processes. For example, Oracle sequences can be reused this way.

Support for External Functions To extend the processing capabilities of Genio, it is also possible to use any legacy function in a DLL written in C++ or any other language. These external functions are declared once in Genio Designer, and can be used seamlessly in all Genio transformations, thereby preserving legacy investments.

Also, Genio can also call any Web Service, executable, external batch or shell script for specialized transformation needs.

National Language and Unicode Support Genio delivers a comprehensive National Language Support and Unicode support.

It allows simultaneous connections to multiple systems encoded in different character sets and exchanges data between these systems. Genio supports most single byte, double-byte and multi-byte character sets as well as Unicode. Whenever it’s possible, Genio can convert data from one character set to another and simultaneously manipulate strings encoded in different code page.

Genio user interface also fully supports Unicode. It allows manipulation of metadata encoded in different character sets, and delivers support for international development teams.

Mapping Genio provides different ways to define mapping. Whenever possible, the tools can automatically detect mapping based on field names, field order or any custom algorithm. Also, simple graphical mapping from the source to target can be manually defined using drag and drop functionality, and more complex mapping can be done using the Genio graphical procedural language.

Joins, Aggregation, Distinction, Filter and Sorting When multiple sources, heterogeneous or otherwise, are required, users are able to define datasets (logical view on the information system) to de-normalize, join, aggregate, sort and distinguish data from the various source systems.

These datasets can combine multiple objects from heterogeneous systems, and can also be used in other datasets.

Page 19: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 1 9

These operations are defined graphically inside Genio, with no need to write SQL code. Nevertheless, they cover the entire functional spectrum of relational database features. One can define regular joins, external left, right joins, full outer joins, calculated joins, or recursive joins involving the same table or view several times, through aliases that Genio manipulates transparently. Filters are transformed in WHERE or HAVING clauses, and sorting becomes ORDER BY. Genio recognizes each SQL grammar, thus adapting itself to the source or target DBMS.

Transaction and Nested Transaction Support Genio uses the traditional transactional mechanisms of relational databases, COMMIT/ROLLBACK, on one or more databases. Transactions can be distributed on several systems. They can occur at any moment during the execution of the interface, for each single treatment or for each important step inside a single treatment, for each functional unit. For example, it is possible to validate a transaction for each client change when processing a single table containing invoices. The COMMIT/ROLLBACK mechanisms can be triggered automatically or conditionally, based on data test results on previous interface executions. This is typically used to add failover capabilities to existing interfaces.

Genio can do conditional COMMIT/ROLLBACK for any module within a process. This means that Genio Designer can create a complex condition for loading and data quality checking, and decide whether to commit or rollback the load.

Validation against Metadata Genio is completely metadata-driven and always validates the business rules and changes against the metadata and operational sources and targets. While it is possible to write logically incorrect statements, their physical grammar is automatically tested and validated by Genio.

Metadata can be defined using Genio Designer, either by manually editing it, or by importing it. Metadata can be imported from database schema, using sample files or using Genio MetaLinks. By using third party solutions, such as MetaIntegration, it’s also possible to implemented bi-directional metadata exchange with complementary software such as BI tools (BO, Cognos…), metadata repositories or industry standards (CWM…)

Loading Genio has multiple loading strategies—single row, packet and bulk. In certain cases, the loading can be done by the source database directly, when the developer has decided to bypass the engine altogether.

Page 20: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 2 0

Explained later in ‘Genio Unique Processing Methodology’.

Genio’s design environment does not impose a pre-defined methodology to implement data loading processes. It has been designed to be highly productive and generic to support different kinds of needs. Genio provides a comprehensive and flexible solution that supports both a full data refresh strategy as well as data updates when needed.

Delete, Insert, and Update strategy are all supported natively. Genio also provides high level, user-friendly commands such as SmartInsert and SmartUpdate, designed to simply perform row additions or updates in tables (database Merge).

Supported Targets

File Formats

• AS-400 Flat Files

• Delimited Text files (CSV…)

• Fixed Text files

• SAP IDoc

• Standard Flat Files (Cobol)

• Palm Pilot JFile, JfilePro

• XML files

RDBMS

• Microsoft Access 97, 2000, 2002, 2003 (Native, ODBC)

• Microsoft SQL Server 6.5, 7, 2000, 2005 (ADO/OLEDB, ODBC)

• Oracle 7, 8, 8i, 9i, 10g (Native, ADO/OLEDB, ODBC)

• Informix 7.3x, 9.x (ADO/OLEDB, ODBC)

• Sybase Adaptive Server 11.x, 12.x , 15 (Native, ADO/OLEDB, ODBC)

• Sybase IQ 12.x (Native SQL, ODBC)

• Sybase SQL Anywhere 5.5, 6.x, 7.x, 8.x, 9.x (Native SQL, ODBC)

• Teradata V2R3.x, V2R4.x, V2R5.x, V2R6.x (Native SQL, ODBC)

• IBM UDB 5.x, 6.x, 7.x, 8.x (ADO/OLEDB, ODBC)

• MySQL 4.1.x (Native SQL, ODBC)

• Generic ODBC

Page 21: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 2 1

Mainframe Connectivity (Native, ODBC)

• IBM DB2 AS400

• IBM DB2 MVS, OS/930, Z/OS

• IBM IMS

• Adabas

• VSAM

• Natural

• CICS

Multi-dimensional Databases

• Hyperion Essbase 4.x, 5.x, 6.x, 7.x

• Oracle Express 5.x, 6.x

Other Systems and Standards

• SAP R/3 (Native)

• SAP BW (1.2, 2.0, 3.0) (Native)

• Web Services (Native)

Scheduling Genio includes a complete scheduling facility, making it possible to schedule process execution at a fixed time, periodically (daily, weekly, monthly), triggered by outside events or from the polling service (files based events). Data Integration processes can also be triggered by external events or Message Oriented Middleware (MOM) such as IBM WebSphere MQ. Combining these functions, the Genio developer can build as complex scheduling rules as necessary.

Genio Scheduler is not always required, as there is support for external applications setting Genio variables, launching processes and receiving the result of the process. This makes it very easy to implement use of system management applications like IBM Tivoli or CA Unicenter. The substitution process is straightforward, and it can be implemented on UNIX, Linux or Windows using standard API calls or command line utilities.

Page 22: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 2 2

Failover Capabilities Genio does not impose any methodology on failover functionality. The open architecture enables the developer to use any preferred technology for failover systems—including “power-off” re-starts or complex rules on continuity.

To permit such implementations, Genio provides the basic features required to implement complex failover strategy. When triggering a process execution, Genio users are able to define the list of Genio engines and timeout for each one.

If the process execution fails, Genio Scheduler can automatically trigger a fail process that will implement desired failover strategy or restart the same process. Within each process, users can define restart points and therefore automate process restart.

Hummingbird Connectivity training and best practices routinely teach these different approaches, and can help Genio developers find the best approach for each specific situation.

Performance Measurement Genio has a unique performance meter inside its logs. All the different tasks are timed, including the module coherence tests and SQL statement performance, as well as the load processes. Also the volume of impacted data on every single target systems is readably available in these logs. As a result, the Data Integration process administrators can easily spot any potential performance problems.

Genio can automatically email this report to the developer or the system administrator after each execution as well as keep it in the repository, making it possible to analyze the performance.

It is also possible to use performance measurement tools to detect and isolate networks, machines or any other potential bottlenecks.

Process Optimization and Tuning Once again, Genio openness delivers multiple ways of optimizing process executions.

Genio offers ways of offloading part of, or the entire process execution on the source or the target system.

For more information, see ‘Genio Unique Processing Methodology’ section.

Also, Genio provides multiple reading and writing strategies (single, packet, and bulk) that enable Genio users to optimize data movements based on the topology of their information system.

Page 23: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 2 3

When loading needs to be optimized, indexes can be dropped and restored on the target so that the database engine can accept rows at the speed they are sent from the Genio Engine. This is very simple to achieve with Genio’s SQL procedures and SQL functions.

Also, when required, SQL statements can be tuned using the database SQL analyzer or database hints that will ensure maximum utilization of resources. The tuning of the target is usually based on the performance of the database and the type of end-user tools used, and how they are used. Typically, the database environment provides tools to assist tuning by showing statistics on the use of disk, indexes and processor time for queries.

Data Quality Management Genio delivers data cleansing capabilities through its own functions or through partner products. Using Genio built-in functions (String functions, Soundex…), and Genio procedural language, it’s possible to implement a basic cleansing process. Also, by leveraging third party products, such as Harte-Hanks Trillium Software or SPAD DQM…, it’s possible to implement a more complex cleansing process involving address cleansing, pattern matching, and etc.

Error Handling Genio, through its graphical procedural language, also delivers exceptional error handling capabilities. Genio automatically reports all errors and anomalies in its log. Technical exceptions (Datatype issues, Constraint violation…) are automatically handled by the tool. While other exceptions types, such as business rules driven exceptions, can be handle through user-defined exceptions. The Genio user can then implement various exception handling strategies and decide if the execution should be stopped after certain number of exception, or if incriminated data should be output into rejection files. All of this can be done using Genio procedural language, which provides the Genio user with an easy and comprehensive mechanism for error handling. It is possible to define any logical test and implement virtually any type of processing according to organization business rules.

Audit and Monitoring All process execution information is logged into the Genio repository. Genio logs everything from used SQL statements to all timing information and anomalies —either system or designer defined exceptions. This log information can be accessed from the Genio Scheduler client, or by using reporting tools to query Genio’s open repository. It is also possible to use email as a delivery vehicle to inform DBAs or

Page 24: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 2 4

system administrator of processing results. Genio can also package the logs into different formats (Text, XML…). These can be sent as SMS messages to mobile phones or via email.

Parallelism and Process Slicing Genio has native multi-threading capabilities that enable processing tasks to be performed in parallel—it communicates through global variables and events between the processes. Increased performance can be achieved by splitting the load work across multiple CPUs or across multiple physical servers, depending on where the bottleneck resides.

Genio has a simple facility to perform source slicing based on the RDBMS sources. Again this architecture is open and does not impose a foreign methodology on the developer. Also, Genio gives users an easy way of defining multiple execution contexts that will handle a subdivision of the entire process with its ‘Running Context’.

Page 25: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 2 5

Genio — Unique Processing Methodology Genio offers a unique methodology that distributes transformation workload by offloading certain tasks to idle database engines during off-peak hours to maximize efficiency and system performance. The following graphics and descriptions depict the three modes of transformation processing that Genio offers:

Use of the Genio Engine Exclusively The first mode of transformation processing involves Genio extracting data from the source database, transforming it within its own engine, and then populating target databases directly. This is the standard ETL approach. This architectural schema is recommended when sources and targets are heterogeneous, or when the required transformations cannot be performed by remote source or target databases.

Figure 4 Data Access Method One

Use of the Genio Engine with Processing on the Remote Database The second Genio data access mode involves extracting data and transforming it on the source database platform. The resulting data is then transmitted through the Genio platform, and loaded into the target database where any further transformation is carried out. This is the ELT or ETLT approach. If possible, aggregations are processed on the source, thereby reducing the volume of data transmitted between source and target and, as a result, the required bandwidth to move data through the network. In this mode, data is transported through Genio and copied without transformations to the target.

Figure 5 Data Access Method Two

Page 26: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 2 6

Exclusive Use of the Transformation Capabilities of Remote Databases With the third model, the source and target are on either part of the same logical server or visible to each other (using database link). In this situation, it is not necessary for the data to leave the server or to transport the data through a communication layers (Network). In this context Genio can operate as a dynamic code generator, and only send SQL orders that have been adapted to the relational database. The RDBMS manages the extractions, transformations, and insertions (or updates). As a result of this architecture, this model outperforms the two previous ones. This is because no network bandwidth is used and it fully exploits the processing capabilities of this database platform, including Massive Parallel Processing (MPP) architectures.

Each data access mode is accessible through a common user interface and data integration processes. Leveraging these various modes are defined using the same graphical metaphor and programming methodology. By maximizing user control over data flow, the Genio data access architecture enables users to significantly improve the performance of their data exchange processes. Being able to select, manage and summarize only relevant data and control, the platform on which work is executed will vastly improve performance.

Regardless of which data access model is chosen, Genio impact analysis capabilities are maintained, ensuring that if changes are made to any element of the data exchange process, administrators are notified prior to the next scheduled execution.

Figure 6 Data Access Method Three

Page 27: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 2 7

Genio — Superior Productivity

Procedural Scripting Language Genio provides a graphical scripting language, designed specifically for simple and complex data transformation, that supplies the transformation power typically found only in traditional 4GLs. The language permits the use of loops, tests, variables, as well as IF, ELSE, THEN and CASE statements.

Procedural transformations can be performed on individual rows of data, on a set of rows, or on an entire table. The ability of Genio to manipulate data on a row-by-row basis allows the development of extremely complex transformations. The ability to support complex transformations directly within the Genio procedural language is critical as it permits the full power of impact analysis.

Reusability Objects created in Genio Designer are stored in the centralized metadata repository, making them available for re-use. Objects are developed once and deployed many times, providing consistency across all data integration processes. The centralized development model of Genio eliminates the need to re-code business rules, lookup tables, and custom functions for each new data integration project.

Also, Genio allows the reuse of external code by providing native support for Stored Procedure, SQL Function, DLL function, Shell commands and Web Services.

This enables organizations to capitalize on existing investments and simplifies the reuse of existing IT developments within Genio processes.

Tracking Changes Genio provides the ability to track differences between an object definition stored in the Genio Repository and the state of the same object, as it exists in the remote system (physical object in the source or target). By utilizing the ‘Track Changes’ wizard, Genio users can automatically detect and import changes into the Genio repository. Every change made to an object, whether it’s located in a remote database or in Genio, is also stored and available for documentation purposes. This means that Genio is always consistent with data structures, as they exist on remote sources and targets, ensuring data accuracy and consistency in every data transformation and exchange process.

Page 28: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 2 8

Dynamic Impact Analysis Genio provides a unique one-of-a-kind Dynamic Impact Analysis solution. The consistency and status of objects are automatically maintained in real-time within the design environment. Genio Designer provides two impact analysis modes—immediate and deferred.

In the immediate impact analysis mode, Genio immediately checks the impact of a change on an object and on all dependent objects. If the change impacts the integrity of an object, its status is automatically changed to ‘Invalid Object’.

When an object is modified in the deferred impact analysis mode, Genio changes its status to ‘Undefined Object’. Objects with this status can be verified at a later date.

This impact analysis is triggered either by changes made by developer within the Genio environment or by the ‘Track Changes’ feature. For example, if a source data structure changes, the ‘Track Changes’ feature detects it, and the impact analysis identifies the effect the change on all related interfaces. Genio’s impact analysis eliminates the need for developers to spend time manually tracking down dependencies whenever a change is made. Genio provides a persistent list of invalid and undefined objects, allowing developers to know the exact state of their metadata, and the immediate consequence of making a change to it. This dynamic impact analysis helps developers fix impacted objects by providing a thorough description of required changes, and even auto-correction mechanisms. This decreases the maintenance cycle and increases developer productivity.

Auto-Documentation Genio Designer automatically manages the documentation of Genio projects, including dependencies between objects, modification history and comments.

Genio users can automatically print or generate HTML documentation, dependency graphs or dataflow graphs at any time. This significantly reduces documentation efforts, and ensures the accuracy of project documentation.

Versioning Genio is built on client/server architecture and leverages an open metadata repository. It can be implemented in a centralized or distributed deployment model, allowing multiple developers to work on projects simultaneously with complete version control and customized access privileges.

Genio natively supports version management and status management (Development, Test, Production…). All versions of data integration projects are independent from each other and can be used in parallel. All objects in theses

Page 29: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 2 9

projects have timestamps for creation and modification as well as user information and comments. History of data structure modifications is also maintained automatically by the tools.

Metadata Management With the vast amount of information that organizations currently have at their disposal, there is an ever-increasing need to collect, manage and reuse that information. Organizations want to know what information they possess, its location, its origin, and its size. This ‘data about data’ is called metadata. It can describe any characteristic of the data—such as the content, its structure, its quality, or any attributes related to its processing or changes.

Quite simply, metadata is an important catalog of information from any number of sources—data warehouses, data exchange tools, business intelligence tools, ERP CRM, SCM, business process modeling, workflow, data quality tools, ECM systems, or any other application dealing with data.

Metadata secures the lineage of data, enabling knowledge workers to gain access to business rules, and to understand where the data came from and how it has been handled to date. This makes the time they spend on query and analysis activity more productive.

Metadata management provides critical access for both business users and technical users working with the data. Depending on the type of user, metadata can serve either as a blueprint to the inner technical workings of the warehouse, or as a roadmap to assist in navigating the warehouse and locating useful information.

Metadata also delivers valuable help to organizations when it comes to their compliance with regulatory rules.

Genio repository contains all metadata used by data integration processes. This metadata is made available to users through Genio tools—either by querying Genio’s open database repository or though XML datagrams.

Page 30: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 3 0

Looking Forward Today, the data integration market includes several tools. The 3 main categories that are used to implement data exchange are ETL (Extract, Transform and Load), EAI (Enterprise Application Integration) and Enterprise Information Integration (EII) tools.

ETL, EAI and EII architectures, share many common concepts—although they address different business requirements and provide specific functionalities. Generally speaking, ETL tools are primarily batch data exchange solutions, focusing on ‘Technical’ data and working mainly with RDBMS.

On the other hand, EAI tools are considered real-time data exchange solutions, focusing on ‘Business’ data and working mainly with message-oriented middleware. However, both product categories often deal with the same data, and implement the same business rules to process this data.

EII tools provide data abstraction to address the data access challenges associated with data heterogeneity and data contextualization. They provide a dynamic view on multiple source systems and enable unified access to the data residing in those various systems.

Today, organizations that want to implement these 3 types of data exchange solutions typically have to use different technologies and products, resulting in increased total cost of ownership of the data integration platform.

What is the Hummingbird Connectivity Vision? To increase their competitiveness, organizations have to make more informed, strategic decisions based on a single and accurate version of the truth. This means that they require a single solution that will deliver them accurate, consistent and timely information by connecting their data sources to any target system throughout the enterprise.

To enable organizations to implement this vision, Hummingbird Connectivity already delivers a comprehensive data integration solution that offers a strong ETL package, with some extensions into the EAI world (XML connectivity, support for IBM WebSphere® MQ, Support for Web Services...).

Genio Suite Development Directions To further incorporate this model, Hummingbird Connectivity intends to develop and extend its data integration product line to deliver a single solution that will address all business needs, in a comprehensive metadata driven environment that operates as an enterprise wide information broker.

Page 31: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 3 1

Hummingbird Connectivity will continue to extend and enhance existing functionalities of the product—including productivity, openness and ease of maintenance. Multiple improvements will be introduced to the user interface to ensure even better productivity. Interoperability and connectivity will be extended to guarantee maximum integration with IT environments. Finally, impact analysis, dependency management and auto-documentation will be enhanced to guarantee minimum maintenance time.

In order to respond to the increasing demand for data processing; performance, scalability and high availability will also receive specific attention. The Genio Engine and Genio DataLinks will be further enhanced and improved to guarantee maximum usage of resources. To enhance scalability, Hummingbird Connectivity will work on enhancing workload repartition on multiple CPUs or clustered environments, and the failover capabilities will be strengthened to guarantee 24/7 availability.

Page 32: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 3 2

Conclusion It would be accurate to say that data just might represent one of the last great competitive advantages for today’s businesses. If this is true, then companies taking advantage of a robust, flexible, and fully functional data integration solution are going to be leaders in their market.

Because they address the complex requirements of sophisticated IT environments, data integration solutions are at the technology forefront for many organizations. With its advanced architecture, support for the most complete range of database and hardware systems, and its near real-time functionality, Genio Suite offers companies around the world a best-of-breed solution for today and for the future.

Page 33: Genio7 Technical Wp

w w w . h u m m i n g b i r d . c o m 3 3

Sales & Support Corporate Head Office

Canada 38 Leek Crescent,

Richmond Hill, Ontario, L4B 4N8

Phone: +1 905-762-6400Fax: +1 905-762-6407

Toll Free: 1 877-359-4866

www.hummingbird.com [email protected] [email protected]

North America Support 1 800-486-0095

Worldwide Support +1 905-762-6400

www.opentext.com [email protected]

North America Sales

1 800-499-6544 International Sales

+800-4996-5440

If you’re a Hummingbird Connectivity partner or customer, visit www.hummingbird.com or online.opentext.com for more information about this and other Open Text solutions.

Open Text is a publicly traded company on the NASDAQ (OTEX) and the TSX (OTC).

Copyright © 2007 Hummingbird Ltd. All other trademarks or registered trademarks are the property of their respective owners. All rights reserved. Hummingbird Connectivity is a Division of Open Text Corporation. Printed in Canada. WP-03-00-EN-142.12/07