Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream...

39
Java Extraction and Transformation Service for Transmitting Records and Exchanging Application Metadata: (JETSTREAM) Specifications Version 1.0 PHASE 2 12 April 2022 Page 1 of 39

Transcript of Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream...

Page 1: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Java Extraction and Transformation Service for Transmitting Records and Exchanging Application Metadata:

(JETSTREAM) SpecificationsVersion 1.0

PHASE 2

16 May 2023 Page 1 of 31

Page 2: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

Table of Contents

1.0 Introduction_________________________________________________________________32.0 Business Architecture_________________________________________________________5

2.1 Business Purpose Overview__________________________________________________52.2 Data Migration Process______________________________________________________52.3 Groups and Roles__________________________________________________________62.4 Non-Functional Requirements_________________________________________________7

3.0 System Architecture__________________________________________________________83.1 System Architecture Connectivity______________________________________________83.2 Server Specifications_______________________________________________________103.3 Client Specifications_______________________________________________________113.4 Future Directions__________________________________________________________11

4.0 JETSTREAM Application Storyboard____________________________________________134.1 Process-Components Map/Use-Case Identification_______________________________134.2 Use-Case Overview_______________________________________________________134.3 Transform Data Use-Case__________________________________________________14

4.3.1 Process Logic_________________________________________________________144.3.2 UI and Message Functionality____________________________________________164.3.3 Specific Transform Use-Cases____________________________________________164.3.4 Test Cases___________________________________________________________19

4.4 Create XML Transform/Cleanse Doc Use-Case__________________________________204.4.1 Use-Case Detail_______________________________________________________214.4.2 XSL Generator System Architecture________________________________________22

5.0 Site Implementation and Operation_____________________________________________235.1 Development Team________________________________________________________235.2 Phase 2 Project Plan_______________________________________________________235.4 Long-term Operational Needs________________________________________________24

5.4.1 Operational Skills Required______________________________________________245.4.2 Recommended Back-Up and Recovery Procedures___________________________24

Appendices___________________________________________________________________26Appendix A: Reference Material_________________________________________________27Appendix B: Glossary of Acronyms_______________________________________________28Appendix C: Source Data Inclusion Requirements Changes Log________________________31Appendix D: Summary of Databases for Data Exchange______________________________32

16 May 2023 Page 2 of 31

Page 3: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

1.0 IntroductionThese specifications are intended to communicate to the development team and open source community the functionality and technical details to be implemented in the second phase of development ongoing data migration and synchronization from various databases to a single database. This is similar in many respects to what is required to populate a data warehouse.

A single database was the initial source for populating the database in Phase 1. Phase 2 has identified several data sources for information exchange, both to receive as well as send data. This requires extending the JETSTREAM application to meet these needs.

The desire is to expand the framework for a solution that meets the following overall objectives:

Allows a repeatable process to map, extract, transform, and cleanse data from existing databases into the single developed database.

Is extended to include non-relational data sources; XML and CSV are the most likely candidates. This may roll into Phase 3 if no data sources are identified that require this.

Is extended to allow scheduled, data exchange with data sources requiring update. Continues to be built upon a system architecture that is portable to other operating

system environments and can be integrated into an architecture that supports web services.

Re-use as many available components as possible; the developed system has been contributed to the open source community to ensure long-term availability and dramatically reduce maintenance costs. Additional developer membership is being sought from the open source community.

The Phase 2 implementation will provide the ability to

1. Identify structures within an existing data source and the JETSTREAM user will develop a suitable structure for the target repository. The source and target structures will be identified within an XML document.

2. Identify transformation rules to be applied to one or more source data elements (fields within a table or structured file: XML, csv, or SQL DML script) to get a value for one or more data elements into the target repository. These will be described within an XSL/XML document.

3. Apply defined cleansing rules to the data (within the staging repository or during processing); cleansing rules will be described in an XSL/XML document.

4. Perform the end-to-end process using the rule sets described above either interactively or on a schedule basis.

This Phase 2 transfer mechanism will be created on the Java platform integrating various open source components. This will provide the following advantages:

A re-usable component that will allow transfer between any JDBC (and by extension ODBC) compliant database or other structured data source.

Ability to run on any platform with a Java Virtual Machine. Ability to control the transfer (transformation rules, cleansing rules, source tables, etc.) of

information through XSL/XML. Little or no software acquisition costs to assemble the necessary components.

This transfer mechanism will be supported in Phase 2 conceptually as shown in Figure 1.

16 May 2023 Page 3 of 31

Page 4: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

Figure 1It is assumed that most transfers between the various source databases will be on a scheduled basis.

Phase2 will maintain the business rules for the mappings and transformations in an Extraction-Transformation Definition (.etd) file that is used to build an XSL document to fully describe the transformation. Stub files, which will be referred also as Stylesheet Rules Templates that describe the basics of the transformation needs (e.g. items in this XML format should go to this type of database entry) will also be used. An XSL Generation Module will utilize both of these file types to produce the needed XSL. A source builder module will have file locations or URLs managed and use plug-ins to process these into the needed XML node tree. The XML Source plug-in is not actually required as the Xalan XSL engine natively handles it; it is shown for completeness.

Phase 3 is intended to support extending to provide a graphical user interface in building the rules for mappings and transformations (the .etd file), a reporting mechanism for these, and so that the application suite can respond to a web service request for data exchange, allowing other external applications to provide data (e.g. a system supporting the procurement) or receive data (e.g. a remote system tracking their approved requests). Within the UI, the suite will also support creating and modifying the target database structure if it does not exist.

This application has being dubbed the Java Extraction Transformation Service for Transmitting Records and Exchanging Application Metadata or JETSTREAM and is sponsored by SAIC. Phase 2 will culminate in version 1.0 of JETSTREAM.

16 May 2023 Page 4 of 31

Page 5: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

2.0 Business ArchitectureThis section describes the purpose for the JETSTREAM application and how the various personnel that would interact with it in Phase 2.

2.1 Business Purpose Overview

The need for migrating data form one data source to another is often necessary. Typical uses can be the need to populate a data warehouse or operational data store or to move data from one transactional system to another. Often these data sources are disparate in nature; the data sources may be of a different format (one uses XML and another uses a RDBMS) or the structure even of the same type may be very different (one may be a more normalized structure, while the other is using a star schema). Within the structures, the field or attribute datatypes may be different; one may hold a date using a date type while another may use three integers representing the month, day, and year. Finally, there may be a need to cleanse or transform the data as it moves from one data source to another in order to meet certain criteria. For example, moving a Boolean value to char(1) field that expects a Y/N value set.

The desire is to continue within a solution framework that meets the following overall objectives:

Allows a repeatable process to map, extract, transform, and cleanse data from existing databases into the target data base.

Is extended to include non-relational data sources as both sources or targets; XML and CSV are the most likely candidates.

Is extended to allow scheduled, data exchange with data sources requiring update. Continues to be built upon a system architecture that is portable to other operating

system environments and can be integrated into one that supports web services. Re-use as many available components as possible; the developed system has been

contributed to the open source community to ensure long-term availability and dramatically reduce maintenance costs. Additional developer membership is being sought from the open source community.

2.2 Data Migration Process

The process for designing, constructing, and using the transformation component is shown in the Eriksson-Penker UML diagram of Figure 2.

16 May 2023 Page 5 of 31

Page 6: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

Figure 2Transformation rules are defined within XSL (used for both transformation and cleansing) documents. The existing database is analyzed and whether it is considered a suitable source for any of the elements; these are the critical ones to transform and cleanse. This output determines the transforms and cleansings required. Simultaneously, the frequency of updates (if required at all) is determined. From the outputs of these processes, the XSL-based rules can be created.

With the XSL transformation and cleansing documents and the data records from the source, the process for populating and synchronizing can occur. The transformation and cleansing occurs automatically using the JETSTREAM transformation tool suite and the data is stored into the target; any records that failed transformation and/or cleansing are identified for reconciliation by the data manager.

When using a staging database, this process is performed twice. The first pass is to develop the structure, rules, and perform the population for the staging database. The second time through identifies any required transformations from the staging database to the target database. One item to bear in mind is that the staging database structure must be created as well. Additionally, if the transfer from the staging database to the final target database is to be automated; it is recommended that one or more triggers within the staging database that fire off this process be implemented.

2.3 Groups and Roles

This process identifies two roles of users that are required to ensure the process is fulfilled. They could be viewed as falling into two overall groups: developers and technical business representatives. The following matrix identifies how these roles are organized within the groups.

Developer Technical Business RepsXML Designer Data Manager

16 May 2023 Page 6 of 31

Page 7: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

Currently this process has no need for management of these groups and roles along it; however, when the transformation and cleansing definitions are exposed with a web-enabled interface in Phase 3, additional business-oriented roles may be defined within the system and a suitable access control mechanism will need to be put in place.

2.4 Non-Functional Requirements

The following non-functional requirements have been derived:

The application is to be portable across server platforms: Windows2000 and Solaris8 are the primary target environments of the core JETSTREAM team.

The application should be extensible for other possible uses. The application should be able to transform and cleanse 75% of the data with all non-

handled data being placed in the exceptions list for reconciliation. The data transformation should be 95% or greater in accuracy. Industry standards should be practiced for transformation rule creation and process

execution. The application is to be able to transform XML data and comma-separated value (CSV)

files, and extensible to handle SQL data manipulation language scripts and also being a back-end for a web service that is event driven.

16 May 2023 Page 7 of 31

Page 8: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

3.0 System ArchitectureThis describes the system environment within which the Phase II JETSTREAM tool operates.

3.1 System Architecture Connectivity

The component diagram shown in Figure 3 provides an overview of the system connectivity and dependencies. The grey items are considered optional.

Figure 3Within this diagram, a source database can be reviewed using Druid to assist in creating a data element mapping (the Extraction-Transformation Definition), or creating a suitable staging database structure if desired, between the data elements in the target database. Druid runs on top of the Java Virtual Machine (JVM), hence it is dependent on it, and connects via JDBC to the databases. Information regarding these databases is stored in a project file, which is an XML document.

By examining the structures of both the source and target databases, ETL documents (which are written in XSL) that describe transformations and/or cleansing logic can be defined. These are executed by JETSTREAM and specifically with the Apache Xalan component; this has a further dependency on the Apache Xerces XML parsing engine. Cleansing logic may be passed through Regular Expressions to do transformation on the data values.

16 May 2023 Page 8 of 31

Page 9: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

Should the target database be a staging database, stored procedures could be used to automatically push information to the ultimate target database (hence the staging database also serves as a source and a target). Additional transformations could be performed both internal to the staging database or on the final transfer from the staging to the ultimate target database.

To ensure clarity, the diagram in Figure 4 describes the external view of the database transformation process being implemented by JETSTREAM; this diagram is identical to Figure 1.

Figure 4

Should a staging database be used, it would usually follow the concept shown in Figure 5; this also shows additional transformation and cleansing work that could be also done within the database. For usage performed by the team (when required), the staging database will be the hSQL(HypersonicSQL) RDBMS.

16 May 2023 Page 9 of 31

Page 10: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

Figure 5As data arrives into the load tables, a trigger fires to transform data into the staging tables. These become the tables for the final extraction and transformation to the target data structure; a final trigger executes another JETSTREAM database to move the data to the actual final database.

By using a staging database that supports replicas of the source schemas and a replica of the target schema, verification of proper transformations and cleansings can take place before actual placement within the target database. This provides a checking mechanism before target update. It also allows the staging database to act as a full back-up (for that snapshot in time) for both the source and target databases.

3.2 Server Specifications

It is envisioned that all work will take place on a single machine, such that the client and server are the same. The following are the server requirements:

Hardware Component Specifications CommentsProcessor Pentium III+ (650 MHz)Network Connectivity Ethernet/Fast EthernetMemory 512MB minimum,

1GB preferableHard Drive Space ~50MB This reflects an estimate for all required

software components.

16 May 2023 Page 10 of 31

Page 11: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

Software Component Version CommentsJava Development Kit 1.3.1_04 Downloaded from http://java.sun.com

HypersonicSQL (hSQL) 1.7.0 Java-based relational database with Java stored procedures; also has database transfer tool. Downloaded from http://www.sourceforge.net

Xalan 2.4-D1 Java Xalan package. Downloaded from http://jakarta.apache.org (includes Xerces and RegExp package)

Druid (optional) 3.0-INT Java-based package to manage database structures. Downloaded from http://www.sourceforge.net

3.3 Client Specifications

It is envisioned that all work will take place on a single machine, such that the client and server are the same; however, should it be desired to perform the work through a remote connection for database management and/or XML document development, the following is the anticipated requirements:

Hardware Component Specifications CommentsProcessor Pentium III+ (650 MHz)Network Connectivity Ethernet/Fast EthernetMemory 256MB minimum.

512MB preferredHard Drive Space 300MB minimum

Software Component Specifications CommentsRemote Terminal Login (telnet, SSH, etc.)

N/A telnet is included with most OSs

Java Development Kit 1.3.1_04 Downloaded from http://java.sun.com

Druid (optional) 3.0-INT Java-based package to manage database structures. Downloaded from http://www.sourceforge.net

3.4 Future Directions

JETSTREAM is being constructed on the Java2 platform (initially J2SE). While not in scope within the first phase, this system was being designed so that minimal work would be required to define additional interface types beyond database connectivity through JDBC and XML and CSV file types. This is being performed in Phase 2 which will culminate in the JETSTREAM 1.0 release.

Phase 3 enhancements include the following: remote call wrapper (SOAP/XML-RPC), graphical interface for selecting fields to be extracted and editing the transformation rules, and development of a reporting capability that can show the mappings and transformations. A potential roadmap is shown in Figure 6.

16 May 2023 Page 11 of 31

Page 12: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

Figure 6

The ability to wrap requests in XML-RPC or SOAP calls will allow on-demand updates and/or information requests from external data sources; this could be used to allow this authoritative database on applications to become distributed in nature. These could be developed in servlets or Java Server Pages (JSPs) so that not only could web-enabled (HTML/XHTML) interfaces could be supported, but wireless requests (WML) and other web services. The graphical interface and reporting component will allow business users to build and review these.

Subsequent phases will reflect these enhancements as the architecture is expanded to embrace full J2EE capabilities. JETSTREAM will become a real best practice example of how to build and evolve applications that comply with Naval policies and industry standards.

16 May 2023 Page 12 of 31

Page 13: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

4.0 JETSTREAM Application StoryboardThis section details the functionality of the JETSTREAM application and how it is to be implemented.

4.1 Process-Components Map/Use-Case Identification

The Ericsson-Penker assembly line diagram shown in Figure 7 shows how various components fit within the context of the process.

Figure 7This identifies several of the high level use-cases that will be performed along the process. These are covered in more detail within the next section.

4.2 Use-Case Overview

Figure 8 shows how the use-cases become organized around the various users. A new actor or role is identified, the actual user who will execute the translation program.

16 May 2023 Page 13 of 31

Page 14: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

Figure 8The XML Designer and the Data Manager review the database structure to build the ETL rules. These are consumed by the JETSTREAM application (Transfer Tool) to read the source database and then update the target database.

4.3 Transform Data Use-Case

The heart of the application is what is shown in the Use-Case diagram of Figure 9 and will be discussed in further detail in this section. Two cases of usage will be defined here. One will be the transfer/transformation of data between source and target databases.

Figure 9

4.3.1 Process Logic

The overall process logic for the JETSTREAM application is shown in the Activity diagram of Figure 10. The program is started and is passed the appropriate ETL rules, which is then read in

16 May 2023 Page 14 of 31

Page 15: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

and parsed. Once parsed any temp variables are created and initialized and a select statement is built. Database connections will then be opened to the source and target databases and the select statement will be executed.

An insert, update, or delete statement is built as each row of the selection is handled along with any rules for merging, splitting, or replacing data based on the ETL Rule that was defined; this is executed in order. When an error is encountered, a fall back rule will be attempted first (this would normally be an insert for an update – the most likely case is the row did not exist -- or an update for an insert -- the most likely case is that the row already exists – for deletes an entry may be made in a table saying the item was not found). If the update fallback rule also fails, then an error message will be logged; this becomes the exception list. This processing continues until there are no more rows.

Figure 10

16 May 2023 Page 15 of 31

Page 16: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

4.3.2 UI and Message Functionality

There is no user interface required for this component; however, to start the transfer process, the application will need to identify that java is the executing application and then pass the JAR file being executed along with the ETL Rule file to be used as a parameter. Suitable error messages should be presented within a console window and/or logged to a file. These include the following (in some form):

“Unable to connect to source <source name>”“Unable to connect to target”“Unable to open XSL file. Check path and filename.”“Invalid XSL file.”“Failed to connect: username and/or password incorrect for source”“Failed to connect: username and/or password incorrect for target”“Entity <table/entity name> not found in source <source name>”“Entity <table/entity name> not found in target <target name>”“Attribute <field/entity/attribute name> not found in Entity <entity name> in source <source name>”“Attribute <field/entity/attribute name> not found in Entity <entity name> in target <target name>”“Unable to [insert!update!delete] <table name.field name> with primary key:<list of primary key values> from <source name> into target <target name>”

Each successful completion of a transform will log the name, time, and date that it completed successfully.

4.3.3 Specific Transform Use-Cases

To fully understand how the transformation should work, it is necessary to examine some specific cases of type of transformation that will take place. Figure 11 shows how data is transferred and transformed from the perspective of the Transform Program (JETSTREAM) within the Transfer Data use-case.

Figure 11

16 May 2023 Page 16 of 31

Page 17: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

The specific use-cases that focus on the actual transformation are those used by the Populate Target Tables use-case. This is an internal use-case describing functionality that will be contained within the ETL Rules (XSL) document.

Additionally, it should be noted, that these singular use-cases represent various transformation and cleansing rules; they are to be considered building blocks. Each may be combined with another to create more complex transformation capabilities. Intermediate steps may require variables to temporarily hold values or arrays of values as each component transform step is executed.

4.3.3.1 Single Table Data Transfer

This transfer requires each field to be placed into the target to be associated with the target field. The only transformation required is where field names are titled differently or to alter formats; for example, an application name may be CHAR(45) in Access, but merely VARCHAR2 in Oracle. Figure 12 graphically depicts this simple transfer.

Figure 12

4.3.3.2 Merging Table Data within Transfer

In some cases, more than one table in the source database may contain data that is intended for a single table. Figure 13 illustrates this case; the rule may be that only the latest version is of interest from the Source, so only that version number is being transferred. This will be implemented as two single transfers where an update fallback is performed.

Figure 13

4.3.3.3 Merging Field Data within Transfer

16 May 2023 Page 17 of 31

Page 18: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

There may also be case where field data from the source may constitute parts of what will become one field within the target database. The example in Figure 14 depicts how the document’s creation date is stored within 3 integers on the source, but the target will use a date field. Thus the transformation needs to be able to handle this.

Figure 14

4.3.3.4 Splitting Field Data within Transfer

There may also be cases where a field within the source may need to be separated into constituent components. In Figure 15, the user’s name is stored as a full name, where the target will have these separated into first and last names. Some form of delimiter would be identified within the source to be the split point.

Figure 15

4.3.3.5 Replacing Table Data during Transfer

In some cases, there needs to be a means of replacing the field’s data with a specific value. This allows clean-up activities when the source may contain a field where items are inconsistently spelled or simply not represented in the manner desired for the target database. Figure 16 shows where occurrences of “MS Word” within the Source are replaced with “Microsoft Word” in the Target.

Figure 16

4.3.3.6 Table Data Transfer with Conditions

16 May 2023 Page 18 of 31

Page 19: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

Some data will only need to be transferred when it meets certain conditions. Figure 17 depicts a simple transfer which occurs as long as source field Application.AppName is not null.

Figure 17

4.3.3.7 Using A Java Class for Specific Needs

The previous transformation capabilities may be inadequate for some very specific transformation rules that could be required. Therefore, the ability to perform other unforeseen transformation functions is required; this will be enabled by allowing the XML rule document to identify a Java class to be executed. For example, suppose a directory contains documents associated with an application being transferred. These were documents from a separate source, not the database being transferred. It is desired to annotate these from document metadata about the file. Assuming the application name was known to match for these documents a call could be made with the AppID, to create a table entry for each related document, it’s name, date created, associated with the AID within the target. This would most certainly be a part of a larger set of processing logic. Figure 18 shows this graphically.

Figure 18

4.3.4 Test Cases

The test cases required for the JETSTREAM application are described as follows:

Basic operation:

Execute the JETSTREAM application, passing it an XML rule file; o Did the application start?o Is the process in memory?

16 May 2023 Page 19 of 31

Page 20: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

Proper Transformation and Cleansing:

Assumes Basic operation test completed successfully Perform one of each transform use case in the previous section1

Perform the following combination of transforms:o Conditional splito Conditional mergeo Conditional replaceo Split with replace on a portiono Merge with a replace on a portion of ito Conditional split, then conditional replace on one of the split values

Did transforms execute properly?o Each basic transform executed and stored the expected resultso Each combination executed and stored the expected result

Proper Error-Handling:

Attempt operation with the following tests:o Mangled XML fileo Improper filenameo Improper source URLo Improper target URLo Improper username in sourceo Improper password in sourceo Improper username in targeto Improper password in targeto Improper table/entity name referenced in sourceo Improper field/entity/attribute name referenced in sourceo Improper table/entity name referenced in targeto Improper field/entity/attribute name referenced in targeto Improper data type reference in target for a specific fieldo Establish one field that can not be processed, ensure it goes into the exception

list Ensure proper error messages display & proper exception list generation

Stress:

Attempt operation with the following tests:o Large # of records (>100 with no errors), straight transfer to identical structureo Large # of records (with no errors), transfer where a condition tests trueo Large # of records (with no errors), transfer with splito Large # of records (with no errors), transfer with merge

Ensure proper exception list generation (if one is generated) Note execution time

4.4 Create XML Transform/Cleanse Doc Use-Case

The JETSTEAM application suite will contain an XSL Generator application; this takes in information described in XML and transforms it to a suitable XSL file for execution. To do this, it

1 If no Java-class based transformation has been identified when examining the structures of the sources and target, then this specific use-case may be ignored for testing.

16 May 2023 Page 20 of 31

Page 21: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

parses and analyzes an XML-based document, the extraction-transformation definition (.etd) document that describes the source and targets for the transformation and the transformations and cleansings required. It also reads in XML-based documents, the stylesheet rules template (.srt), that describe the basic transformations required (i.e. format transitions – XML, databases, CSV files, etc.). These specify Xalan extensions to be used and implicit common data conversions. Together, these two XML document types are the ETL business rules that are used to create the XSL that is regularly run. It is the XML designer that upon completion of the ETL rules would execute the XSL Generator; this allows for a simplification of the development of transformations. In Phase 3, a GUI would allow this to be created even more easily by a business user.

4.4.1 Use-Case Detail

Figure 19 shows the detail for the Create XML Transform/Cleanse Doc use-case taken from the perspective of what the XSL Generator program will be performing.

Figure 19The XML designer would have reviewed the structure of the incoming and outgoing databases (as discussed within the main use-case overview) and would create the ETL rules document (.etd). If a new type of format (e.g. a new form of XML or SQL DML) was being encountered, the designer would create a new set of format rules (.srt).

Once these are completed, the XSL Generator program will read the ETL rules and determine the source and target formats and then build the final ETL (XSL) utilizing appropriate extensions as needed. Once the XSL is generated (which is used to perform XSLT), it can be compiled to become XSLT-C whenever performance considerations dictate.

16 May 2023 Page 21 of 31

Page 22: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

4.4.2 XSL Generator System Architecture

The XSL Generator uses Xalan as its primary engine; a “wrapper” program consisting of Java code and XSL is used to transform the extraction/transformation definitions and the stylesheet templates or XSL stubs into final XSL that is used by the scheduled process. The XSL Generator’s system architecture is shown in the component diagram of Figure 20.

Figure 20The XSL Generator can be executed on either a server or any client machine desired with the proper software components installed. As shown in Figure 20, the wrapper code interfaces to the Xalan to create the final XSL for use. This architecture provides a convenient means abstraction in separating business rule logic for extraction and transformation from formatting logic for the basic underlying transformation from the source to the target.

A sample document that addresses the transformation needs for the use-cases from 4.3.3.1 through 4.3.3.2 and that shows how these use-cases are to intended to be implemented is contained in Appendix C: Sample In-work Extraction-Transformation Definition.

16 May 2023 Page 22 of 31

Page 23: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

5.0 Site Implementation and Operation

5.4 Long-term Operational Needs

This section describes what is required for continued operation of the JETSTREAM application.

5.4.1 Operational Skills Required

Operating the JETSTREAM application will normally be performed by a database administrator familiar with XML and operating (but not necessarily creating) Java programs. The individual will be expected to create and maintain XML transformation rules and review physical schemas and implement any staging database structures.

For code maintenance; while it is expected that immediate 3 needs may require continued SAIC involvement, the code and design for JETSTREAM will be donated to the open source community for continued maintenance and enhancement. This will be licensed similarly to the Apache license.

5.4.2 Recommended Back-Up and Recovery Procedures

It is recommended that as transformation definitions (either .etd or .srt files) are modified that a back-up be made of the original definitions and any generated XSL. This ensures that if any are improperly defined, they can be recovered easily. Prior to executing the resulting XSL, a database back-up should also be made; this ensures that changes to the database that were unintended can be recovered. Upon completion of the execution of the XSL, a second back-up should be made. This will produce an incrementally revised snapshot of both structure and data.

16 May 2023 Page 23 of 31

Page 24: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

Appendices

16 May 2023 Page 24 of 31

Page 25: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

Appendix A: Reference Material

Business Process Diagram (Figure 2) & Assembly Diagram (Figure 7): Business Modeling with UML, Hans-Erik Ericksson & Magnus Penker, OMG Press/Wiley, 2000

Package/Component Diagram (Figure 3): Understanding UML: The Developer’s Guide, Paul Harmon & Mark Watson, Morgan-Kaufmann Publishing, 1998; UML Distilled, Martin Fowler with Kendall Scott, Addison-Wesley, 1997; & UML Distilled, Business Modeling with UML, Hans-Erik Ericksson & Magnus Penker, OMG Press/Wiley, 2000

Druid 3.0-pre: available from http://www.sourceforge.net

HypersonicSQL/hSQL: available from http://www.sourceforge.net

Apache Xalan and Xerces: available from http://xml.apache.org

Apache RegExp (Regular Expressions package): available from http://jakarta.apache.org

16 May 2023 Page 25 of 31

Page 26: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

Appendix B: Glossary of Acronyms

CSV: Comma-Separated Values

DBA: Database Administrator

DBMS: Database Management System

DDL: Data Definition Language (the statements within SQL that define the structure of a relational database)

DML: Data Manipulation Language (the statements within SQL that manipulate data)

EAI: Enterprise Application Integration

ETD: Extraction-Transformation Definition

GUI: Graphical user Interface

HTML: Hypertext Mark-Up Language

hSQL: A derivative of the HypersonicSQL RDBMS; both of which are open source database projects. (See http://sourceforge.net/hsqldb)

ICE: Internet Content Exchange, an XML format common for newsfeeds

IP: Internet Protocol

J2EE: Java2 Enterprise Edition

J2SE: Java2Standard Edition

JAR: Java Archive

JDBC: Java Database Connectivity

JDK: Java Development Kit

JETSTREAM: Java Extraction Transformation Service for Transmitting Records and Exchanging Application Metadata

JSP: Java Server Pages

JVM: Java Virtual Machine

OCS: Open Content Syndication, an XML format common for newsfeeds

ODBC: Open Database Connectivity

RDBMS: Relational Database Management System

RegExp: Regular Expressions Package from the Apache Group (http://xml.apache.org)

RSS: Rich Site Summary, an XML format developed by Netscape common for newsfeeds

16 May 2023 Page 26 of 31

Page 27: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

SOAP: Simple Object Access Protocol

SQL: Structured Query Language

SRT: Stylesheet Rules template

TCP/IP: Transmission Control Protocol/Internet Protocol

UI: User Interface

W3C: Worldwide Web Consortium

WML: Wireless Mark-Up Language

XALAN: Apache Group (http://xml.apache.org) XSL processor package named after a rare musical instrument.

XERCES: Apache Group (http://xml.apache.org) XML parser package named after a rare butterfly

XHTML: Extensible Hypertext Mark-Up Language

XML: Extensible Mark-Up Language

XSL: Extensible Stylesheet Language

XSLT: Extensible Stylesheet Language Transformation

XSLT-C: Extensible Stylesheet Language Transformation - Compiled

XML-RPC: XML Remote Procedure Call

16 May 2023 Page 27 of 31

Page 28: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

Appendix C: Sample In-work Extraction-Transformation Definition (XML Document).

This section contains a sample XML document that describes how the business rules can be represented within the .etd document. This sample may change, but provides some idea of the type of XML abstraction required.

<?xml version="1.0" encoding='ISO-8859-1' ?><!DOCTYPE DATABASE_MIGRATION SYSTEM "JTD.dtd">

<TRANSFORM_RULES>

<!-- Run Parameters may not be used --><RUN_PARAMETERS></RUN_PARAMETERS>

<!-- The data source identifies the database or file to be used and any other special attributes required, there can be any number of data sources. -->

<DATA_SOURCE ID="__" TYPE="RDBMS|XML|CSV" CLASS="/.../..../__.class" URL="__" USER="__" PASSWORD="__" DESCRIPTION="__">

<!-- The source entity is the database table, element, etc. that is being selected from....-->

<SOURCE_ENTITY ID="__" NAME="tablename|elementname|etc." DESCRIPTION="__">

<!-- The source value is the table field, element, element attribute, etc. that is being selected from.... This can be recursive as XML files can be recursive in nature.-->

<SOURCE_ATTRIBUTE ID="__" NAME="fieldname|elementname|attributename|etc." DATATYPE="__" DESCRIPTION="__">

<SOURCE_ATTRIBUTE ID="__" NAME="elementname|attributename|etc." DATATYPE="__" DESCRIPTION="__"></SOURCE_ATTRIBUTE>

</SOURCE_ATTRIBUTE>

</SOURCE_ENTITY>

</DATA_SOURCE>

<!-- The temporary entity is used when complex operations are required. The action operations are the same as in the target area. Look there for a more complete set of examples for these. This is optional; there is no need for a temporary entity. A condition identifies when the particular temporary entity is set. -->

<TEMP_VARIABLE ID="__" ACTION="PUSH|REPLACE|SET|SPLIT|MERGE|EXPRESSION" CONDITION="CONDITION.ID" DATATYPE="__" DESCRIPTION="__">

<!-- Any number of source values could be here; at least one is required if a temporary entity is defined. -->

<SOURCE_VALUE ID="__" DESCRIPTION="__">SOURCE_ATTRIBUTE.ID|TEMP_VARIABLE ID|static value|system variable</SOURCE_VALUE>

<!-- Connectors are not required except for expressions. For merges with no connectors, the values are simply concatenated. -->

<EXPRESSION_CONNECTOR ID="__" DESCRIPTION="__">Math or basic string expression usually... e.g. +|-|*|/ or perhaps what would be used in a merge e.g. taking in 3 #’s that will represent a date in as mm/dd/yyyy </EXPRESSION_CONNECTOR>

<SOURCE_VALUE ID="__" DESCRIPTION="__">SOURCE_ATTRIBUTE.ID|TEMP_VARIABLE ID|static value|system variable</SOURCE_VALUE>

</TEMP_VARIABLE>

16 May 2023 Page 28 of 31

Page 29: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

<!-- Conditions specify how sources or temporary entities are used to make decisions. -->

<CONDITION ID="__" DESCRIPTION="__">

<COMPARISON ID="__" DESCRIPTION="__">A mathematical comparison type (e.g. !=, =, <, >, <>, .<VARIABLE ID="__" DESCRIPTION="__">SOURCE_ATTRIBUTE.ID|TEMP_VARIABLE ID|static value</VARIABLE><EXPRESSION_CONNECTOR ID="__" DESCRIPTION="__">Math or basic string expression usually... e.g. +|-|*|/ or perhaps what would be used in a merge e.g. taking in 3 #’s that will represent a date in as mm/dd/yyyy</EXPRESSION_CONNECTOR><VARIABLE ID="__" DESCRIPTION="__">CONDITION.ID|SOURCE_ENTITY.ID|TEMP_VARIABLE ID|static value</VARIABLE>

</COMPARISON>

<COMPARISON_CONNECTOR ID="__" DESCRIPTION="__">AND|OR|NAND|NOR|UNION|INTERSECTION</COMPARISON_CONNECTOR>

<COMPARISON ID="__" DESCRIPTION="__">A mathematical comparison type (e.g. !=, =, <, >, <>, .<VARIABLE ID="__" DESCRIPTION="__">SOURCE_ATTRIBUTE.ID|TEMP_VARIABLE ID|static value</VARIABLE><EXPRESSION_CONNECTOR ID="__" DESCRIPTION="__">Math or basic string expression usually... e.g. +|-|*|/ or perhaps what would be used in a merge e.g. taking in 3 numbers that will represent a date in as mm/dd/yyyy</EXPRESSION_CONNECTOR><VARIABLE ID="__" DESCRIPTION="__">CONDITION.ID|SOURCE_ENTITY.ID|TEMP_VARIABLE ID|static value</VARIABLE>

</COMPARISON>

</CONDITION>

<DATA_TARGET ID="__" TYPE="RDBMS|XML|CSV" CLASS="/.../..../__.class" URL="__" USER="__" PASSWORD="__" DESCRIPTION="__">

<TARGET_ENTITY ID="__" NAME="tablename|elementname|etc." TYPE="UPDATE|INSERT|DELETE" FALLBACK="UPDATE|INSERT" DESCRIPTION="__">

<!-- Attributes define the actual values to be placed. Any number of target attributes can be defined; this is an example of a push (a simple transfer of one value over). A condition identified here determines when the source is transfered. These can be nested just as sources to work with XML hierarchies. -->

<TARGET_ATTRIBUTE ID="__" NAME="fieldname|elementname|attributename|etc." ACTION="PUSH" DATATYPE="__" DATASIZE="_" CONDITION="CONDITION.ID" DESCRIPTION="__">

<SOURCE_VALUE ID="__" DESCRIPTION="__"> SOURCE_ATTRIBUTE.ID|TEMP_VARIABLE ID|static value|system variable</SOURCE_VALUE>

</TARGET_ATTRIBUTE>

</TARGET_ENTITY>

<TARGET_ENTITY ID="__" NAME="tablename|elementname|etc." TYPE="UPDATE|INSERT|DELETE" FALLBACK="UPDATE|INSERT" DESCRIPTION="__">

<!-- Attributes define the actual values to be placed. Any number of target entities can be defined; this is an example of a replace (where one element is replaced with another). The matching value identifies what value is to be replaced with the static value identified. A condition identified here determines when the value is replaced with another; this is different than the matching value. Matching value specifies what is to be replaced. Multiple sources can be identified here; these would each have a different matching value. These can be nested just as sources to work with XML hierarchies. -->

16 May 2023 Page 29 of 31

Page 30: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

<TARGET_ATTRIBUTE ID="__" NAME="fieldname|elementname|attributename|etc." ACTION="REPLACE" DATATYPE="__" DATASIZE="_" CONDITION="CONDITION.ID" DESCRIPTION="__">

<SOURCE_VALUE ID="__" DESCRIPTION="__" MATCHING_VALUE="__"> SOURCE_ATTRIBUTE.ID|TEMP_VARIABLE ID|static value|system variable</SOURCE_VALUE>

<SOURCE_VALUE ID="__" DESCRIPTION="__" MATCHING_VALUE="__"> SOURCE_ATTRIBUTE.ID|TEMP_VARIABLE ID|static value|system variable</SOURCE_VALUE>

</TARGET_ATTRIBUTE>

</TARGET_ENTITY>

<TARGET_ENTITY ID="__" NAME="tablename|elementname|etc." TYPE="UPDATE|INSERT|DELETE" FALLBACK="UPDATE|INSERT" DESCRIPTION="__">

<!-- Attributes define the actual values to be placed. Any number of target entities can be defined; this is an example of a set (where one element is simply set to a static value). A condition identified here determines when the value is set. The set is normally used to set a static value for a target. These can be nested just as sources to work with XML hierarchies. -->

<TARGET_ATTRIBUTE ID="__" NAME="fieldname|elementname|attributename|etc." ACTION="SET" DATATYPE="__" DATASIZE="_" CONDITION="CONDITION.ID" DESCRIPTION="__">

static value|system variable

</TARGET_ATTRIBUTE>

</TARGET_ENTITY>

<TARGET_ENTITY ID="__" NAME="tablename|elementname|etc." TYPE="UPDATE|INSERT|DELETE" FALLBACK="UPDATE|INSERT" DESCRIPTION="__">

<!-- Attributes define the actual values to be placed. Any number of target entities can be defined; this is an example of a split (where the target element is set to a substring of the source). A condition identified here determines when the value is set. The set is normally used to set a static value for a target. These can be nested just as sources to work with XML hierarchies. -->

<TARGET_ATTRIBUTE ID="__" NAME="fieldname|elementname|attributename|etc." ACTION="SPLIT" DATATYPE="__" DATASIZE="_" CONDITION="CONDITION.ID" DESCRIPTION="__">

<SOURCE_VALUE ID="__" DESCRIPTION="__" DELIMITER="__" SUBSTRING="BEFORE|AFTER">SOURCE_ATTRIBUTE.ID|TEMP_VARIABLE ID|static value|system variable</SOURCE_VALUE>

</TARGET_ATTRIBUTE>

</TARGET_ENTITY>

<TARGET_ENTITY ID="__" NAME="tablename|elementname|etc." TYPE="UPDATE|INSERT|DELETE" FALLBACK="UPDATE|INSERT" DESCRIPTION="__">

<!-- Attributes define the actual values to be placed. Any number of target entities can be defined; this is an example of a merge (where the target element is set to a concatenation of the various sources). A condition identified here determines when the value is set. The set is normally used to set a static value for a target. These can be nested just as sources to work with XML hierarchies. -->

<TARGET_ATTRIBUTE ID="__" NAME="fieldname|elementname|attributename|etc." ACTION="MERGE" DATATYPE="__" DATASIZE="_" CONDITION="CONDITION.ID" DESCRIPTION="__">

16 May 2023 Page 30 of 31

Page 31: Table of Contents - search read.pudn.comread.pudn.com/downloads94/doc/374679/jetstream specifications …  · Web viewVersion 1.0. PHASE 2 Table of Contents. 1.0 Introduction. 3.

Data Migration Application (JETSTREAM) Design Specifications

<SOURCE_VALUE ID="__" DESCRIPTION="__"> SOURCE_ENTITY.ID|TEMP_VARIABLE ID|static value|system variable</SOURCE_VALUE>

<EXPRESSION_CONNECTOR ID="__" DESCRIPTION="__">What would be used in a merge e.g. taking in 2 numbers that will represent a time as in hh:mm - it is a static value of some kind to be placed between each source value</EXPRESSION_CONNECTOR>

<SOURCE_VALUE ID="__" DESCRIPTION="__"> SOURCE_ATTRIBUTE.ID|TEMP_VARIABLE ID|static value|system variable</SOURCE_VALUE>

</TARGET_ATTRIBUTE>

</TARGET_ENTITY>

<TARGET_ENTITY ID="__" NAME="tablename|elementname|etc." TYPE="UPDATE|INSERT|DELETE" FALLBACK="UPDATE|INSERT" DESCRIPTION="__">

<!-- Attributes define the actual values to be placed. Any number of target entities can be defined; this is an example of an expression (where the target element is set to a calculation of the various sources). A condition identified here determines when the value is set. The set is normally used to set a static value for a target. These can be nested just as sources to work with XML hierarchies.-->

<TARGET_ATTRIBUTE ID="__" NAME="fieldname|elementname|attributename|etc." ACTION="EXPRESSION" DATATYPE="__" DATASIZE="_" CONDITION="CONDITION.ID" DESCRIPTION="__">

<SOURCE_VALUE ID="__" DESCRIPTION="__"> SOURCE_ATTRIBUTE.ID|TEMP_VARIABLE ID|static value|system variable</SOURCE_VALUE>

<EXPRESSION_CONNECTOR ID="__" DESCRIPTION="__">Math or basic string expression usually... e.g. +|-|*|/</EXPRESSION_CONNECTOR>

<SOURCE_VALUE ID="__" DESCRIPTION="__"> SOURCE_ATTRIBUTE.ID|TEMP_VARIABLE ID|static value|system variable</SOURCE_VALUE>

</TARGET_ATTRIBUTE>

</TARGET_ENTITY>

</DATA_TARGET>

</TRANSFORM_RULES>

16 May 2023 Page 31 of 31