IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud...

34
Page 1 of 34 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment November 2019 Shrumit Mehta and Shashidhar Yellareddy, IBM Vinod Shukla and Jim McConnell, AWS Quick Start team Visit our GitHub repository for source files and to post feedback, report bugs, or submit feature ideas for this Quick Start. Contents Overview .................................................................................................................................... 3 IBM InfoSphere DataStage on AWS ...................................................................................... 3 Cost and licenses .................................................................................................................... 4 Architecture ............................................................................................................................... 5 Single-AZ mode ...................................................................................................................... 6 Planning the deployment .......................................................................................................... 7 Specialized knowledge ........................................................................................................... 7 AWS account ..........................................................................................................................8 Technical requirements .........................................................................................................8 Deployment options ............................................................................................................. 10 Deployment steps .................................................................................................................... 10 Step 1. Sign in to your AWS account .................................................................................... 10 Step 2. Retrieve IBM InfoSphere DataStage and Red Hat license information ................. 10 Step 3. Launch the Quick Start .............................................................................................11 Option 1: Parameters for deploying InfoSphere DataStage into a new VPC ................... 12

Transcript of IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud...

Page 1: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Page 1 of 34

IBM InfoSphere DataStage on the AWS Cloud

Quick Start Reference Deployment

November 2019

Shrumit Mehta and Shashidhar Yellareddy, IBM

Vinod Shukla and Jim McConnell, AWS Quick Start team

Visit our GitHub repository for source files and to post feedback,

report bugs, or submit feature ideas for this Quick Start.

Contents

Overview .................................................................................................................................... 3

IBM InfoSphere DataStage on AWS ...................................................................................... 3

Cost and licenses .................................................................................................................... 4

Architecture ............................................................................................................................... 5

Single-AZ mode ...................................................................................................................... 6

Planning the deployment .......................................................................................................... 7

Specialized knowledge ........................................................................................................... 7

AWS account ..........................................................................................................................8

Technical requirements .........................................................................................................8

Deployment options ............................................................................................................. 10

Deployment steps .................................................................................................................... 10

Step 1. Sign in to your AWS account .................................................................................... 10

Step 2. Retrieve IBM InfoSphere DataStage and Red Hat license information ................. 10

Step 3. Launch the Quick Start ............................................................................................. 11

Option 1: Parameters for deploying InfoSphere DataStage into a new VPC ................... 12

Page 2: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 2 of 34

Option 2: Parameters for deploying InfoSphere DataStage into an existing VPC .......... 17

Step 4. Test the deployment ................................................................................................ 21

Accessing the IIS Launchpad from a browser .................................................................. 21

Accessing the DataStage Windows Client ........................................................................ 22

Logging in using Remote Desktop Protocol (RDP) ............................................................. 22

Using InfoSphere DataStage and QualityStage Designer ................................................... 23

Using the IIS Launchpad ..................................................................................................... 23

Accessing the cluster nodes through the Ansible config server ....................................... 24

Transferring files from your local computer to the cluster .............................................. 24

Bash ............................................................................................................................... 24

WinSCP .......................................................................................................................... 25

Manual cleanup .................................................................................................................... 25

Best practices for using InfoSphere DataStage on AWS ........................................................ 26

Enabling backups ................................................................................................................. 26

Administering OpenShift clusters ....................................................................................... 26

Security .................................................................................................................................... 26

Scaling compute ...................................................................................................................... 27

Horizontal scaling ............................................................................................................... 28

Changing the number of is-engine-compute pods ......................................................... 28

Scaling back down ............................................................................................................... 28

Provisioning more DataStage Client instances ....................................................................... 29

Troubleshooting ...................................................................................................................... 29

IBM support ............................................................................................................................ 32

Create a case ......................................................................................................................... 32

Support forum ...................................................................................................................... 32

Send us feedback ..................................................................................................................... 32

Additional resources ............................................................................................................... 32

Document revisions ................................................................................................................. 33

Page 3: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 3 of 34

This Quick Start was created by IBM in collaboration with Amazon Web Services (AWS).

Quick Starts are automated reference deployments that use AWS CloudFormation

templates to deploy key technologies on AWS, following AWS best practices.

Overview

This Quick Start reference deployment guide provides step-by-step instructions for

deploying IBM InfoSphere DataStage 11.7.1 Service Pack 1 (SP1) on a Red Hat OpenShift

Container Platform 3.11 cluster on the AWS Cloud.

This Quick Start is for users who want to deploy InfoSphere DataStage on the AWS Cloud to

integrate data from multiple sources and prepare data for insights.

IBM InfoSphere DataStage on AWS

IBM InfoSphere DataStage is a data integration, extract, transform, and load (ETL) tool

that enables users to move and transform data between operational, transactional, and

analytical target systems.

Data transformation and movement is the process by which source data is selected,

converted, and mapped to the format required by target systems. The process manipulates

data to bring it into compliance with business, domain, and integrity rules, and with other

data in the target environment.

This reference deployment provides AWS CloudFormation templates to deploy InfoSphere

DataStage on a new OpenShift cluster. This cluster includes:

A Red Hat OpenShift Container Platform cluster created in a new or existing virtual

private cloud (VPC) on Red Hat Enterprise Linux (RHEL) 7.7 instances, using the

OpenShift on AWS Quick Start. See the OpenShift on AWS deployment guide for details

about the underlying OpenShift deployment architecture.

A GlusterFS distributed file system that uses encrypted Amazon Elastic Block Storage

(Amazon EBS) volumes.

Scalable OpenShift worker nodes running InfoSphere DataStage.

A Microsoft Windows–based DataStage Client machine.

Page 4: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 4 of 34

Cost and licenses

You are responsible for the cost of AWS services used while running this Quick Start

reference deployment.

The AWS CloudFormation template for this Quick Start includes configuration parameters

that you can customize. Some of these settings, such as instance type, will affect the cost of

deployment. For cost estimates, see the pricing pages for each AWS service you will be

using. Prices are subject to change.

Tip After you deploy the Quick Start, we recommend that you enable the AWS Cost

and Usage Report to track costs associated with the Quick Start. This report delivers

billing metrics to an S3 bucket in your account. It provides cost estimates based on

usage throughout each month, and finalizes the data at the end of the month. For

more information about the report, see the AWS documentation.

This Quick Start requires a Red Hat subscription. For detailed instructions, see step 1 of

“Deployment Steps” in the Red Hat OpenShift on AWS Quick Start Red Hat OpenShift on

AWS Quick Start deployment guide.

This Quick Start requires licenses for IBM InfoSphere DataStage and IBM InfoSphere

DataStage and QualityStage Designer. You can purchase licenses from Passport Advantage

or an IBM representative. For general assistance with Passport Advantage, see the Passport

Advantage Online for customers or the eCustom care webpages.

After you purchase a license, IBM will email a Proof of Entitlement (PoE) certificate to the

primary contact person on the order form. The PoE confirms the eligible software and level

of use for which you are authorized, and contains your IBM Customer Number (ICN). To

use this Quick Start, you must provide the ICN and part numbers listed in your PoE.

If you’re an existing IBM client, please contact your IBM representative for additional

information about using your entitlements with this Quick Start.

When you launch the Quick Start, read the following software license agreements, and

agree to the terms:

Software license agreement for IBM InfoSphere DataStage v11.7.1

Software license agreement for IBM InfoSphere DataStage and QualityStage Designer

v11.7.1

Page 5: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 5 of 34

In addition, prior to using this Quick Start, please review IBM’s Eligible Public Cloud

BYOSL Policy.

Architecture

Deploying this Quick Start for a new virtual private cloud (VPC) with default parameters

builds the following InfoSphere DataStage environment in the AWS Cloud.

Figure 1: Quick Start architecture for IBM InfoSphere DataStage on AWS

The Quick Start sets up the following:

A highly available architecture that spans three Availability Zones.*

A VPC configured with public and private subnets according to AWS best practices, to

provide you with your own virtual network on AWS.*

In the public subnets, managed network address translation (NAT) gateways to allow

outbound internet access for resources in the private subnets.*

In a public subnet, a Linux Ansible config server Amazon Elastic Compute Cloud

(Amazon EC2) instance that also serves as a bastion host to allow inbound Secure Shell

(SSH) access to EC2 instances in private subnets.

Page 6: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 6 of 34

• In a public subnet, an EC2 instance (Windows Server 2012 R2) running the InfoSphere DataStage thick client. Inbound SSH to EC2 instances in the public and private subnets are also possible from this instance using PuTTY.

• In the private subnets:

– Three OpenShift Container Platform master instances in an Auto Scaling group.

– Three OpenShift Container Platform etcd instances in an Auto Scaling group.

– Three OpenShift Container Platform GlusterFS instances in an Auto Scaling group that use encrypted Amazon Elastic Block Storage (Amazon EBS) volumes.

– Two OpenShift worker nodes in an Auto Scaling group that, combined, contain InfoSphere DataStage engine, services, and metadata repository tiers.

• A Classic Load Balancer spanning the public subnets for accessing DataStage from a web browser and from DataStage Client instances. Internet traffic to this load balancer is only permitted from ContainerAccessCIDR.

• A Classic Load Balancer spanning the public subnets for accessing the OpenShift Container Platform master instances. Internet traffic to this load balancer is only permitted from RemoteAccessCIDR.

• A Network Load Balancer spanning the private subnets, for routing internal OpenShift API traffic to the OpenShift Container Platform master nodes.

• An Amazon Route 53 private hosted zone for resolving internal Domain Name System

(DNS) queries.

* The template that deploys the Quick Start into an existing VPC skips the components marked by asterisks and prompts you for your existing VPC configuration.

Single-AZ mode

This Quick Start can be deployed as a non-highly-available cluster that spans a single

Availability Zone. This option is enabled by setting Non-HA for the ClusterAvailability

parameter when launching the Quick Start.

Page 7: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 7 of 34

Figure 2: Non-HA Quick Start architecture for IBM InfoSphere DataStage on AWS

Planning the deployment

Specialized knowledge

This Quick Start assumes basic familiarity with the use of the InfoSphere DataStage

application, including a browser-based Designer (thin client), a Windows-based Designer

(thick client), and a basic awareness of the components of a DataStage installation. If you’re

new to InfoSphere DataStage, see the Additional resources section.

This Quick Start also assumes familiarity with the OpenShift command line interface (CLI)

and Linux.

Lastly, this deployment guide requires a moderate level of familiarity with AWS services. If

you’re new to AWS, visit the Getting Started Resource Center and the AWS Training and

Page 8: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 8 of 34

Certification website for materials and programs that can help you develop the skills to

design, deploy, and operate your infrastructure and applications on the AWS Cloud.

AWS account

If you don’t already have an AWS account, create one at https://aws.amazon.com by

following the on-screen instructions. Part of the sign-up process involves receiving a phone

call and entering a PIN using the phone keypad.

Your AWS account is automatically signed up for all AWS services. You are charged only for

the services you use.

Technical requirements

You must provide your IBM Customer Number (ICN) and the part numbers of the software

licenses purchased, as noted in your Proof of Entitlement (PoE) certificate.

Red Hat Enterprise Linux (RHEL) 7.7 is used for the OpenShift EC2 instances in this

deployment. Other distributions aren’t currently supported. The DataStage Windows Client

instance is deployed from a private Amazon Machine Image (AMI) based on Windows

Server 2012 R2, and the bastion host instance runs Amazon Linux. Your AWS account is

given launch permission for the private AMI when the Quick Start is deployed.

Before you launch the Quick Start, your account must be configured as specified in the

following table. Otherwise, deployment might fail.

Page 9: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 9 of 34

Resources If necessary, request service limit increases for the following resources. You might need

to do this if you already have an existing deployment that uses these resources, and you

think you might exceed the default limits with this deployment. For default limits, see

the AWS documentation.

AWS Trusted Advisor offers a service limits check that displays your usage and limits

for some aspects of some services.

Resource This deployment uses

HA Non-HA

VPCs 1 1

Elastic IP addresses 4 2

IAM roles 10 10

Auto Scaling groups 4 4

Elastic Load Balancers 3 3

i3.large instances 3 3

m5.xlarge instances 10 6

Route 53 hosted zones 1 1

Regions This Quick Start is only supported in regions with three Availability Zones. The current

list of supported regions are listed here.

Key pair Make sure that at least one Amazon EC2 key pair exists in your AWS account in the

region where you are planning to deploy the Quick Start. Make note of the key pair

name. You’ll be prompted for this information during deployment. To create a key pair,

follow the instructions in the AWS documentation.

If you’re deploying the Quick Start for testing or proof-of-concept purposes, we

recommend that you create a new key pair instead of specifying a key pair that’s already

being used by a production instance.

IAM permissions To deploy the Quick Start, you must log in to the AWS Management Console with IAM

permissions for the resources and actions the templates will deploy. The

AdministratorAccess managed policy within IAM provides sufficient permissions,

although your organization may choose to use a custom policy with more restrictions.

Page 10: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 10 of 34

Deployment options

This Quick Start provides two deployment options:

Deploy InfoSphere DataStage into a new VPC (end-to-end deployment). This

option builds a new AWS environment consisting of the VPC, subnets, NAT gateways,

security groups, bastion hosts, and other infrastructure components, and then deploys

InfoSphere DataStage into this new VPC.

Deploy InfoSphere DataStage into an existing VPC. This option provisions

InfoSphere DataStage in your existing AWS infrastructure.

The Quick Start provides separate templates for these options. It also lets you configure

CIDR blocks, instance types, and InfoSphere DataStage settings, as discussed later in this

guide.

Deployment steps

Step 1. Sign in to your AWS account

1. Sign in to your AWS account at https://aws.amazon.com with an IAM user role that has

the necessary permissions. For details, see Planning the deployment earlier in this

guide.

2. Make sure that your AWS account is configured correctly, as discussed in the Technical

requirements section.

Step 2. Retrieve IBM InfoSphere DataStage and Red Hat license information

1. Purchase a license from Passport Advantage or your IBM representative.

2. After the purchase is complete, IBM will email a Proof of Entitlement (PoE) certificate

to the primary contact person on the order form. The PoE certificate contains your IBM

customer number (ICN). To use this Quick Start for IBM InfoSphere DataStage, you will

need to provide your ICN and the part numbers you used to purchase the licenses for

the product.

3. Red Hat account credentials and the pool ID of the necessary subscription must be

retrieved. For detailed instructions, see step 1 of “Deployment Steps” in the Red Hat

OpenShift on AWS Quick Start Red Hat OpenShift on AWS Quick Start deployment

guide.

Page 11: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 11 of 34

Step 3. Launch the Quick Start

Notes The instructions in this section reflect the older version of the AWS

CloudFormation console. If you’re using the redesigned console, some of the user

interface elements might be different.

You are responsible for the cost of the AWS services used while running this Quick

Start reference deployment. There is no additional cost for using this Quick Start.

For full details, see the pricing pages for each AWS service you will be using in this

Quick Start. Prices are subject to change.

1. Sign in to your AWS account, and choose one of the following options to launch the

AWS CloudFormation template. For help choosing an option, see deployment options

earlier in this guide.

Deploy InfoSphere DataStage into a

new VPC on AWS

Deploy InfoSphere DataStage into an

existing VPC on AWS

Important If you’re deploying InfoSphere DataStage into an existing VPC, make

sure that your VPC has three private subnets in different Availability Zones for the

workload instances, and that the subnets aren’t shared. This Quick Start doesn’t

support shared subnets. These subnets require NAT gateways in their route tables, to

allow the instances to download packages and software without exposing them to the

internet. You will also need the domain name option configured in the DHCP options

as explained in the Amazon VPC documentation. You will be prompted for your VPC

settings when you launch the Quick Start.

Each deployment takes about 2 hours to complete.

2. Check the region that’s displayed in the upper-right corner of the navigation bar and

change it if necessary. This is where the network infrastructure for InfoSphere

DataStage will be built. The template is launched in the US East (Ohio) Region by

default.

• new VPC

• workloadDeploy • workload onlyDeploy

Page 12: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 12 of 34

3. On the Select Template page, keep the default setting for the template URL, and then

choose Next.

4. On the Specify Details page, change the stack name if needed. Review the parameters

for the template. Provide values for the parameters that require input. For all other

parameters, review the default settings and customize them as necessary.

In the following tables, parameters are listed by category and described separately for

the two deployment options:

– Parameters for deploying InfoSphere DataStage into a new VPC

– Parameters for deploying InfoSphere DataStage into an existing VPC

When you finish reviewing and customizing the parameters, choose Next.

OPTION 1: PARAMETERS FOR DEPLOYING INFOSPHERE DATASTAGE INTO A NEW VPC

View template

VPC configuration:

Parameter label

(name) Default Description

Cluster availability

(ClusterAvailability)

Non-HA HA deploys a cluster spanning three Availability Zones,

provisioning three instances each of Master and Etcd. Non-HA

deploys a cluster spanning one Availability Zone, creating one

instance each of Master and Etcd. GlusterFS is deployed with

three instances/EBS volumes in both cases.

Note: Non-HA should NOT be used for business-critical

cluster deployments.

Availability Zones

(AvailabilityZones)

Requires input List of Availability Zones to use for the subnets in the VPC.

Three Availability Zones are required for this deployment, and

the logical order of your selections is preserved.

Public subnet 1 CIDR

(PublicSubnet1CIDR)

10.0.128.0/20 The CIDR block for the public subnet located in Availability

Zone 1.

Public subnet 2 CIDR

(PublicSubnet2CIDR)

10.0.144.0/20 The CIDR block for the public subnet located in Availability

Zone 2.

Public subnet 3 CIDR

(PublicSubnet3CIDR)

10.0.160.0/20 The CIDR block for the public subnet located in Availability

Zone 3.

Private subnet 1 CIDR

(PrivateSubnet1CIDR)

10.0.0.0/19 The CIDR block for the private subnet located in Availability

Zone 1.

Private subnet 2 CIDR

(PrivateSubnet2CIDR)

10.0.32.0/19 The CIDR block for the private subnet located in Availability

Zone 2.

Private subnet 3 CIDR

(PrivateSubnet3CIDR)

10.0.64.0/19 The CIDR block for the private subnet located in Availability

Zone 3.

Page 13: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 13 of 34

Parameter label

(name) Default Description

VPC CIDR

(VPCCIDR)

10.0.0.0/16 The CIDR block for the VPC to be created.

Remote access CIDR

(RemoteAccessCIDR)

Requires input The CIDR IP range that is permitted to access the OpenShift

master nodes and OpenShift Container Platform (OCP) user

interface (UI). We recommend that you set this value to a

trusted IP range. For example, you might want to grant only

your corporate network access to the software.

Container access CIDR

(ContainerAccessCIDR)

Requires input The CIDR IP range that is permitted to access the Windows

client instance and InfoSphere Launchpad from a web

browser. We recommend that you set this value to a trusted IP

range. For example, you might want to grant only your

corporate network access to the software.

Cluster configuration:

Parameter label

(name) Default Description

Key pair name

(KeyPairName)

Requires input A public/private key pair, which allows you to connect

securely to your instance after it launches. This is the key

pair you created in your preferred region; see the Technical

requirements section.

InfoSphere DataStage

Windows Client

instance type

(DSClientInstanceType)

t3.xlarge The type of EC2 instance for the DataStage Windows Client

instance.

Worker nodes

instance type

(NodesInstanceType)

m5.xlarge The EC2 instance type for the OpenShift node instances.

Master instance type

(MasterInstance

Type)

m5.xlarge The EC2 instance type for the OpenShift master instances.

Etcd instance type

(EtcdInstance

Type)

m5.xlarge The EC2 instance type for the OpenShift etcd instances.

Resource tag

(ResourceTag)

ds1 This will be used to label AWS resources and the DataStage

project in OpenShift. Ensure that every InfoSphere

DataStage deployment in your AWS account uses a unique

resource tag.

IIS password

(IISPassword)

Requires input The password to be set on the DataStage application for the

user name "isadmin".

OpenShift admin

password

Requires input The password for the OpenShift Admin UI. Must be at least

8 characters containing letters (minimum 1 capital letter),

numbers, and symbols.

Page 14: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 14 of 34

Parameter label

(name) Default Description

(OpenShiftAdmin

Password)

GlusterFS configuration:

Parameter label

(name)

Default Description

Gluster instance type

(GlusterInstanceType)

i3.large The EC2 instance type for the GlusterFS instances.

Gluster EBS volume

type

(GlusterStorageType)

gp2 The EBS volume type to use for storage.

Gluster storage size

(GlusterStorageSize)

1500 The size in GB of the available storage (will create 3x EBS

volumes of this size).

Gluster storage IOPS

(GlusterStorageIops)

4500 The EBS volume IOPS to allocate (only applicable if io1 has

been selected for GlusterStorageType). A minimum of 4500

IOPS are needed for successful installation.

Red Hat subscription information:

Parameter label

(name)

Default Description

Red Hat subscription

user name

(RedHatSubscription

UserName)

Requires input Enter your Red Hat Network (RHN) user name.

Red Hat subscription

password

(RedHatSubscription

Password)

Requires input Enter your Red Hat Network (RHN) password.

Red Hat pool ID

(RedhatSubscription

PoolID)

Requires input Enter your Red Hat Network (RHN) pool ID.

Page 15: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 15 of 34

DataStage license information:

Parameter label

(name)

Default Description

License agreement

(LicenseAgreement)

— Choose Accept to acknowledge that you have read and agree to

the license terms for IBM InfoSphere DataStage v11.7.1

(http://ibm.biz/isds1171) and IBM InfoSphere DataStage and

Quality Stage Designer v11.7.1 (http://ibm.biz/isdsc1171).

IBM Customer Number

for InfoSphere

DataStage

(IBMDataStageICN)

Requires input The IBM Customer Number (ICN) listed in your Proof of

Entitlement for InfoSphere DataStage.

InfoSphere DataStage

part number

(IBMDataStagePart

Number)

Requires input The IBM part number associated with your InfoSphere

DataStage license.

IBM Customer Number

for InfoSphere

DataStage and

QualityStage client

(IBMDataStageClientICN)

Requires input The IBM Customer Number (ICN) listed in your Proof of

Entitlement for InfoSphere DataStage and QualityStage client.

InfoSphere DataStage

and QualityStage client

part number

(IBMDataStageClient

PartNumber)

Requires input The IBM part number associated with your InfoSphere

DataStage and QualityStage client license.

AWS Quick Start configuration:

Note We recommend that you keep the default settings for the following two

parameters, unless you are customizing the Quick Start templates for your own

deployment projects. Changing the settings of these parameters will automatically

update code references to point to a new Quick Start location. For additional details,

see the AWS Quick Start Contributor’s Guide.

Parameter label

(name) Default Description

Quick Start S3 bucket

name

(QSS3BucketName)

aws-quickstart The S3 bucket you created for your copy of Quick Start assets,

if you decide to customize or extend the Quick Start for your

own use. The bucket name can include numbers, lowercase

letters, uppercase letters, and hyphens, but should not start or

end with a hyphen.

Page 16: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 16 of 34

Parameter label

(name) Default Description

Quick Start S3 key

prefix

(QSS3KeyPrefix)

quickstart-ibm-

infosphere-

datastage/

The S3 key name prefix used to simulate a folder for your copy

of Quick Start assets, if you decide to customize or extend the

Quick Start for your own use. This prefix can include numbers,

lowercase letters, uppercase letters, hyphens, and forward

slashes.

Output S3 bucket

name

(OutputBucketName)

Optional [Optional] The bucket name where the zip file output should

be placed. If left blank, a bucket name is automatically

generated.

Page 17: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 17 of 34

OPTION 2: PARAMETERS FOR DEPLOYING INFOSPHERE DATASTAGE INTO AN EXISTING

VPC

View template

VPC configuration:

Parameter label

(name) Default Description

Cluster availability

(ClusterAvailability)

Non-HA "HA" deploys a cluster spanning 3 AZs, provisioning 3

instances each of Master and Etcd. "Non-HA" deploys a cluster

spanning 1 AZ, creating one instance each of Master and Etcd.

GlusterFS is deployed with three instances/EBS volumes in

both cases.

Note: Non-HA should NOT be used for business-critical

cluster deployments.

Public subnet 1 ID

(PublicSubnet1ID)

Requires input The ID of the public subnet in Availability Zone 1.

Public subnet 2 ID

(PublicSubnet2ID)

Requires input The ID of the public subnet in Availability Zone 2.

Public subnet 3 ID

(PublicSubnet3ID)

Requires input The ID of the public subnet in Availability Zone 3.

Private subnet 1 ID

(PrivateSubnet1ID)

Requires input The ID of the private subnet in Availability Zone 1.

Private subnet 2 ID

(PrivateSubnet2ID)

Requires input The ID of the private subnet in Availability Zone 2.

Private subnet 3 ID

(PrivateSubnet3ID)

Requires input The ID of the private subnet in Availability Zone 3.

VPC ID

(VPCID)

Requires input The ID of your existing VPC for deployment.

VPC CIDR

(VPCCIDR)

10.0.0.0/16 The CIDR block for the VPC to be created.

Remote access CIDR

(RemoteAccessCIDR)

Requires input The CIDR IP range that is permitted to access the OpenShift

master nodes and OpenShift Container Platform (OCP) user

interface (UI). We recommend that you set this value to a

trusted IP range. For example, you might want to grant only

your corporate network access to the software.

Container access CIDR

(ContainerAccessCIDR)

Requires input The CIDR IP range that is permitted to access the Windows

client instance and InfoSphere Launchpad from a web

browser. We recommend that you set this value to a trusted IP

range. For example, you might want to grant only your

corporate network access to the software.

Page 18: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 18 of 34

Cluster configuration:

Parameter label

(name) Default Description

Key pair name

(KeyPairName)

Requires input A public/private key pair, which allows you to connect

securely to your instance after it launches. This is the key

pair you created in your preferred region; see the Technical

requirements section.

InfoSphere DataStage

Windows Client

instance type

(DSClientInstanceType)

t3.xlarge The type of EC2 instance for the DataStage Windows Client

instance.

Worker nodes

instance type

(NodesInstanceType)

m5.xlarge The EC2 instance type for the OpenShift node instances..

Master instance type

(MasterInstance

Type)

m5.xlarge The EC2 instance type for the OpenShift master instances.

Etcd instance type

(EtcdInstance

Type)

m5.xlarge The EC2 instance type for the OpenShift etcd instances.

Resource tag

(ResourceTag)

ds1 This will be used to label AWS resources and the DataStage

project in OpenShift. Ensure that every InfoSphere

DataStage deployment in your AWS account uses a unique

resource tag.

IIS password

(IISPassword)

Requires input The password to be set on the DataStage application for the

user name "isadmin".

OpenShift admin

password

(OpenShiftAdmin

Password)

Requires input The password for the OpenShift Admin UI. Must be at least

8 characters containing letters (minimum 1 capital letter),

numbers, and symbols.

GlusterFS configuration:

Parameter label

(name)

Default Description

Gluster instance type

(GlusterInstanceType)

i3.large The type of EC2 instance for the Node instances.

Gluster EBS volume

type

(GlusterStorageType)

gp2 The EBS volume type to use for storage.

Gluster storage size

(GlusterStorageSize)

1500 The size in GB of the available storage (will create 3x EBS

volumes of this size),

Page 19: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 19 of 34

Parameter label

(name)

Default Description

Gluster storage IOPS

(GlusterStorageIops)

4500 The EBS volume IOPS to allocate (only applicable if io1 has

been selected for GlusterStorageType). A minimum of 4500

IOPS are needed for successful installation.

Red Hat subscription information:

Parameter label

(name)

Default Description

Red Hat subscription

user name

(RedHatSubscription

UserName)

Requires input Enter your Red Hat Network (RHN) user name.

Red Hat subscription

password

(RedHatSubscription

Password)

Requires input Enter your Red Hat Network (RHN) password.

Red Hat pool ID

(RedhatSubscription

PoolID)

Requires input Enter you Red Hat Network (RHN) pool ID.

DataStage license information:

Parameter label

(name)

Default Description

License agreement

(LicenseAgreement)

— Choose Accept to acknowledge that you have read and agree

to the license terms for IBM InfoSphere DataStage v11.7.1

(http://ibm.biz/isds1171) and IBM InfoSphere DataStage and

Quality Stage Designer v11.7.1 (http://ibm.biz/isdsc1171).

IBM Customer Number

for InfoSphere

DataStage

(IBMDataStageICN)

Requires input The IBM Customer Number (ICN) listed in your Proof of

Entitlement for InfoSphere DataStage.

InfoSphere DataStage

part number

(IBMDataStagePart

Number)

Requires input The IBM part number associated with your InfoSphere

DataStage license.

IBM Customer Number

for InfoSphere

DataStage and

QualityStage client

(IBMDataStageClientICN)

Requires input The IBM Customer Number (ICN) listed in your Proof of

Entitlement for InfoSphere DataStage and QualityStage client.

Page 20: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 20 of 34

Parameter label

(name)

Default Description

InfoSphere DataStage

and QualityStage client

part number

(IBMDataStageClient

PartNumber)

Requires input The IBM part number associated with your InfoSphere

DataStage and QualityStage client license.

AWS Quick Start configuration:

Note We recommend that you keep the default settings for the following two

parameters, unless you are customizing the Quick Start templates for your own

deployment projects. Changing the settings of these parameters will automatically

update code references to point to a new Quick Start location. For additional details,

see the AWS Quick Start Contributor’s Guide.

Parameter label

(name) Default Description

Quick Start S3 bucket

name

(QSS3BucketName)

aws-quickstart The S3 bucket you created for your copy of Quick Start assets,

if you decide to customize or extend the Quick Start for your

own use. The bucket name can include numbers, lowercase

letters, uppercase letters, and hyphens, but should not start or

end with a hyphen.

Quick Start S3 key

prefix

(QSS3KeyPrefix)

quickstart-ibm-

infosphere-

datastage/

The S3 key name prefix used to simulate a folder for your copy

of Quick Start assets, if you decide to customize or extend the

Quick Start for your own use. This prefix can include numbers,

lowercase letters, uppercase letters, hyphens, and forward

slashes.

Output S3 bucket

name

(OutputBucketName)

Optional [Optional] The bucket name where the zip file output should

be placed. If left blank, a bucket name is automatically

generated.

5. On the Options page, you can specify tags (key-value pairs) for resources in your stack

and set advanced options. When you’re done, choose Next.

6. On the Review page, review and confirm the template settings. Under Capabilities,

select the two check boxes to acknowledge that the template will create IAM resources

and that it might require the capability to auto-expand macros.

7. Choose Create to deploy the stack.

8. Monitor the status of the stack. When the status is CREATE_COMPLETE, the

InfoSphere DataStage cluster is ready.

Page 21: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 21 of 34

9. Use the URLs displayed in the Outputs tab for the stack to view the resources that were

created.

Figure 3: InfoSphere DataStage outputs after successful deployment

Step 4. Test the deployment

You can access InfoSphere DataStage from the browser by using the URL for the IBM

InfoSphere Information Server (IIS) Launchpad, and from the DataStage Windows Client

instance that’s provisioned with the cluster.

ACCESSING THE IIS LAUNCHPAD FROM A BROWSER

After stack creation has finished, use the link from the Outputs section of the root stack

shown in the preceding figure to open the Launchpad. On password-protected pages, the

user name will be “isadmin” and the password will be what you entered for the parameter

IISPassword.

Note The Launchpad link will only work when accessed from an IP address in the

ContainerAccessCIDR range.

The Launchpad is a standard, single web interface for opening the various clients or

consoles for IBM InfoSphere Information Server. After deploying this Quick Start, the

Launchpad gives you access to the following services, shown in the following figure:

Page 22: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 22 of 34

IBM DataStage Flow Designer (you might need to purchase additional licenses to use

this)

IBM InfoSphere DataStage Operations Console

Subscription Manager

IBM InfoSphere Metadata Asset Manager

IBM InfoSphere Information Server Administration Console

Figure 4: InfoSphere Information Server Launchpad

ACCESSING THE DATASTAGE WINDOWS CLIENT

Logging in using Remote Desktop Protocol (RDP)

1. Retrieve the initial password from the Amazon EC2 console:

a. Select the instance named “<ResourceTag>-DS-Client” in the Amazon EC2 console.

b. Choose Actions, and then choose Get Windows Password. It can take a few

minutes for this option to become available.

c. Upload your key pair file, and then choose Decrypt Password.

2. Log in to the DS-Client instance by using RDP, with the user name “Administrator” and

the password retrieved in step 1.

Page 23: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 23 of 34

Using InfoSphere DataStage and QualityStage Designer

1. From the CloudFormation console page, click on the root stack for this cluster and, from

the Outputs tab, retrieve the value for ContainerELBName.

2. Open the Designer Client from the desktop.

3. Enter the following values. You can also find these values in README.txt on the

desktop:

Host name <ContainerELBName>:443

User name isadmin

Password <value entered for the IISPassword parameter>

Project <will be auto-populated after clicking on the

dropdown button>

You must choose the dropdown button next to the Project field before logging in

every time, even if the Project value has already been populated.

4. Choose Login.

5. If you encounter a timeout error the first time that you try to log in, please try again in a

couple minutes.

Using the IIS Launchpad

Note For compatibility, we recommend that you install Mozilla Firefox (version

54 and later) or Google Chrome (version 63 and later).

1. Right-click the IIS Launchpad icon on the desktop and change the URL to

https://<ContainerELBName>/ibm/iis/launchpad/.

2. Click OK to save.

3. Click on the icon to open the site. Choose Continue to this website on the security

prompt page.

4. On the Security Alert dialog box, choose Yes.

Page 24: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 24 of 34

ACCESSING THE CLUSTER NODES THROUGH THE ANSIBLE CONFIG SERVER

Note We recommend that you periodically run the command

sudo yum update -y --security; sudo reboot

or configure yum-cron on the Ansible config server to keep the instance hardened

with the latest security patches.

The recommended method of SSH access to the OpenShift cluster instances via the bastion

host is by using SSH agent forwarding, as in the following Bash instructions:

1. Run the command ssh-add -K <your-key.pem> to store the key in your keychain.

On Linux, you might need to omit the -K flag.

2. Retrieve the IP address of the Ansible config server from the Amazon EC2 console.

3. To log in to the bastion host, run ssh -A ec2-user@<config-server-ip>.

4. To log in to private subnet instances, run ssh <instance-ip> from the bastion host.

For details and Windows instructions, see the blog post Securely Connect to Linux

Instances Running in a Private Amazon VPC.

TRANSFERRING FILES FROM YOUR LOCAL COMPUTER TO THE CLUSTER

Files can be manually transferred to the OpenShift cluster by tunneling via SSH through the

Ansible config server. Files copied to the /mnt directory in any of the OpenShift EC2 worker

node instances will be visible to the DataStage application.

Bash

1. Run the command ssh-add -K <your-key.pem> to store the key in your keychain.

On Linux, you might need to omit the -K flag.

2. In one terminal window, run the following command to establish an SSH tunnel. Keep

this terminal open for the duration of your file transfer.

ssh -L 9999:<k8s-instance-ip>:22 ec2-user@<config-server-ip>

3. In another terminal, you may SFTP through the tunnel bound to port 9999 at localhost.

sftp -P 9999 ec2-user@localhost

Page 25: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 25 of 34

WinSCP

Open an SCP connection using the following site configuration:

File protocol: SCP

Host name: <instance-ip>

User name: ec2-user

Advanced > SCP/Shell:

– Shell: sudo su

Advanced > Tunnel:

– Host name: <config-server-ip>

– Private key file: <your-key.ppk>

If the key pair file is in .pem format, you may still select it, and WinSCP will

offer to convert it to .ppk format.

Advanced > Authentication:

Private key file: <your-key.ppk>

Manual cleanup

When you delete the stack created by this Quick Start, the DataStage cluster will be deleted

automatically. However, some EBS volumes may persist that must be deleted manually

from the AWS Management Console:

1. Navigate to the Volumes page in the Amazon EC2 console.

2. Volumes called kubernetes-dynamic-pvc might have been created by this cluster and

must be deleted. To confirm that the volume was created by this cluster, check the Tags

tab of the volume for a key called kubernetes.io/cluster/<ResourceTag>.

The Quick Start automatically deregisters instances from Red Hat, when the cluster is

deleted. However, you may double-check on the Systems page on the Red Hat Customer

Portal to make sure that all subscriptions have been reclaimed.

Page 26: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 26 of 34

Figure 5: Red Hat customer portal systems deregistration page

Best practices for using InfoSphere DataStage on AWS

Enabling backups

The Quick Start copies scripts to the Ansible controller instance to enable Db2 backup for

DataStage. You can find them in the /quickstart/backup_scripts directory.

Administering OpenShift clusters

Upon cluster deployment, the OpenShift CLI (oc) can be operated after connecting via SSH

to the Ansible config instance or to one of the master nodes.

The cluster can be administered from a remote machine using the following command.

Note that the remote machine’s IP address must lie in the ContainerAccessCIDR range:

oc login \ -s="<OpenShiftUI from outputs>" \ -u=devuser \ –p="<IISPassword>"

Security

When you build systems on the AWS infrastructure, security responsibilities are shared

between you and AWS. This shared model can reduce your operational burden. AWS

operates, manages, and controls the components from the host operating system and

Page 27: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 27 of 34

virtualization layer down to the physical security of the facilities in which the services

operate.

In turn, you assume responsibility and management of the guest operating system

(including updates and security patches), other associated applications, and configuration

of the AWS-provided security group firewall. For more information about security on AWS,

visit the AWS Security Center.

This Quick Start creates an IAM role that’s attached to all instances of OpenShift configured

with a minimum set of access policies, and restricted to tagged resources when possible.

The DataStage Client instance is not attached to an IAM role.

GlusterFS EBS volumes that are used for storing application files and data are encrypted.

The password of the isadmin user is set to the value entered for the parameter IISPassword.

This password is also used for Db2 native encryption and the LWAS signer certificate. The

default password for other internal-facing users is set to “isadmin”.

Security groups are used to restrict network ingress traffic for instances in this Quick Start.

Inbound access from the internet is only permitted from IP addresses in the

RemoteAccessCIDR and ContainerAccessCIDR to relevant ports on the Classic Load Balancer.

By default, the OpenShift cluster is configured with the HTPasswd identity provider.

Different identity providers can be configured, such as keystone, LDAP, or basic

authentication. For details, see configuring authentication and user agent the OpenShift

documentation.

Scaling compute

At the time of deploying the Quick Start, two is-engine-compute pods are provisioned on

the OpenShift cluster (one each in the two node instances). The parallel compute capability

of the cluster can be scaled up by increasing the number of node instances and provisioning

more of these pods.

To ensure proper distribution of additional compute pods across all the instances, first

horizontally scale the node instances, as outlined in Horizontal scaling. Then, scale the

number of is-engine-compute pods. Depending on your use case and the instance type, you

might also want to have multiple compute pods on the same node instance.

Page 28: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 28 of 34

Horizontal scaling

1. Navigate to the “OpenShiftNodeASG” Auto Scaling group in the Amazon EC2 console.

2. Increase the Desired capacity value as needed.

3. Wait 20 minutes for Ansible to prepare the instance and attach it to the cluster. The

instance has been successfully attached if it appears in oc get nodes.

Changing the number of is-engine-compute pods

1. Connect via SSH into the Ansible config server or one of the master nodes .

2. Run the following command as root:

oc scale –-replicas=<number-of-replicas> statefulset is-engine-compute

Scaling back down

When the number of nodes of an Auto Scaling group is scaled down, the default termination

policy might terminate any of the instances of the group. To minimize disruption in the

cluster when scaling down, the instances that are present before scaling up should be

preserved. This can be achieved by enabling instance protection for the pre-scale-up

instances before initiating scale-down action.

1. Find the instance IDs of the openshift-node instances that you’d like to preserve.

2. On the Auto Scaling Group page in the EC2 Management Console, select the Auto

Scaling group named OpenShiftNode. From the Instances tab, select the instances from

step 1 and enable Actions > Instance Protection > Set Scale In Protection.

3. Set the desired capacity of the OpenShiftNode ASG to “2” from Actions > Edit for the

ASG. This will scale down the cluster.

4. Connect via SSH to the Ansible config server or one of the master instances, and change

to root. Wait for the terminated node to disappear from the oc get nodes list. This

might take a few minutes.

5. Run the following command to adjust the number of is-engine-compute replicas:

oc scale –-replicas=<number-of-replicas> statefulset is-engine-compute

Page 29: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 29 of 34

Provisioning more DataStage Client instances

More DataStage Client instances can be provisioned as required.

1. Open the Amazon EC2 console, and then navigate to the Launch Templates page.

2. Select the template named <ResourceTag>-DSClientLT. Choose Actions, and then

choose Launch instance from template.

3. You can use the default values or change any parameters as desired. Ensure that the

selected subnet is one of the public subnets originally created by the Quick Start.

4. You can add an Elastic IP address to the new instance, so that the public IP persists

between reboots. Select the new instance on the Instances page, choose Actions >

Networking > Manage IP Addresses, and then choose Allocate an Elastic IP.

5. Inbound rules in the OpenShiftNode security group must be created for the public IP of

the new instance, allowing traffic on ports 80, 443, 31538, and 31531.

Troubleshooting

Q. I encountered a CREATE_FAILED error when I launched the Quick Start.

A. If AWS CloudFormation fails to create the stack, we recommend that you relaunch the

template with Rollback on failure set to No. (This setting is under Advanced in the

AWS CloudFormation console, Options page.) With this setting, the stack’s state will be

retained and the instance will be left running, so you can troubleshoot the issue. (For

Windows, look at the log files in %ProgramFiles%\Amazon\EC2ConfigService and

C:\cfn\log.)

Important When you set Rollback on failure to No, you will continue to incur

AWS charges for this stack. Please make sure to delete the stack when you finish

troubleshooting.

For additional information, see Troubleshooting AWS CloudFormation on the AWS

website.

Q. I encountered a size limitation error when I deployed the AWS CloudFormation

templates.

A. We recommend that you launch the Quick Start templates from the links in this guide or

from another S3 bucket. If you deploy the templates from a local copy on your computer or

from a non-S3 location, you might encounter template size limitations when you create the

Page 30: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 30 of 34

stack. For more information about AWS CloudFormation limits, see the AWS

documentation.

Q. A CREATE_FAILED error occurred at the Custom::AMIInfo resource in the

DataStageClientStack. Or in the Custom::EcrAccess resource in the DataStageStack.

A. You may have entered an invalid IBM Customer Number (ICN) and/or part number.

Please provide the ICN and part numbers noted in the Proof of Entitlement (PoE) that IBM

provided when you purchased your entitlements to IBM InfoSphere DataStage and

InfoSphere DataStage and QualityStage Designer. Please note that your entitlements might

be in different PoEs, if you purchased licenses for InfoSphere DataStage and for InfoSphere

DataStage and QualityStage Designer separately.

You can obtain your PoE from IBM’s Passport Advantage Online portal by following these

steps. For general assistance with Passport Advantage, see the Passport Advantage Online

for customers or the Customer eCare team webpages.

If you’re unable to locate your PoE or don’t have one, please contact your IBM

representative or IBM support. Please do not post your ICN and part number in a GitHub

Issue.

Q. I encountered a java.lang.NoClassDefFoundError: com.ibm.iis.isf.admin.Config

(initialization failure) error message when I tried to access the Launchpad URL.

A. After the stack has been created, the pods might require some time to finish initializing

the InfoSphere DataStage application. If the error persists for longer than 60 minutes after

stack creation has finished, please contact IBM support, or open an Issue in the Quick

Start’s GitHub repository.

Q. In the Launchpad Operations Console, one or more service status monitors is in an error

state.

A. If this occurs right after the stack has been deployed, all the containers might not have

finished initializing. Check back after a few minutes. In other cases, try the following

command to restart the monitoring services. (Note that the snippet is a single command.)

Page 31: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 31 of 34

oc exec -i is-en-conductor-0 -- bash -c '/opt/IBM/InformationServer/Server/DSODB/bin/DSAppWatcher.sh -stop ResMonApp; /opt/IBM/InformationServer/Server/DSODB/bin/DSAppWatcher.sh -start ResMonApp; /opt/IBM/InformationServer/Server/DSODB/bin/DSAppWatcher.sh -stop EngMonApp; /opt/IBM/InformationServer/Server/DSODB/bin/DSAppWatcher.sh -start EngMonApp'

Q. After the Auto Scaling group adds worker instances, the application is in an error state.

A. After an OpenShift node instance replacement by the Auto Scaling group, it takes 30-40

minutes for the pods to finish deploying again. You can observe pod scheduling progress by

logging into the Ansible controller, changing to the root user (using sudo su), and using the

commands oc get nodes to check if the new instance has attached to the cluster, and oc

get pods -o wide to check if any DataStage pods are pending creation.

Q. Trying to connect with the DataStage Designer client gives the error: “Failed to

authenticate the current user against the selected Service Tier.”

A. Please make sure sure that you are following the instructions in Accessing the DataStage

Windows Client or the README.txt file and that you’ve entered the right credentials. As

noted, you must select the dropdown button every time before you connect, even if the

project name has been pre-populated in the Project field.

Q. Disk performance is poor.

A. If using gp2 as the GlusterStorageType, you can redeploy with a larger value for the

GlusterStorageSize parameter. This allocates a greater IOPS value for the GlusterFS EBS

volumes.

Alternatively, you can choose io1 for GlusterStorageType and enter a specific value in the

GlusterStorageIops parameter. Note that io1 EBS volumes are priced significantly higher than

gp2. For more information, see Amazon EBS features.

Page 32: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 32 of 34

IBM support

IBM support is available to users of the InfoSphere DataStage application and InfoSphere

DataStage and QualityStage Designer.

Create a case

1. Log in to the Cloud Service Portal using your IBMid. If you don’t have an IBMid, please

sign up for one.

2. In Need more help, choose Create a case.

3. In What type of support do you need, select Technical.

4. In the Category drop-down menu, select Analytics.

5. Under Offering, enter dsoncloud.

6. Fill in the Subject and Description fields, and include as much information about the

issue as possible. State that you’re an AWS Quick Start user.

7. Submit the case, and note the case number.

Support forum

Look for answers in our support forum.

Send us feedback

To post feedback, submit feature ideas, or report bugs, use the Issues section of the

GitHub repository for this Quick Start. If you’d like to submit code, please review the Quick

Start Contributor’s Guide.

Additional resources

AWS resources

Getting Started Resource Center

AWS General Reference

AWS Glossary

Page 33: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 33 of 34

AWS services

AWS CloudFormation

Amazon EBS

Amazon EC2

Amazon EFS

IAM

Amazon VPC

IBM InfoSphere DataStage documentation

InfoSphere Information Server documentation

Getting started with InfoSphere DataStage and InfoSphere QualityStage

IBM DataStage Flow Designer (web-based; thin client)

IBM Support home page

Topology for DataStage with Kubernetes

Other Quick Start reference deployments

AWS Quick Start home page

Document revisions

Date Change In sections

November 2019 Updated and added various sections to reflect

that this Quick Start is now based on top of the

OpenShift on AWS Quick Start

Architecture; Launch the Quick

Start; other changes throughout

guide

August 2019 Added information about manually transferring

files to the Kubernetes cluster;

Added troubleshooting information about how to

resolve a CREATE_FAILED error that occurs at

the WaitCondition resource in the

DataStageStack

Transferring files from your

local computer to the cluster;

Troubleshooting

May 2019 Initial publication —

Page 34: IBM InfoSphere DataStage on the AWS Cloud · Page 1 of 33 IBM InfoSphere DataStage on the AWS Cloud Quick Start Reference Deployment August 2019 Shrumit Mehta and Shashidhar Yellareddy,

Amazon Web Services – IBM InfoSphere DataStage on the AWS Cloud November 2019

Page 34 of 34

© 2019, Amazon Web Services, Inc. or its affiliates, and IBM. All rights reserved.

Notices

This document is provided for informational purposes only. It represents AWS’s current product offerings

and practices as of the date of issue of this document, which are subject to change without notice. Customers

are responsible for making their own independent assessment of the information in this document and any

use of AWS’s products or services, each of which is provided “as is” without warranty of any kind, whether

express or implied. This document does not create any warranties, representations, contractual

commitments, conditions or assurances from AWS, its affiliates, suppliers or licensors. The responsibilities

and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of,

nor does it modify, any agreement between AWS and its customers.

The software included with this paper is licensed under the Apache License, Version 2.0 (the "License"). You

may not use this file except in compliance with the License. A copy of the License is located at

http://aws.amazon.com/apache2.0/ or in the "license" file accompanying this file. This code is distributed on

an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and limitations under the License.