Relocatable Application Bundles for IBM InfoSphere Streams V4.0

20
© 2015 IBM Corporation Application Bundles and sundry related things IBM InfoSphere Streams Version 4.0 Howard Nasgaard InfoSphere Streams Compiler Development For questions about this presentation contact Howard Nasgaard [email protected]

Transcript of Relocatable Application Bundles for IBM InfoSphere Streams V4.0

Page 1: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Application Bundlesand sundry related things

IBM InfoSphere Streams Version 4.0

Howard Nasgaard

InfoSphere Streams Compiler Development

For questions about this presentation contact Howard Nasgaard

[email protected]

Page 2: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

2 © 2015 IBM Corporation

Important Disclaimer

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.

WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.

IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR

• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.

IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.

THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

Page 3: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

3 © 2015 IBM Corporation

Agenda

Application bundles

Changes to the data directory

Application and operator migration

Page 4: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

4 © 2015 IBM Corporation

High-Level Overview

Prior to Version 4.0, a Streams application consisted of many artifacts

– The output directory hierarchy

– Possibly an impl directory hierarchy

– Possibly some toolkits

– Artifacts had to be available on all execution hosts

– Basically mandated a shared file system

Application relocatability was limited

– All application artifacts would have to be moved together

– Toolkits could not be relocated

• Must exist on all hosts on which the application would execute

Version 4.0 introduces the application bundle

– A single file that contains all the artifacts necessary to run an application

– Same name as the ADL file, but with the .sab extension

• ie: sample.Vwap.sab

– Job is submitted as before, but using the sab file instead of the ADL file*

Page 5: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

5 © 2015 IBM Corporation

Benefits of an Application Bundle

Application is fully relocatable

– Only the sab file is needed to submit the application

– It can live anywhere that streamtool can access it

No toolkits are required on execution hosts

– Toolkit not required once compilation is complete

At submission-time the Streams runtime sends the bundle to each node that

will execute the application

– No longer any need for a shared file system*

– No longer a need for any toolkits on the execution hosts

Standalone also runs from the application bundle

– ./bundle.sab Bundle is self executing if the system is correctly configured

– java -jar bundle.sab

– output/bin/standalone (as in previous versions)

– Fully relocatable*

Page 6: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

6 © 2015 IBM Corporation

How do I create an application bundle?

The sc compiler now automatically produces a bundle– Pre-Version 4.0 applications cannot be run on Version 4.0. They must be

recompiled.

You must re-index all toolkits prior to compiling your application– The toolkit.xml file that is generated has changed format from previous versions

– spl-make-toolkit -i ….

– This is done as a side-effect of compilation for the application toolkit

You must then recompile your applications– Applications may need some migration*

Page 7: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

7 © 2015 IBM Corporation

How are application bundles executed?

The Streams runtime determines what hosts the application will

execute and sends the bundle to each one

The bundle is un-bundled in a fixed location on each host– The root of this location is controlled by a domain property

• domain.applicationBundlesPath

– Each unique application has a unique, non-predictable, location under the root

– Multiple invocations of the same application share the un-bundled application

code

The PEs are started as they were for previous versions

Page 8: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

8 © 2015 IBM Corporation

What goes into an application bundle?

Many artifacts from the application directory are included by default– etc

– lib

– nl

– opt

– impl/bin

– impl/lib

– impl/nl

– impl/java/bin

– impl/java/lib

– Select parts of the output directory

Similar parts of each of the toolkit directories for the toolkits used by

the application– Relies on the compiler knowing what an application uses

Page 9: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

9 © 2015 IBM Corporation

Can other artifacts be included in the bundle?

Yes! The info.xml format is extended to allow specification of other

artifacts– Uses a <sabFiles> element

• With <include> sub-element to include artifacts

• With <exclude> sub-element to exclude an element

– Format: <include path=”...” root=”...”/> <exclude path=”...”/>• Path is mandatory attribute and specifies a path RELATIVE to the root of the toolkit

Uses ant path syntax

• Root is optional and is one of: toolkitDir (the default)

applicationDir

outputDir

<sabFiles>

<include path="schemas/*"/>

<sabFiles>

Page 10: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

10 © 2015 IBM Corporation

What is the purpose of the “root” attribute?

Tells the compiler the root path to prepend to the RELATIVE path

specified in the path attribute – toolkitDir: prepends the root path of the toolkit containing the toolkit.xml file

– applicationDir: prepends the application directory path regardless of which

toolkit's info.xml specifies this

– outputDir: prepends the output directory path regardless of which toolkit's

info.xml specifies this

This is a bit confusing...let's look at some use cases:

Page 11: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

11 © 2015 IBM Corporation

Use case 1: applicationDir

I am developing an operator in some arbitrary toolkit that will be used

by some application

My operator needs to be given a schema file that my operator will

load at run-time. The user of my toolkit must create and supply the

schema file (I can't know the contents when I write my operator).

As part of the documentation of my operator I describe the syntax of

the schema file and specify that the files should be placed in the

schemas subdirectory of the application directory

As the operator developer I need a way of forcing any files in the

<appDir>/schemas directory into the application bundle

Key point: an arbitrary toolkit can force files into an application

bundle so the user of that toolkit doesn't have to

<sabFiles>

<include path="schemas/*" root=”applicationDir”/>

<sabFiles>

Page 12: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

12 © 2015 IBM Corporation

Use case 2: toolkitDir

I am developing an operator in some arbitrary toolkit that will be used

by some application

My toolkit contains some files that are supplied with the toolkit and

will be read by my operator at runtime.

I want these files to live in the schema subdirectory in my toolkit

which is not one of subdirectories included in the bundle by default

Key point: Any toolkit can include files from its own directory

hierarchy

<sabFiles>

<include path="schemas/*" root=”toolkitDir/>

<sabFiles>

Page 13: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

13 © 2015 IBM Corporation

How do I trouble-shoot application bundles?

If your application fails to run and the logs suggest that a file or

library is missing:

Use the spl-app-info utility to look into the bundle– Lists toolkits and files in the bundle

– Can be used to manually un-bundle an application• Needed to run gdb on standalone applications

– Gives the bundle's build ID and build date

– Tells you if the bundle was compiled for standalone or distributed execution

– Provides this information in machine-readable (XML) format

Page 14: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

14 © 2015 IBM Corporation

The data directory

In previous versions of Streams the data directory was (by default)

part of the application directory– The compiler created it if it didn't already exist

In Version 4.0 the data directory is NOT part of the application bundle– Application bundles are shared across multiple executions of the same app

– The application should not write into the application directory• It is logically not accessible so anything written there is inaccessible from outside the

streams environment

• There is no file sharing between hosts

– The compiler no longer creates a data directory

If your application needs a data directory– You must create the data directory and populate it prior to submitting your

application to run

– You must specify where the data directory is• at compile time using –data-directory, or

• at submission-time using streamtool …. -C data-directory ….

Page 15: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

15 © 2015 IBM Corporation

The data directory in a non-shared file system

There is only one data directory for an application

Operators that write to, or read from, the data directory must be

located on a host that has the data directory

Operators that share information via files in the data directory must

be co-located on the same host

Page 16: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

16 © 2015 IBM Corporation

The data directory as CWD

Prior to Version 4.0 the CWD was changed to the data directory at

startup– Operators and other code could assume that the CWD was the data directory

when dealing with relative paths

In Version 4.0 the data directory is NOT the CWD– CWD is a temporary directory created by the Streams runtime

– Unique directory per application instance

– Root location controlled by domain.applicationCurrentWorkingPath domain

property

This means application migration is likely if:– Your operators support relative file names

– You have SPL code that uses the spl::file functions with relative file names

Page 17: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

17 © 2015 IBM Corporation

Data files vs Configuration Files

Version 4.0 distinguishes between data and configuration files– Data files are processed by the application and would be unique to each

instance of an application• Are not part of the application bundle

• Expect to be found in (or relative to) the data directory

– Configuration files are common across all instances of a given application• Should be included in the application bundle

– Operators that support params that accept relative path names should

document how relative paths are treated

If the application developer must supply the configuration files– The operator should specify in what directory they expect to be found. For

example: etc or schema

– If the configuration files are in non-default locations, then the sabFiles section

of the info.xml file should specify they be included in the app bundle so the user

of the operator doesn't have do the inclusion

Page 18: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

18 © 2015 IBM Corporation

Application Migration

All code SPL that assumes the CWD is the data directory MUST

change– SPL code that uses relative paths

• open(“myFile”); → open(dataDirectory() + “/myFile”);

New compiler intrinsic functions available to access toolkit and application

directories

– getThisToolkitDir: returns the path to the toolkit containing the SPL with this

function in it

• For example: open(getThisToolkitDir() + “/etc/myFile.cfg”);

– getApplicationDir: returns the path to the application directory

• For example: open(getApplicationDir() + “/etc/myFile.cfg”);

Page 19: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

19 © 2015 IBM Corporation

Operator Migration

Operators that accept relative paths– param file : “myFile”

– Should document where relative paths are rooted (data dir, application dir)• For example:

The configFile param accepts a relative file name

Relative file names are rooted in the application directory

The config files are expected to be in the etc directory

• A correct relative param specification would be param configFile : “etc/myFile.cfg”;

• An alternative is to say Relative file names are rooted in the etc subdirectory of the application directory

• Then the param specification could look like: param configFile : “myFile.cfg”;

– Build an absolute file name by prepending the appropriate root path to the

given relative path

– APIs are available in C++ and Java to get the data directory, application

directory, and toolkit directory (of the toolkit containing the operator)

Page 20: Relocatable Application Bundles for IBM InfoSphere Streams V4.0

20 © 2015 IBM Corporation

Questions?