Relocatable Application Bundles for IBM InfoSphere Streams V4.0
-
Upload
lisanl -
Category
Data & Analytics
-
view
88 -
download
0
Transcript of Relocatable Application Bundles for IBM InfoSphere Streams V4.0
© 2015 IBM Corporation
Application Bundlesand sundry related things
IBM InfoSphere Streams Version 4.0
Howard Nasgaard
InfoSphere Streams Compiler Development
For questions about this presentation contact Howard Nasgaard
2 © 2015 IBM Corporation
Important Disclaimer
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.
WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.
IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR
• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.
IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.
THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
3 © 2015 IBM Corporation
Agenda
Application bundles
Changes to the data directory
Application and operator migration
4 © 2015 IBM Corporation
High-Level Overview
Prior to Version 4.0, a Streams application consisted of many artifacts
– The output directory hierarchy
– Possibly an impl directory hierarchy
– Possibly some toolkits
– Artifacts had to be available on all execution hosts
– Basically mandated a shared file system
Application relocatability was limited
– All application artifacts would have to be moved together
– Toolkits could not be relocated
• Must exist on all hosts on which the application would execute
Version 4.0 introduces the application bundle
– A single file that contains all the artifacts necessary to run an application
– Same name as the ADL file, but with the .sab extension
• ie: sample.Vwap.sab
– Job is submitted as before, but using the sab file instead of the ADL file*
5 © 2015 IBM Corporation
Benefits of an Application Bundle
Application is fully relocatable
– Only the sab file is needed to submit the application
– It can live anywhere that streamtool can access it
No toolkits are required on execution hosts
– Toolkit not required once compilation is complete
At submission-time the Streams runtime sends the bundle to each node that
will execute the application
– No longer any need for a shared file system*
– No longer a need for any toolkits on the execution hosts
Standalone also runs from the application bundle
– ./bundle.sab Bundle is self executing if the system is correctly configured
– java -jar bundle.sab
– output/bin/standalone (as in previous versions)
– Fully relocatable*
6 © 2015 IBM Corporation
How do I create an application bundle?
The sc compiler now automatically produces a bundle– Pre-Version 4.0 applications cannot be run on Version 4.0. They must be
recompiled.
You must re-index all toolkits prior to compiling your application– The toolkit.xml file that is generated has changed format from previous versions
– spl-make-toolkit -i ….
– This is done as a side-effect of compilation for the application toolkit
You must then recompile your applications– Applications may need some migration*
7 © 2015 IBM Corporation
How are application bundles executed?
The Streams runtime determines what hosts the application will
execute and sends the bundle to each one
The bundle is un-bundled in a fixed location on each host– The root of this location is controlled by a domain property
• domain.applicationBundlesPath
– Each unique application has a unique, non-predictable, location under the root
– Multiple invocations of the same application share the un-bundled application
code
The PEs are started as they were for previous versions
8 © 2015 IBM Corporation
What goes into an application bundle?
Many artifacts from the application directory are included by default– etc
– lib
– nl
– opt
– impl/bin
– impl/lib
– impl/nl
– impl/java/bin
– impl/java/lib
– Select parts of the output directory
Similar parts of each of the toolkit directories for the toolkits used by
the application– Relies on the compiler knowing what an application uses
9 © 2015 IBM Corporation
Can other artifacts be included in the bundle?
Yes! The info.xml format is extended to allow specification of other
artifacts– Uses a <sabFiles> element
• With <include> sub-element to include artifacts
• With <exclude> sub-element to exclude an element
– Format: <include path=”...” root=”...”/> <exclude path=”...”/>• Path is mandatory attribute and specifies a path RELATIVE to the root of the toolkit
Uses ant path syntax
• Root is optional and is one of: toolkitDir (the default)
applicationDir
outputDir
<sabFiles>
<include path="schemas/*"/>
<sabFiles>
10 © 2015 IBM Corporation
What is the purpose of the “root” attribute?
Tells the compiler the root path to prepend to the RELATIVE path
specified in the path attribute – toolkitDir: prepends the root path of the toolkit containing the toolkit.xml file
– applicationDir: prepends the application directory path regardless of which
toolkit's info.xml specifies this
– outputDir: prepends the output directory path regardless of which toolkit's
info.xml specifies this
This is a bit confusing...let's look at some use cases:
11 © 2015 IBM Corporation
Use case 1: applicationDir
I am developing an operator in some arbitrary toolkit that will be used
by some application
My operator needs to be given a schema file that my operator will
load at run-time. The user of my toolkit must create and supply the
schema file (I can't know the contents when I write my operator).
As part of the documentation of my operator I describe the syntax of
the schema file and specify that the files should be placed in the
schemas subdirectory of the application directory
As the operator developer I need a way of forcing any files in the
<appDir>/schemas directory into the application bundle
Key point: an arbitrary toolkit can force files into an application
bundle so the user of that toolkit doesn't have to
<sabFiles>
<include path="schemas/*" root=”applicationDir”/>
<sabFiles>
12 © 2015 IBM Corporation
Use case 2: toolkitDir
I am developing an operator in some arbitrary toolkit that will be used
by some application
My toolkit contains some files that are supplied with the toolkit and
will be read by my operator at runtime.
I want these files to live in the schema subdirectory in my toolkit
which is not one of subdirectories included in the bundle by default
Key point: Any toolkit can include files from its own directory
hierarchy
<sabFiles>
<include path="schemas/*" root=”toolkitDir/>
<sabFiles>
13 © 2015 IBM Corporation
How do I trouble-shoot application bundles?
If your application fails to run and the logs suggest that a file or
library is missing:
Use the spl-app-info utility to look into the bundle– Lists toolkits and files in the bundle
– Can be used to manually un-bundle an application• Needed to run gdb on standalone applications
– Gives the bundle's build ID and build date
– Tells you if the bundle was compiled for standalone or distributed execution
– Provides this information in machine-readable (XML) format
14 © 2015 IBM Corporation
The data directory
In previous versions of Streams the data directory was (by default)
part of the application directory– The compiler created it if it didn't already exist
In Version 4.0 the data directory is NOT part of the application bundle– Application bundles are shared across multiple executions of the same app
– The application should not write into the application directory• It is logically not accessible so anything written there is inaccessible from outside the
streams environment
• There is no file sharing between hosts
– The compiler no longer creates a data directory
If your application needs a data directory– You must create the data directory and populate it prior to submitting your
application to run
– You must specify where the data directory is• at compile time using –data-directory, or
• at submission-time using streamtool …. -C data-directory ….
15 © 2015 IBM Corporation
The data directory in a non-shared file system
There is only one data directory for an application
Operators that write to, or read from, the data directory must be
located on a host that has the data directory
Operators that share information via files in the data directory must
be co-located on the same host
16 © 2015 IBM Corporation
The data directory as CWD
Prior to Version 4.0 the CWD was changed to the data directory at
startup– Operators and other code could assume that the CWD was the data directory
when dealing with relative paths
In Version 4.0 the data directory is NOT the CWD– CWD is a temporary directory created by the Streams runtime
– Unique directory per application instance
– Root location controlled by domain.applicationCurrentWorkingPath domain
property
This means application migration is likely if:– Your operators support relative file names
– You have SPL code that uses the spl::file functions with relative file names
17 © 2015 IBM Corporation
Data files vs Configuration Files
Version 4.0 distinguishes between data and configuration files– Data files are processed by the application and would be unique to each
instance of an application• Are not part of the application bundle
• Expect to be found in (or relative to) the data directory
– Configuration files are common across all instances of a given application• Should be included in the application bundle
– Operators that support params that accept relative path names should
document how relative paths are treated
If the application developer must supply the configuration files– The operator should specify in what directory they expect to be found. For
example: etc or schema
– If the configuration files are in non-default locations, then the sabFiles section
of the info.xml file should specify they be included in the app bundle so the user
of the operator doesn't have do the inclusion
18 © 2015 IBM Corporation
Application Migration
All code SPL that assumes the CWD is the data directory MUST
change– SPL code that uses relative paths
• open(“myFile”); → open(dataDirectory() + “/myFile”);
New compiler intrinsic functions available to access toolkit and application
directories
– getThisToolkitDir: returns the path to the toolkit containing the SPL with this
function in it
• For example: open(getThisToolkitDir() + “/etc/myFile.cfg”);
– getApplicationDir: returns the path to the application directory
• For example: open(getApplicationDir() + “/etc/myFile.cfg”);
19 © 2015 IBM Corporation
Operator Migration
Operators that accept relative paths– param file : “myFile”
– Should document where relative paths are rooted (data dir, application dir)• For example:
The configFile param accepts a relative file name
Relative file names are rooted in the application directory
The config files are expected to be in the etc directory
• A correct relative param specification would be param configFile : “etc/myFile.cfg”;
• An alternative is to say Relative file names are rooted in the etc subdirectory of the application directory
• Then the param specification could look like: param configFile : “myFile.cfg”;
– Build an absolute file name by prepending the appropriate root path to the
given relative path
– APIs are available in C++ and Java to get the data directory, application
directory, and toolkit directory (of the toolkit containing the operator)
20 © 2015 IBM Corporation
Questions?