An Overview of IBM InfoSphere Streams V4.0

19
© 2015 IBM Corporation InfoSphere Streams V4 update Mike Spicer STSM, Lead Architect InfoSphere Streams For questions about this presentation contact Mike Spicer via [email protected]

Transcript of An Overview of IBM InfoSphere Streams V4.0

Page 1: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

InfoSphere Streams V4 update

Mike SpicerSTSM, Lead Architect InfoSphere StreamsFor questions about this presentation contact Mike Spicer via [email protected]

Page 2: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Important Disclaimer

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.

WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.

IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR

• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.

IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.

THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

Page 3: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Streams V4 Update

A Major Release

–Next generation Architecture

–Automated System High Availability

–Application Resiliency

–Streams for Excel

–Toolkit Enhancements

Released March 2015

3

Page 4: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Automated System High Availability

“Without specialized HA skills, an administrator can

quickly and easily configure Streams to be resilient

and use a single console to manage multiple

instances with common users and hosts.”

New next generation architecture– Simpler Setup & Administration

– More Secure

– More Resilient

– More Automatic

– More Dynamic

– New JMX API

4

Page 5: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Automated System High Availability

Simpler Setup & Administration– Reduce dependencies (Shared FS, DB2, SSH) and support versioning– Multi instance management with new Domain concept, single console, and interactive streamtool– Simpler Management Service Configuration, fully automatic or controlled using tags

Comprehensive Monitoring and Management API– JMX api provides secure interface for full programmatic management and monitoring of Streams

More Secure– Removed dependency on SSH and OS users– Authentication and authorization checks for all api and tooling requests– Improved LDAP support (Microsoft Active Directory and multi-part lookup) – Improved security model (Roles and Job Groups)– Improved Audit Log support

More Resilient– Recovery always on (Zookeeper), support redundant services to remove single point of failure

More Automatic– Automatic failover & restart of services, automatic recovery of host failures– Automatic notification of system changes and service relocation for resource changes

More Dynamic– Support more dynamic configuration changes

Page 6: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Moving beyond an Instance centric model

Tooling

Domain Metadata Catalog

Instance Services

Host Controller

PEC

PEC

Host

Instance

Domain

Host Controller

PEC

PEC

Host

Instance Services

Host Controller

PEC

PEC

Host

Instance

Host Controller

PEC

PEC

Host

Instance Metadata Catalog Instance Metadata Catalog

Domain Services

New Streams Domain

Page 7: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

New Streams Domain

A container for instances which provides a single point for configuring and managing common resources, security and instances

– A domain can contain 0 or more instances

– A single management console for a domain and all of its instances

The domain is responsible for the following:

– Configuration : Global configuration for the Streams domain and defaults

for new instances.

– Instance management : Allow users to configure and manage instances.

– Resources : Allow users to configure and manage the host resources

available for instances in the domain.

– Security : Users are configured and managed by the domain. The domain

is responsible for authenticating users and checking that they are

authorized to perform actions against the domain and instances.

– Public API : Provide JMX and REST apis to manage and monitor the

domain and instances.

Page 8: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Service A(leader)(standby)

Service A

Scenario 1: Management Host Failure

Services are running with a HA Count of 3

A Host failure is detected

If a Service on the Host was the leader, a standby takes over

A replacement service is started

Another Host becomes available and is tagged for management services

The Services are load balanced across the management hosts

Resource A Resource B Resource C

Service A Service A

“Management” “Management” “Management”

(standby) (standby)(leader)

Service A(standby)

Resource D“Management”

Page 9: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Scenario 2: Application Host Failure

An Applications PEs are running across several Hosts

A Host failure is detected

PEs are started on alternative application Hosts

Streams are reconnected

Resource A Resource B Resource C“Application” “Application” “Application”

Source

Source

Sink 1

Sink 2

Op 2

Op 1 Op 1

Page 10: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

All New Admin Console

A single console for a domain

A summary of system health is always visible

Dashboard widgets can be flipped for graphical & textual views

Tree based view similar to Streams Studio system Explorer

Context based actions

Page 11: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

“With a simple annotation and HA compliant operators, a

developer can guarantee all data is processed.”

Consistent State – A point in time where all tuples for all

streams in a consistent region have been fully processed

by the operators in the consistent region.

11

op1

op2

op3

Application Resiliency

Page 12: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Region is

Consistent

Source initiates consistency with the Controller

Source drains processing and checkpoints state

Operators in the region drain processing and checkpoint state

Controller confirms a consistent state has been established

Processing resumes

12

Source SinkOp 1

Op 2

Controller

Iniate

Consistent

State

Establishing A Consistent State

Page 13: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Region is

Consistent

The controller detects a failure and Initiates a reset

Source resets state from the last consistent state checkpoint

Failed PEs are restarted and Streams are reconnected

Operators in the region reset state from the last consistent state checkpoint

Controller confirms recovering to a consistent state

Processing resumes with the source replaying tuples since the last consistent state

13

Source SinkOp 1

Op 2

Controller

Reset

Region

Recovering To A Consistent State On Failure

Page 14: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Streams guarantees that a consistent region will process all data at

least once– Operators at the start of a consistent region must be able to replay data

• Can be achieved using a new replay operator which we provide

Exactly once semantics can be achieved when all operators in a

consistent region have at least one of the following characteristics:– Can reset their state and the state of any external system they interact with

to the last consistent state on a reset marker

– Can detect duplicates tuples being replayed since the last consistent state

and do not process them again

– Are idempotent (tuples can be processed multiple times without changing

the result beyond the initial processing of the tuple)

14

Consistent Region Application Semantics

Page 15: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

IBM InfoSphere Streams for Microsoft Excel

“An Excel user can quickly and easily identify and access

streaming data, to enable analysis and visualization on

continually updating data with the full power of Excel”

Page 16: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Streams for Excel

An Excel add-in using Excel Real Time Data (RTD)

A Stream is made available to Excel using a simple annotation@view(name = "VMStatData", port = Stats, sampleSize = 50, bufferSize = 100,

description = “Memory related statistics", activateOption = automatic)

Streams for Excel shows the Streams the user has authorization for

–Name, description, attributes and properties of the Stream

–Search and Favorites to locate streams of interest

Streams are dragged onto spreadsheet

–Entire Stream or individual attributes

–Data is continually updated, and can be paused

Full Excel functionality on the data

–Charts, Formulas, Cut & Paste

Spreadsheets can be saved and sent to others

–Stream data will continue when reconnect to Streams

Page 17: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Toolkit Enhancements

Timeseries Toolkit

– New Operators & Functions• AnomalyDetector: Online detection of anomalous patterns

• KmeansClustering: Builds a K-Means cluster model and scores incoming data against it

• DSPFilterFinite: filters a “fixed-length” time series

• CrossCorrelateMulti: Cross correlates more than two time series simultaneously

• Distance functions: Calculate the distance between two time series (DTW, LCSS & LpNorm)

– Improved Operators• CrossCorrelate2, DWT2, VAR2, ReSample, FFT, RLSFilter

GeoSpatial Toolkit

– GeoFence – returns set of polygons (fenced areas) that contain a location

– Hangout – determines if an entity is “hanging out”

– SpatialGridIndex – objects in the index within the given radius of the point

– SpatialSplit – route tuples based on location (similar to Split operator)

17

Page 18: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Open Source Streams Toolkits on GitHub

IBMStreams repository on GitHub

–https://github.com/IBMStreams

Over 25 Toolkits as well as samples, benchmarks and demos

–MongoDB, HBase, Kafka, Thrift, JSON, Parquet

–Streams YARN Resource Manager

The Messaging, iNet & HDFS Toolkits available on GitHub

–https://github.com/IBMStreams/streamsx.messaging• New support for Kafka & improvements to JMS & MQTT

–https://github.com/IBMStreams/streamsx.inet• New & extended inet operators

–https://github.com/IBMStreams/streamsx.hdfs• Added support for compressed binary files

18

Page 19: An Overview of IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

InfoSphere Streams V4 & Roadmap

Mike Spicer - STSM, Lead Architect InfoSphere Streams