At-least-once Tuple Processing with Consistent Regions for IBM InfoSphere Streams V4.0

11
© 2015 IBM Corporation At-least-once tuple processing with consistent regions IBM InfoSphere Streams Version 4.0 Gabriela Jacques da Silva Research Staff Member [email protected]

Transcript of At-least-once Tuple Processing with Consistent Regions for IBM InfoSphere Streams V4.0

Page 1: At-least-once Tuple Processing with Consistent Regions for IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

At-least-once tuple processing with

consistent regions

IBM InfoSphere Streams Version 4.0

Gabriela Jacques da Silva

Research Staff Member

[email protected]

Page 2: At-least-once Tuple Processing with Consistent Regions for IBM InfoSphere Streams V4.0

2 © 2015 IBM Corporation

Important Disclaimer

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.

WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.

IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR

• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.

IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.

THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

Page 3: At-least-once Tuple Processing with Consistent Regions for IBM InfoSphere Streams V4.0

3 © 2015 IBM Corporation

Agenda

What are consistent regions?

Tuple processing guarantees

Demonstration

Stages of consistent state establishment

@consistent annotation

Standard toolkit support

Page 4: At-least-once Tuple Processing with Consistent Regions for IBM InfoSphere Streams V4.0

4 © 2015 IBM Corporation

Consistent regions enable topologies to checkpoint a

state consistent with fully processing a set of tuples

op1

op2

op3

time

op1

op2

op3

consistent

inconsistent

m1 m2 m3 m4

Page 5: At-least-once Tuple Processing with Consistent Regions for IBM InfoSphere Streams V4.0

5 © 2015 IBM Corporation

On failures, state of the topology is reset to a

consistent one and tuples are replayed

op1

op2

op3

time

op1

op2

op3

m1 m2 m3 m4m3

Page 6: At-least-once Tuple Processing with Consistent Regions for IBM InfoSphere Streams V4.0

6 © 2015 IBM Corporation

Use consistent regions when the application needs to

process every tuple at-least-once

Fine-grained selection of regions by using @consistent and @autonomous

@consistent

@autonomous

at-least-once*

may receive duplicates

on replay

at-most-once

*Exactly once if

• Operator can reset state and state

of external system

• Detect duplicate to avoid re-

processing

• Tuple processing is idempotent

Page 7: At-least-once Tuple Processing with Consistent Regions for IBM InfoSphere Streams V4.0

7 © 2015 IBM Corporation

Demo

1. Run without consistent regions – incomplete output on failures

2. Run with consistent regions – complete output on failures

Page 8: At-least-once Tuple Processing with Consistent Regions for IBM InfoSphere Streams V4.0

8 © 2015 IBM Corporation

Establishment of a consistent state has two stages,

while restoration has a single stage

Drain

All in-flight tuples are forced to be processed

Checkpoint Operator state (including state variables) is written to checkpoint backend

Checkpoint backend File system – levelDB

Redis Sharding

Replicas

Reset Operator state is read back from checkpoint backend

New StateHandler interface exposes stages to primitive operators

Page 9: At-least-once Tuple Processing with Consistent Regions for IBM InfoSphere Streams V4.0

9 © 2015 IBM Corporation

Configure a consistent region by parameterizing the

@consistent annotation

@consistent(

trigger={periodic|

operatorDriven},

period=3.0,

drainTimeout=30.0,

resetTimeout=30.0,

maxConsecutiveResetAttempts=5)

How to start the establishment of a consistent state?

How often?

When to timeout a drain?

When to timeout a reset?

How many reset retries?

Page 10: At-least-once Tuple Processing with Consistent Regions for IBM InfoSphere Streams V4.0

10 © 2015 IBM Corporation

Many Standard Toolkit operators have been adapted

Aggregate

Filter

Functor

Punctor

Join

Barrier

Beacon

CharacterTransform

Compress

Custom

Decompress

DeDuplicate

Delay

DynamicFilter

Format

Pair

Split

ThreadedSplit

Throttle

Union

DirectoryScan

FileSink

FileSource

MetricsSink

UDPSink

XMLParse

ReplayableStart

Page 11: At-least-once Tuple Processing with Consistent Regions for IBM InfoSphere Streams V4.0

11 © 2015 IBM Corporation

More details can be found at streamsdev and

InfoCenter

http://www-

01.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.dev.do

c/doc/consistentregions.html

https://developer.ibm.com/streamsdev/2015/02/20/processing-tuples-least-

infosphere-streams-consistent-regions/

https://developer.ibm.com/streamsdev/docs/setup-redis-replication-infosphere-

streams-4-0/

https://github.com/IBMStreams/samples/tree/master/ConsistentRegions/

Samples at $STREAMS_INSTALL/samples/spl/feature/ConsistentRegion