Telecommunications Event Data Analytics for IBM InfoSphere Streams V4.0

29
© 2015 IBM Corporation Telecommunications Event Data Analytics IBM InfoSphere Streams Version 4.0 Mark-Oliver Heger, Paul Zollna IBM Research & Development For questions about this presentation contact Mark-Oliver Heger [email protected] Paul Zollna [email protected]

Transcript of Telecommunications Event Data Analytics for IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Telecommunications Event Data Analytics

IBM InfoSphere Streams Version 4.0

Mark-Oliver Heger, Paul Zollna

IBM Research & Development

For questions about this presentation contact

Mark-Oliver Heger [email protected]

Paul Zollna [email protected]

2 © 2015 IBM Corporation

Important Disclaimer

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.

WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.

IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR

• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.

IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.

THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

3 © 2015 IBM Corporation

Agenda

Toolkit Overview

Project setup wizard installation

Sample application demonstration

Mediation use case demonstration (ASN.1 to CSV)

4 © 2015 IBM Corporation

High-Level Overview

The Telecommunications Event Data Analytics toolkit provides

a set of generic operators that are used in telecommunications applications

an application framework that enables you to setup new file-to-file applications

Connect to various input sources and downstream applications

Speed up custom implementations and reduce development & test efforts

Utility Functions and Operators

DB Loader *

File / Directory Operators

Application Framework

Lookup & Enrichment Data

Input Data Files Output Data Files

NFS, GPFS, HDFS

DB Loader

NFS, GPFS

Files with Call Detail Recordsin mounted directories

Priority Handling Queue

Data Source Adapters Target System Adapters

Parser:· ASN.1· Structure· CSV

File Writer

(S)FTP Operators *

Metadata checkpoints

* Release via GitHubOperator GUISetup WizardCheat Sheets

5 © 2015 IBM Corporation

High-Level Overview (2)

Applications are based on code templates

The applications support

Customization

Configurable parallel processing

Graceful application shutdown

Reliable file processing

Utility Functions and Operators

DB Loader *

File / Directory Operators

Application Framework

Lookup & Enrichment Data

Input Data Files Output Data Files

NFS, GPFS, HDFS

DB Loader

NFS, GPFS

Files with Call Detail Recordsin mounted directories

Priority Handling Queue

Data Source Adapters Target System Adapters

Parser:· ASN.1· Structure· CSV

File Writer

(S)FTP Operators *

Metadata checkpoints

* Release via GitHubOperator GUISetup WizardCheat Sheets

6 © 2015 IBM Corporation

Revenue Assurance and Business Intelligence applications

Location based services

Campaign management

User experience, user behavior & statistics

Network and services usage

Fraud detection

Utility Functions and Operators

DB Loader *

File / Directory Operators

Application Framework

Lookup & Enrichment Data

Input Data Files Output Data Files

NFS, GPFS, HDFS

DB Loader

NFS, GPFS

Files with Call Detail Recordsin mounted directories

Priority Handling Queue

Data Source Adapters Target System Adapters

Parser:· ASN.1· Structure· CSV

File Writer

(S)FTP Operators *

Metadata checkpoints

* Release via GitHubOperator GUISetup WizardCheat Sheets

Use cases

7 © 2015 IBM Corporation

Toolkit structure

Operators and functions

Parser operators Utility operators & functions

BloomFilter

ExceptionCatcher

ScheduledBeacon

createDirectory()

rename()

ASN1Parse

CSVParse

StructureParse

CSVParse

parses an input line with comma separated values and assigns the fields to output

tuple attributes

StructureParse

parses a binary data stream that contains fixed-length binary data structures, extracts

the specified data fields, and sends the fields as tuples to downstream operators

ASN1Parse

parses a binary data stream that contains ASN.1-encoded data, extracts parts of the

data, and sends the data as tuples to downstream operators

8 © 2015 IBM Corporation

Toolkit structure (2)

Operators and functions

Parser operators Utility operators & functions

BloomFilter

ExceptionCatcher

ScheduledBeacon

createDirectory()

rename()

ASN1Parse

CSVParse

StructureParse

ExceptionCatcher

catches exceptions from fused downstream operators and reports these exceptions

ScheduledBeacon

utility source that generates tuples at the configured time

BloomFilter

detects duplicate tuples in a memory efficient way

9 © 2015 IBM Corporation

Toolkit structure (3)

Operators and functions

Parser operators Utility operators & functions

BloomFilter

ExceptionCatcher

ScheduledBeacon

createDirectory()

rename()

ASN1Parse

CSVParse

StructureParseGenericParsers

File UtilityFunctions

19 SampleApplications

10 © 2015 IBM Corporation

Toolkit structure (4)

Application framework

Operators and functions

Multi-levelde-duplication

Multi-stageLookups

DemoApplication

ETL & CampaignManagement

DataIntegrity

MultiThreading

Setup

wizardConfigurable &

customizable

applications

Monitoring

GUI

Parser operators Utility operators & functions

BloomFilter

ExceptionCatcher

ScheduledBeacon

createDirectory()

rename()

ASN1Parse

CSVParse

StructureParseGenericParsers

File UtilityFunctions

19 SampleApplications

11 © 2015 IBM Corporation

Ingest, transform, enrich data records for downstream applications

Application Framework

12 © 2015 IBM Corporation

ITE application - File processing - ingest filenames

Directory

Scan

Chain

Split

File

Reader

Chain

Control

output

output/rejected

output/load

output/statistics

Reject

Writer

Statistic

Writer

input

input/archive

input/failed

Output

filesystem

Input

filesystem

Chain

Finalizer

Record

ValidatorTransform

Lookup/

EnrichFile

Writer

Lookup

Data

Shared

Memory

Filetype

Validator

Filename

Dedup

• Scans files in one or more directories

• Duplicate filenames are moved to “duplicates“ directory

Parallel file

processing

per channel

• Distributes “file-info“ tuples to the file processing channels

13 © 2015 IBM Corporation

ITE application - File processing – data streaming

Directory

Scan

Chain

Split

File

Reader

Chain

Control

output

output/rejected

output/load

output/statistics

Reject

Writer

Statistic

Writer

input

input/archive

input/failed

Output

filesystem

Input

filesystem

Chain

Finalizer

Record

ValidatorTransform

Lookup/

EnrichFile

Writer

Lookup

Data

Shared

Memory

Filetype

Validator

Filename

Dedup

• FileReader parses the file and generates data tuples for

enrichment and transformation

Parallel file

processing

per channel

• ChainControl ensures that one file is processed after another

14 © 2015 IBM Corporation

ITE application - File processing – closing

Directory

Scan

Chain

Split

File

Reader

Chain

Control

output

output/rejected

output/load

output/statistics

Reject

Writer

Statistic

Writer

input

input/archive

input/failed

Output

filesystem

Input

filesystem

Chain

Finalizer

Record

ValidatorTransform

Lookup/

EnrichFile

Writer

Lookup

Data

Shared

Memory

Filetype

Validator

Filename

Dedup

Parallel file

processing

per channel

• Depending on file processing result, the input file is moved to

archive or failed directory

• Statistic tuple is generated when processing is completed

• File statistics are written to file

15 © 2015 IBM Corporation

ITE application - File processing

Directory

Scan

Chain

Split

File

Reader

Chain

Control

output

output/rejected

output/load

output/statistics

Reject

Writer

Statistic

Writer

input

input/archive

input/failed

Output

filesystem

Input

filesystem

Chain

Finalizer

Record

ValidatorTransform

Lookup/

EnrichFile

Writer

Lookup

Data

Shared

Memory

Filetype

Validator

Filename

Dedup

Parallel file

processing

per channel

16 © 2015 IBM Corporation

Shared memory on multiple hostsHost A Host B Host n

common::LookupManagerMain

demoapp::ITEMain

sample::ITEMain

Streams Job

Shared

Memory

Shared

Memory

Telecommunications applications

require very high throughput (millions

of records/second) and reference-

data lookup functionality

The application framework provides

shared memory functionality for

common, cross-server lookup tables

that can be used by multiple jobs –

efficiently and cost-effectively.

Streams PE Streams PE (SHM READ)

Streams PE (SHM WRITE)

...

... ...

... ...

...

...

...

Scalability – share lookup data across hosts

17 © 2015 IBM Corporation

The user can configure one of three storage types:

tableFile - one input file can result in many output files prepared to be loaded into a

database by the DBLoader application (available on GitHub)

recordFile - one input file results in one output file (simple mediation use case)

custom - user can plugin customized sink logic

File Writer configurations

18 © 2015 IBM Corporation

Proven, configurable, ready-to-use

framework for high-performance file

processing applications

Facilitates implementation of customer-

specific usecases processing telco

network data

Value-added operators and functions on

top of Streams standard, e.g. Lookup

Manager using shared memory

Setup Wizard (Eclipse plugin and script)

Application Framework - Summary

19 © 2015 IBM Corporation

Setup

Create projects using Streams Studio Wizard or using a command line tool

Configure

• Configure the application using configuration files to enable or disable features and to definelookup stores for data enrichment

Customize

Add custom SPL code to template operators

Add operators to template composites

Implement your business logic

Workflow to develop applications

20 © 2015 IBM Corporation

Providing the

operator

with a display of

health status, metrics

and statistics at a

glance

Demo Java

application

Data access via REST

Can be extended in

projects

Monitoring GUI

21 © 2015 IBM Corporation

Live demo - begin

Preparation steps for the toolkit

Create applications based on the framework LookupManager

ITE

Show customizing of Lookup Manager application

Show file processing of sample files

Show monitoring of applications

24 © 2015 IBM Corporation

No grouping used

- Reads and parsesinput files

- Business logic: enrich, transform tuples

Writes output files

Scans one or more directories for input files

Each chain processes one file after another.The more chains are configured the more files can be processed in parallel

Example use case:

Files are converted from

ASN.1 format to CSV format

No logic across file

boundaries required

ITE application - variant A

25 © 2015 IBM Corporation

Every single data entity (tuple) determines the group

Example use case:

Aggregate on transformed tuples across files per group (Campaign Management)

Each group represents a range of MSISDN numbers

ITE application - variant B

26 © 2015 IBM Corporation

The filename determines the group

Example use case:

Aggregate on transformed tuples across files per group (Campaign Management)

Each group represents a network element ID. Identifiers are part of the filename.

ITE application - variant C

27 © 2015 IBM Corporation

LookupMgrCustomizing.xml sample

<Application ApplicationNamespace="ite.workshop">

<CommandMappings>

<CommandMapping LookupCommand="init">

<SegmentName>DimMaster1</SegmentName>

<SegmentName>DimMaster2</SegmentName>

</CommandMapping>

<CommandMapping LookupCommand="update">

<SegmentName>DimMaster1</SegmentName>

<SegmentName>DimMaster2</SegmentName>

</CommandMapping>

</CommandMappings>

Application name

defined by namespace

Supported command type

LookupMgrCustomizing.xml

Repository segments

32 © 2015 IBM Corporation

Mediation use case demonstration (ASN.1 to CSV)

Customize the File Reader of the ITE application

ITE

ASN.1 input files CSV output files

33 © 2015 IBM Corporation

Statistics

Control

IngestFiles

Context

ChainDirScan

FileType Validator

ApplCtrl Scheduler

LogWriter

Dedup

Filename Dedup

ChainProcessorReader

ChainSink

ChainControl

ChainProcessorTransformer

PreFile Reader

RejectFileWriter

File Writer

Validator

Business Logic / Transform / EnrichTuple Group Split

Taps

Post Transformer

Tap

PostContext Processor

Tap

Chain Finalizer

(Files Mover)

Chain Split

File GroupSplit

Context Custom

FileReaderFileReader

Converter

ContextRestore Writer

PostContext Processor

Checkpoint Control

Legend Custom optionalCustomCommon Common or Custom Variant CVariant B

ITE application

ChainProcessorReader

ChainControl

PreFile Reader

ValidatorFileReaderFileReader

FileReaderASN1

34 © 2015 IBM Corporation

Live demo - begin

Create ITE application based on the framework

Configure ITE application project

Prepare the schema for the ASN.1 parser

Customize the File Reader

Build and launch application

Review output files

42 © 2015 IBM Corporation

Questions?