Data integraMon with IBM InfoSphere Streams - [email protected]

14
Data integra*on with IBM InfoSphere Streams Sco7 Schneider Gabriela Jacques da Silva {sco7.a.s,g.jacques}@us.ibm.com IBM Research

Transcript of Data integraMon with IBM InfoSphere Streams - [email protected]

Data  integra*on  with  IBM  InfoSphere  Streams  

Sco7  Schneider  Gabriela  Jacques  da  Silva  

{sco7.a.s,g.jacques}@us.ibm.com  IBM  Research  

2  

Streams  Programming  Model  

•  Streams  applica*ons  are  data  flow  graphs  that  consist  of:    –  Tuples:  structured  data  item  – Operators:  reusable  stream  analy*cs  –  Streams:  series  of  tuples  with  a  fixed  type  –  Processing  Elements:  operator  groups  in  execu*on  

3

x86 host x86 host x86 host x86 host x86 host

PE   PE  

Sink  

Source  

Source   PE  

PE  

PE   PE  

Sink  

Sink  

Streams  Run*me  

(Job  management,  Security,  

Con*nuous  Resource    Management)  

Source  è  Compila*on  è  Execu*on  

3  

SPL  source    

PE  

PE  

PE  

PE  

PE  

PE  

PE  

PE  

Conn

ec*o

ns  

Source  

Sink  

PE  

SPL  compiler  

Applica*on  Example  composite Main { !type! Entry = int32 uid, rstring server, ! rstring msg; ! Sum = uint32 uid, int32 total; !graph ! stream<Entry> Msgs = ParSource() { ! param servers: "logs.*.com"; ! partitionBy: server; ! } !! stream<Sum> Sums = Aggregate(Msgs) { ! window Msgs: tumbling, time(5), ! partitioned; ! param partitionBy: uid; ! } !! stream<Sum> Suspects = Filter(Sums) { ! param filter: total > 100; ! } !! () as Sink = FileSink(Suspects) { ! param file: "suspects.csv"; ! }!} !

4  

ParSrc

Aggr

Filter

Sink

ParSrc

Aggr

Filter

ParSrc

Aggr

Filter

Sink

ParSrc

Aggr

Filter

Data  Formats  are  Fundamental  type ! Trade = decimal64 price, decimal64 volume; ! Quote = decimal64 briprice, decimal64 askprice, ! decimal64 asksize; ! TradesAndQuotes = Trade, Quote, ! tuple<rstring ticker, rstring dayAndTime> !

type ! User = rstring name, uint64 id, rstring screen_name; ! Twitter = User user, rstring text, rstring retweet_count; !

type ! CDR = rstring calling, rstring called, ! timestamp startTime, timestamp connectTime; !

type ! Video = blob video; !

Social  Media  

Avid  Movie  Goer  Buzz  During  Super  Bowl  

Project X

John Carter

Battleship

Ghost Rider

21 Jump Street

The Dark Knight Rises

G.I. Joe

Spider-man

The Avengers

Act of Valor

The Lorax The Dictator

Social  Media  Analy*cs  Architecture  

Social Media Consumer Profiles

Customer Models

InfoSphere Streams

InfoSphere BigInsights

Entity Integration

Predictive Analytics

Data Ingest & prep.

Text Analytics: Timely Insights

Entity Integration:

Profile Resolution

Predictive Analytics:

Action Determination

Social Media Data

Online  Flow:  Data-­‐in-­‐mo0on  analysis  

Text Analytics

Offline  Flow:  Data-­‐at-­‐rest  analysis  

Timely Decisions

Social Media Data

Customer Database

Consumer Lists

Customer & Prospect

profiles

Entity Integration

Medical  

10  

"Data  Baby"  

Real  Neonatal  ICU  

InfoSphere  Streams  and  Excel  Medical  BedMasterEX  integra*on  

Automa*c  integra*on  

Analy*c  composi*on  via  planning  

14  

IDE  

Developers