Stream Computing: A New Paradigm To Gain Insight and Value

19
© 2008 IBM Corporation 06/24/22 IBM T. J. Watson Research Center Stream Computing: A New Paradigm To Gain Insight and Value Nagui Halim System S Team

description

Stream Computing: A New Paradigm To Gain Insight and Value. Nagui Halim System S Team. News/weather Text,data feeds. Market feeds. System S is a high performance computing platform designed to host a new class of stream analytic applications - PowerPoint PPT Presentation

Transcript of Stream Computing: A New Paradigm To Gain Insight and Value

Page 1: Stream Computing: A New Paradigm To Gain Insight and Value

© 2008 IBM Corporation04/22/23

IBM T. J. Watson Research Center

Stream Computing: A New Paradigm To Gain Insight and Value

Nagui Halim

System S Team

Page 3: Stream Computing: A New Paradigm To Gain Insight and Value

System S at IBM

© 2008 IBM Corporation – All Rights Reserved 3

04/22/23

System S is a high performance computing platform designed to host a new class of stream analytic applications

Designed for high ingest volumes and to adapt to changing data, needs, and capability

System S is an operational prototype, with a stable core that serves as the base for pilots and for systems and stream computing research

Page 4: Stream Computing: A New Paradigm To Gain Insight and Value

4 Making Sense of the Clutter | System S © 2008 IBM Corporation

InputConnectors

OutputConnectors

High performance scalability infrastructureHigh Volume, Structured & Unstructured Streaming Data Sources

Result DataDelivery / Visualization

continuous processing of streaming data

SchedulerJob

Manager

Workflow Development Tooling

IDE WorkflowAssembly

Data Source Management

Heterogeneous, Multi-scaleand/or Commodity Hardware

Component Repository

ComponentGeneration

System S Functional Overview

ImageAudio, voice, VoIP

Video, TV, financial newsRadio, police scanners Web traffic, email, chat,

GPS dataFinancial transaction data,

Satellite dataSensors, badge swipes, …

Secure, Privacy PreservingUsing Certified Downgraders

Page 5: Stream Computing: A New Paradigm To Gain Insight and Value

5 Making Sense of the Clutter | System S © 2008 IBM Corporation

System S Analytic Processing Building BlocksClassifiers, Annotators, Correlators, Filters, Aggregators

Correlate Transform

Annotator

Classifier

Filter

IBM_USER
HCMA
Page 6: Stream Computing: A New Paradigm To Gain Insight and Value

6 Making Sense of the Clutter | System S © 2008 IBM Corporation

X86 Box

X86 Blade

CellBlade

X86 Blade

FPGABlade

X86 Blade

X86 Blade

X86Blade

X86 Blade

X86Blade

System S Runtime Services

Transport System S Data Fabric

Processing Element

Container

Processing Element

Container

Processing Element

Container

Processing Element

Container

Processing Element

Container

Optimizing scheduler assigns operators to processing nodes, and continually manages resource allocation

Runs on commodity hardware – from single node to blade centers to high performance multi-rack clusters

IBM_USER
Jim
Page 7: Stream Computing: A New Paradigm To Gain Insight and Value

7 Making Sense of the Clutter | System S © 2008 IBM Corporation

X86 Box

X86 Blade

CellBlade

Blue Gene

FPGABlade

X86 Blade

X86 Blade

X86Blade

X86 Blade

X86Blade

Transport System S Data Fabric

System S Runtime Services

Processing Element

Container

Processing Element

Container

Processing Element

Container

Processing Element

Container

Processing Element

Container

Adapts to changes in resources, workload, data rates

Capable of exploiting specialized hardware

Runs on commodity hardware – from single node to blade centers to high performance multi-rack clusters

Optimizing scheduler assigns operators to processing nodes, and continually manages resource allocation

Page 8: Stream Computing: A New Paradigm To Gain Insight and Value

8 System S © 2007 IBM Corporation

Overview, Beacon Institute for Rivers and Estuaries

Nonprofit organization, based in Beacon, NY

Patterned after Woods Hole Oceanographic Institute

Formed 2000 by Gov. Pataki

Mission: “To create a global center for interdisciplinary research, policy-making and education regarding rivers, estuaries and their connection with society.”

$30M capital, additional $12M this year + program funds

90% NY State funding

Balance NSF and private donors

Page 9: Stream Computing: A New Paradigm To Gain Insight and Value

9 System S © 2007 IBM Corporation

Evolution/locations Troy Office Research

Beacon HQ Harbor (pier for

research vessel) Multi-use building Research Center

Palisades Columbia’s Lamont

Doherty Earth Observatory

Manhattan Research pier

Center for Advanced Environmental Technology (40,000 ft2 )

Page 10: Stream Computing: A New Paradigm To Gain Insight and Value

10 System S © 2007 IBM Corporation

Core: An advanced sensor-based environment

Autonomous Microbial Genosensor

Solar-powered Autonomous Underwater Vehicle

Conductivity

Temperature

Turbidity

pH/ORP

Chlorophyll

Sontek-YSI Array

Open and scalable network Bearer network agnostic

Heterogeneous Physical Chemical Biological Radiological

Multiple deployment platforms Fixed Mobile

End-to-end middleware Device management Security

Page 11: Stream Computing: A New Paradigm To Gain Insight and Value

11

S&D/Research

2007 FOAK Program IBM Confidential © 2007 IBM Corporation

FSS Industry Point of View

The financial markets industry is growing quickly while experiencing rapid electronification and automation.

Speed and transparency will increase dramatically. To survive firms will specialize, and compete based on technology.

Sources: IBM Institute for Business Value “FM2015 – The Trader is Dead, Long Live the Trader”; IBM / EIU Macro Model, 2007; SIAC, OPRA, and NASDAQ courtesy the TABB Group

0

500

1,000

1,500

2,000

2,500

3,000

3,500

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 20040

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

Average Daily Trade Volume (millions)(10 year CAGR 19%)

Security brokers and services personnel ('000)(10 year CAGR 3%)

Average Daily Trade Volumes vs. Headcount, 1994-2004 (Millions of Shares; Number of Employees (‘000), Volume per Employee)

Volume of daily shares traded per employee (10 year CAGR 15%)

0

500

1,000

1,500

2,000

2,500

3,000

3,500

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 20040

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

Average Daily Trade Volume (millions)(10 year CAGR 19%)

Security brokers and services personnel ('000)(10 year CAGR 3%)

Average Daily Trade Volumes vs. Headcount, 1994-2004 (Millions of Shares; Number of Employees (‘000), Volume per Employee)

Volume of daily shares traded per employee (10 year CAGR 15%)

Financial System Depth, 1995-2025(Investable Assets, $ Trillions)

Currency / Deposits Size Securities Size

Securities % of GDP 219%

0

100

200

300

400

500

600

700

800

1995 2005 2015 2025

305% 393% 523%

Securities CAGR

N/A 8% 9% 9%

Deposits CAGR

N/A 5% 7% 6%

Financial System Depth, 1995-2025(Investable Assets, $ Trillions)

Currency / Deposits Size Securities Size

Securities % of GDP 219%

0

100

200

300

400

500

600

700

800

1995 2005 2015 2025

305% 393% 523%

Securities CAGR

N/A 8% 9% 9%

Deposits CAGR

N/A 5% 7% 6%

Page 12: Stream Computing: A New Paradigm To Gain Insight and Value

IBM Research

© 2008 IBM Corporation12 04/22/23

StreamSight Visualization – Market Making

Page 13: Stream Computing: A New Paradigm To Gain Insight and Value

IBM Research

© 2008 IBM Corporation13 04/22/23

StreamSight Visualization – Market Making Full Scale

Page 14: Stream Computing: A New Paradigm To Gain Insight and Value

IBM Research

© 2008 IBM Corporation – All Rights Reserved 14

04/22/23

M1 Rs Processing Pipeline: Wafer Processing

Raw Wafer Processed

Wafer

SiCOH1,1

SiCOH6,6

Anneal1 CMP1,1

SiCOH1,2 Anneal2

Anneal4

CMP1,2

CMP13,2

Photoresist and Etch to create structures

Deposit Metal (Cu) in structure

Use Chemical and MechanicalPolishing to planarize surface

Page 15: Stream Computing: A New Paradigm To Gain Insight and Value

IBM Research

© 2008 IBM Corporation – All Rights Reserved 15

04/22/23

m1 Rs Processing Pipeline: Instrumentation

Process Data

Raw Wafer Processed

Wafer

SiCOH1,1

SiCOH6,6

Anneal1 CMP1,1

Defect Data

Test Data

Oxide ThicknessRefractive IndexAnneal Duration

Pad HrsDresser Hrs

Slurry Compos.

SiCOH1,2 Anneal2

Anneal4

CMP1,2

CMP13,2

m1 Rs valueYield

FDC Summary Statistics

Dat

a A

vaila

bilit

y Ti

me

Data Warehouse

Other DataEvent, sensor, alarm, tool log, control job, process job

Trace Data

Statistical Process Control (SPC) Identify tool/product drift and automatically shut down recipe/tool

Fault Detection and Classification (FDC) Multivariate monitoring for real-time process fault detection and classification

Advanced Process Control (APC) Feedback and feed forward controls to compensate for variations in incoming material and prior level processing

Page 16: Stream Computing: A New Paradigm To Gain Insight and Value

IBM Research

© 2008 IBM Corporation – All Rights Reserved 16

04/22/23

Two-Class Decision Tree

Predicted True Label Bad-OK Good-Exc Accuracy

Bad-OK 52 6 89.65%

Good-Exc 7 51 87.9%

Built Decision Tree

Confusion Matrix

Bad-OK Good-Exc

Bad-OK Good-Exc

Bad-OK

Bad-OK Good-Exc

Good-ExcBad-OK

Y N

90% prediction accuracy

Prediction accuracy with tool based operating thresholds ~10%

Sensitivity varies across FDC values

Page 17: Stream Computing: A New Paradigm To Gain Insight and Value

17

IBM Research

IBM Confidential © 2007 IBM Corporation

h

Century

ClientClient

DeviceAdapter

DeviceAgent

SensorData

SensorData

SensorData

SODA

EventPreprocessor

DBAgent

Administrator Portal

PatientRegistration

Service

Patient Info

Interoperability Container(HIE Adapter)

PERepository

CENTURYCENTURY ServerServerApplicationRegistration

Service

ApplicationInfo

SourcePE

SourcePE

SourcePE

Analysis Jobs

Angina Pectoris

alert

alert

QRS

FA

RR

SP

BP

AR

AP

PT

SPA

BPA

EP

alertWBWT WTAWell-Being

O33 O45

O51

O11

O23

O13

O25 O24 O30

O40O42

O70 O71

O95

I10

I8 I9

I21

I41I45

I2

I50I52

I56

I49

I79I89I80 I83

I96I97

I67I15

External DataAccess Manager

EMR DataPlug-in

OtherPlug-ins

Registration Systen

RegSPE

EnrollmentTrigger

DeMux

Filter

Filter

Filter

System S SPC

Analysis Framework

Stream ElementEngine

DataProvenance

Manager

TVC Accessor

Data Provenance

Query Manager

Process Provenance

Query ManagerDynamicProvenance

Storage Manager

TVC Rule

DynamicProvenance

StreamElement

Provenizer Provenance Server

APP

APP

APP

APP

GUI

GUI

GUI

GUIEvent StoreQuery Service

ProvenanceQuery

Service

SubscriptionServiceSubscription

Data

ProvenanceCache

Delivery System

WAS

EventDelivery

SinkPE

SinkPE

SinkPE

Event StorageManager

Remote Access

ManagerEventStore

Event Management Service

QoI DataQoI

Manager

DBAgentDB

AgentDBAgent SDO2SE

SDO2SESDO2SE

JDL from IDE

QoI Management

Page 18: Stream Computing: A New Paradigm To Gain Insight and Value

18 Market Insights & Business Development

IBM Research

© 2008 IBM CorporationIBM CONFIDENTIAL

Solution positioning based on processing needs ( indicative positioning)

Event complexity (diversity)

Analytics complexity (event correlation

and pattern matching)

Decision latency

Human decisionsAutomated decisions

Predictive processing

(pattern matching and inferencing)

Segment 1: exception detection

Segment 2: operational monitoring

Segment 3: high performance processing

Segment 4: adaptive BPM

structured unstructured

ms s m h

Real-time information

delivery

Automated trading

Telecom billing

Industrial process control

Lease management

systemRetail inventory

optimization

Battlespace command &

control

Call center monitoring (cross sale)

Salesforce enablement

Baggage handling

Clickstream analysisReal-time game

monitoring

Cross-sales

Health monitoring

Artwork safety

Early warning system for energy trading

Health records screening

Fraud detection & prevention

Telecom network security

Geospatial tracking

Multisource monitoring

Capital market surveillance

Database monitoring

Card fraud detection & prevention

Astrophysical data mining

Asset trackingTelco QoS & SLA

monitoring

Trade desk

monitoring

Liquidity management

system

Retail goods receipt

Location based services

Risk management in energy trading

Manufacturing process control

Shop floor monitoring

Risk analytics platform

Online hotel booking

Call center monitoring (quality)

Sensor based water mgtt

Page 19: Stream Computing: A New Paradigm To Gain Insight and Value

19 Market Insights & Business Development

IBM Research

© 2008 IBM CorporationIBM CONFIDENTIAL

Vision

• Stream Computing is a new computing paradigm that opens up entirely new ways of conducting science and business

• System S is a prototype platform that enables new insights to be gained from large volumes of complex data with sophisticated on the fly analysis

• The new insights can drive value to organizations by giving them more accurate answers more quickly

• System S is one element of an overall solution framework that will include other elements such as databases, messaging, and modeling

• The quantities and types of data that organizations can take advantage of will increase by orders of magnitude over time; new computational paradigms are necessary to drive new value from this information