Introduction to WSO2 Analytics Platform

Post on 28-Jul-2015

661 views 0 download

Tags:

Transcript of Introduction to WSO2 Analytics Platform

Introduction to WSO2 Analytics Platform

Srinath PereraVP Research

WSO2 Inc.

Analytics is Growing Up▪It is no longer about doing your first analytics usecase.

▪It is about ▪How to do it everyday, efficiently?

▪How to recover?▪How to make decisions?

▪How to do other forms like real-time , Interactive, and predicative analytics

Analytics 2.0 Platform▪One platform for all four forms of analytics

▪Single consistent programming model

▪One analytics archive format)

▪Support for the lifecycle of analytics Apps

Integrate well with rest of the enterprise!!

Collect Data

▪One Sensor API to publish events - REST, Thrift, JMS,

Kafka- Java clients, java

script clients*▪First you define streams (think it as a infinite table in SQL DB)

▪Then send events via Sensor APICan send to batch pipeline, Realtime pipeline or both via

configuration!

Collecting Data: Example

Java example: create and send events Events send asynchronously See client given in http://goo.gl/vIJzqc for more info

Agent agent = new Agent(agentConfiguration);publisher = new AsyncDataPublisher("tcp://hostname:7612", .. );

StreamDefinition definition = new StreamDefinition(STREAM_NAME,VERSION);definition.addPayloadData("sid", STRING);... publisher.addStreamDefinition(definition);... Event event = new Event();event.setPayloadData(eventData);publisher.publish(STREAM_NAME, VERSION, event); Send events

Define Stream

Initialize Agent

Analysis: Batch Analytics

Complex Event Processing

Analytics logic with SQL like Queries

▪Both BAM and CEP provides a SQL like data processing language

▪Since many understands SQL, above languages made large scale data processing Big Data accessible to many

▪Expressive, short, and sweet.

▪Define core operations that covers 90% of problems

▪Lets experts dig in when they like! (via User Defined functions)

Scaling CEP Queries on top of Storm

▪Accepts CEP queries with hints about how to partition streams

▪Partition streams, build a Apache Storm topology running CEP nodes as Storm Sprouts, and run it. (see http://goo.gl/pP3kdX )

Predictive Analytics

▪Predictive Analytics learns a decision function (a model) using examples

▪Is this fraud?▪How to drive?▪Handwritten text

▪Build models and use them with WSO2 CEP, BAM and ESB using WSO2 Machine Learner Product ( 2015 Q3)

▪Build model using R, export them as PMML, and use within WSO2 CEP

WSO2 Machine Learner▪A wizard to sample, explore, and understand data through visualizations

▪A wizard to configure, train machine learning models, and select the best model

▪Find and use those models with WSO2 CEP, BAM and ESB

▪Powered by Apache Spark MLLib

Communicate: Dashboards

▪Idea is to give a “Overall idea” in a glance (e.g. car dashboard)

▪Support for personalization, you can build your own dashboard.

▪Also the entry point for Drill down▪How to build?

- Dashboard via Google Gadget and content via HTML5 + java scripts

- Use charting libraries like Vega or D3

Communicate: Alerts▪Detecting conditions can be done via CEP Queries

▪Key is the “Last Mile”- Email- SMS- Push notifications to

a UI- Pager - Trigger physical

Alarm

▪How?- Select Email sender “Output Adaptor” from CEP, or

send from CEP to ESB, and ESB has lot of connectors

Communicate: APIs▪With mobile Apps, most data are exposed and shared as APIs (REST/Json ) to end users.

▪Need to expose analytics results as API

▪Following are some challenges - Security and

Permissions- API Discovery - Billing, throttling,

quotas & SLA

▪How?- Write data to a database from CEP event tables- Build Services via WSO2 Data Service - Expose them as APIs via API Manager

Event Stream Store▪One stop place for all event stream definitions

▪Let users ▪ Publish and

consume though Multiple protocols like REST, JMS, Thrift, Web Sockets etc.

▪ Discover event streams

▪ Enforce security and authorization

▪ Per-pay subscriptions

▪ Effectively a Event Stream Market Place!!

▪This will automate APIs creation as discussed in the slide before.

What is it good for?

▪Batch Analytics▪Realtime Streaming analytics

▪Realtime Interactive analytics

▪Lambda Architecture ▪Train and use a ML model

▪Selective Detailed Analysis

http://tinybuddha.com/blog/a-simple-technique-to-solve-problems-before-they-get-bigger

/

Selective Detailed Analysis

• Too expensive to do detailed analysis on all the data

• Instead detect the condition, and dig into related data

• Fraud toolbox • Other usecases– Dynamic offers at

Retail Site– Weather

Lambda Architecture

• Same code in both batch and realtime layers

• Idea is to fill the time between two batch runs

• Batch layer writes the data to a DB• Realtime layer merge with batch data via

Event Tables

Real Life Use Cases▪Health, Smart Parking solutions

▪Financial Monitoring ▪Smart City project, Vehicle tracking, Building monitoring

▪Railway monitoring ▪Throttling and Anomaly Detection

▪API Analytics (13+ customers)

▪Connected Car

Case Study: DEBS Grand Challenges

▪DEBS ((Distributed Event Based Systems) Grand Challenge is a yearly event processing challenge.

▪2014 Challenge: ▪Smart Home electricity data: 2000 sensors, 40 houses, 4 Billion events. We posted (400K events/sec) and close to one million distributed throughput with 4 nodes.

▪one of the four finalists▪2015 Challenge:

▪Based on taxi activities collected from New York City over the year 2013. 14,144 taxis 173 million taxi trip records. We posted 300K/sec on a single node and one of the finalists.

https://www.flickr.com/photos/shedboy/3681317392/

Case Study: Realtime Soccer Analysis

Watch at: https://www.youtube.com/watch?v=nRI6buQ0NOM

Case Study: TFL Traffic Analysis

Built using TFL ( Transport for London) open data feeds.

http://goo.gl/04tX6khttp://goo.gl/9xNiCm

Select the Product

Product Features

WSO2 Data Analytics Server (DAS)

Everything : Batch, Realtime, Interactive, and Predictive Analytics

WSO2 Complex Event Processor (CEP)

Realtime Analytics only

WSO2 Machine Learner

Predictive Analytics only

Questions?

Thank You