Make Streaming Analytics work for you: The Devil is in the Details

38
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Make Streaming Analytics Work for you: The Devil is in the Details Kanishk Mahajan June 2016 Director Product Management, Hortonworks Ryan Medlin Director Software Engineering, Neustar

Transcript of Make Streaming Analytics work for you: The Devil is in the Details

Page 1: Make Streaming Analytics work for you: The Devil is in the Details

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Make Streaming Analytics Work for you: The Devil is in the DetailsKanishk Mahajan

June 2016

Director Product Management, Hortonworks

Ryan MedlinDirector Software Engineering, Neustar

Page 2: Make Streaming Analytics work for you: The Devil is in the Details

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda• Building a Stream Analytics Platform:

Requirements

• Building a Stream Analytics Platform: Capabilities

• Use Case -• CyberSecurity - Threat Protection

• Q&A

Page 3: Make Streaming Analytics work for you: The Devil is in the Details

Building a Stream Analytics PlatformRequirements

Page 4: Make Streaming Analytics work for you: The Devil is in the Details

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

5 Attributes of a Streaming Platform

Ingest Process Analyze Visualize Respond

Page 5: Make Streaming Analytics work for you: The Devil is in the Details

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data Flow versus Stream Analytics

Data Flow– Ingest and route terabytes of data into a ”unified firehose”–Actively performance manage the latency and quality of these data flows -

– across high variability of data formats, size of data and speed of data

–Produces Streams - ”unbounded sequence of events” Real Time Stream Analytics

–Sub second event processing with linear scalability to billions of events –Predictive Analytics at Scale

– Real time data aggregation across edge nodes while processing 10s of millions of events and 100s of gigabytes per second

–Guaranteed no data loss and events processed in order

Page 6: Make Streaming Analytics work for you: The Devil is in the Details

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

6 key focus areas for building a Streaming Platform

Common Abstraction Layer Latency Lambda Architecture

–“Orchestrate” over static and real time data Scale-out Rapid Application Development Data Visualization

Page 7: Make Streaming Analytics work for you: The Devil is in the Details

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Focus Areas for Streaming Platform

Common Abstraction Layer–Select one or more streaming engines–Select one or more cloud providers–Select one or more event sources–…Future Proof

Page 8: Make Streaming Analytics work for you: The Devil is in the Details

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Focus Areas for Streaming Platform

Latency–500 millisecond or less- Dashboards, Security Incidents, Asset Performance–20 milliseconds or less - Ad Networks, Preventive Maintenance–…

Page 9: Make Streaming Analytics work for you: The Devil is in the Details

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Focus Areas for Streaming Platform

Lambda Architecture–Integrate static and real time data–Enrichment–Orchestration of Batch workflows–Predictive Classification and Scoring

Page 10: Make Streaming Analytics work for you: The Devil is in the Details

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Focus Areas for Streaming Platform

Scale Out–Linear Scale out or scale down–Resource Management–Handle Transient workloads

Page 11: Make Streaming Analytics work for you: The Devil is in the Details

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Focus Areas for Streaming Platform Rapid Application Development

–Aggregations–Filters–Multi Stream Correlations–Splits–Joins–Normalizations–Business Rules Editor

Page 12: Make Streaming Analytics work for you: The Devil is in the Details

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Focus Areas for Streaming Platform Data Visualization

–Time Series Visualizations–Metrics Dashboarding–Trends–Comparisons–Thresholds–Custom UI extensions

Page 13: Make Streaming Analytics work for you: The Devil is in the Details

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Real Time Stream Analytics versus Streaming Engine

Abstracts underlying Streaming Engine- Storm, Spark Streaming, Flink.. OOTB Support for multiple (cloud) event sources - Kafka, AWS Kinesis, Azure Event

Hub Built-in Operators for Complex Event Processing Built-in Real Time Dashboarding- Metrics and Events Pluggable Workflow Management Business Rules Editor and Rapid Application Development Framework Cloud Deployment Scalable Architecture Handles different latency requirements

Hortonworks Inc Confidential 2016

Page 14: Make Streaming Analytics work for you: The Devil is in the Details

Building a Stream Analytics PlatformCapabilities

Page 15: Make Streaming Analytics work for you: The Devil is in the Details

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stream Analytics Platform - Capabilities

• Continuous, Real Time Processing–Continuous Query and near real time processing of event data–Distributed, scalable platform for processing continuous unbounded streams of data–Abstraction across multiple open source stream engine technologies– Large Scale Event Processing–No coding approach for event processing logic

• Event Source–Multiple event source support–Multiple data format support–Drag and Drop, no coding approach

• Event Storage–Multiple event storage support– Event Retention time period support–Drag and Drop, no coding approach

Page 15

Page 16: Make Streaming Analytics work for you: The Devil is in the Details

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Streaming Platform Capabilities - Analytics

Detect Trends Alert on probable Failures

Descriptive Proactive Predictive Prescriptive

• Rules Calculations on predefined settings

• High Throughput, Low Latency Event Processing Framework

• Train Models Offline• Provide Query interface and

storage of Models• Integrate Real Time Data

with Batch Data

• Connect with Enterprise Applications

Reporting and DashboardingVisual Pattern Recognition

Monitor System Alert before Failure

Take Action on Data Avoid Failure and

Drive Operational Efficiency

Page 17: Make Streaming Analytics work for you: The Devil is in the Details

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stream Analytics Platform - Capabilities

• Filtering–Event Selection based on pre-defined criteria

• Transformations–Aggregations–Event Prioritization–Event Deduplication–Event Time Stamping–Custom User Defined Functions

• Enrichment–Pull in reference data for additional context

Page 17

Page 18: Make Streaming Analytics work for you: The Devil is in the Details

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stream Analytics Platform - Capabilities

• Correlations–Correlation of in-flight event data from multiple event streams

• Temporal Calculations–Time Based Calculations on in-flight data in an event stream for e.g. a windowing

computation with a time based window

• Pattern Detection–Calculating trends on in-flight data in an event stream–Outlier detection : Pattern Matching with historical data

Page 18

Page 19: Make Streaming Analytics work for you: The Devil is in the Details

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stream Analytics Platform - Capabilities

• Serialization/Deserialization–Marshalling/Unmarshalling event data into a series of bytes–Drag and Drop, no coding support

• Rule Engine–Support for business rules (business directives that impact a decision) –Support for invoking external business rule engine

• Visualization–Reporting and Dashboarding–Visual Pattern Recognition

Page 19

Page 20: Make Streaming Analytics work for you: The Devil is in the Details

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stream Analytics - Capabilities Summary

# Capability

1. Filtering

2. Transformations

3. Enrichment

4. Correlations

5. Temporal Calculations

6. Pattern Detection

# Capability

1. Continuous Real Time Processing

2. Event Sources

3. Event Storage

4. Temporal Calculations

5. Rules Engine

6. Visualization

# Capability

7. Filtering

8. Transformation

9. Event Storage

10. Enrichment

11. Correlations

12. Pattern Detection

Page 21: Make Streaming Analytics work for you: The Devil is in the Details

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use CasesCyberSecurity - Threat Protection

Page 22: Make Streaming Analytics work for you: The Devil is in the Details

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

CyberSecurity - Threat Protection Use Case

Page 23: Make Streaming Analytics work for you: The Devil is in the Details

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

CyberSecurity - Threat Protection - StepsApplying Streaming Analytics

• Expose• Provide Visibility into your enterprise• Passively watch network traffic and construct events to describe the activity it sees.• “Ingest”

• Transmit• Compress, Encrypt, De-identify event data• Send to the cloud for centralized log retention, real-time threat analysis and incident investigation

Page 24: Make Streaming Analytics work for you: The Devil is in the Details

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

CyberSecurity - Threat Protection - StepsApplying Streaming Analytics

• Prioritize• Aggregate intelligence across all control points to identify and prioritize those systems that remain

compromised and require immediate remediation.• Allows analysts to “zero in” on just those events of most importance.• Significantly reduce the number of incidents that security analysts need to investigate• Combine global telemetry from cyber intelligence networks with local customer context across

endpoints, networks and email, to uncover attacks that would otherwise evade detection.• “Correlate” and “Enrich”

Page 25: Make Streaming Analytics work for you: The Devil is in the Details

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

CyberSecurity - Threat Protection - StepsApplying Streaming Analytics

• Analyze• Create comprehensive detection rules, behavioral analytics and guided investigations to detect the

latest threats.• Enable quick and nimble data exploration across billions of events to proactively hunt for hidden

indicators of compromise (IOCs).• Provide agile investigation tooling/UI to pivot from one indicator to the next, reconstruct the attack

storyline and plan a forceful response to disrupt the attack.• Allow Security analysts to visualize all related attack components, e.g., all files used in the attack,

all email addresses, all malicious IP addresses, etc.• “Proactive”, “Proactive”, “Prescriptive” and “Descriptive” Analytics• “Real Time Analytics across large volumes of data at scale”

Page 26: Make Streaming Analytics work for you: The Devil is in the Details

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

CyberSecurity - Threat Protection - StepsApplying Streaming Analytics

• Remediate• Generate alerts to expedite validating and scoping the incident.• Enrich alerts with supporting data. This data could include threat intelligence, point-in-time

context regarding users impacted, actions taken and hosts involved help you validate and scope the incident.

• Allow a Security Investigator to hunt for Indicators-of-Compromise across all your endpoints from a single console.

• Leverage your existing installations of Endpoint Protection and Email Security for enforcement without having to install any new endpoint agents.

• Export rich security intelligence into third-party security incident and event management systems (SIEMs).

• “Notifications”, “Event Storage”, “Workflow”

Page 27: Make Streaming Analytics work for you: The Devil is in the Details

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Make Streaming Analytics Work for you: The Devil is in the DetailsKanishk Mahajan

June 2016

Director Product Management, Hortonworks

Ryan MedlinDirector Software Engineering, Neustar

Page 28: Make Streaming Analytics work for you: The Devil is in the Details

Copyright © 2015 Neustar, Inc. All Rights Reserved 28

Page 29: Make Streaming Analytics work for you: The Devil is in the Details

AREAS OF FOCUS AND INNOVATION FOR NEUSTAR

Security

User

Device

Anomaly Detection

Multi-Factor Authentication (PKI, etc)

Policy Creation/Decisioning/Enforcement

Local / Edge / Distributed functionality

Openness and Standardization

(OCF/OneM2M/MQTT/CoAP/GitHub)

Partnerships

Copyright © 2015 Neustar, Inc. All Rights Reserved 29

Page 30: Make Streaming Analytics work for you: The Devil is in the Details

DEVICE AND IOT PLATFORM

Devices

Data Processing Services

Local & Offline Functionality

Device and User Authentication

Device Directory & Identity

User and Device Policy

Device Certification & Public / Private Key Distribution/MFA

OIC/OCF and IoTivity Standards Based Platform for Interoperability & device definition

Absolute, Group/Collection, Local & cloud

Verification / Fraud /

Anomaly detection

Location Services

Future

Peer-to-Peer communication & Discovery. decentralization

Secure Discovery & Communication OCF based security & communication protocols

Data pipeline, enrichment and analytics

Page 31: Make Streaming Analytics work for you: The Devil is in the Details

CORE DATA PLATFORM REQUIREMENTS

Provide Ingestion for all Machine Data Exhaust from all system components. (ex: Anomaly Detection, Monitoring/Alerting)

Deserializing avro messages using given schema(s) and by default store a message in hdfs.

Given a message attribute (for example device type and customer Id or device Id) execute a rule to determine next course of action which should comprise of one or more of these actions:– Notify a user via email or sms or 3rd party api – Notify or pass message to registry service/API– Store data in hdfs– Store data in a relational data store (hbase phoenix or other?)

Copyright © 2015 Neustar, Inc. All Rights Reserved 31

Page 32: Make Streaming Analytics work for you: The Devil is in the Details

Copyright © 2015 Neustar, Inc. All Rights Reserved 32

Page 33: Make Streaming Analytics work for you: The Devil is in the Details

Copyright © 2015 Neustar, Inc. All Rights Reserved 33

Page 34: Make Streaming Analytics work for you: The Devil is in the Details

Copyright © 2015 Neustar, Inc. All Rights Reserved 34

Outgoing Messages

Page 35: Make Streaming Analytics work for you: The Devil is in the Details

Copyright © 2015 Neustar, Inc. All Rights Reserved 35

Turn a Light on ManuallyInbound Message Flow

Page 36: Make Streaming Analytics work for you: The Devil is in the Details

Copyright © 2015 Neustar, Inc. All Rights Reserved 36

Incoming Messages

Page 37: Make Streaming Analytics work for you: The Devil is in the Details

INTERESTING PROBLEMS We need to work in offline and temporary offline mode so data replication is the hardest

problem to solve and is based on unique use cases. Also need to run a subset of the Cloud Inbound real time service rules and processing.

How to reuse this logic both in the cloud and locally on a Local Gateway is a design challenge with tradeoffs.

Copyright © 2015 Neustar, Inc. All Rights Reserved 37

Page 38: Make Streaming Analytics work for you: The Devil is in the Details

38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Q & A