Make Streaming Analytics work for you: The Devil is in the Details
-
Upload
dataworks-summithadoop-summit -
Category
Technology
-
view
918 -
download
0
Transcript of Make Streaming Analytics work for you: The Devil is in the Details
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Make Streaming Analytics Work for you: The Devil is in the DetailsKanishk Mahajan
June 2016
Director Product Management, Hortonworks
Ryan MedlinDirector Software Engineering, Neustar
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda• Building a Stream Analytics Platform:
Requirements
• Building a Stream Analytics Platform: Capabilities
• Use Case -• CyberSecurity - Threat Protection
• Q&A
Building a Stream Analytics PlatformRequirements
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
5 Attributes of a Streaming Platform
Ingest Process Analyze Visualize Respond
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Flow versus Stream Analytics
Data Flow– Ingest and route terabytes of data into a ”unified firehose”–Actively performance manage the latency and quality of these data flows -
– across high variability of data formats, size of data and speed of data
–Produces Streams - ”unbounded sequence of events” Real Time Stream Analytics
–Sub second event processing with linear scalability to billions of events –Predictive Analytics at Scale
– Real time data aggregation across edge nodes while processing 10s of millions of events and 100s of gigabytes per second
–Guaranteed no data loss and events processed in order
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
6 key focus areas for building a Streaming Platform
Common Abstraction Layer Latency Lambda Architecture
–“Orchestrate” over static and real time data Scale-out Rapid Application Development Data Visualization
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Focus Areas for Streaming Platform
Common Abstraction Layer–Select one or more streaming engines–Select one or more cloud providers–Select one or more event sources–…Future Proof
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Focus Areas for Streaming Platform
Latency–500 millisecond or less- Dashboards, Security Incidents, Asset Performance–20 milliseconds or less - Ad Networks, Preventive Maintenance–…
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Focus Areas for Streaming Platform
Lambda Architecture–Integrate static and real time data–Enrichment–Orchestration of Batch workflows–Predictive Classification and Scoring
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Focus Areas for Streaming Platform
Scale Out–Linear Scale out or scale down–Resource Management–Handle Transient workloads
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Focus Areas for Streaming Platform Rapid Application Development
–Aggregations–Filters–Multi Stream Correlations–Splits–Joins–Normalizations–Business Rules Editor
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Focus Areas for Streaming Platform Data Visualization
–Time Series Visualizations–Metrics Dashboarding–Trends–Comparisons–Thresholds–Custom UI extensions
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Real Time Stream Analytics versus Streaming Engine
Abstracts underlying Streaming Engine- Storm, Spark Streaming, Flink.. OOTB Support for multiple (cloud) event sources - Kafka, AWS Kinesis, Azure Event
Hub Built-in Operators for Complex Event Processing Built-in Real Time Dashboarding- Metrics and Events Pluggable Workflow Management Business Rules Editor and Rapid Application Development Framework Cloud Deployment Scalable Architecture Handles different latency requirements
Hortonworks Inc Confidential 2016
Building a Stream Analytics PlatformCapabilities
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stream Analytics Platform - Capabilities
• Continuous, Real Time Processing–Continuous Query and near real time processing of event data–Distributed, scalable platform for processing continuous unbounded streams of data–Abstraction across multiple open source stream engine technologies– Large Scale Event Processing–No coding approach for event processing logic
• Event Source–Multiple event source support–Multiple data format support–Drag and Drop, no coding approach
• Event Storage–Multiple event storage support– Event Retention time period support–Drag and Drop, no coding approach
Page 15
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Platform Capabilities - Analytics
Detect Trends Alert on probable Failures
Descriptive Proactive Predictive Prescriptive
• Rules Calculations on predefined settings
• High Throughput, Low Latency Event Processing Framework
• Train Models Offline• Provide Query interface and
storage of Models• Integrate Real Time Data
with Batch Data
• Connect with Enterprise Applications
Reporting and DashboardingVisual Pattern Recognition
Monitor System Alert before Failure
Take Action on Data Avoid Failure and
Drive Operational Efficiency
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stream Analytics Platform - Capabilities
• Filtering–Event Selection based on pre-defined criteria
• Transformations–Aggregations–Event Prioritization–Event Deduplication–Event Time Stamping–Custom User Defined Functions
• Enrichment–Pull in reference data for additional context
Page 17
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stream Analytics Platform - Capabilities
• Correlations–Correlation of in-flight event data from multiple event streams
• Temporal Calculations–Time Based Calculations on in-flight data in an event stream for e.g. a windowing
computation with a time based window
• Pattern Detection–Calculating trends on in-flight data in an event stream–Outlier detection : Pattern Matching with historical data
Page 18
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stream Analytics Platform - Capabilities
• Serialization/Deserialization–Marshalling/Unmarshalling event data into a series of bytes–Drag and Drop, no coding support
• Rule Engine–Support for business rules (business directives that impact a decision) –Support for invoking external business rule engine
• Visualization–Reporting and Dashboarding–Visual Pattern Recognition
Page 19
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stream Analytics - Capabilities Summary
# Capability
1. Filtering
2. Transformations
3. Enrichment
4. Correlations
5. Temporal Calculations
6. Pattern Detection
# Capability
1. Continuous Real Time Processing
2. Event Sources
3. Event Storage
4. Temporal Calculations
5. Rules Engine
6. Visualization
# Capability
7. Filtering
8. Transformation
9. Event Storage
10. Enrichment
11. Correlations
12. Pattern Detection
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use CasesCyberSecurity - Threat Protection
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
CyberSecurity - Threat Protection Use Case
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
CyberSecurity - Threat Protection - StepsApplying Streaming Analytics
• Expose• Provide Visibility into your enterprise• Passively watch network traffic and construct events to describe the activity it sees.• “Ingest”
• Transmit• Compress, Encrypt, De-identify event data• Send to the cloud for centralized log retention, real-time threat analysis and incident investigation
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
CyberSecurity - Threat Protection - StepsApplying Streaming Analytics
• Prioritize• Aggregate intelligence across all control points to identify and prioritize those systems that remain
compromised and require immediate remediation.• Allows analysts to “zero in” on just those events of most importance.• Significantly reduce the number of incidents that security analysts need to investigate• Combine global telemetry from cyber intelligence networks with local customer context across
endpoints, networks and email, to uncover attacks that would otherwise evade detection.• “Correlate” and “Enrich”
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
CyberSecurity - Threat Protection - StepsApplying Streaming Analytics
• Analyze• Create comprehensive detection rules, behavioral analytics and guided investigations to detect the
latest threats.• Enable quick and nimble data exploration across billions of events to proactively hunt for hidden
indicators of compromise (IOCs).• Provide agile investigation tooling/UI to pivot from one indicator to the next, reconstruct the attack
storyline and plan a forceful response to disrupt the attack.• Allow Security analysts to visualize all related attack components, e.g., all files used in the attack,
all email addresses, all malicious IP addresses, etc.• “Proactive”, “Proactive”, “Prescriptive” and “Descriptive” Analytics• “Real Time Analytics across large volumes of data at scale”
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
CyberSecurity - Threat Protection - StepsApplying Streaming Analytics
• Remediate• Generate alerts to expedite validating and scoping the incident.• Enrich alerts with supporting data. This data could include threat intelligence, point-in-time
context regarding users impacted, actions taken and hosts involved help you validate and scope the incident.
• Allow a Security Investigator to hunt for Indicators-of-Compromise across all your endpoints from a single console.
• Leverage your existing installations of Endpoint Protection and Email Security for enforcement without having to install any new endpoint agents.
• Export rich security intelligence into third-party security incident and event management systems (SIEMs).
• “Notifications”, “Event Storage”, “Workflow”
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Make Streaming Analytics Work for you: The Devil is in the DetailsKanishk Mahajan
June 2016
Director Product Management, Hortonworks
Ryan MedlinDirector Software Engineering, Neustar
Copyright © 2015 Neustar, Inc. All Rights Reserved 28
AREAS OF FOCUS AND INNOVATION FOR NEUSTAR
Security
User
Device
Anomaly Detection
Multi-Factor Authentication (PKI, etc)
Policy Creation/Decisioning/Enforcement
Local / Edge / Distributed functionality
Openness and Standardization
(OCF/OneM2M/MQTT/CoAP/GitHub)
Partnerships
Copyright © 2015 Neustar, Inc. All Rights Reserved 29
DEVICE AND IOT PLATFORM
Devices
Data Processing Services
Local & Offline Functionality
Device and User Authentication
Device Directory & Identity
User and Device Policy
Device Certification & Public / Private Key Distribution/MFA
OIC/OCF and IoTivity Standards Based Platform for Interoperability & device definition
Absolute, Group/Collection, Local & cloud
Verification / Fraud /
Anomaly detection
Location Services
Future
Peer-to-Peer communication & Discovery. decentralization
Secure Discovery & Communication OCF based security & communication protocols
Data pipeline, enrichment and analytics
CORE DATA PLATFORM REQUIREMENTS
Provide Ingestion for all Machine Data Exhaust from all system components. (ex: Anomaly Detection, Monitoring/Alerting)
Deserializing avro messages using given schema(s) and by default store a message in hdfs.
Given a message attribute (for example device type and customer Id or device Id) execute a rule to determine next course of action which should comprise of one or more of these actions:– Notify a user via email or sms or 3rd party api – Notify or pass message to registry service/API– Store data in hdfs– Store data in a relational data store (hbase phoenix or other?)
Copyright © 2015 Neustar, Inc. All Rights Reserved 31
Copyright © 2015 Neustar, Inc. All Rights Reserved 32
Copyright © 2015 Neustar, Inc. All Rights Reserved 33
Copyright © 2015 Neustar, Inc. All Rights Reserved 34
Outgoing Messages
Copyright © 2015 Neustar, Inc. All Rights Reserved 35
Turn a Light on ManuallyInbound Message Flow
Copyright © 2015 Neustar, Inc. All Rights Reserved 36
Incoming Messages
INTERESTING PROBLEMS We need to work in offline and temporary offline mode so data replication is the hardest
problem to solve and is based on unique use cases. Also need to run a subset of the Cloud Inbound real time service rules and processing.
How to reuse this logic both in the cloud and locally on a Local Gateway is a design challenge with tradeoffs.
Copyright © 2015 Neustar, Inc. All Rights Reserved 37
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Q & A