Hortonworks Data In Motion Webinar Series Pt. 2
-
Upload
hortonworks -
Category
Technology
-
view
537 -
download
10
Transcript of Hortonworks Data In Motion Webinar Series Pt. 2
Make Your Big Data Ecosystem Work Better for You
@MarkLochbihlerPartner Engineering
October 12th, 2016© Hortonworks Inc. 2011 – 2015. All Rights Reserved
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda • Hortonworks Connected Data Platforms• HDF 2.0 Platform• Data Ingestion into Hadoop made EASY• HDF 2.0 Platform Use Cases• HDF 2.0 Product Integration Certification• HDF Partner Ecosystem Solutions• More Information
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda Hortonworks Connected Data Platforms
HDF 2.0 Platform
Data Ingestion into Hadoop made EASY
HDF 2.0 Platform Use Cases
HDF 2.0 Product Integration Certification
HDF Partner Ecosystem Solutions
More Information
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Actionable Intelligence fromConnected Data Platforms
Capturing perishable insights from data in motion
Ensuring rich, historical insightson data at rest
Necessary for moderndata applications
DATA AT RESTDATA IN MOTION
ACTIONABLEINTELLIGENCE
Modern Data Applications
Hortonworks DataFlow
Hortonworks Data Platform
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Modern Data AppsCustom or Off the Shelf
Real-Time Cyber Securityprotects systems with superior threat detectionSmart Manufacturingdramatically improves yields by managing more variables in greater detailConnected, Autonomous Carsdrive themselves and improve road safetyFuture Farmingoptimizing soil, seeds and equipment to measured conditions on each square footAutomatic Recommendation Enginesmatch products to preferences in milliseconds
DATA ATREST
DATA IN MOTION
ACTIONABLEINTELLIGENCE
Modern Data Applications
Hortonworks DataFlow
Hortonworks Data Platform
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda Hortonworks Connected Data Platforms
HDF 2.0 Platform
Data Ingestion into Hadoop made EASY
HDF 2.0 Platform Use Cases
HDF 2.0 Product Integration Certification
HDF Partner Ecosystem Solutions
More Info
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Constrained High-latency Localized context
Hybrid – cloud / on-premises Low-latency Global context
CoreInfrastructure
Hortonworks DataFlow Manages Data in MotionRegional
InfrastructureSources
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Easy, Real-Time, Coding Free Data Movement
Dynamic data pipeline as not all data is equal
AWSAzure
Google CloudHadoop
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Expand To The Very Edge With MiNiFi
AWSAzure
Google CloudHadoop
Capture new sources of data
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Edge Intelligence with Apache MiNiFi
Guaranteed delivery Data buffering
‒ Backpressure‒ Pressure release
Prioritized queuing Flow specific QoS
‒ Latency vs. throughput‒ Loss tolerance
Data provenance
Recovery / recording a rolling log of fine-grained history
Designed for extension
Different from Apache NiFi Design and Deploy Warm re-deploys
Key Features
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Immediate Insights At Massive Scale
AWSAzure
Google CloudHadoop
Adapt to differing rates of data creation & delivery (Kafka) with real-time streaming analytics (Storm)
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks DataFlow Management and Stream Processing
CoreInfrastructureSources
Constrained High-latency Localized context
Hybrid – cloud / on-premises Low-latency Global context
RegionalInfrastructure
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Flow management
D A T A I N M O T I O N D A T A A T R E S T
IoT Data Sources AWSAzure
Google CloudHadoop
NiFiKafka
Storm
Others…NiFi
NiFi NiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
NiFi
HDF 2.0: Data-in-Motion Platform
Enterprise Services
Ambari Ranger Other services
Flow management + Stream Processing
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda Hortonworks Connected Data Platforms
HDF 2.0 Platform
Data Ingestion into Hadoop made EASY
HDF 2.0 Platform Use Cases
HDF 2.0 Product Integration Certification
HDF Partner Ecosystem Solutions
More Information
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Problems Today: Timely Access to Data and Decisions
HDF helps us to streamline the flow of data and build models and visualisations quickly, so that my team can work iteratively with business colleagues on building solutions that work for the business. -Royal Mailhttp://diginomica.com/2016/04/22/royal-mail-starts-to-deliver-on-hortonworks-data-in-motion-promise/
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Store Data
Process and Analyze Data
Acquire Data
Simplistic View of DataFlows: Easy, Definitive
Dataflow
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Realistic View of Dataflows: Complex, Convoluted
Store Data
Process and Analyze Data
Acquire Data
Store DataStore Data
Store Data
Store Data
Acquire Data
Acquire Data
Acquire Data
Dataflow
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDPHORTONWORKSDATA PLATFORMPowered by Apache Hadoop
HDF Makes Big Data Ingest Fast, Easy
HDPHORTONWORKSDATA PLATFORMHDPHORTONWORKSDATA PLATFORMPowered by Apache Hadoop
Complicated, messy, and takes weeks to months to move the right data into Hadoop Streamlined, Efficient, Easy
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda Hortonworks Connected Data Platforms
HDF 2.0 Platform
Data Ingestion into Hadoop made EASY
HDF 2.0 Platform Use Cases
HDF 2.0 Product Integration Certification
HDF Partner Ecosystem Solutions
More Information
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Cases for Data in Motion
Use Cases for Data-in-Motion Using DataFlow Mgmt• Data Ingestion • Edge Intelligence• First Mile Problem • Physical Data Movement • Simple event processing such as Route, Filter, Enrich,
Transform, etc.
When Only DataFlow Management is
Required
Use Cases for Data-in-Motion Using DataFlow Mgmt and Steam Processing• Flow Management to deliver data for Stream Processing• PLUS: Complex pattern matching on unbounded streams of
data.
When Both DataFlow Management and Stream Processing
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Ingestion: Optimize Log Analytics with Content Based Routing
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example: Company X provides alerting services when users’ resting heart rate higher than a threshold
Real-Time Insights
Acquire Data
Company X Cloud Instance 1
Acquire Data
Company X Cloud Instance 2
Acquire Data
Company X Cloud Instance 3
Acquire Data Across Cloud
Instances
Parse, Filter, Validate, Enrich
and Route
Core Data Center
Analytics/Pattern Match
Data Store
Alerts
Dashboards/Visualization
Flow Management Stream ProcessingLegend:
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data in Motion Needs Dataflow Management and Stream Processing
Acquire data from various Wearable Device’s Cloud Instances
Move Data from Customer Cloud Instances to on-premise instance
Perform Intelligent Routing & Filtering of data. The routing and filtering rules will be often changed at run-time.
Deliver the data data to various downstream systems. New downstream apps should will always appear and the data should be fed to it when it comes online.
Parse the device data to standardized format that downstream sysem can understand
Enrich the data with contextual information including patient/customer info (age, sex, etc..)
Recognize the Pattern when the resting heart rate exceeds a certain threshold (the insight), and then create an alert/notification.
Run a Outlier detection model on streaming heart rate that comes in. If the score is above certain threshold, alert on the heart rate.
Flow Management (NiFi, MiNiFi
andPartner
Integration)
StreamProcessing
(Storm, Kafka and Partner
Integration)
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data in Motion(Cloud)
Data in Motion
(on-premises)
Data at Rest
(on-premises)
Edge Data
Data in Motion
Edge Analytics
Data at Rest
(Cloud)
Edge Data
Data at Rest
(on-premises)
Closed Loop Analytics
MachineLearning
Deep HistoricalAnalysis
The Future of DataArchitectural Transformation Enabled By Connected Data Platforms
On PremCloud
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda Hortonworks Connected Data Platforms
HDF 2.0 Platform
Data Ingestion into Hadoop made EASY
HDF 2.0 Platform Use Cases
HDF 2.0 Product Integration Certification
HDF Partner Ecosystem Solutions
More Information
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
http://hortonworks.com/partners/product-integration-certification/
• Announced August 10th, 2016
• As the adoption of HDF expands, enterprises are looking for proven integrations that mitigate deployment risk, pre-tested and certified. The HDF Certified badge is earned by partners with certified integrations with HDF
• Email [email protected] to get started
Product Integration Certification for HDP 2.0
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Product Certification3 SIMPLE STEPS
Step 1 : Join Partnerworks
Step 2 : Complete HDF 2.0 Certification Kit Certification Report Reference Architecture Solutions Overview
Step 3: Joint Review w Partner Engineering
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda Hortonworks Connected Data Platforms
HDF 2.0 Platform
Data Ingestion into Hadoop made EASY
HDF 2.0 Platform Use Cases
HDF 2.0 Product Integration Certification
HDF Partner Ecosystem Solutions
More Information
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks DataFlow: Connecting Data Between EcosystemsHash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route Content
Route Context
Route Text
Control Rate
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritized Delivery
Encrypt
Tail
Evaluate
Execute
HL7
FTP
UDP
XML
SFTP
HTTP
Syslog
HTML
Image
AMQP
MQTT
All Apache project logos are trademarks of the ASF and the respective projects All other trademarks are the property of their respective owners.
Fetch
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connected Vehicle Case – Big Data Primary Data Flow
Sensitive structured and
unstructured data
Hadoop Edge Nodes
HPE SecureData Hadoop Tools
Hadoop Cluster Teradata EDW
Sensitive structuredsources
Cognos
Analytics & Data Science
HPE SecureData Key Servers &
WS API’s
~2 Billion real time transactions/day
Other real-time data feeds – customer
data from dealerships,
manufacturers
Sqoop
Hive UDFs
Map Reduce
“Landing zone”
“Integration Controls”
Real time ingest
Existing data sets and 3rd party data, e.g.. accident data
UDFs
IBM DataStage
Driver Blood
Pressure Sensor Data
Exadata
TDEServer or
laptop log files
Public data sources such
as NHTSA
Storm Kafka
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Analytics and Hortonworks
D A T A I N M O T I O N
STORAG
ESTO
RAG
E
GROUP 2GROUP 1
GROUP 4GROUP 3
D A T A A T R E S TD A T A I N M O T I O N
INTERNETOF
ANYTHING
C L O U D
O N P R E M I S E
STORAG
ESTO
RAG
E
GROUP 2GROUP 1
GROUP 4GROUP 3
D A T A A T R E S T
ESP
ESP
ESP
HDP
HDP
USAGE CASE EXAMPLESCyber Security; Fraud; Predictive Maintenance; Customer Experience; Stream Data Management
Deep HistoricalAnalysis
MachineLearning
Edge Analytics, MLearning,Historical Analytics
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
CDC MSG
n 2 1
MSG MSGData Streaming
Transaction logs
In memory optimized metadata management and data transport
Bulk Load
MSG
n 2 1
MSG MSGData Streaming
Message broker
Message broker
Data Streaming into Kafka HDF HDP
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Filtered data flowfrom edge to central sitePerform basic
operations like ingest, alert, filter, transform, etc.
SITE 1 • Data is read from multiple sensors
• The ‘Nifi’ installation on the edge node ingests the data from the sensors
SITE 2
CENTRAL NODES
SITE 3
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
D A T A I N M O T I O N D A T A A T R E S T
Data Sources Polling & Protocol Translation Real Time Database Long Term Storage
DCS
PLC
MiNiFi
Meters
Vehicles
Analyzers
RTU
Data Access
Polling Engine
Protocol Proxy Service
Time Series Storage
ArchiveFiles
VariedSupport
Gov
erna
nce
& In
tegr
atio
n
Secu
rity
Ope
ratio
nsData Access
Data Management
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Edge
Gateway Server
MiNiFiMobile
Client Libraries
IoT Devices
Client Libraries
Server Cluster
NiFi
Devices
MiNiFi
Regional Center
NiFi NiFi
Core Data Center
Server Cluster
NiFi NiFi NiFi AWSAzure
Google Cloud
DBData WHIoT Devices
Client Libraries
eCompute eStorageeFabric® Data PlaneSoftware Defined Fabric
eNetwork
eFabric®Control Plane Zone based micro-segmentation for data security
AppHub – Application gallery for building, deploying and managing data pipelines
Seamless connectivity to public cloud services
www.midfinsystems.com
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda Hortonworks Connected Data Platforms
HDF 2.0 Platform
Data Ingestion into Hadoop made EASY
HDF 2.0 Platform Use Cases
HDF 2.0 Product Integration Certification
HDF Partner Ecosystem Solutions
More Information
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
More Information, Resources
Hortonworks Community Connection:Data Ingestion and Streaminghttps://community.hortonworks.com
Partnerworks: http://hortonworks.com/partners/
HDF Certification: http://hortonworks.com/partners/product-integration-certification/
Webinars: http://hortonworks.com/events-webcasts/
Sandbox: http://hortonworks.com/events-webcasts/
HDF: http://hortonworks.com/hdf/
HDP: http://hortonworks.com/hdp/