Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Transcript of Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Real-Time Processing in HadoopPhoenix Hadoop User Group
Shane Kumpf & Mac MooreSolutions Engineers, HortonworksJuly 2015
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Agenda
Introduction & about Hortonworks HDP Overview of logistics industry scenario Overview of streaming architecture on HDP Streaming Demo #1 Integrating Predictive Analytics in streaming scenarios Streaming Demo with Predictive additions Q & A
Page 2
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Preface: Enabling Technologies
Page 3
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Preface: Enabling Technologies
Page 4
Enablers: Key technologies from mass consumer-scale deployments.
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Preface: Enabling Technologies
Page 5
• Problems solved at scale, via fundamentally new approaches…• Make it possible, even simple, to produce new products/applications that would have been too cost prohibitive – or simply impossible - beforehand.
• Where foundation tech like Li-Ion batteries, retina displays, GPS & tiny HD cameras (from smartphones) have enabled Electric cars, quad-copters, VR displays, & more…
• Hadoop has similarly led to breakthroughs in big data scale & capability, and enables new real-time advanced analytic applications.
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Why did Hadoop emerge?
April 2015
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Traditional systems under pressure
Challenges• Constrains data to app
• Can’t manage new data
• Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
20122.8 Zettabytes
202040 Zettabytes
LAGGARDS
INDUSTRY LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop for the Enterprise: Implement a Modern Data Architecture with HDP
Spring 2015
Hortonworks. We do Hadoop.
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop for the Enterprise: Implement a Modern Data Architecture with HDP
Customer Momentum• 430+ customers (Q1 2015)
Hortonworks Data Platform• Completely open multi-tenant platform for any app & any data.
• A centralized architecture of consistent enterprise services for resource management, security, operations, and governance.
Partner for Customer Success• Open source community leadership focus on enterprise needs
• Unrivaled world class support
• Founded in 2011
• Original 24 architects, developers, operators of Hadoop from Yahoo!
• 600+ Employees
• 1000+ Ecosystem Partners
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Customer Partnerships matterDriving our innovation through
Apache Software Foundation Projects
Apache Project Committers PMC Members
Hadoop 27 21
Pig 5 5
Hive 18 6
Tez 16 15
HBase 6 4
Phoenix 4 4
Accumulo 2 2
Storm 3 2
Slider 11 11
Falcon 5 3
Flume 1 1
Sqoop 1 1
Ambari 34 27
Oozie 3 2
Zookeeper 2 1
Knox 13 3
Ranger 10 n/a
TOTAL 161 108Source: Apache Software Foundation. As of 11/7/2014.
Hortonworkers are the architects and engineers that lead development of open source Apache Hadoop at the ASF
• ExpertiseUniquely capable to solve the most complex issues & ensure success with latest features
• ConnectionProvide customers & partners direct input into the community roadmap
• PartnershipWe partner with customers with subscription offering. Our success is predicated on yours.
27
Cloudera: 11
Facebook: 5
LinkedIn: 2
IBM: 2
Others: 23
Yahoo10
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Technology Partnerships matter
Apache Project Hortonworks
Relationship Named Partner
CertifiedSolution Resells Joint
Engr
Microsoft
HP
SAS
SAP
IBM
Pivotal
Redhat
Teradata
Informatica
Oracle
It is not just about packaging and certifying software…
Our joint engineering with our partners drives open source standards for Apache Hadoop
HDP is Apache Hadoop
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP delivers a Centralized Architecture
Modern Data Architecture
• Unifies data and processing.
• Enables applications to have access to all your enterprise data through an efficient centralized platform
• Supported with a centralized approach governance, security and operations
• Versatile to handle any applications and datasets no matter the size or type
Clickstream Web & Social
Geolocation Sensor & Machine
Server Logs
Unstructured
SOU
RCES
Existing Systems
ERP CRM SCM
ANAL
YTIC
S
Data Marts
Business Analytics
Visualization& Dashboards
ANAL
YTIC
S
Applications Business Analytics
Visualization& Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS (Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-TimeBatch Partner ISVBatch BatchMPP
EDW
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP delivers a completely open data platform
Hortonworks Data Platform 2.2
Hortonworks Data Platform provides Hadoop for the Enterprise: a centralized architecture of core enterprise services, for any application and any data.
Completely Open
• HDP incorporates every element required of an enterprise data platform: data storage, data access, governance, security, operations
• All components are developed in open source and then rigorously tested, certified, and delivered as an integrated open source platform that’s easy to consume and use by the enterprise and ecosystem.
YARN: Data Operating System(Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Apa
che
Pig
° °
° °
° ° °
° ° °
HDFS (Hadoop Distributed File System)
GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
Apache Falcon
Apa
che
Hiv
e
Cas
cad
ing
Apa
che
HB
ase
Apa
che
Acc
umul
o
Apa
che
So
lr
Apa
che
Sp
ark
Apa
che
Sto
rm
Apache Sqoop
Apache Flume
Apache Kafka
SECURITY
Apache Ranger
Apache Knox
Apache Falcon
OPERATIONS
Apache Ambari
Apache Zookeeper
Apache Oozie
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Real World Use Case:Trucking Company
Spring 2015
Hortonworks. We do Hadoop.
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scenario Overview.
Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Trucking company w/ large fleet of trucks in Midwest
A truck generates millions of events for a given route; an event could be:
'Normal' events: starting / stopping of the vehicle
‘Violation’ events: speeding, excessive acceleration and breaking, unsafe tail distance
Company uses an application that monitors truck locations and violations from the truck/driver in real-time
Route?Truck?Driver?
Analysts query a broad history to understand if today’s violations are part of a larger problem with specific routes, trucks, or drivers
Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Distributed Storage: HDFS
Many Workloads: YARN
Trucking Company’s YARN-enabled Architecture
Stream Processing (Storm)
Inbound Messaging(Kafka)
Real-time Serving (HBase)
Alerts & Events(ActiveMQ)
Real-Time User Interface
One cluster with consistent security, governance & operations
SQL
Interactive Query(Hive on Tez)
Truck Sensors
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Distributed Storage: HDFS
Many Workloads: YARN
Trucking Company’s YARN-enabled Architecture
Stream Processing (Storm)
Inbound Messaging(Kafka)
Real-time Serving (HBase)
Alerts & Events(ActiveMQ)
Real-Time User Interface
One cluster with consistent security, governance & operations
SQL
Interactive Query(Hive on Tez)
Truck Sensors
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
What is Kafka? APACHE KAFKA
High throughput distributed messaging system
Publish-Subscribe semantics but re-imagined at the implementation level to operate at speed with big data volumes
Kafka @LinkedIn: 800 billion messages per day 175 terabytes of data written per day 650 terabytes of data read per day Over 13 million messages/2.75GB of data
per second
Kafka Cluster
producer
producer
producer
consumer
consumer
consumer
Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Kafka: Anatomy of a TopicPartition 0 Partition 1 Partition 2
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10
11 11
12
Writes
Old
New
APACHE KAFKA
Partitioning allows topics to scale beyond a single machine/node
Topics can also be replicated, for high availability.
Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Distributed Storage: HDFS
Many Workloads: YARN
Trucking Company’s YARN-enabled Architecture
Stream Processing (Storm)
Inbound Messaging(Kafka)
Real-time Serving (HBase)
Alerts & Events(ActiveMQ)
Real-Time User Interface
One cluster with consistent security, governance & operations
SQL
Interactive Query(Hive on Tez)
Truck Sensors
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Apache Storm
• Distributed, real time, fault tolerant Stream Processing platform.• Provides processing guarantees.• Key concepts include:
•Tuples•Streams•Spouts•Bolts•Topology
Page 22
Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Tuples and Streams
• What is a Tuple?– Fundamental data structure in Storm. Is a named list of values that can be of any data type.
Page 23
• What is a Stream?– An unbounded sequences of tuples.– Core abstraction in Storm and are what you “process” in Storm
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Spouts
• What is a Spout?– Generates or a source of Streams– E.g.: JMS, Twitter, Log, Kafka Spout– Can spin up multiple instances of a Spout and dynamically adjust as needed
Page 24
Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Bolts
• What is a Bolt?– Processes any number of input streams and produces output streams– Common processing in bolts are functions, aggregations, joins, read/write to data stores, alerting
logic– Can spin up multiple instances of a Bolt and dynamically adjust as needed
• Bolts used in the Use Case:1. HBaseBolt: persisting and counting in Hbase2. HDFSBolt: persisting into HFDS as Avro Files using Flume3. MonitoringBolt: Read from Hbase and create alerts via email and a message to ActiveMQ if the
number of illegal driver incidents exceed a given threshhold.
Page 25
Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Topology
• What is a Topology?– A network of spouts and bolts wired together into a workflow
Page 26
Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Distributed Storage: HDFS
Many Workloads: YARN
Trucking Company’s YARN-enabled Architecture
Stream Processing (Storm)
Inbound Messaging(Kafka)
Real-time Serving (HBase)
Alerts & Events(ActiveMQ)
Real-Time User Interface
One cluster with consistent security, governance & operations
SQL
Interactive Query(Hive on Tez)
Truck Sensors
Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Key Constructs in Apache HBase• HBase = Key / Value store• Designed for petabyte scale• Supports low latency reads, writes and updates• Key features
– Updateable records– Versioned Records– Distributed across a cluster of machines– Low Latency– Caching
• Popular use cases:– User profiles and session state– Object store– Sensor apps
Page 28
Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Data Assignment
Page 29
HBase Table
Keys within HBaseDivided among
different RegionServers
Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Data Access
• Get– Retrieves a single cell, all cells with a matching rowkey, or all cells in a column
family with a matching rowkey
• Put– Inserts a new version of a cell.
• Scan– The whole table, row by row, or a section of that table starting at a particular
start key and ending at a particular end key
• Delete– It is actually a version of put(Add a new version with put with a deletion
marker)
• SQL via Apache Phoenix– Unique capability in the NoSQL market
Page 30
Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Distributed Storage: HDFS
Many Workloads: YARN
Trucking Company’s YARN-enabled Architecture
Stream Processing (Storm)
Inbound Messaging(Kafka)
Real-time Serving (HBase)
Alerts & Events(ActiveMQ)
Real-Time User Interface
One cluster with consistent security, governance & operations
SQL
Interactive Query(Hive on Tez)
Truck Sensors
Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
20092006
1 ° ° ° ° °
° ° ° ° ° N
HDFS (Hadoop Distributed File
System)
MapReduceLargely Batch Processing
Hadoop w/ MapReduce
YARN: Data Operating System
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° °
°
°N
HDFS (Hadoop Distributed File System)
Hadoop2 & YARN based Architecture
Silo’d clustersLargely batch systemDifficult to integrate
MR-279: YARN
Hadoop 2 & YARN
Interactive Real-TimeBatch
Architected & led development of YARN to enable the Modern Data Architecture
October 23, 2013
Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Benefits of YARN as the Data Operating System
• The container based model allows for running nearly any workload.– Enables the centralized architecture.– No longer is MapReduce the only data processing engine.– Docker containers managed by YARN. Yes Please!
• Decouples resource scheduling from application lifecycle.– Improved scalability and fault tolerence
• Dynamically allocated resources, resulting in HUGE utilization gains– Versus static allocation of “slots” in Hadoop 1.0
Page 33
Yahoo has over 30000 nodes running YARN across over 365PB of data. They calculate running about 400,000 jobs per day for about 10 million hours of compute time.
They also have estimated a 60% – 150% improvement on node usage per day since moving to YARN.
Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Distributed Storage: HDFS
Many Workloads: YARN
Trucking Company’s YARN-enabled Architecture
Stream Processing (Storm)
Inbound Messaging(Kafka)
Real-time Serving (HBase)
Alerts & Events(ActiveMQ)
Real-Time User Interface
One cluster with consistent security, governance & operations
SQL
Interactive Query(Hive on Tez)
Truck Sensors
Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Apache HDFS – Hadoop Distributed File System
• Very large scale distributed file system• 10K nodes, tens of millions files and PBs of data• Supports large files
• Designed to run on commodity hardware, assumes hardware failures
• Files are replicated to handle hardware failure• Detect failures and recovers from them automatically
• Optimized for Large Scale Processing• Data locations are exposed so that the computations can move to where data
resides• Data Coherency
• Write once and read many times access pattern• Files are broken up in chunks called ‘blocks’
• Blocks are distributed over nodes
Page 35
Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Streaming Demo - High Level Architecture
Distributed Storage: HDFS
YARN
Storm Stream Processing
Kakfa Spout
HBase
Dangerous Events TableHbase
BoltHDFSBolt
Truck Events
Active MQ
Monitoring Bolt
Web App
Truck Streaming Data
T(1) T(2) T(N)
Inbound Messaging(Kafka)
Truck Events Topic
Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Demo – Streaming Dashboard.
Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
A New Challenge.
Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
CDO’s vision: Build a Predictive Business, not a Reactive one
CDO’s Requirements Offline predictions
Identify investments that will increase safety and reduce company’s liabilities
Real-time predictions Anticipate driver violations before they
happen and take precautionary actions
Data Scientist’s Response Need to explore data & form a hypothesis Verify trends against TBs of events data via
machine learning Generate predictive models with Spark
MLlib on HDP Plug models into the Storm topology to predict
driver violations in real-time
♬ I’ve been waiting for this moment all my life ♬
Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Demo – Analyzing Events with Tableau.
Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Analyzing Raw Events – dangerous drivers
Page 41
Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Analyzing Raw Events – dangerous routes
Page 42
Page 43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Analyzing Raw Events – violations by location
Page 43
Page 44 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Enriching truck events for analysis with Pig
HDFS Raw Truck EventsWeather Data Sets
Raw Weather Data
HCatalog (Metadata)
Payroll Data
HR & Payroll DBs
Load Raw Truck Events
Clean & Filter
Cleaned Events
TransformedEvents
Transform
Join withHR & weather data
EnrichedEvents
Enriched Events
Store
Tableau
Page 45 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Analyzing Enriched Events – noncertified and fatigued drivers more dangerous
Page 45
Page 46 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Analyzing Enriched Events – top 3 dangerous routes seem to be driven by fatigued drivers
Page 46
Page 47 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Analyzing Enriched Events – foggy weather leads to violations
Page 47
Page 48 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Analyzing Enriched Events – but top 3 safest routes are also foggy
Page 48
Page 49 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Integrating Predictive Analytics
Page 50 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Building the Predictive Model on HDP
Tableau Explore small subset of events to identify predictive features and make a hypothesis. E.g. hypothesis: “foggy weather causes driver violations”
1
Identify suitable ML algorithms to train a model – we will use classification algorithms as we have labeled events data
2
Transform enriched events data to a format that is friendly to Spark MLlib – many ML libs expect training data in a certain format
3
Train a logistic regression model in Spark on YARN, with above events as training input, and iterate to fine tune the generated model
4
Integrate Spark MLlib model in a Storm bolt to predict violations in real time
5
Page 51 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Truck Sensors
HDFS
YARN
Integrate Predictive Analytics in Stream Processing
Stream Processing (Storm)
Inbound Messaging(Kafka)
Interactive Query(Hive on Tez)
Real-time Serving (HBase)
Millions of Enriched Truck Events
Prediction Bolt
Plug Spark model into Storm bolt
Machine Learning(Spark)
Train Spark ML model with millions of truck events
Page 52 © Hortonworks Inc. 2011 – 2014. All Rights Reserved© Hortonworks Inc. 2012 Professional Services
Streaming Demo - Updated Architecture
Distributed Storage: HDFS
YARN
Storm Stream Processing
Kakfa Spout
HBase
PayRollTableHBase
BoltHDFSBolt
Truck Events
Active MQ
Monitoring Bolt
Web App
Truck Streaming Data
T(1) T(2) T(N)
Inbound Messaging(Kafka)
Truck Events Topic
PredictionBolt
Enrich
Event
Predict violation in real time & alert via MQ
Render Real time predictions on UI
Page 53 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Transforming training data for Spark MLlibEnriched Events Data
Event Type Is Driver Certified?
Wage Plan
Hours Driven
MilesDriven
Longitude Latitude WeatherFoggy
Weather Rainy
Weather Windy
Normal Yes Hourly 45 2721 -91.3 38.14 No No NoOverspeed No Miles 72 4152 -94.23 37.09 Yes Yes No
… … … … … … … … … …
Spark MLlib Training DataLabel Is Driver
Certified?Wage Plan
Hours Driven
MilesDriven
WeatherFoggy
Weather Rainy
Weather Windy
0 1 1 0.45 0.2721 0 0 01 0 0 0.72 0.4152 1 1 0… … … … … … … …
Normal events labeled as 0 and
violation events as 1
Feature scaling applied to hours and miles to improve
algorithm performance
Features with binary values denoted as 0 and 1
Page 54 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Running Spark ML on YARN
1spark-submit --class org.apache.spark.examples.mllib.BinaryClassification --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 truckml.jar --algorithm LR --regType L2 --regParam 1.0 /user/root/truck_training --numIterations 100
Run spark-submit script to launch a Spark job on YARN.
Training data location on HDFS
2 Monitor progress of Spark job in YARN Resource Mgr UI
Page 55 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Interpreting Spark Logistic Regression Results
Precision: 87.5% Recall: 88%
Top three predictors of violations 1. Foggy Weather 2. Rainy Weather 3. Driver Certification
Page 56 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Integrating Spark model in Storm
Kafka Spout
Storm Prediction Bolt
Initialize Spark model Parse truck event Enrich event with HBase data Predict violation with model Send Alert if violation predicted
Real-time Serving (HBase)
Active MQ
Ops Center LOB Dashboards
Page 57 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Summary: Solution Value.
Page 58 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Value of large scale ML on HDP Accelerate time to market/value
Test out multiple ML algorithms against TBs of training data in reasonable time frames
Confirm hypothesis against TBs of training data with confidence We confirmed that fog does impact safety and wage plans do not,
whereas BI tools indicated otherwise
Easily integrate predictive models in data driven apps Run predictive models in Storm or any other app in your enterprise
Run all of the above in a multi-tenant YARN cluster Large scale ML on YARN respects other tenants in an HDP cluster
Page 59 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Recommendations to CDO
Investment recommendations, in order of priority1. Invest in visibility sensors and auto braking systems to deal with foggy conditions2. Invest in slip resistant tires to fight rainy conditions3. Invest in certifying drivers to reduce violation probability
Power of real time predictions 40% reduction in violation rates by predicting high risk situations in real-time and
sending immediate alerts to drivers
Page 60 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Predictive Demo.
Page 61 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Q & A