Positioning, Campaigns & 2.0 Launch Nicolas MAILLARD.pdf · Page 1 © Hortonworks Inc. 2011 –...
Transcript of Positioning, Campaigns & 2.0 Launch Nicolas MAILLARD.pdf · Page 1 © Hortonworks Inc. 2011 –...
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP Enabling the Modern Data Architecture
Nicolas Maillard – Hortonworks
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks enables adoption of Apache Hadoop
through HDP (Hortonworks Data Platform)
• Founded in 2011
• Original 24 architects, developers,
operators of Hadoop from Yahoo!
• We are leaders in Hadoop
community
• 500+ employees
Customer Momentum • 300+ customers in seven quarters, growing at 75+/quarter
• Two thirds of customers come from F1000
Hortonworks and Hadoop at Scale • HDP in production on largest clusters on planet
• Multiple +1000 node clusters, including 35,000 nodes at
Yahoo!, 800 nodes at Spotify
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Our Mission To enable Apache Hadoop to be the enterprise data
platform that powers the modern data architecture
and processes half the worlds data
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Our Strategy: A Commitment to Enterprise Hadoop
Innovate the Core 1
Architect and build
innovation at the core of
Hadoop
• YARN transformed Hadoop
to enable multiple workloads
across a multi-tenant
architecture
Enable the Ecosystem 3
Enable the leaders in the data
center to easily adopt & extend
their platforms
• Establish Hadoop as standard
component of a modern data
architecture
• Joint engineering
Extend Hadoop as an
Enterprise Data Platform 2
Extend Hadoop with enterprise
capabilities for governance,
security & operations
Apply enterprise software rigor
to the open source development
process
HDP 2.1
Go
ve
rna
nc
e
& In
teg
rati
on
Se
cu
rity
Op
era
tio
ns
Data Access
Data Management
YARN
YARN: Data Operating System
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° N
HDFS (Hadoop Distributed File System)
Interactive Real-Time Batch
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache
Project Committers
PMC
Members
Hadoop 27 20
Pig 5 5
Hive 16 4
Tez 15 15
HBase 6 4
Phoenix 4 4
Accumulo 2 2
Storm 3 2
Slider 10 10
Flume 1 0
Sqoop 1 0
Ambari 32 27
Oozie 3 2
Zookeeper 2 1
Knox 11 5
Argus 10 n/a
Falcon 5 3
TOTAL 153 104
YARN : Data Operating System
Script
Pig
Memory
Spark
SQL
Hive/Tez, HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Batch
Map Reduce
HDFS (Hadoop Distributed File System)
Contributes more to the Apache Hadoop
ecosystem in the ASF than any other vendor
Innovating within the community for the enterprise
• Open Source: fastest path to innovation for a platform technology
• Complete open source platform speeds enterprise and ecosystem
adoption and minimizes lock in
• Enables the market to function much bigger, much faster
…all done completely in Open Source 4
HDP 2.1
Go
ve
rna
nc
e
& In
teg
rati
on
Se
cu
rity
Op
era
tio
ns
Data Access
Data Management
YARN
Driving our innovation through
Apache Software Foundation Projects
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop Driver: Enabling the data lake S
CA
LE
SCOPE
Data Lake Definition
• Centralized Architecture Multiple applications on a shared data set
with consistent levels of service
• Any App, Any Data Multiple applications accessing all data
affording new insights and opportunities.
• Unlocks ‘Systems of Insight’ Advanced algorithms and applications
used to derive new value and optimize
existing value.
Drivers:
1. Cost Optimization
2. Advanced Analytic Apps
Goal:
• Centralized Architecture
• Data-driven Business
DATA LAKE
Journey to the Data Lake with Hadoop
Systems of Insight
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
2013 Digital universe
4.4 Zettabytes
1 Zettabyte (ZB) = 1 million Petabytes (PB); Sources: IDC, IDG Enterprise, and AMR Research
85% of growth from
new types of data with
machine-generated data
increasing 15x
2020 Digital universe
44 Zettabytes
& Hadoop Market $50B
Data is doubling in
size every two years
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Traditional systems under pressure
Challenges
• Constrains data to app
• Can’t manage new data
• Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012
2.8 Zettabytes
2020
40 Zettabytes
LAGGARDS
INDUSTRY
LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Big Data & Hadoop Market Drivers and Opportunities
Business Drivers
• From reactive analytics
to proactive customer
interaction
• Insights that drive
competitive advantage
& optimal returns
Financial Drivers
• Cost of data systems,
as % of IT spend,
continues to grow
• Cost advantages of
commodity hardware
& open source software
$
Technical Drivers
• Data is growing
exponentially & existing
systems overwhelmed
• Predominantly driven by
NEW types of data that
can inform analytics
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
..to shift from reactive to proactive interactions
HDP and Hadoop allow
organizations to shift
interactions from…
Reactive Post Transaction
Proactive Pre Decision
…to Real-time Personalization From static branding
…to repair before break From break then fix
…to Designer Medicine From mass treatment
…to Automated Algorithms From Educated Investing
…to 1x1 Targeting From mass branding
A shift in Advertising
A shift in Financial Services
A shift in Healthcare
A shift in Retail
A shift in Telco
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Existing silos under pressure from new data sources A
PP
LIC
ATI
ON
S D
ATA
SY
STEM
SO
UR
CES
Business Analytics
Custom Applications
Packaged Applications
Existing Sources (CRM, ERP, Clickstream, Logs)
SILO SILO
RDBMS
SILO SILO SILO SILO
EDW MPP
Data growth: New Data Types
OLTP, ERP, CRM Systems
Unstructured docs, emails
Clickstream
Server logs
Social/Web Data
Sensor. Machine Data
Geolocation
85% Source: IDC
??
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Enterprise Goals for the Modern Data Architecture
• Consolidate siloed data sets; structured
and unstructured
• Provide single view of the customer,
product, supply chain
• Serve batch, interactive and real time
applications on shared datasets
• Central data set on a single cluster
• Central services for security, governance
and operation
• Preserve existing investment in current
tools and platforms
AP
PL
ICA
TIO
NS
D
AT
A
SY
ST
EM
Business
Analytics
Custom
Applications
Packaged
Applications
RDBMS
EDW
MPP
YARN: Data Operating System
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° N
Interactive Real-Time Batch CRM
ERP
Other 1 ° ° °
° ° ° °
HDFS (Hadoop Distributed File System)
SO
UR
CE
S
EXISTING Systems
Clickstream Web &Social
Geolocation Sensor & Machine
Server Logs
Unstructured
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Traditional Hadoop, challenges & limitations
1 ° ° ° ° °
° ° ° ° ° N
HDFS (Hadoop Distributed File System)
MapReduce
Largely Batch Processing
SO
UR
CE
S
EXISTING Systems
Clickstream Web &Social Geolocation Sensor & Machine
Server Logs Unstructured
Architectural Limitations
• Primarily a batch system using MapReduce
• Single purpose clusters, specific data sets
Enterprise Challenges
• Limited enterprise capabilities:
Operations, Security & Governance
• Created additional Silos
Interoperability Challenges
• Difficult to natively integrate existing applications
AP
PL
ICA
TIO
NS
D
AT
A
SY
ST
EM
Business
Analytics
Custom
Applications
Packaged
Applications
RDBMS EDW MPP
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN and HDP Enables the Modern Data Architecture
HDP Hortonworks Data Platform
Provision,
Manage &
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow,
Lifecycle &
Governance
Falcon
Sqoop
Flume
NFS
WebHDFS
YARN: Data Operating System
DATA MANAGEMENT
SECURITY BATCH, INTERACTIVE & REAL-TIME
DATA ACCESS
GOVERNANCE
& INTEGRATION
Authentication
Authorization
Accounting
Data Protection
Storage: HDFS
Resources: YARN
Access: Hive, …
Pipeline: Falcon
Cluster: Knox
OPERATIONS
Script
Pig
Search
Solr
SQL
Hive
HCatalog
NoSQL
HBase
Accumulo
Stream
Storm
Other
ISVs
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS (Hadoop Distributed File System)
In-Memory
Spark
YARN is the architectural center of
Hadoop and HDP
• YARN enables a common data set
across all applications
• Batch, interactive & real-time
workloads
• Support multi-tenant access &
processing
HDP enables Apache Hadoop to
become Enterprise Viable Data
Platform with centralized services
• Security
• Governance
• Operations
• Productization
Enabled broad ecosystem
adoption
Tez Tez
Hortonworks drove this innovation of Hadoop through YARN
Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Key Drivers of Hadoop
OPERATIONS TOOLS
Provision,
Manage &
Monitor
DEV & DATA TOOLS
Build &
Test
DA
TA S
YST
EM
REPOSITORIES
SOU
RC
ES
RDBMS EDW MPP
AP
PLI
CA
TIO
NS
Business Analytics
Custom Applications
Packaged Applications
Unlock New Approach to Analytics
• Agile analytics via “Schema on Read” with ability to store all data in native format
• Create new apps from new types of data
A
Optimize Investments, Cut Costs
• Focus EDW on high value workloads
• Use commodity servers & storage to enable all data (original and historical) to be accessible for ongoing exploration
B
Enable a Modern Data Architecture
• Integrate new & existing data sets
• Make all data available for shared access and processing in multitenant infrastructure
• Batch, interactive & real-time use cases
• Integrated with existing tools & skills
C
EXISTING Systems
Clickstream Web & Social
Geolocation Sensor & Machine
Server Logs
Unstructured
YARN: Data Operating System
° ° ° ° ° ° ° ° °
Interactive Real-Time Batch
HDFS: Hadoop Distributed File System
Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Create New Applications from New Types of Data
INDUSTRY USE CASE Sentiment
& Web
Clickstream
& Behavior
Machine
& Sensor Geographic Server Logs
Structured &
Unstructured
Financial Services
New Account Risk Screens ✔ ✔
Trading Risk ✔ ✔
Insurance Underwriting ✔ ✔ ✔
Telecom
Call Detail Records (CDR) ✔ ✔
Infrastructure Investment ✔ ✔
Real-time Bandwidth Allocation ✔ ✔ ✔ ✔ ✔
Retail
360° View of the Customer ✔ ✔ ✔
Localized, Personalized Promotions ✔
Website Optimization ✔
Manufacturing
Supply Chain and Logistics ✔
Assembly Line Quality Assurance ✔
Crowd-sourced Quality Assurance ✔
Healthcare Use Genomic Data in Medical Trials ✔ ✔
Monitor Patient Vitals in Real-Time ✔ ✔
Pharmaceuticals Recruit and Retain Patients for Drug Trials ✔ ✔
Improve Prescription Adherence ✔ ✔ ✔ ✔
Oil & Gas Unify Exploration & Production Data ✔ ✔ ✔ ✔
Monitor Rig Safety in Real-Time ✔ ✔ ✔
Government ETL Offload/Federal Budgetary Pressures ✔ ✔
Sentiment Analysis for Government Programs ✔
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN: Traditional to Modern Hadoop
Owen O’Malley – Founder, Hortonworks
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
1 ° ° ° ° °
° ° ° ° ° N
HDFS (Hadoop Distributed File System)
MapReduce Largely Batch Processing
2006
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Traditional Hadoop Traditional Hadoop allowed early adopters
to deal with data at scale via: • Single purpose clusters, specific data sets
• Primarily batch-oriented applications using MapReduce
However…
• No direct way to integrate interactive and real-time
applications
• Limited enterprise capabilities:
Operations, Security & Governance
In the beginning…
Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
1 ° ° ° ° °
° ° ° ° ° N
HDFS (Hadoop Distributed File System)
MapReduce Largely Batch Processing
2006 JAN 2008
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Traditional Hadoop
MAPREDUCE-279 Outlines a NEW architecture for Hadoop which allows
for efficient use of resources across many types of apps
…with increased adoption and
breadth of use cases,
a new approach was needed
2011
Hortonworks Founded Work accelerates on Hadoop’s
next-gen architecture
Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
2008 2006
1 ° ° ° ° °
° ° ° ° ° N
HDFS (Hadoop Distributed File System)
MapReduce Largely Batch Processing
Traditional Hadoop
MAPREDUCE-279
2011
Enterprise Hadoop Era Begins October 23, 2013
Hadoop 2 & YARN
YARN : Data Operating System
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° °
°
° N
HDFS (Hadoop Distributed File System)
Batch Interactive Real-Time
Core of Enterprise Hadoop
Architected &
led development
of YARN to enable
the Modern Data
Architecture
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Benefits Enabled by MDA and YARN
SOLUTION: A single set of data across the entire cluster with multiple
access methods using “zones” for processing
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° ° ° ° ° ° ° n
Interactive Hive
Storm
Real Time Streams
Single Cluster,
Multiple Workloads
• Maximize compute
resources to lower TCO
• No standalone,
siloed clusters
• Simple management
& operations
…all enabled by YARN
Batch Pig
Real Time HBase
Spark
In Memory
Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop Does Interactive & Real-Time
Trucking Company
Use Case
Tom Benton
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Trucking company w/ large fleet of trucks in Midwest
A truck generates millions of events for
a given route; an event could be:
• 'Normal' events: starting / stopping of the vehicle
• ‘Violation’ events: speeding, excessive
acceleration and breaking, unsafe tail distance
Route?
Truck?
Driver? Analysts query a
broad history to
understand if today’s
violations are part of
a larger problem with
specific routes,
trucks, or drivers
Company uses an application that
monitors truck locations and violations
from the truck/driver in real-time
Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Truck Sensors
Distributed Storage: HDFS
Many Workloads: YARN
Trucking Company’s YARN-enabled Architecture
Stream Processing (Storm)
Inbound Messaging (Kafka)
Microsoft
Excel
Interactive Query (Hive on Tez)
Alerts & Events (ActiveMQ)
Real-Time
User Interface
Real-time Serving (HBase)
One cluster with consistent
security, governance &
operations