Why Cloudera · The Platform for Production Success Why Cloudera ... We deliver long-term...
Transcript of Why Cloudera · The Platform for Production Success Why Cloudera ... We deliver long-term...
2© Cloudera, Inc. All rights reserved.
Why Cloudera?
Enterprise SecurityMeet compliance requirements and reducerisk exposure from storing sensitive data.
Data GovernanceEnable compliance and maximize analystproductivity.
Complete ManagementDeliver optimum system utilization andmeet SLA commitments, on-premises orin the cloud, with minimum effort.
We deliver long-term production success with enterprise Hadoop.
Open Source InnovationNo one knows Hadoop better than Cloudera. Cloudera leads development of enterprise Hadoop and offers the best support, training, and services.
Powerful Enterprise ToolsCloudera extends open source Hadoop with capabilities required by the largest enterprises.
EcosystemCloudera partners with industry leaders to ensure Hadoop works with the platforms, tools, and integrators our customers rely on.
4© Cloudera, Inc. All rights reserved.
Cloudera is Built for Production Success
Hadoop delivers:• One place for unlimited data
• Unified, multi-framework data access
Cloudera delivers:• Enterprise Security
• Data Governance
• Complete Management
• And more…
Security and Administration
Unlimited Storage
Process Discover Model Serve
DeploymentFlexibility
On-PremisesAppliancesEngineered Systems
Public CloudPrivate CloudHybrid Cloud
A modern data platform plus what the enterprise requires.
5© Cloudera, Inc. All rights reserved.
Industrial Multi-Workload Performance
Batch, Interactive, and Real-Time.Leading performance and usability in one platform.
• End-to-end analytic workflows
• Access more data
• Work with data in new ways
• Enable new users
Security and Administration
Process
IngestSqoop, Flume
TransformMapReduce,
Hive, Pig, Spark
Discover
Analytic Database
Impala
SearchSolr
Model
Machine Learning
SAS, R, Spark, Mahout
Serve
NoSQL DatabaseHBase
StreamingSpark Streaming
Unlimited Storage HDFS, HBase
YARN, Cloudera Manager,Cloudera Navigator
Multiple big data opportunities in one optimized, high-performance, multi-tenant platform.
6© Cloudera, Inc. All rights reserved.
Latest SQL Performance
0
50
100
150
200
250
300
350
Impala Spark SQL Presto Hive-on-Tez
Tim
e (
in s
eco
nd
s)
Single User vs 10 User Response Time& Impala Times Faster
(Lower bars = better)
Sin
gle
Use
r, 5
10
Use
rs, 1
1
Sin
gle
Use
r, 2
5
10
Use
rs, 1
20
10
Use
rs, 3
02
10
Use
rs, 2
02
Sin
gle
Use
r, 3
7
Sin
gle
Use
r, 7
7
5.0x
10.6x
7.4x
27.4x
15.4x
18.3x
Independent validation by IBM Research SQL-on-Hadoop VLDB paper:“Impala’s database architecture provides significant performance gains”
7© Cloudera, Inc. All rights reserved.
Hadoop Security is Different
Hadoop Benefit Security Side Effect
A single platform for all the dataCombining data and audiences that used to be
securely silo’d
A rich, flexible ecosystem of tools & utilitiesSecurity method proliferation can increase costs/
introduce coverage gaps
Ingest data of any type Sensitive fields added without review
Active Archive provides lower cost storage than legacy systems
Lose the built-in compliance controls that legacy systems provided
8© Cloudera, Inc. All rights reserved.
The Only Comprehensively Secure Hadoop Platform
Cloudera is the leader in Hadoop security.
Unique Capabilities:
• Comprehensive and Unified• Secure at the core
• No Performance Impact• Jointly engineered with Intel
• Compliance-Ready• Only distribution to pass PCI audit
1. Perimeter Standards-based Authentication
Security and Administration
Unlimited Storage
Process Discover Model Serve
2. Access Unified Role-based Authorization
4. Data Encryption & Key Management
3. Visibility Auditing & Governance
Meet compliance requirements and reduce risk exposure from storing sensitive data.
9© Cloudera, Inc. All rights reserved.
The Only Hadoop Data Governance Solution
Cloudera NavigatorMinimize risk and maintain compliance with the only native end-to-end data governance solution for Apache Hadoop.
Unique Capabilities:• Auditing
• Lineage
• Metadata Tagging and Discovery
• Lifecycle Management
Enable compliance and maximize analyst productivity.
10© Cloudera, Inc. All rights reserved.
MasterCard
Challenge: All applications, databases, or file systems that have the potential to handle personal account-related data must undergo full PCI certification
Solution: MasterCard’s Cloudera environment fully conforms to the PCI-DSS V 2.0 security standards so it can host PCI datasets and potentially integrate with other internal systems
Cloudera: The first PCI-Certified Hadoop Platform
“Data privacy and protection is a top priority for MasterCard. As we maximize the most advanced technologies from partners and vendors, they must meet the rigorous security standards we’ve set. With Cloudera’s commitment to the same standards, we now have additional options in how we manage our data center.”Gary VonderHaar
Chief Technology Officer, ArchitectureMasterCard
11© Cloudera, Inc. All rights reserved.
Security and Governance
ClouderaUnified, Compliance-Ready, Transparent
HortonworksFragmented, Incomplete, Complex
PerimeterProtecting access to the cluster
Kerberos with Cloudera ManagerAutomated, industry-standard authentication integrated with
existing systems
KerberosManual configuration
and integration
AccessSecuring access to data
Apache SentryWorking within the
community to deliver centralized,granular RBAC across frameworks
Hive ATZ-NG, RangerRBAC configuration silos,
GUI “Band-Aid”
VisibilityReporting on data access
and lineage
Cloudera NavigatorTransparent end-to-end
data and metadata visibility
Apache Falcon, Knox, RangerManual and limited auditing through
a single workflow framework,and multiple tools
DataProtecting data at rest
or in transmission
Cloudera NavigatorTransparent, comprehensive, high-
performance, compliance-readyencryption and key management
N/A
● ◐
●
●
●
◔
○
◐
12© Cloudera, Inc. All rights reserved.
The Only Complete Hadoop Management Suite
Cloudera ManagerFocus on the solution, not the cluster, with the only complete, zero-downtime administrationtool for Apache Hadoop.
Unique Capabilities:• Unified configuration, management
and monitoring across all services
• Online installation and upgrades
• Direct connection to Cloudera Support
• 3rd Party Extensibility
Deliver optimum system utilization and meet SLA commitments.
13© Cloudera, Inc. All rights reserved.
Cloudera Manager vs. Ambari
ClouderaUnified, Directed, Streamlined
AmbariFederated, Chaotic, Disjointed
ManageDeploying and
configuring services
Parcels and WorkflowsHolistic, service-oriented components
enable streamlined, comprehensive, and straightforward operations
YUM and Shell CommandsManual configurationand time-consuming,
error-prone integration
MonitorSystem health and
QoS and SLA notification
Integrated Charting and SNMP AlertsCatalog of chart metrics and visualization
with easy-to-build, easy-to-sharedashboards and common alerts
Nagios, GangliaManual configuration, limited native
visualization, and manual integration of separate, disparate systems and services
DiagnoseRoot cause discovery, analysis, and solution
Time Control and Log CollectionCentralized log aggregation of all services
with integrated faceted search and visual timeframe controls
SSH/SCP to /var/logManual log collection via CLI tools
from diverse locations with limited, service-specific search and no historical views
IntegrateExtending security policies,
adding 3rd party services
Enterprise Kerberos IntegrationAutomated, industry-standard
authentication with integrationto existing enterprise systems
KerberosAssisted CLI configuration,
manual deployment,and limited integration
● ◐
●
●
●
◔
◐
◐
14© Cloudera, Inc. All rights reserved.
The Only Portable Cloud Experience for Hadoop
Cloudera DirectorThe first portable, self-service solution for deploying and managing enterprise-grade Hadoop in the Cloud.
Unique Capabilities:• Dynamic cluster lifecycle management
• Cloud blueprints
• Multi-cluster health visibility
• Usage reporting for billing models
Maximize flexibility in Hadoop deployment architectures.
16© Cloudera, Inc. All rights reserved.
Focusing on Open Standards, not just Open Source
Open Standards are just as important as Open Source.
Why does it matter?
• Diverse engineering is more sustainable.
• Broad support ensures vendor portability.
• Project utility depends on ecosystem compatibility, which depends on standards.
Cloudera leads in definingthe de facto open standards adopted by the market.
Vendor Support
Component (Founder) Cloudera Pivotal MapR Amazon IBM Hortonworks
Impala (Cloudera) ✔ ✖ ✔ ✔ ✖ ✖
Spark (UC Berkeley) ✔ ✔ ✔ ✔ ✔ ✔
Hue (Cloudera) ✔ ✔ ✔ ✔ ✖ ✔
Sentry (Cloudera) ✔ ✔ ✔ ✖ ✔ ✖
Flume (Cloudera) ✔ ✔ ✔ ✖ ✔ ✔
Parquet (Cloudera/Twitter)
✔ ✔ ✔ ✔ ✔ ✖
Sqoop (Cloudera) ✔ ✔ ✔ ✔ ✔ ✔
Falcon (Hortonworks) ✖ ✖ ✖ ✖ ✖ ✔
Knox (Hortonworks) ✖ ✖ ✖ ✖ ✔ ✔
Tez (Hortonworks) ✖ ✖ ✔ ✖ ✖ ✔
Ranger (Hortonworks) ✖ ✖ ✖ ✖ ✖ ✔
ORCfile (Hortonworks) ✖ ✖ ✖ ✖ ✖ ✔
17© Cloudera, Inc. All rights reserved.
Sustainable Innovation
A Hybrid Open Source Modelcombining the power of open source with the enterprise capabilities customers need.
• Deep open source commitment• 2/3 of engineering on open source
• 19 Hadoop ecosystem projects founded
• 90 ASF committer seats, 67 PMC seats
• Enterprise-ready extensions• Security, governance, and system management
• Comprehensive partner integrations• 160+ certified solutions
Open Platform100% Open Source& Open Standards
18© Cloudera, Inc. All rights reserved.
Supporting the Entire Ecosystem, not just the Core
Source: Apache JIRAJanuary 2012 – March 2015
54%
HortonworksIBMMapRMicrosoftPivotalWANdisco
90 Committer* Seatsdeliver the fastest issue resolution and enable us to drive the Apache roadmap for our customers.
Cloudera and Intel committers resolve over 50% of all JIRA tickets among all Hadoop vendors.
AccumuloAvroBigtopCrunchFlumeHadoop Core HBase
HiveKafkaMahoutOoziePigSolrSparkSqoop
TezWhirrZookeeper
Projects Included:
* “Committer” = A developer who has earned community privileges to commit patches
19© Cloudera, Inc. All rights reserved.
Leading Innovation in the Hadoop Ecosystem
2008 2009 2010 2011 2012 2013 2014
Cloudera Founded
Hortonworks Founded
First Training Offered Hortonworks U(Less than 1,000 Trained)
Cloudera U(Over 20,000 Trained)
CDH 1 Released HDP 1.0 Released
Cloudera Manager 1.0 Ambari 1.0 (Missing many enterprise features)
HUE Ships in CDH3 HUE Ships in HDP 2.0
Impala Launches Stinger “Final Phase”(Still 5-9x slower)
Navigator Launches Falcon(Missing many enterprise features)
Search Launches LucidWorks(Reseller Only)
Spark for CDH 4.4 ???
Key Management N/A
Data Encryption N/A
Cloud Deployment N/A
Sentry Ships CDH 4.3 XA Secure / Ranger(Limited scope)
20© Cloudera, Inc. All rights reserved.
Best-In-Class Support
8.9 Overall satisfaction makes Cloudera the industry benchmark for support
95% Customers agree they benefit from Cloudera technical support outreach
#1 Ability to solve technical issues is the top reason to recommend Cloudera for Hadoop
21© Cloudera, Inc. All rights reserved.
Cloudera has trained over
40,000people on Hadoop since
2009
Big Data professionals from
60%of the Fortune 100 have attended live Cloudera
training
Industry-Leading Training and University Programs
Source: Fortune, “Fortune 500 “ and “Global 500,” May 2012.
22© Cloudera, Inc. All rights reserved.
The Most Complete Partner Ecosystem
DataSystems
Enterprise Data Hub
Security and Administration
Unlimited Storage
Process Discover Model Serve
Applications
System Integration
Infrastructure
OperationalTools
More than 1,400 partnersensure compatibility with existinginvestments, lower skill barriers, and help maximize value from your data.
23© Cloudera, Inc. All rights reserved.
Why Cloudera?
Enterprise SecurityMeet compliance requirements and reducerisk exposure from storing sensitive data.
Data GovernanceEnable compliance and maximize analystproductivity.
Complete ManagementDeliver optimum system utilization andmeet SLA commitments, on-premises orin the cloud, with minimum effort.
We deliver long-term production success with enterprise Hadoop.
Open Source InnovationNo one knows Hadoop better than Cloudera. Cloudera leads development of enterprise Hadoop and offers the best support, training, and services.
Powerful Enterprise ToolsCloudera extends open source Hadoop with capabilities required by the largest enterprises.
EcosystemCloudera partners with industry leaders to ensure Hadoop works with the platforms, tools, and integrators our customers rely on.