Why Cloudera · The Platform for Production Success Why Cloudera ... We deliver long-term...

24
1 © Cloudera, Inc. All rights reserved. The Platform for Production Success Why Cloudera

Transcript of Why Cloudera · The Platform for Production Success Why Cloudera ... We deliver long-term...

1© Cloudera, Inc. All rights reserved.

The Platform for Production Success

Why Cloudera

2© Cloudera, Inc. All rights reserved.

Why Cloudera?

Enterprise SecurityMeet compliance requirements and reducerisk exposure from storing sensitive data.

Data GovernanceEnable compliance and maximize analystproductivity.

Complete ManagementDeliver optimum system utilization andmeet SLA commitments, on-premises orin the cloud, with minimum effort.

We deliver long-term production success with enterprise Hadoop.

Open Source InnovationNo one knows Hadoop better than Cloudera. Cloudera leads development of enterprise Hadoop and offers the best support, training, and services.

Powerful Enterprise ToolsCloudera extends open source Hadoop with capabilities required by the largest enterprises.

EcosystemCloudera partners with industry leaders to ensure Hadoop works with the platforms, tools, and integrators our customers rely on.

3© Cloudera, Inc. All rights reserved.

Our Platform

4© Cloudera, Inc. All rights reserved.

Cloudera is Built for Production Success

Hadoop delivers:• One place for unlimited data

• Unified, multi-framework data access

Cloudera delivers:• Enterprise Security

• Data Governance

• Complete Management

• And more…

Security and Administration

Unlimited Storage

Process Discover Model Serve

DeploymentFlexibility

On-PremisesAppliancesEngineered Systems

Public CloudPrivate CloudHybrid Cloud

A modern data platform plus what the enterprise requires.

5© Cloudera, Inc. All rights reserved.

Industrial Multi-Workload Performance

Batch, Interactive, and Real-Time.Leading performance and usability in one platform.

• End-to-end analytic workflows

• Access more data

• Work with data in new ways

• Enable new users

Security and Administration

Process

IngestSqoop, Flume

TransformMapReduce,

Hive, Pig, Spark

Discover

Analytic Database

Impala

SearchSolr

Model

Machine Learning

SAS, R, Spark, Mahout

Serve

NoSQL DatabaseHBase

StreamingSpark Streaming

Unlimited Storage HDFS, HBase

YARN, Cloudera Manager,Cloudera Navigator

Multiple big data opportunities in one optimized, high-performance, multi-tenant platform.

6© Cloudera, Inc. All rights reserved.

Latest SQL Performance

0

50

100

150

200

250

300

350

Impala Spark SQL Presto Hive-on-Tez

Tim

e (

in s

eco

nd

s)

Single User vs 10 User Response Time& Impala Times Faster

(Lower bars = better)

Sin

gle

Use

r, 5

10

Use

rs, 1

1

Sin

gle

Use

r, 2

5

10

Use

rs, 1

20

10

Use

rs, 3

02

10

Use

rs, 2

02

Sin

gle

Use

r, 3

7

Sin

gle

Use

r, 7

7

5.0x

10.6x

7.4x

27.4x

15.4x

18.3x

Independent validation by IBM Research SQL-on-Hadoop VLDB paper:“Impala’s database architecture provides significant performance gains”

7© Cloudera, Inc. All rights reserved.

Hadoop Security is Different

Hadoop Benefit Security Side Effect

A single platform for all the dataCombining data and audiences that used to be

securely silo’d

A rich, flexible ecosystem of tools & utilitiesSecurity method proliferation can increase costs/

introduce coverage gaps

Ingest data of any type Sensitive fields added without review

Active Archive provides lower cost storage than legacy systems

Lose the built-in compliance controls that legacy systems provided

8© Cloudera, Inc. All rights reserved.

The Only Comprehensively Secure Hadoop Platform

Cloudera is the leader in Hadoop security.

Unique Capabilities:

• Comprehensive and Unified• Secure at the core

• No Performance Impact• Jointly engineered with Intel

• Compliance-Ready• Only distribution to pass PCI audit

1. Perimeter Standards-based Authentication

Security and Administration

Unlimited Storage

Process Discover Model Serve

2. Access Unified Role-based Authorization

4. Data Encryption & Key Management

3. Visibility Auditing & Governance

Meet compliance requirements and reduce risk exposure from storing sensitive data.

9© Cloudera, Inc. All rights reserved.

The Only Hadoop Data Governance Solution

Cloudera NavigatorMinimize risk and maintain compliance with the only native end-to-end data governance solution for Apache Hadoop.

Unique Capabilities:• Auditing

• Lineage

• Metadata Tagging and Discovery

• Lifecycle Management

Enable compliance and maximize analyst productivity.

10© Cloudera, Inc. All rights reserved.

MasterCard

Challenge: All applications, databases, or file systems that have the potential to handle personal account-related data must undergo full PCI certification

Solution: MasterCard’s Cloudera environment fully conforms to the PCI-DSS V 2.0 security standards so it can host PCI datasets and potentially integrate with other internal systems

Cloudera: The first PCI-Certified Hadoop Platform

“Data privacy and protection is a top priority for MasterCard. As we maximize the most advanced technologies from partners and vendors, they must meet the rigorous security standards we’ve set. With Cloudera’s commitment to the same standards, we now have additional options in how we manage our data center.”Gary VonderHaar

Chief Technology Officer, ArchitectureMasterCard

11© Cloudera, Inc. All rights reserved.

Security and Governance

ClouderaUnified, Compliance-Ready, Transparent

HortonworksFragmented, Incomplete, Complex

PerimeterProtecting access to the cluster

Kerberos with Cloudera ManagerAutomated, industry-standard authentication integrated with

existing systems

KerberosManual configuration

and integration

AccessSecuring access to data

Apache SentryWorking within the

community to deliver centralized,granular RBAC across frameworks

Hive ATZ-NG, RangerRBAC configuration silos,

GUI “Band-Aid”

VisibilityReporting on data access

and lineage

Cloudera NavigatorTransparent end-to-end

data and metadata visibility

Apache Falcon, Knox, RangerManual and limited auditing through

a single workflow framework,and multiple tools

DataProtecting data at rest

or in transmission

Cloudera NavigatorTransparent, comprehensive, high-

performance, compliance-readyencryption and key management

N/A

● ◐

12© Cloudera, Inc. All rights reserved.

The Only Complete Hadoop Management Suite

Cloudera ManagerFocus on the solution, not the cluster, with the only complete, zero-downtime administrationtool for Apache Hadoop.

Unique Capabilities:• Unified configuration, management

and monitoring across all services

• Online installation and upgrades

• Direct connection to Cloudera Support

• 3rd Party Extensibility

Deliver optimum system utilization and meet SLA commitments.

13© Cloudera, Inc. All rights reserved.

Cloudera Manager vs. Ambari

ClouderaUnified, Directed, Streamlined

AmbariFederated, Chaotic, Disjointed

ManageDeploying and

configuring services

Parcels and WorkflowsHolistic, service-oriented components

enable streamlined, comprehensive, and straightforward operations

YUM and Shell CommandsManual configurationand time-consuming,

error-prone integration

MonitorSystem health and

QoS and SLA notification

Integrated Charting and SNMP AlertsCatalog of chart metrics and visualization

with easy-to-build, easy-to-sharedashboards and common alerts

Nagios, GangliaManual configuration, limited native

visualization, and manual integration of separate, disparate systems and services

DiagnoseRoot cause discovery, analysis, and solution

Time Control and Log CollectionCentralized log aggregation of all services

with integrated faceted search and visual timeframe controls

SSH/SCP to /var/logManual log collection via CLI tools

from diverse locations with limited, service-specific search and no historical views

IntegrateExtending security policies,

adding 3rd party services

Enterprise Kerberos IntegrationAutomated, industry-standard

authentication with integrationto existing enterprise systems

KerberosAssisted CLI configuration,

manual deployment,and limited integration

● ◐

14© Cloudera, Inc. All rights reserved.

The Only Portable Cloud Experience for Hadoop

Cloudera DirectorThe first portable, self-service solution for deploying and managing enterprise-grade Hadoop in the Cloud.

Unique Capabilities:• Dynamic cluster lifecycle management

• Cloud blueprints

• Multi-cluster health visibility

• Usage reporting for billing models

Maximize flexibility in Hadoop deployment architectures.

15© Cloudera, Inc. All rights reserved.

Our Approach

16© Cloudera, Inc. All rights reserved.

Focusing on Open Standards, not just Open Source

Open Standards are just as important as Open Source.

Why does it matter?

• Diverse engineering is more sustainable.

• Broad support ensures vendor portability.

• Project utility depends on ecosystem compatibility, which depends on standards.

Cloudera leads in definingthe de facto open standards adopted by the market.

Vendor Support

Component (Founder) Cloudera Pivotal MapR Amazon IBM Hortonworks

Impala (Cloudera) ✔ ✖ ✔ ✔ ✖ ✖

Spark (UC Berkeley) ✔ ✔ ✔ ✔ ✔ ✔

Hue (Cloudera) ✔ ✔ ✔ ✔ ✖ ✔

Sentry (Cloudera) ✔ ✔ ✔ ✖ ✔ ✖

Flume (Cloudera) ✔ ✔ ✔ ✖ ✔ ✔

Parquet (Cloudera/Twitter)

✔ ✔ ✔ ✔ ✔ ✖

Sqoop (Cloudera) ✔ ✔ ✔ ✔ ✔ ✔

Falcon (Hortonworks) ✖ ✖ ✖ ✖ ✖ ✔

Knox (Hortonworks) ✖ ✖ ✖ ✖ ✔ ✔

Tez (Hortonworks) ✖ ✖ ✔ ✖ ✖ ✔

Ranger (Hortonworks) ✖ ✖ ✖ ✖ ✖ ✔

ORCfile (Hortonworks) ✖ ✖ ✖ ✖ ✖ ✔

17© Cloudera, Inc. All rights reserved.

Sustainable Innovation

A Hybrid Open Source Modelcombining the power of open source with the enterprise capabilities customers need.

• Deep open source commitment• 2/3 of engineering on open source

• 19 Hadoop ecosystem projects founded

• 90 ASF committer seats, 67 PMC seats

• Enterprise-ready extensions• Security, governance, and system management

• Comprehensive partner integrations• 160+ certified solutions

Open Platform100% Open Source& Open Standards

18© Cloudera, Inc. All rights reserved.

Supporting the Entire Ecosystem, not just the Core

Source: Apache JIRAJanuary 2012 – March 2015

54%

HortonworksIBMMapRMicrosoftPivotalWANdisco

90 Committer* Seatsdeliver the fastest issue resolution and enable us to drive the Apache roadmap for our customers.

Cloudera and Intel committers resolve over 50% of all JIRA tickets among all Hadoop vendors.

AccumuloAvroBigtopCrunchFlumeHadoop Core HBase

HiveKafkaMahoutOoziePigSolrSparkSqoop

TezWhirrZookeeper

Projects Included:

* “Committer” = A developer who has earned community privileges to commit patches

19© Cloudera, Inc. All rights reserved.

Leading Innovation in the Hadoop Ecosystem

2008 2009 2010 2011 2012 2013 2014

Cloudera Founded

Hortonworks Founded

First Training Offered Hortonworks U(Less than 1,000 Trained)

Cloudera U(Over 20,000 Trained)

CDH 1 Released HDP 1.0 Released

Cloudera Manager 1.0 Ambari 1.0 (Missing many enterprise features)

HUE Ships in CDH3 HUE Ships in HDP 2.0

Impala Launches Stinger “Final Phase”(Still 5-9x slower)

Navigator Launches Falcon(Missing many enterprise features)

Search Launches LucidWorks(Reseller Only)

Spark for CDH 4.4 ???

Key Management N/A

Data Encryption N/A

Cloud Deployment N/A

Sentry Ships CDH 4.3 XA Secure / Ranger(Limited scope)

20© Cloudera, Inc. All rights reserved.

Best-In-Class Support

8.9 Overall satisfaction makes Cloudera the industry benchmark for support

95% Customers agree they benefit from Cloudera technical support outreach

#1 Ability to solve technical issues is the top reason to recommend Cloudera for Hadoop

21© Cloudera, Inc. All rights reserved.

Cloudera has trained over

40,000people on Hadoop since

2009

Big Data professionals from

60%of the Fortune 100 have attended live Cloudera

training

Industry-Leading Training and University Programs

Source: Fortune, “Fortune 500 “ and “Global 500,” May 2012.

22© Cloudera, Inc. All rights reserved.

The Most Complete Partner Ecosystem

DataSystems

Enterprise Data Hub

Security and Administration

Unlimited Storage

Process Discover Model Serve

Applications

System Integration

Infrastructure

OperationalTools

More than 1,400 partnersensure compatibility with existinginvestments, lower skill barriers, and help maximize value from your data.

23© Cloudera, Inc. All rights reserved.

Why Cloudera?

Enterprise SecurityMeet compliance requirements and reducerisk exposure from storing sensitive data.

Data GovernanceEnable compliance and maximize analystproductivity.

Complete ManagementDeliver optimum system utilization andmeet SLA commitments, on-premises orin the cloud, with minimum effort.

We deliver long-term production success with enterprise Hadoop.

Open Source InnovationNo one knows Hadoop better than Cloudera. Cloudera leads development of enterprise Hadoop and offers the best support, training, and services.

Powerful Enterprise ToolsCloudera extends open source Hadoop with capabilities required by the largest enterprises.

EcosystemCloudera partners with industry leaders to ensure Hadoop works with the platforms, tools, and integrators our customers rely on.

24© Cloudera, Inc. All rights reserved.

Thank You!Matt Brandwein@mattbrandwein