Hadoop and other animals

39
Hadoop and other animals Matthew Aslett, Research Director

Transcript of Hadoop and other animals

Hadoop and other animals

Matthew Aslett, Research Director

Copyright (C) 2016 451 Research LLC

451 Research is a leading IT research & advisory company

2

Founded in 2000

250+ employees, including over 100 analysts

1,000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers

50,000+ IT professionals, business users and consumers in our research community

Over 52 million data points published each quarter and 4,500+ reports published each year

2,000+ technology & service providers under coverage

451 Research and its sister company, Uptime Institute, are the two divisions of The 451 Group

Headquartered in New York City, with offices in London, Boston, San Francisco, Washington DC, Mexico, Costa Rica, Brazil, Spain, UAE, Russia, Taiwan, Singapore and Malaysia

Research & Data

Advisory

Events

Go 2 Market

Copyright (C) 2016 451 Research LLC3

A combination of research & data is delivered across fifteen channels aligned to the prevailing topics and technologies of digital infrastructure… from the datacenter core to the mobile edge.

Copyright (C) 2016 451 Research LLC

The future of HadoopHow the meaning of ‘Hadoop’ has evolved overtime and how it will continue to do so.

Comparison of Hadoop distributions.

Converging data platforms – the future evolution of ‘Hadoop’.

Beyond the zoo – focus on use-cases rather than projects.

4

Copyright (C) 2016 451 Research LLC

Hadoop (disambiguation)What do we mean by ‘Hadoop’?

In the beginning ‘Hadoop’ referred to the Hadoop Distributed File System, Hadoop MapReduce, and the Hadoop Common set of utilities.

Since then, ‘Hadoop’ has evolved to become a catch-all brand for a wider distributed data-processing ecosystem that encompasses a mix of data processing and storage capabilities.

5

6Copyright (C) 2016 451 Research LLC

Hadoop (disambiguation)“There’s two things that people mean by Hadoop.”• The Apache Hadoop project• The set of technologies built around it

Doug Cutting (and Hadoop)Hadoop

Copyright (C) 2016 451 Research LLC

Hadoop and other animals

7

Copyright (C) 2016 451 Research LLC

The Hadoop ecosystem

8

9Copyright (C) 2016 451 Research LLC

The Table of Hadoop elements

MAPREDUCE

M1

HDFS

H2

YARN

Y26

ASF projects in more than one Hadoop distributionCORE

MANAGEMENT

PROCESSING

ANALYTICS

OTHER

SECURITY

DATA MANAGEMENT

HIVE

Hi8

HBASE

Hb3

PIG

P4

FLUME

F16

SQOOP

Sq17

SPARK

Sp30

ZOOKEEPER

Z5

MAHOUT

Ma6

KAFKA

K19

WHIRR

W11

AMBARI

Am22

OOZIE

O20

IMPALA

Im59

TEZ

Te28

KNOX

Kn27

SENTRY

Se32

STORM

St33

DATAFU

Da39

PARQUET

Pa40

SLIDER

Sl38

HUE

Hu13

SOLR

So0

10Copyright (C) 2016 451 Research LLC

The Table of Hadoop elements

Other Hadoop-related ASF projects SAMZA

Sa31

GIRAPH

G21

HAMA

Ha7

ACCUMULO

Ac23

FLINK

Fl37

TINKERPOP

Ti47

APEX

Ap53

S2GRAPH

Sg58

BEAM

Be63

CASSANDRA

C9

GEODE

Ge50

TRAFODION

Tr53

BIGTOP

B18

MRUNIT

Mu15

TWILL

Tw34

RANGER

R42

METRON

Me62

EAGLE

Ea57

AVRO

A10

CALCITE

Ca41

ATLAS

At51

RYA

Ry56

KUDU

Ku61

ARROW

Ar64

CRUNCH

Cr24

FALCON

Fa29

CHUKWA

Ch12

MYRIAD

My49

MADLIB

Md55

SYSTEMML

Sm59

HAWQ

Hq54

ZEPPELIN

Z46

KYLIN

K44

MRQL

Mq36

TAJO

T14

DRILL

D25

PHOENIX

Ph35

IGNITE

I43

ASTERIXDB

As48

CLOUDERAMANAGER

AMAZONS3

EMCISILON

IBMBIG SQL

MAPR-FS

MANAGEMENT

CORE

PROCESSING

ANALYTICS OTHER

SECURITY

DATA MANAGEMENT

NON-ASF

And non-ASF Hadoop products/projects

NIFI

N45

11Copyright (C) 2016 451 Research LLC

Combining Hadoop elements

AMAZONS3

IMPALA

Im59

So0

SPARK

Sp30

KAFKA

K19

STORM

St33

HDFS

H2

YARN

Y26

SOLR

So0

ZOOKEEPER

Z5

MAPREDUCE

M1

HDFS

H2

YARN

Y26

FLUME

F16

= “Hadoop”

= “Hadoop”

= “Hadoop”

MAPREDUCE

M1

HDFS

H2

YARN

Y26

HIVE

Hi8

TEZ

Te28

PIG

P4

= “Hadoop”?And if not Hadoop – then what?

12Copyright (C) 2016 451 Research LLC

Hadoop (disambiguation)“If people stop using MapReduce and HDFS we’ll let them disappear, we’re not religious about that.”

Doug Cutting (and Hadoop)Hadoop

13Copyright (C) 2016 451 Research LLC

Hadoop (disambiguation)“As long as it’s open source we can bring it in to this platform.”

Doug Cutting (and Hadoop)Hadoop

Copyright (C) 2016 451 Research LLC

Hadoop distributions

14

Hortonworks Data Platform (HDP)

2009 2010 2011 2014 20152012 2013 2016

Cloudera’s Distributionincluding Apache Hadoop

Cloudera Distributionfor Hadoop Cloudera CDH

Yahoo! Distribution of Hadoop

IBM InfoSphere BigInsights Basic Edition

IBM Distribution of Apache Hadoop

IBM Open Platform with Apache Hadoop

Greenplum HD

Greenplum HD Community Edition Pivotal HD

Greenplum MR

Greenplum HD Enterprise Edition

MapR Distribution including Hadoop

MapR Distribution for Apache Hadoop

Intel Distribution for Apache Hadoop

WANdiscoDistro

Teradata Open Distribution for Hadoop

Apache Hadoop for MapR CDP

PivotalHDP

Copyright (C) 2016 451 Research LLC

Hadoop distributions

15

Hortonworks Data Platform (HDP)

2009 2010 2011 2014 20152012 2013 2016

Cloudera’s Distributionincluding Apache Hadoop

Cloudera Distributionfor Hadoop Cloudera CDH

IBM InfoSphere BigInsights Basic Edition

IBM Distribution of Apache Hadoop

IBM Open Platform with Apache Hadoop

16Copyright (C) 2016 451 Research LLC

Comparing Hadoop distributions

MAPREDUCE

M1

HDFS

H2

YARN

Y26

HIVE

Hi8

HBASE

Hb3

PIG

P4

FLUME

F16

SQOOP

Sq17

SPARK

Sp30

ZOOKEEPER

Z5

KAFKA

K19

AMBARI

Am22

OOZIE

O20

TEZ

Te28

KNOX

Kn27

STORM

St33

SLIDER

Sl38

HUE

Hu13

CASCADING

ACCUMULO

Ac23

CALCITE

Ca41

ATLAS

At50

RANGER

R42

FALCON

Fa29

PHOENIX

Ph35

SOLR

So0

MAHOUT

Ma6

DATAFU

Da39

CLOUDBREAK

Hortonworks Data Platform

17Copyright (C) 2016 451 Research LLC

Comparing Hadoop distributions

MAPREDUCE

M1

HDFS

H2

YARN

Y26

HIVE

Hi8

HBASE

Hb3

PIG

P4

FLUME

F16

SQOOP

Sq17

SPARK

Sp30

ZOOKEEPER

Z5

KAFKA

K19

AMBARI

Am22

OOZIE

O20

KNOX

Kn27

SLIDER

Sl38

SOLR

So0

DATAFU

Da39

IBM Open Platform - 17 projects in common with HDP

PARQUET

Pa40

18Copyright (C) 2016 451 Research LLC

IMPALA

Im59

Comparing Hadoop distributions

MAPREDUCE

M1

HDFS

H2

YARN

Y26

HIVE

Hi8

HBASE

Hb3

PIG

P4

FLUME

F16

SQOOP

Sq17

SPARK

Sp30

ZOOKEEPER

Z5

OOZIE

O20

HUE

Hu13

SOLR

So0

MAHOUT

Ma6

DATAFU

Da39

Cloudera CDH - 15 projects in common with HDP

PARQUET

Pa40

WHIRR

W11

SENTRY

Se32

AVRO

A10

CRUNCH

Cr24

CLOUDERASEARCHLLAMAKITE

Plus others supportedon Cloudera Enterprise

(e.g. Kafka)

19Copyright (C) 2016 451 Research LLC

IMPALA

Im59

Hadoop bifurcation

Hortonworks HDP - 11 ASF projects not in CDH

PARQUET

Pa40

WHIRR

W11

SENTRY

Se32

AVRO

A10

CRUNCH

Cr24

Cloudera CDH – 6 ASF projects not in HDP

TEZ

Te28

STORM

St33

ACCUMULO

Ac23

AMBARI

Am22

FALCON

Fa29

KAFKA

K19

KNOX

Kn27

SLIDER

Sl38

CALCITE

Ca41

ATLAS

At50

RANGER

R42

CLOUDERADIRECTOR

CLOUDERA NAVIGATOR

CLOUDERAMANAGER

CASCADING CLOUDBREAK

CLOUDERASEARCHLLAMAKITE

Copyright (C) 2016 451 Research LLC

RELATIONAL OPERATIONAL DATABASE

NOSQL DATABASE

DISTRIBUTED GRID/CACHE

ANALYTIC DATABASE

STREAM PROCESSING

CONTAINERIZATION

HADOOP

26

Converging data platforms

21Copyright (C) 2016 451 Research LLC

HADOOP

HDFS ANALYTIC DBMS

MAPREDUCE

STREAMINGSTORM/

SPARK STREAMING/DATATORRENT

SQL-ON-HADOOPIMPALA/ SPARK

SQL/HIVE/DRILL/PRESTO

SPARK

DOCUMENT DATABASE

HBASE

OPERATIONAL DBMS

Converging data platforms

22Copyright (C) 2016 451 Research LLC

HADOOP

HDFS ANALYTIC DBMS

MAPREDUCE

DISTRIBUTED RELATIONAL DATABASESPLICE MACHINEMEMSQL

SPARK

NUODB* VOLTDB

OPERATIONAL DBMS

COCKROACHDBCLUSTRIX

DOCUMENT DATABASE

DISTRIBUTED KEY VALUE STORE

GRAPH DATABASE/ENGINENEO4J TITANAPACHE GIRAPHSTARDOG

MONGODB CLOUDANT LOCALCOUCHDBCOUCHBASE

DATASTAX/CASSANDRAREDISRIAK AEROSPIKE AWS DYNAMODB

OBJECTROCKET

INFINITEGRAPH

HBASESTREAMING

STORM/SPARK STREAMING/

DATATORRENT

SQL-ON-HADOOPIMPALA/ SPARK

SQL/HIVE/DRILL/PRESTO

23Copyright (C) 2016 451 Research LLC

HADOOP

HDFS ANALYTIC DBMS

MAPREDUCE

DISTRIBUTED RELATIONAL DATABASESPLICE MACHINEMEMSQL

SPARK

NUODB* VOLTDB COCKROACHDBCLUSTRIX

DOCUMENT DATABASE

DISTRIBUTED KEY VALUE STORE

GRAPH DATABASE/ENGINENEO4J TITANAPACHE GIRAPHSTARDOG

MONGODB CLOUDANT LOCALCOUCHDBCOUCHBASE

DATASTAX/CASSANDRAREDISRIAK AEROSPIKE AWS DYNAMODB

OBJECTROCKET

INFINITEGRAPH

HBASESTREAMING

STORM/SPARK STREAMING/

DATATORRENT

SQL-ON-HADOOPIMPALA/ SPARK

SQL/HIVE/DRILL/PRESTO

OPERATIONAL DBMS

24Copyright (C) 2016 451 Research LLC

HADOOP

HDFS ANALYTIC DBMS

MAPREDUCEFEDERATED QUERY

PIVOTAL HAWQ/IBM BIG SQL/ORACLE BIG DATA SQL/TERADATA QUERY

GRID/MICROSOFT POLYBASE

DISTRIBUTED RELATIONAL DATABASESPLICE MACHINEMEMSQL NUODB* VOLTDB

OPERATIONAL DBMS

COCKROACHDBCLUSTRIX

DOCUMENT DATABASE

DISTRIBUTED KEY VALUE STORE

GRAPH DATABASE/ENGINENEO4J TITANAPACHE GIRAPHSTARDOG

MONGODB CLOUDANT LOCALCOUCHDBCOUCHBASE

DATASTAX/CASSANDRAREDISRIAK AEROSPIKE AWS DYNAMODB

OBJECTROCKET

INFINITEGRAPH

STREAMINGSTORM/

SPARK STREAMING/DATATORRENT

SQL-ON-HADOOPIMPALA/ SPARK

SQL/HIVE/DRILL/PRESTO

SPARK

HBASE

25Copyright (C) 2016 451 Research LLC

HADOOP

HDFS ANALYTIC DBMS

MAPREDUCEFEDERATED QUERY

PIVOTAL HAWQ/IBM BIG SQL/ORACLE BIG DATA SQL/TERADATA QUERY

GRID/MICROSOFT POLYBASE

DISTRIBUTED RELATIONAL DATABASESPLICE MACHINEMEMSQL

DATASTAX/CASSANDRA*

MARKLOGIC

ARANGODBORIENTDB

MONGODB*

RIAK*

SQRRL DATA

OBJECTROCKET*

COUCHBASE*

ORCHESTRATEAWS DYNAMODB

CLOUDANT LOCAL*

AEROSPIKE*

MULTI-MODEL DATABASE

NUODB* VOLTDB

OPERATIONAL DBMS

COCKROACHDBCLUSTRIX

Multi-model databases support a combination of data models, including (potentially) key value, graph and document*Anticipated functionality

STREAMINGSTORM/

SPARK STREAMING/DATATORRENT

SQL-ON-HADOOPIMPALA/ SPARK

SQL/HIVE/DRILL/PRESTO

SPARK

HBASE

26Copyright (C) 2016 451 Research LLC

HADOOP

HDFS ANALYTIC DBMS

MAPREDUCEFEDERATED QUERY

PIVOTAL HAWQ/IBM BIG SQL/ORACLE BIG DATA SQL/TERADATA QUERY

GRID/MICROSOFT POLYBASE

DISTRIBUTED RELATIONAL DATABASESPLICE MACHINEMEMSQL

DATASTAX/CASSANDRA*

MARKLOGIC

ARANGODBORIENTDB

MONGODB*

RIAK*

SQRRL DATA

OBJECTROCKET*

SPARK

COUCHBASE*

ORCHESTRATEAWS DYNAMODB

CLOUDANT LOCAL*

AEROSPIKE*

NEO4J

REDIS

COUCHDB

MULTI-MODEL DATABASE

NUODB* VOLTDB

OPERATIONAL DBMS

COCKROACHDBCLUSTRIX

STREAMINGSTORM/

SPARK STREAMING/DATATORRENT

SQL-ON-HADOOPIMPALA/ SPARK

SQL/HIVE/DRILL/PRESTO

HBASE

27Copyright (C) 2016 451 Research LLC

HADOOP

HDFS ANALYTIC DBMS

MAPREDUCEFEDERATED QUERY

PIVOTAL HAWQ/IBM BIG SQL/ORACLE BIG DATA SQL/TERADATA QUERY

GRID/MICROSOFT POLYBASE

DISTRIBUTED RELATIONAL DATABASESPLICE MACHINEMEMSQL

DATASTAX/CASSANDRA*

MARKLOGIC

ARANGODBORIENTDB

MONGODB*

RIAK*

SQRRL DATA

OBJECTROCKET*

COUCHBASE*

ORCHESTRATEAWS DYNAMODB

CLOUDANT LOCAL*

AEROSPIKE*

NEO4J

REDIS

COUCHDB

MULTI-MODEL DATABASE

NUODB* VOLTDB

OPERATIONAL DBMS

MESOS/DOCKER/KUBERNETES/ATOMIC etc

COCKROACHDBCLUSTRIX

STREAMINGSTORM/

SPARK STREAMING/DATATORRENT

SQL-ON-HADOOPIMPALA/ SPARK

SQL/HIVE/DRILL/PRESTO

SPARK

HBASE

Copyright (C) 2016 451 Research LLC

C________DATAPLATFORM

RELATIONAL OPERATIONAL DATABASE

NOSQL DATABASE

DISTRIBUTED GRID/CACHE

ANALYTIC DATABASE

STREAM PROCESSING

CONTAINERIZATION

HADOOP

27

Converging data platforms

29Copyright (C) 2016 451 Research LLC

Toward a c________ data platformMapR – Converged Data PlatformHortonworks – Connected Data PlatformsCloudera Data Platform?Chimeric Data Platform?

Chimera (greek mythology) • a multi-headed hybrid

creature composed of the parts of more than one animal

Image source Wikimediahttps://commons.wikimedia.org/wiki/File:Chimera_di_Arezzo.jpg

30Copyright (C) 2016 451 Research LLC

Toward a c________ data platformMapR – Converged Data PlatformHortonworks – Connected Data PlatformsCloudera Data Platform?Chimeric Data Platform?

Chimera (greek mythology) • a multi-headed hybrid

creature composed of the parts of more than one animal

31Copyright (C) 2016 451 Research LLC

Toward a c________ data platformMapR – Converged Data PlatformHortonworks – Connected Data PlatformsCloudera Data Platform?Chimeric Data Platform?

Chimera (Merriam Webster) • “something that exists only in

the imagination and is not possible in reality”

Copyright (C) 2016 451 Research LLC

RELATIONAL OPERATIONAL DATABASE

NOSQL DATABASE

DISTRIBUTED GRID/CACHE

ANALYTIC DATABASE

STREAM PROCESSING

CONTAINERIZATION

HADOOP

29

Toward a chimeric data platform

33Copyright (C) 2016 451 Research LLC

Rather than projects, focus on use-cases

MAPREDUCE

M1

HDFS

H2

YARN

Y26

HIVE

Hi8

HBASE

Hb3

PIG

P4

FLUME

F16

SQOOP

Sq17

SPARK

Sp30

ZOOKEEPER

Z5

MAHOUT

Ma6

KAFKA

K19

WHIRR

W11

AMBARI

Am22

OOZIE

O20

IMPALA

Im59

TEZ

Te28

KNOX

Kn27

SENTRY

Se32

STORM

St33

DATAFU

Da39

PARQUET

Pa40

SLIDER

Sl38

HUE

Hu13

CORE

MANAGEMENT

PROCESSING

ANALYTICS

OTHER

SECURITY

DATA MANAGEMENT

SOLR

So0

34Copyright (C) 2016 451 Research LLC

Rather than projects, focus on use-cases

MAPREDUCE

M1

HDFS

H2

YARN

Y26

PIG

P4

FLUME

F16

SQOOP

Sq17

SPARK

Sp30

TEZ

Te28

Hadoop for data engineeringCORE

MANAGEMENT

PROCESSING

ANALYTICS

OTHER

SECURITY

DATA MANAGEMENT

35Copyright (C) 2016 451 Research LLC

Rather than projects, focus on use-cases

MAPREDUCE

M1

HDFS

H2

YARN

Y26CORE

MANAGEMENT

PROCESSING

ANALYTICS

OTHER

SECURITY

DATA MANAGEMENT

SOLR

So0

Hadoop for search

FLUME

F16

ZOOKEEPER

Z5

36Copyright (C) 2016 451 Research LLC

Rather than projects, focus on use-cases

MAPREDUCE

M1

HDFS

H2

YARN

Y26

HBASE

Hb3

CORE

MANAGEMENT

PROCESSING

ANALYTICS

OTHER

SECURITY

DATA MANAGEMENT

Hadoop for operational applications

PHOENIX

Ph35

37Copyright (C) 2016 451 Research LLC

Rather than projects, focus on use-cases

MAPREDUCE

M1

HDFS

H2

YARN

Y26

SPARK

Sp30

KAFKA

K19

STORM

St33

CORE

MANAGEMENT

PROCESSING

ANALYTICS

OTHER

SECURITY

DATA MANAGEMENT

Hadoop for stream processing

Copyright (C) 2016 451 Research LLC

DATA SCIENCE

RELATIONAL OPERATIONAL DATABASE

NOSQL DATABASE

DISTRIBUTED GRID/CACHE

ANALYTIC DATABASE

STREAM PROCESSING

CONTAINERIZATION

HADOOP

35

Toward a chimeric data platform

ANALYTIC

OPERATIONAL

SEARCH

DATA ENGINEERING

Copyright (C) 2016 451 Research LLC

Thank [email protected]@maslettwww.451research.com