Hadoop and other animals
-
Upload
hadoop-summit -
Category
Technology
-
view
593 -
download
0
Transcript of Hadoop and other animals
Copyright (C) 2016 451 Research LLC
451 Research is a leading IT research & advisory company
2
Founded in 2000
250+ employees, including over 100 analysts
1,000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers
50,000+ IT professionals, business users and consumers in our research community
Over 52 million data points published each quarter and 4,500+ reports published each year
2,000+ technology & service providers under coverage
451 Research and its sister company, Uptime Institute, are the two divisions of The 451 Group
Headquartered in New York City, with offices in London, Boston, San Francisco, Washington DC, Mexico, Costa Rica, Brazil, Spain, UAE, Russia, Taiwan, Singapore and Malaysia
Research & Data
Advisory
Events
Go 2 Market
Copyright (C) 2016 451 Research LLC3
A combination of research & data is delivered across fifteen channels aligned to the prevailing topics and technologies of digital infrastructure… from the datacenter core to the mobile edge.
Copyright (C) 2016 451 Research LLC
The future of HadoopHow the meaning of ‘Hadoop’ has evolved overtime and how it will continue to do so.
Comparison of Hadoop distributions.
Converging data platforms – the future evolution of ‘Hadoop’.
Beyond the zoo – focus on use-cases rather than projects.
4
Copyright (C) 2016 451 Research LLC
Hadoop (disambiguation)What do we mean by ‘Hadoop’?
In the beginning ‘Hadoop’ referred to the Hadoop Distributed File System, Hadoop MapReduce, and the Hadoop Common set of utilities.
Since then, ‘Hadoop’ has evolved to become a catch-all brand for a wider distributed data-processing ecosystem that encompasses a mix of data processing and storage capabilities.
5
6Copyright (C) 2016 451 Research LLC
Hadoop (disambiguation)“There’s two things that people mean by Hadoop.”• The Apache Hadoop project• The set of technologies built around it
Doug Cutting (and Hadoop)Hadoop
9Copyright (C) 2016 451 Research LLC
The Table of Hadoop elements
MAPREDUCE
M1
HDFS
H2
YARN
Y26
ASF projects in more than one Hadoop distributionCORE
MANAGEMENT
PROCESSING
ANALYTICS
OTHER
SECURITY
DATA MANAGEMENT
HIVE
Hi8
HBASE
Hb3
PIG
P4
FLUME
F16
SQOOP
Sq17
SPARK
Sp30
ZOOKEEPER
Z5
MAHOUT
Ma6
KAFKA
K19
WHIRR
W11
AMBARI
Am22
OOZIE
O20
IMPALA
Im59
TEZ
Te28
KNOX
Kn27
SENTRY
Se32
STORM
St33
DATAFU
Da39
PARQUET
Pa40
SLIDER
Sl38
HUE
Hu13
SOLR
So0
10Copyright (C) 2016 451 Research LLC
The Table of Hadoop elements
Other Hadoop-related ASF projects SAMZA
Sa31
GIRAPH
G21
HAMA
Ha7
ACCUMULO
Ac23
FLINK
Fl37
TINKERPOP
Ti47
APEX
Ap53
S2GRAPH
Sg58
BEAM
Be63
CASSANDRA
C9
GEODE
Ge50
TRAFODION
Tr53
BIGTOP
B18
MRUNIT
Mu15
TWILL
Tw34
RANGER
R42
METRON
Me62
EAGLE
Ea57
AVRO
A10
CALCITE
Ca41
ATLAS
At51
RYA
Ry56
KUDU
Ku61
ARROW
Ar64
CRUNCH
Cr24
FALCON
Fa29
CHUKWA
Ch12
MYRIAD
My49
MADLIB
Md55
SYSTEMML
Sm59
HAWQ
Hq54
ZEPPELIN
Z46
KYLIN
K44
MRQL
Mq36
TAJO
T14
DRILL
D25
PHOENIX
Ph35
IGNITE
I43
ASTERIXDB
As48
CLOUDERAMANAGER
AMAZONS3
EMCISILON
IBMBIG SQL
MAPR-FS
MANAGEMENT
CORE
PROCESSING
ANALYTICS OTHER
SECURITY
DATA MANAGEMENT
NON-ASF
And non-ASF Hadoop products/projects
NIFI
N45
11Copyright (C) 2016 451 Research LLC
Combining Hadoop elements
AMAZONS3
IMPALA
Im59
So0
SPARK
Sp30
KAFKA
K19
STORM
St33
HDFS
H2
YARN
Y26
SOLR
So0
ZOOKEEPER
Z5
MAPREDUCE
M1
HDFS
H2
YARN
Y26
FLUME
F16
= “Hadoop”
= “Hadoop”
= “Hadoop”
MAPREDUCE
M1
HDFS
H2
YARN
Y26
HIVE
Hi8
TEZ
Te28
PIG
P4
= “Hadoop”?And if not Hadoop – then what?
12Copyright (C) 2016 451 Research LLC
Hadoop (disambiguation)“If people stop using MapReduce and HDFS we’ll let them disappear, we’re not religious about that.”
Doug Cutting (and Hadoop)Hadoop
13Copyright (C) 2016 451 Research LLC
Hadoop (disambiguation)“As long as it’s open source we can bring it in to this platform.”
Doug Cutting (and Hadoop)Hadoop
Copyright (C) 2016 451 Research LLC
Hadoop distributions
14
Hortonworks Data Platform (HDP)
2009 2010 2011 2014 20152012 2013 2016
Cloudera’s Distributionincluding Apache Hadoop
Cloudera Distributionfor Hadoop Cloudera CDH
Yahoo! Distribution of Hadoop
IBM InfoSphere BigInsights Basic Edition
IBM Distribution of Apache Hadoop
IBM Open Platform with Apache Hadoop
Greenplum HD
Greenplum HD Community Edition Pivotal HD
Greenplum MR
Greenplum HD Enterprise Edition
MapR Distribution including Hadoop
MapR Distribution for Apache Hadoop
Intel Distribution for Apache Hadoop
WANdiscoDistro
Teradata Open Distribution for Hadoop
Apache Hadoop for MapR CDP
PivotalHDP
Copyright (C) 2016 451 Research LLC
Hadoop distributions
15
Hortonworks Data Platform (HDP)
2009 2010 2011 2014 20152012 2013 2016
Cloudera’s Distributionincluding Apache Hadoop
Cloudera Distributionfor Hadoop Cloudera CDH
IBM InfoSphere BigInsights Basic Edition
IBM Distribution of Apache Hadoop
IBM Open Platform with Apache Hadoop
16Copyright (C) 2016 451 Research LLC
Comparing Hadoop distributions
MAPREDUCE
M1
HDFS
H2
YARN
Y26
HIVE
Hi8
HBASE
Hb3
PIG
P4
FLUME
F16
SQOOP
Sq17
SPARK
Sp30
ZOOKEEPER
Z5
KAFKA
K19
AMBARI
Am22
OOZIE
O20
TEZ
Te28
KNOX
Kn27
STORM
St33
SLIDER
Sl38
HUE
Hu13
CASCADING
ACCUMULO
Ac23
CALCITE
Ca41
ATLAS
At50
RANGER
R42
FALCON
Fa29
PHOENIX
Ph35
SOLR
So0
MAHOUT
Ma6
DATAFU
Da39
CLOUDBREAK
Hortonworks Data Platform
17Copyright (C) 2016 451 Research LLC
Comparing Hadoop distributions
MAPREDUCE
M1
HDFS
H2
YARN
Y26
HIVE
Hi8
HBASE
Hb3
PIG
P4
FLUME
F16
SQOOP
Sq17
SPARK
Sp30
ZOOKEEPER
Z5
KAFKA
K19
AMBARI
Am22
OOZIE
O20
KNOX
Kn27
SLIDER
Sl38
SOLR
So0
DATAFU
Da39
IBM Open Platform - 17 projects in common with HDP
PARQUET
Pa40
18Copyright (C) 2016 451 Research LLC
IMPALA
Im59
Comparing Hadoop distributions
MAPREDUCE
M1
HDFS
H2
YARN
Y26
HIVE
Hi8
HBASE
Hb3
PIG
P4
FLUME
F16
SQOOP
Sq17
SPARK
Sp30
ZOOKEEPER
Z5
OOZIE
O20
HUE
Hu13
SOLR
So0
MAHOUT
Ma6
DATAFU
Da39
Cloudera CDH - 15 projects in common with HDP
PARQUET
Pa40
WHIRR
W11
SENTRY
Se32
AVRO
A10
CRUNCH
Cr24
CLOUDERASEARCHLLAMAKITE
Plus others supportedon Cloudera Enterprise
(e.g. Kafka)
19Copyright (C) 2016 451 Research LLC
IMPALA
Im59
Hadoop bifurcation
Hortonworks HDP - 11 ASF projects not in CDH
PARQUET
Pa40
WHIRR
W11
SENTRY
Se32
AVRO
A10
CRUNCH
Cr24
Cloudera CDH – 6 ASF projects not in HDP
TEZ
Te28
STORM
St33
ACCUMULO
Ac23
AMBARI
Am22
FALCON
Fa29
KAFKA
K19
KNOX
Kn27
SLIDER
Sl38
CALCITE
Ca41
ATLAS
At50
RANGER
R42
CLOUDERADIRECTOR
CLOUDERA NAVIGATOR
CLOUDERAMANAGER
CASCADING CLOUDBREAK
CLOUDERASEARCHLLAMAKITE
Copyright (C) 2016 451 Research LLC
RELATIONAL OPERATIONAL DATABASE
NOSQL DATABASE
DISTRIBUTED GRID/CACHE
ANALYTIC DATABASE
STREAM PROCESSING
CONTAINERIZATION
HADOOP
26
Converging data platforms
21Copyright (C) 2016 451 Research LLC
HADOOP
HDFS ANALYTIC DBMS
MAPREDUCE
STREAMINGSTORM/
SPARK STREAMING/DATATORRENT
SQL-ON-HADOOPIMPALA/ SPARK
SQL/HIVE/DRILL/PRESTO
SPARK
DOCUMENT DATABASE
HBASE
OPERATIONAL DBMS
Converging data platforms
22Copyright (C) 2016 451 Research LLC
HADOOP
HDFS ANALYTIC DBMS
MAPREDUCE
DISTRIBUTED RELATIONAL DATABASESPLICE MACHINEMEMSQL
SPARK
NUODB* VOLTDB
OPERATIONAL DBMS
COCKROACHDBCLUSTRIX
DOCUMENT DATABASE
DISTRIBUTED KEY VALUE STORE
GRAPH DATABASE/ENGINENEO4J TITANAPACHE GIRAPHSTARDOG
MONGODB CLOUDANT LOCALCOUCHDBCOUCHBASE
DATASTAX/CASSANDRAREDISRIAK AEROSPIKE AWS DYNAMODB
OBJECTROCKET
INFINITEGRAPH
HBASESTREAMING
STORM/SPARK STREAMING/
DATATORRENT
SQL-ON-HADOOPIMPALA/ SPARK
SQL/HIVE/DRILL/PRESTO
23Copyright (C) 2016 451 Research LLC
HADOOP
HDFS ANALYTIC DBMS
MAPREDUCE
DISTRIBUTED RELATIONAL DATABASESPLICE MACHINEMEMSQL
SPARK
NUODB* VOLTDB COCKROACHDBCLUSTRIX
DOCUMENT DATABASE
DISTRIBUTED KEY VALUE STORE
GRAPH DATABASE/ENGINENEO4J TITANAPACHE GIRAPHSTARDOG
MONGODB CLOUDANT LOCALCOUCHDBCOUCHBASE
DATASTAX/CASSANDRAREDISRIAK AEROSPIKE AWS DYNAMODB
OBJECTROCKET
INFINITEGRAPH
HBASESTREAMING
STORM/SPARK STREAMING/
DATATORRENT
SQL-ON-HADOOPIMPALA/ SPARK
SQL/HIVE/DRILL/PRESTO
OPERATIONAL DBMS
24Copyright (C) 2016 451 Research LLC
HADOOP
HDFS ANALYTIC DBMS
MAPREDUCEFEDERATED QUERY
PIVOTAL HAWQ/IBM BIG SQL/ORACLE BIG DATA SQL/TERADATA QUERY
GRID/MICROSOFT POLYBASE
DISTRIBUTED RELATIONAL DATABASESPLICE MACHINEMEMSQL NUODB* VOLTDB
OPERATIONAL DBMS
COCKROACHDBCLUSTRIX
DOCUMENT DATABASE
DISTRIBUTED KEY VALUE STORE
GRAPH DATABASE/ENGINENEO4J TITANAPACHE GIRAPHSTARDOG
MONGODB CLOUDANT LOCALCOUCHDBCOUCHBASE
DATASTAX/CASSANDRAREDISRIAK AEROSPIKE AWS DYNAMODB
OBJECTROCKET
INFINITEGRAPH
STREAMINGSTORM/
SPARK STREAMING/DATATORRENT
SQL-ON-HADOOPIMPALA/ SPARK
SQL/HIVE/DRILL/PRESTO
SPARK
HBASE
25Copyright (C) 2016 451 Research LLC
HADOOP
HDFS ANALYTIC DBMS
MAPREDUCEFEDERATED QUERY
PIVOTAL HAWQ/IBM BIG SQL/ORACLE BIG DATA SQL/TERADATA QUERY
GRID/MICROSOFT POLYBASE
DISTRIBUTED RELATIONAL DATABASESPLICE MACHINEMEMSQL
DATASTAX/CASSANDRA*
MARKLOGIC
ARANGODBORIENTDB
MONGODB*
RIAK*
SQRRL DATA
OBJECTROCKET*
COUCHBASE*
ORCHESTRATEAWS DYNAMODB
CLOUDANT LOCAL*
AEROSPIKE*
MULTI-MODEL DATABASE
NUODB* VOLTDB
OPERATIONAL DBMS
COCKROACHDBCLUSTRIX
Multi-model databases support a combination of data models, including (potentially) key value, graph and document*Anticipated functionality
STREAMINGSTORM/
SPARK STREAMING/DATATORRENT
SQL-ON-HADOOPIMPALA/ SPARK
SQL/HIVE/DRILL/PRESTO
SPARK
HBASE
26Copyright (C) 2016 451 Research LLC
HADOOP
HDFS ANALYTIC DBMS
MAPREDUCEFEDERATED QUERY
PIVOTAL HAWQ/IBM BIG SQL/ORACLE BIG DATA SQL/TERADATA QUERY
GRID/MICROSOFT POLYBASE
DISTRIBUTED RELATIONAL DATABASESPLICE MACHINEMEMSQL
DATASTAX/CASSANDRA*
MARKLOGIC
ARANGODBORIENTDB
MONGODB*
RIAK*
SQRRL DATA
OBJECTROCKET*
SPARK
COUCHBASE*
ORCHESTRATEAWS DYNAMODB
CLOUDANT LOCAL*
AEROSPIKE*
NEO4J
REDIS
COUCHDB
MULTI-MODEL DATABASE
NUODB* VOLTDB
OPERATIONAL DBMS
COCKROACHDBCLUSTRIX
STREAMINGSTORM/
SPARK STREAMING/DATATORRENT
SQL-ON-HADOOPIMPALA/ SPARK
SQL/HIVE/DRILL/PRESTO
HBASE
27Copyright (C) 2016 451 Research LLC
HADOOP
HDFS ANALYTIC DBMS
MAPREDUCEFEDERATED QUERY
PIVOTAL HAWQ/IBM BIG SQL/ORACLE BIG DATA SQL/TERADATA QUERY
GRID/MICROSOFT POLYBASE
DISTRIBUTED RELATIONAL DATABASESPLICE MACHINEMEMSQL
DATASTAX/CASSANDRA*
MARKLOGIC
ARANGODBORIENTDB
MONGODB*
RIAK*
SQRRL DATA
OBJECTROCKET*
COUCHBASE*
ORCHESTRATEAWS DYNAMODB
CLOUDANT LOCAL*
AEROSPIKE*
NEO4J
REDIS
COUCHDB
MULTI-MODEL DATABASE
NUODB* VOLTDB
OPERATIONAL DBMS
MESOS/DOCKER/KUBERNETES/ATOMIC etc
COCKROACHDBCLUSTRIX
STREAMINGSTORM/
SPARK STREAMING/DATATORRENT
SQL-ON-HADOOPIMPALA/ SPARK
SQL/HIVE/DRILL/PRESTO
SPARK
HBASE
Copyright (C) 2016 451 Research LLC
C________DATAPLATFORM
RELATIONAL OPERATIONAL DATABASE
NOSQL DATABASE
DISTRIBUTED GRID/CACHE
ANALYTIC DATABASE
STREAM PROCESSING
CONTAINERIZATION
HADOOP
27
Converging data platforms
29Copyright (C) 2016 451 Research LLC
Toward a c________ data platformMapR – Converged Data PlatformHortonworks – Connected Data PlatformsCloudera Data Platform?Chimeric Data Platform?
Chimera (greek mythology) • a multi-headed hybrid
creature composed of the parts of more than one animal
Image source Wikimediahttps://commons.wikimedia.org/wiki/File:Chimera_di_Arezzo.jpg
30Copyright (C) 2016 451 Research LLC
Toward a c________ data platformMapR – Converged Data PlatformHortonworks – Connected Data PlatformsCloudera Data Platform?Chimeric Data Platform?
Chimera (greek mythology) • a multi-headed hybrid
creature composed of the parts of more than one animal
31Copyright (C) 2016 451 Research LLC
Toward a c________ data platformMapR – Converged Data PlatformHortonworks – Connected Data PlatformsCloudera Data Platform?Chimeric Data Platform?
Chimera (Merriam Webster) • “something that exists only in
the imagination and is not possible in reality”
Copyright (C) 2016 451 Research LLC
RELATIONAL OPERATIONAL DATABASE
NOSQL DATABASE
DISTRIBUTED GRID/CACHE
ANALYTIC DATABASE
STREAM PROCESSING
CONTAINERIZATION
HADOOP
29
Toward a chimeric data platform
33Copyright (C) 2016 451 Research LLC
Rather than projects, focus on use-cases
MAPREDUCE
M1
HDFS
H2
YARN
Y26
HIVE
Hi8
HBASE
Hb3
PIG
P4
FLUME
F16
SQOOP
Sq17
SPARK
Sp30
ZOOKEEPER
Z5
MAHOUT
Ma6
KAFKA
K19
WHIRR
W11
AMBARI
Am22
OOZIE
O20
IMPALA
Im59
TEZ
Te28
KNOX
Kn27
SENTRY
Se32
STORM
St33
DATAFU
Da39
PARQUET
Pa40
SLIDER
Sl38
HUE
Hu13
CORE
MANAGEMENT
PROCESSING
ANALYTICS
OTHER
SECURITY
DATA MANAGEMENT
SOLR
So0
34Copyright (C) 2016 451 Research LLC
Rather than projects, focus on use-cases
MAPREDUCE
M1
HDFS
H2
YARN
Y26
PIG
P4
FLUME
F16
SQOOP
Sq17
SPARK
Sp30
TEZ
Te28
Hadoop for data engineeringCORE
MANAGEMENT
PROCESSING
ANALYTICS
OTHER
SECURITY
DATA MANAGEMENT
35Copyright (C) 2016 451 Research LLC
Rather than projects, focus on use-cases
MAPREDUCE
M1
HDFS
H2
YARN
Y26CORE
MANAGEMENT
PROCESSING
ANALYTICS
OTHER
SECURITY
DATA MANAGEMENT
SOLR
So0
Hadoop for search
FLUME
F16
ZOOKEEPER
Z5
36Copyright (C) 2016 451 Research LLC
Rather than projects, focus on use-cases
MAPREDUCE
M1
HDFS
H2
YARN
Y26
HBASE
Hb3
CORE
MANAGEMENT
PROCESSING
ANALYTICS
OTHER
SECURITY
DATA MANAGEMENT
Hadoop for operational applications
PHOENIX
Ph35
37Copyright (C) 2016 451 Research LLC
Rather than projects, focus on use-cases
MAPREDUCE
M1
HDFS
H2
YARN
Y26
SPARK
Sp30
KAFKA
K19
STORM
St33
CORE
MANAGEMENT
PROCESSING
ANALYTICS
OTHER
SECURITY
DATA MANAGEMENT
Hadoop for stream processing
Copyright (C) 2016 451 Research LLC
DATA SCIENCE
RELATIONAL OPERATIONAL DATABASE
NOSQL DATABASE
DISTRIBUTED GRID/CACHE
ANALYTIC DATABASE
STREAM PROCESSING
CONTAINERIZATION
HADOOP
35
Toward a chimeric data platform
ANALYTIC
OPERATIONAL
SEARCH
DATA ENGINEERING