Hadoop ecosystem
-
Upload
tfmailru -
Category
Technology
-
view
139 -
download
5
description
Transcript of Hadoop ecosystem
![Page 1: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/1.jpg)
Moscow, November 16th, 2011
The Hadoop EcosystemKai Voigt, Cloudera Inc.
![Page 2: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/2.jpg)
2©2011 Cloudera, Inc. All Rights Reserved.
ClouderaCloudera
2
Hadoop Linux
Licence Apache GPL and others
Distribution Vendor Cloudera Red Hat
Free DistributionCloudera's Distribution Including Hadoop (CDH)
Fedora Core
Commercial Distribution
Cloudera EnterpriseRed Hat Enterprise Linux (RHEL)
![Page 3: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/3.jpg)
3©2011 Cloudera, Inc. All Rights Reserved.
Hadoop CoreHadoop Core
3
HDFS
MapReduce
![Page 4: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/4.jpg)
4©2011 Cloudera, Inc. All Rights Reserved.
HDFSHDFS
4
• Hadoop Distributed File System
• Redundancy
• Fault Tolerant
• Scalable
• Self Healing
• Write Once, Read Many Times
• Java API
• Command Line Tool
![Page 5: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/5.jpg)
5©2011 Cloudera, Inc. All Rights Reserved.
MapReduceMapReduce
5
• Two Phases of Functional Programming
• Redundancy
• Fault Tolerant
• Scalable
• Self Healing
• Java API
![Page 6: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/6.jpg)
6©2011 Cloudera, Inc. All Rights Reserved.
Hadoop CoreHadoop Core
6
HDFS
MapReduce
JavaJava
Java
Java
![Page 7: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/7.jpg)
7©2011 Cloudera, Inc. All Rights Reserved.
HDFS-FUSEHDFS-FUSE
7
/mnt/hdfs/
HDFS-FUSE
HDFS
![Page 8: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/8.jpg)
8©2011 Cloudera, Inc. All Rights Reserved.
HDFS-FUSE ExamplesHDFS-FUSE Examples
8
$ mount ...fuse on /mnt/hdfs type fuse (rw,nosuid,nodev,user_id=0,group_id=0,default_permissions,allow_other)
$ cp /boot/vmlinuz-* /mnt/hdfs/user/cloudera/$ hadoop fs -ls vmlinuz-*-rw-r--r-- 3 cloudera supergroup 2107004 2011-11-08 16:14 /user/cloudera/vmlinuz-2.6.18-274.7.1.el5
![Page 9: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/9.jpg)
9©2011 Cloudera, Inc. All Rights Reserved.
SqoopSqoop
9
RDBMS
Sqoop
HDFS
![Page 10: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/10.jpg)
10 ©2011 Cloudera, Inc. All Rights Reserved.
SqoopSqoop
10
• Import & Export
• ODBC, JDBC Data Sources
• CSV Files in HDFS
![Page 11: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/11.jpg)
11 ©2011 Cloudera, Inc. All Rights Reserved.
Sqoop ExamplesSqoop Examples
11
$ sqoop import --connect jdbc:mysql://localhost/world --username root --table City ...
$ hadoop fs -cat City/part-m-000001,Kabul,AFG,Kabol,17800002,Qandahar,AFG,Qandahar,2375003,Herat,AFG,Herat,1868004,Mazar-e-Sharif,AFG,Balkh,1278005,Amsterdam,NLD,Noord-Holland,731200...
![Page 12: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/12.jpg)
12 ©2011 Cloudera, Inc. All Rights Reserved.
HiveHive
12
MapReduce
Hive
SQL
![Page 13: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/13.jpg)
13 ©2011 Cloudera, Inc. All Rights Reserved.
HiveHive
13
• Data Warehouse System for Hadoop
• Data Aggregation
• Ad-Hoc Queries
• SQL-like Language (HiveQL)
• Developed at facebook
![Page 14: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/14.jpg)
14 ©2011 Cloudera, Inc. All Rights Reserved.
Hive ExamplesHive Examples
14
CREATE TABLE newmovie (id INT, name STRING, year INT, numratings INT, avgrating FLOAT);INSERT OVERWRITE TABLE newmovieSELECT id, name, year, COUNT(1), AVG(rating)FROM movie JOIN movieratingON movie.id = movierating.movieidGROUP BY id, name, year;
![Page 15: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/15.jpg)
15 ©2011 Cloudera, Inc. All Rights Reserved.
PigPig
15
MapReduce
Pig
Script
![Page 16: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/16.jpg)
16 ©2011 Cloudera, Inc. All Rights Reserved.
PigPig
16
• Data Warehouse System for Hadoop
• Data Aggregation
• Ad-Hoc Queries
• High-Level Scripting Language (Pig Latin)
• Developed at Yahoo
![Page 17: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/17.jpg)
17 ©2011 Cloudera, Inc. All Rights Reserved.
Pig ExamplesPig Examples
17
movierating = LOAD 'movierating' AS (userid, movieid, rating:INT);groupmr = GROUP movierating BY movieid;ratings = FOREACH groupmr GENERATE group AS movieid, COUNT(movierating.rating) AS numratings, AVG(movierating.rating) AS avgrating;movie = LOAD 'movie' AS (id, name, year);mr = JOIN movie BY id, ratings BY movieid;result = FOREACH mr GENERATE id, name, year, numratings, avgrating;STORE result INTO 'ratedmovie';
![Page 18: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/18.jpg)
18 ©2011 Cloudera, Inc. All Rights Reserved.
The Story So FarThe Story So Far
18
RDBMS
Hive Pig
Sqoop
MapReduce
HDFS
FUSE
FSSQL
SQL Script
Posix
Java
Java
![Page 19: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/19.jpg)
19 ©2011 Cloudera, Inc. All Rights Reserved.
HBaseHBase
19
• Low Latency
• Random Reads And Writes
• Distributed Key/Value Store
• Simple API– PUT– GET– DELETE– SCANE
![Page 20: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/20.jpg)
20 ©2011 Cloudera, Inc. All Rights Reserved.
HBase Data ModelHBase Data Model
20
Key
RowID Columname Timestamp Value
com.apple.www Size yesterday 1234
com.apple.www Content yesterday <html>...
com.cloudera.www Size yesterday 2345
com.cloudera.www Content yesterday <html>...
com.cloudera.www Size today 3456
com.cloudera.www Content today <html>...
com.facebook.www Size yesterday 4567
com.facebook.www Content yesterday <html>...
com.yahoo.www Size today 5678
com.yahoo.www Content today <html>...
![Page 21: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/21.jpg)
21 ©2011 Cloudera, Inc. All Rights Reserved.
HBase FlowHBase Flow
21
GET/PUT/DELETE
MEMORY
HDFS Logfile
![Page 22: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/22.jpg)
22 ©2011 Cloudera, Inc. All Rights Reserved.
HBase ExamplesHBase Examples
22
hbase> create 'mytable', 'mycf'hbase> listhbase> put 'mytable', 'row1', 'mycf:col1', 'val1'hbase> put 'mytable', 'row1', 'mycf:col2', 'val2'hbase> put 'mytable', 'row2', 'mycf:col1', 'val3'hbase> scan 'mytable'hbase> disable 'mytable'hbase> drop 'mytable'
![Page 23: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/23.jpg)
23 ©2011 Cloudera, Inc. All Rights Reserved.
FlumeFlume
23
• Many Servers with many Log Files– Webserver– Mailserver– Syslog
• Store all Logs in One Place– Manageable– Extensible– Reliable
![Page 24: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/24.jpg)
24 ©2011 Cloudera, Inc. All Rights Reserved.
Flume ArchitectureFlume Architecture
24
Log
Flume Node
Log
Flume Node
...
HDFS
![Page 25: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/25.jpg)
25 ©2011 Cloudera, Inc. All Rights Reserved.
Flume Sources and SinksFlume Sources and Sinks
25
• Local Files
• HDFS
• Stdin, Stdout
• IRC
• IMAP
![Page 26: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/26.jpg)
26 ©2011 Cloudera, Inc. All Rights Reserved.
WhirrWhirr
26
• Automatic Cluster Setup in the Cloud– Amazon– Rackspace
![Page 27: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/27.jpg)
27 ©2011 Cloudera, Inc. All Rights Reserved.
Whirr ExampleWhirr Example
27
$ cat hadoop.properties whirr.cluster-name=myhadoopcluster whirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,7 hadoop-datanode+hadoop-tasktracker whirr.provider=aws-ec2whirr.identity=${env:AWS_ACCESS_KEY_ID} whirr.credential=${env:AWS_SECRET_ACCESS_KEY}whirr.private-key-file=${sys:user.home}/.ssh/id_rsawhirr.public-key-file=${sys:user.home}/.ssh/id_rsa.pub
$ bin/whirr launch-cluster --config hadoop.properties
$ . ~/.whirr/myhadoopcluster/hadoop-proxy.sh
$ export HADOOP_CONF_DIR=~/.whirr/myhadoopcluster
$ bin/whirr destroy-cluster --config hadoop.properties
![Page 28: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/28.jpg)
28 ©2011 Cloudera, Inc. All Rights Reserved.
Oozie ConceptOozie Concept
28
• crond for Hadoop
• Job Flow Control– Branching– Serial– Loops
• Triggered– Time– Data
Job 1
Job 3
Job 2
Job 4 Job 5
![Page 29: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/29.jpg)
29 ©2011 Cloudera, Inc. All Rights Reserved.
Oozie FeaturesOozie Features
29
• Component Independent– MapReduce– Hive– Pig– Sqoop– Streaming
![Page 30: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/30.jpg)
30 ©2011 Cloudera, Inc. All Rights Reserved.
MahoutMahout
• Machine Learning Library for Hadoop– Regression– Classification– Recommendations– Pattern Mining
30
![Page 31: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/31.jpg)
31 ©2011 Cloudera, Inc. All Rights Reserved.
Mahout Use CasesMahout Use Cases
• Yahoo: Spam Detection
• Foursquare: Recommendations
• SpeedDate.com: Recommendations
• Adobe: User Targetting
• Amazon: Personalization Platform
31
![Page 32: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/32.jpg)
32 ©2011 Cloudera, Inc. All Rights Reserved.
CDH4u2CDH4u2
32
• Cloudera's Distribution Including Hadoop
• http://www.cloudera.com/download/
• Linux Packages– Red Hat– Debian– Tar Archive
• Virtual Machines
• Cloud Installation with Whirr
![Page 33: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/33.jpg)
33 ©2011 Cloudera, Inc. All Rights Reserved.
CDH ComponentsCDH Components
33
Hadoop Hive
Pig HBase
Zookeeper Flume
Sqoop Whirr
Hue Oozie
FUSE-DFS Mahout
![Page 34: Hadoop ecosystem](https://reader035.fdocuments.us/reader035/viewer/2022081720/54c635034a7959524d8b456d/html5/thumbnails/34.jpg)
34 ©2011 Cloudera, Inc. All Rights Reserved.
Thank you!Thank you!
• Kai Voigt
• http://www.cloudera.com/
34