Treasure Data on The YARN - Hadoop Conference Japan 2014

47
Copyright ©2014 Treasure Data. All Rights Reserved. Treasure Data on The YARN Ryu Kobayashi Hadoop Conference Japan 2014 8 July 2014

description

 

Transcript of Treasure Data on The YARN - Hadoop Conference Japan 2014

Page 1: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Treasure Data on The YARN

Ryu Kobayashi !

Hadoop Conference Japan 2014 8 July 2014

Page 2: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Who am I?• Ryu Kobayashi • @ryu_kobayashi • https://github.com/ryukobayashi

• Treasure Data, Inc. • Software Engineer

• Background • Hadoop, Cassandra, Machine Learning, ... • I developed Huahin(Hadoop) Framework.

http://huahinframework.org/

Page 3: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

What is Treasure Data?

Page 4: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Our Service

!!!!Columnar Storage!

+!Hadoop!

MapReduce!

Data Collection Data Warehouse Data Analysis

!!!Open-Source!Log Collector!

Bulk Loader!!CSV / TSV!

MySQL, Postgres!

Oracle, etc.

Web Log

App Log

Sensor

RDBMS

CRM

ERP

Streaming Upload

BI Tools!Tableau, QlickView,!Pentaho, Excel, etc.!!

TD command / Web Console

REST API JDBC / ODBC

SQL (HiveQL)

or Pig

Bulk Upload Parallel Upload

External Service/Storage!

Custom App,!RDBMS, FTP, etc.

Result push

schema-less!

Page 5: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Our Service

!!!!Columnar Storage!

+!Hadoop!

MapReduce!

Data Collection Data Warehouse Data Analysis

!!!Open-Source!Log Collector!

Bulk Loader!!CSV / TSV!

MySQL, Postgres!

Oracle, etc.

Web Log

App Log

Sensor

RDBMS

CRM

ERP

Streaming Upload

BI Tools!Tableau, QlickView,!Pentaho, Excel, etc.!!

TD command / Web Console

REST API JDBC / ODBC

SQL (HiveQL)

or Pig

Bulk Upload Parallel Upload

External Service/Storage!

Custom App,!RDBMS, FTP, etc.

Result push

schema-less!

Page 6: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Our Query Language

Page 7: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Our Service

!!!!Columnar Storage!

+!Hadoop!

MapReduce!

Data Collection Data Warehouse Data Analysis

!!!Open-Source!Log Collector!

Bulk Loader!!CSV / TSV!

MySQL, Postgres!

Oracle, etc.

Web Log

App Log

Sensor

RDBMS

CRM

ERP

Streaming Upload

BI Tools!Tableau, QlickView,!Pentaho, Excel, etc.!!

TD command / Web Console

REST API JDBC / ODBC

SQL (HiveQL)

or Pig

Bulk Upload Parallel Upload

External Service/Storage!

Custom App,!RDBMS, FTP, etc.

Result push

schema-less!

Page 8: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Hadoop&Cluster PlazmaDB

Our System

HDFS is not used

Page 9: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Hadoop&Cluster PlazmaDB

Our System

HDFS is not used

• Customize Hadoop • Customize Hive • Customize Pig

• Customize Impala • Customize Presto

Page 10: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

We have 4 production’s Hadoop Cluster

Page 11: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

We have 4 production’s Hadoop Cluster

user1,&user4,&user5,&…

user2,&user9,&user34,&…

user10,&user40,&user102,&…

user50,&user88,&user1023,&…

Page 12: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Our Scheduler and Queue

QueueScheduler

Hadoop&Cluster Hadoop&Cluster

Page 13: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

We have 4 production’s Hadoop Cluster and Hadoop Cluster(YARN)

YARN&Cluster

Page 14: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

MRv1 and YARN Queue

Queue

Hadoop&Cluster Hadoop&Cluster

Page 15: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Our Service

• About 4700 users • About 6 trillion records • About 12 million Jobs • About 40,000 Job by day

Page 16: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

What is YARN?

Page 17: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

YARN(Yet Another Resource Negotiator) Architecture

Page 18: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

• MRv1

• JobTracker

• TaskTracker

Page 19: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

• YARN

• ResourceManager

• NodeManager

• ApplicationMaster

• Job History Server

Page 20: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

• MRv1

• JobTracker

• TaskTracker

• YARN

• ResourceManager

• NodeManager

• ApplicationMaster

• Job History Server * ******(We*can*not*see*the*log*history*If*it*do*not*install)

Page 21: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Note!!!

Page 22: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Use the Hadoop 2.4.0 and later!!!

Page 23: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

• The versions which must not be used

• Apache Hadoop 2.2.0

• Apache Hadoop 2.3.0

• HDP 2.0(2.2.0 based)

Page 24: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

• Currently

• Apache Hadoop 2.4.1

• CDH 5.0.2(2.3.0 based and patch)

• HDP 2.1(2.4.0 based)

Page 25: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

• Why should not use?

• Capacity Scheduler

• There is a bug

• Fair Scheduler

• There is a bug

Page 26: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

• Any bugs?

• Each Scheduler will cause a deadlock

Page 27: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Distribution • CDH 5.0.2

• Red Hat/CentOS/Oracle 5 • Red Hat/CentOS/Oracle 6 • Ubuntu/Debian

• HDP 2.1 • Red Hat/CentOS/SLES (64-bit)

• (There is already Ubuntu12 to the repository)

• Windows Server 2008 & 2012

Page 28: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Configuration file has been changed several(YARN from MRv1)

!

reference: http://goo.gl/vBIYQP

Page 29: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Deprecated Properties

Page 30: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Other notes for configuration file

• hadoop-conf-pseudo does not work

• some mistakes ex : yarn.nodemanager.aux-services

mapreduce.shuffle -> mapreduce_shuffle

• 2.2.0 and 2.4.0

• There are some differences

Page 31: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

What should we do?

• Copy of CDH VM and HDP VM configuration files

• Use the Ambari or Cloudera Manager

• I work hard on their own!

Page 32: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Slot has been changed(YARN from MRv1)

• MRv1

• map slot, reduce slot

• YARN(MRv2)

• resource(container)

Page 33: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

mapred-site.xml

• mapred.tasktracker.map.tasks.maximum

• mapred.tasktracker.reduce.tasks.maximum

scheduler.xml

• maxMaps, minMaps

• maxReduces, minReduces

MRv1

Page 34: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

yarn-site.xml • yarn.nodemanager.resource.memory-mb • (yarn.nodenamager.vmem-pmem-ratio) • (yarn.scheduler.minimum-allocation-mb)

mapred-site.xml • yarn.app.mapreduce.am.resource.mb • mapreduce.map.memory.mb • mapreduce.reduce.memory.mb

fair-scheduler.xml • maxResources, minResources

YARN(MRv2)

Page 35: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

yarn.nodemanager.resource.memory-mb => Memory that NodeManager uses

!yarn.app.mapreduce.am.resource.mb =>

Memory that ApplicationMaster uses !

mapreduce.map.memory.mb => Memory that Map uses

!mapreduce.reduce.memory.mb =>

Memory that Reduce uses

YANR Resource Management

Page 36: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

yarn.nodemanager.resource.memory-mb = 4096 yarn.app.mapreduce.am.resource.mb = 1024 mapreduce.map.memory.mb = 1024 mapreduce.reduce.memory.mb = 2048 !MRv2 Application ApplicationMaster => 1 Mapper => 3 Reducer => 1

YANR Resource Example

Page 37: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

In addition to this(ex: Fair Scheduler): minResources maxResources maxRunningApps schedulingPolicy

YANR Resource Example

Page 38: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

In addition to this(ex: Fair Scheduler): pool -> queue user. maxRunningJobs -> user. maxRunningApps userMaxJobsDefault -> userMaxAppsDefault etc…

Changes Fair scheduler

Page 39: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

yarn.nodemanager.resource.memoryDmb

Page 40: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

YANR Scheduler Management

Page 41: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

e.g. Use hdp-configuration-utils.py script http://goo.gl/L2hxyq ! Use Ambari http://ambari.apache.org/ (not supported Ubuntu12. Ubuntu 12 support is coming soon)

YANR Resource Management

Page 42: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

DefaultContainerExecuter • Container launch process based • Same as the conventional(MRv1)

!LinuxContainerExecuter

• Only Linux • Some restrictions

• cgroup, etc…

YANR Container Executer

Page 43: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

MRv1 • The need to set the initial

!YARN

• The need to set the initial • There is a change from MRv1 (ex: /tmp/hadoop-yarn/)

YANR Directory Structure

Page 44: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

What should we do?

• Reference the CDH VM and HDP VM HDFS directory

• Use the Ambari or Cloudera Manager

• I work hard on their own!

Page 45: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Enjoy the YARN!!!

Page 46: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

We are hiring!!!

Page 47: Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved.

Thanks!!!