How we lose etu hadoop competition

Post on 27-Aug-2014

372 views 0 download

Tags:

description

The experience about join a Taiwan hadoop deployment competition .

Transcript of How we lose etu hadoop competition

How We Lose Etu Hadoop Competition

Evans Ye

2014.6.16

04/07/2023 Confidential | Copyright 2013 TrendMicro Inc. 1

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

This April, a Hadoop Competition hosted by Etu was announced

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

It’s about hadoop deployment

2

04/07/2023

4

I have a dream… to win that 150 grand

Confidential | Copyright 2013 TrendMicro Inc.

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Our Team

• Fann Wu, Mammi Chang– Solid Hardware related knowledge– knowing well how to tune performance on

hadoop clusters• Evans Ye

– Have some experience on developing a automatic hadoop deployment tool

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Agenda

• The preliminary– Winning criteria– What we’ve prepared

• The final– Winning criteria– What we’ve prepared

• Why we lost the competition• Lesson learned

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

The preliminary

• Deploy a all-in-one hadoop EC2 instance

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Criteria to win the preliminary

• namenode daemon exist• put 100MB file up to hdfs • yarn daemons exist• run a pi job• zookeeper daemon exist• hbase daemon exist• run hbase put and scan• run a pig script • run a hive query

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

And the most Improtant one, Finish Time

2

04/07/2023

10

Prepare for the fight

Confidential | Copyright 2013 TrendMicro Inc.

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

What we prepare to do

• in order to achieve fastest finish time, we need to practice over and over.– A Vagrant based scripts to simulate the AWS

environment– A shell script which will automatically provision

all-in-one hadoop

2

04/07/2023

Copyright 2013 Trend Micro Inc.

Vagrant

• An open source command line VM provision tool– http://www.vagrantup.com/

• Support Virtualbox, VMware, AWS and more as VM provider

• Support shell, puppet, chef on provisioning• previous sharing

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Vagrant-aws plugin

• https://github.com/mitchellh/vagrant-aws• Vagrantfile

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Provision script

• Jazz Wang already leaked the script to provision a all-in-one hadoop on Ubuntu in OSDC.TW– package based deployment

(you can also started from tarballs)

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Our hack #1

• Use self cloned S3 repo instead of worldwide public repos– avoid SPOF– co-located with Singapore region to speed up

network transmission

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Our hack #2• the evil /usr/lib/hadoop/libexec/init-hdfs.sh

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Our hack #2

• /usr/lib/hadoop/libexec/init-hdfs.sh– A hdfs directories bootstrap script

• /user/hbase, /tmp, /var/log/hadoop-yarn/apps…– Execute lots of hadoop shell command

• HELL SLOW!– BIGTOP-952 attempt to solve it by calling HDFS

API directly using groovy– Our hack is to concatenate similar commands

into one command• hadoop fs -mkdir -p /tmp /var/log /tmp/hadoop-yarn• 50 15 calls

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Our hack #3

• run hdfs, hbase, pig, hive test case in parallel– (hdfs test case here) &– (hbase test case here) &– (…) &– wait– send my score

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Pretty good result on the preliminary

2

04/07/2023

20

The Final

Confidential | Copyright 2013 TrendMicro Inc.

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Evans: GJ, let’s get some rest

• 2 weeks gone

2

04/07/2023

22

The Final

Confidential | Copyright 2013 TrendMicro Inc.

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Criteria to win the final

• held on 5/31 at Etu’s building

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Criteria to win the final

• 部署完整性 (20%)– Zookeeper, HDFS, YARN deployed

• 高可用性驗證 (20%)– Namenode HA using Journalnodes

• 系統安全性驗證 (10%)– Kerberos enabled

• 運行效能 (30%)– DFSIO (write throughput)– Terasort (sort speed)– HBaseEvaluation (Hbase write throughtput)

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Environment

• Hardware

• Software

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Summarize things we need to do

• This time, finish time doesn’t matter. We need to focus on correctness and performance– Choose a hadoop deployment tool which

supports• Namenode HA• Kerberos • YARN

– Figure out how to get best performance on YARN and Virtualbox

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Choosing the deoloyment tool

• Cloudera Manager– You need to install/configure Kerberos by yourself

• Ambari– “Claimed” support Kerberos, while actually it does

not• Bigtop

– Do have Kerberos and namenode HA puppet recipes, but currently is kind of buggy

• Hadooppet– Need to implement yarn deployment

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Cloudera Manager

…Kerberos installation/configuration is on your own

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Ambari has great UI design, but…

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Comparison

2

Deployment Tool

Namenode HA

Kerberos YARN Hadoop distro

Troubleshooting

Cloudera Manager

YES NO YES Hadoop 2.3.0(CDH5)

HARD

Ambari YES NO(enable failed)

YES Hadoop 2.4.0(HDP2.1)

HARD

Bigtop NO(NFS)

NO(buggy)

YES Hadoop 2.0.6-alpha(bigtop-0.7.0)

MIDDLE

Hadooppet YES YES NO Hadoop 2.3.0(CDH5)

EASY勝 勝

04/07/2023

31

Getting our deployment tool ready

Confidential | Copyright 2013 TrendMicro Inc.

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Trap#1

• Got connection refused from JournalNodes while formatting namenodes

• The root cause– When hostname defined in Vagrantfile

– It will help to setup VM’s hostname, AND the /etc/hosts

– Which lead Journalnodes listening on 127.0.0.1 and results in connection refused error while formatting namenodes

• The fix– cat /dev/null > /etc/hosts

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Trap#2

• Kerberos database initialization failed due to timeout exceed

• The root cause– Virtualbox has poor entropy performance(

Ticket #11297)– Kerberos DB init can not get enough random

data– Entropy is often collected from hardware

sources for use in cryptography

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Trap#2

• A quick test to get entropy– A xen VM

– A virtualbox VM

• The fix– Setup havege package which will improve

entropy performance• havege official site, Installation

2

04/07/2023

35

Performance Tuning

Confidential | Copyright 2013 TrendMicro Inc.

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Virtualbox tuning

• Raw hard disk access– direct access host disks from guest VM– create a VMDK file to represent the

disk/partition

– mount it up on the guest through virtualbox GUI

– fdisk the newly added disk in guest VM

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

YARN tuning

• HDFS cache for reads(available since 2.3.0)• YARN:

– yarn.nodemanager.resource.memory-mb• Mapreduce:

– io.sort.mb– mapreduce.map.memory.mb– mapreduce.map.java.opts– mapreduce.map.speculative– …– Most properties are job specific

2

04/07/2023

39

Deployment Architecture

Confidential | Copyright 2013 TrendMicro Inc.

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

VMs configuration

2

RAM CPU DISK daemons

VM1 7G 3 vcpus Local disk NamenodeResourcemanager

VM2 7G 3 vcpus Local disk NamenodeResourcemanager

VM3 15G 8 vcpus 1T raw disk *2 DatanodeNodemanager

VM4 15G 8 vcpus 1T raw disk *2 DatanodeNodemanager

total 44G 22 vcpus 4T for hdfs -

04/07/2023

41

5/31The Day

Confidential | Copyright 2013 TrendMicro Inc.

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

The check we’re so eager to win

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

And the result

2

04/07/2023

44

WE LOST

Confidential | Copyright 2013 TrendMicro Inc.

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

The reason we lost

• VirtualBox sluggish performance on hyper-threading

• To avoid that:– Disable hyper-threading– set equal number of cores for host and guest

• VMs != physical machines– We all assume that hyper-threading helps a lot

on performance, at least it does so on our hadoop cluster

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Poor support for multi-cores

• VMs with multiple vCPUs require that all allocated cores be free before processing can begin– Do not configure too many vCPUs for 1 single

VM– A strong VM will not perform well as you

expect

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

The better architecture

2

RAM CPU DISK daemons

VM1 10G 4 vcpus 1T raw disk *1 NamenodeResourcemanagerDatanodeNodemanager

VM2 10G 4 vcpus 1T raw disk *1 NamenodeResourcemanagerDatanodeNodemanager

VM3 10G 4 vcpus 1T raw disk *1 DatanodeNodemanager

VM4 10G 4 vcpus 1T raw disk *1 DatanodeNodemanager

total 40G 16 vcpus(equal to physical cores)

4T for hdfs -

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

How about hadoop performance tuning?

• Everybody pretty much using defaults, including the team who win the competition

• …

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Lesson learned

• Don't judge too soon• Don’t stay up for a week. If so, you can’t

make decision wisely• We need better project management

– We spent to much time on tuning our deployment tool

– We don’t do much tests on different deployment architectures

2

04/07/2023

Confidential | Copyright 2013 TrendMicro Inc.

Acknowledgments

• Thanks to Fann for sorting out those trivial works– packaging the box– cloning repositories– Preparing testing environment

• Thanks to Mammi for the great presentation on that day

2

51

Q&A

04/07/2023 Confidential | Copyright 2013 TrendMicro Inc.