Structor - Automated Building of Virtual Hadoop Clusters

15
© Hortonworks Inc. 2014 Structor – Automated Building of Virtual Hadoop Clusters July 2014 Page 1 Owen O’Malley [email protected] @owen_omalley

description

Discusses vagrant scripts to setup and deploy a working Hadoop multiple node cluster with or without security. All source code is available on https://github.com/hortonworks/structor .

Transcript of Structor - Automated Building of Virtual Hadoop Clusters

Page 1: Structor - Automated Building of Virtual Hadoop Clusters

© Hortonworks Inc. 2014

Structor – Automated Building of Virtual Hadoop Clusters

July 2014

Page 1

Owen O’Malley [email protected]@owen_omalley

Page 2: Structor - Automated Building of Virtual Hadoop Clusters

© Hortonworks Inc. 2014Page 2

•Creating a virtual Hadoop cluster is hard–Takes time to set and configure VM–A “learning experience” for new engineers–Each engineer has a different setup–Experimenting is hazardous!

•Setting up security is even harder–Most developers don’t test with security

•Need to test Ambari and manual installs•Need to test various operating systems

What’s the Problem?

Page 3: Structor - Automated Building of Virtual Hadoop Clusters

© Hortonworks Inc. 2014Page 3

•Create scripts to create a working Hadoop cluster–Secure or Non-secure–Multiple nodes

•Vagrant–Used for creating and managing the VMs–VM starts as a base box with no Hadoop

•Puppet–Used for provisioning the Hadoop packages

Solution

Page 4: Structor - Automated Building of Virtual Hadoop Clusters

© Hortonworks Inc. 2014Page 4

•We’ve put everything for a development box into the Vagrant base box (CentOS6)–Build tools: ant, git, java, maven, protobuf, thrift

•Downloaded once and cached•Setup

% vagrant init omalley/centos6_x64

% vagrant up

% vagrant ssh

•Less than a minute

Simplest Case – Development Box

Page 5: Structor - Automated Building of Virtual Hadoop Clusters

© Hortonworks Inc. 2014Page 5

•Ssh in with “vagrant ssh”–Account: vagrant, Password: vagrant–Become root with “sudo –i”

•Clone directory to make copies•Other useful vagrant commands:

% vagrant status – list virtual machines

% vagrant suspend – suspend virtual machines

% vagrant resume – resume virtual machines

% vagrant destroy – destroy virtual machines

Using the Box

Page 6: Structor - Automated Building of Virtual Hadoop Clusters

© Hortonworks Inc. 2014Page 6

•Commands to start cluster% git clone [email protected]:hortonworks/structor.git

% cd structor

% vagrant up

•Default profile has 3 machines–gw – client gateway machine–nn – master (NameNode, ResourceMgr)–slave1 – slaves (DataNode, NodeManager)

•HDFS, Yarn, Hive, Pig, and Zookeeper

Setting up Non-Secure Cluster

Page 7: Structor - Automated Building of Virtual Hadoop Clusters

© Hortonworks Inc. 2014Page 7

•Add hostnames to /etc/hosts240.0.0.10 gw.example.com

240.0.0.11 nn.example.com

240.0.0.12 slave1.example.com

240.0.0.13 slave2.example.com

240.0.0.14 slave3.example.com

•HDFS – http://nn.example.com:50070/•Yarn – http://nn.example.com:8088/•For security–Modify /etc/krb5.conf as in README.md.–Use Safari or Firefox (needs config change)

Setting up your Mac

Page 8: Structor - Automated Building of Virtual Hadoop Clusters

© Hortonworks Inc. 2014Page 8

•Commands to start cluster% ln –s profiles/3node-secure.profile current.profile

% mkdir generated (bug workaround)

% vagrant up

•Brings up 3 machines with security–Includes a kdc and principles

•Yarn Web UI - https://nn.exaple.com:8090

•“kinit vagrant” on your Mac for Web UI•Ssh to gw and kinit for the CLI

Setting up Secure Cluster

Page 9: Structor - Automated Building of Virtual Hadoop Clusters

© Hortonworks Inc. 2014Page 9

•JSON files that control cluster•3 node secure cluster:

{ "domain": "example.com”, "realm": "EXAMPLE.COM",

"security": true,

"vm_mem": 2048, "server_mem": 300, "client_mem": 200,

"clients" : [ "hdfs", "yarn", "pig", "hive", "zk" ],

"nodes": [

{ "hostname": "gw", "ip": "240.0.0.10", "roles": [ "client" ] },

{ "hostname": "nn", "ip": "240.0.0.11", "roles": [ "kdc", "nn", "yarn",

"hive-meta", "hive-db”, "zk" ]},

{ "hostname": "slave1", "ip": "240.0.0.12", "roles": [ "slave" ]}]}

Profiles

Page 10: Structor - Automated Building of Virtual Hadoop Clusters

© Hortonworks Inc. 2014Page 10

•Various profiles–1node-nonsecure–3node-secure–5node-nonsecure–ambari-nonsecure– knox-nonsecure

•Great way to setup Ambari cluster•Project owners should add their project–Help other developers use your project

Additional Profiles

Page 11: Structor - Automated Building of Virtual Hadoop Clusters

© Hortonworks Inc. 2014Page 11

•The master branch is Hadoop 2.4–There is also an Hadoop 1.1 (hdp-1.3) branch

•All packages are installed via Puppet–Uses built in OS package tools

•Repo file is in files/repos/hdp.repo–Can override source of packages–Easy to change to download custom builds

Choosing HDP versions

Page 12: Structor - Automated Building of Virtual Hadoop Clusters

© Hortonworks Inc. 2014Page 12

•Each configuration file is templated•HDFS configuration is in–modules/hdfs_client/templates/*.erb–Changes will apply to all nodes

•We use Ruby to find NameNode:<% @namenode = eval(@nodes).select {|node|

node[:roles].include? 'nn’} [0][:hostname] + "." + @domain; %>

<property>

<name>fs.defaultFS</name>

<value>hdfs://<%= @namenode %>:8020</value>

</property>

Configuration Files (eg. core-site.xml)

Page 13: Structor - Automated Building of Virtual Hadoop Clusters

© Hortonworks Inc. 2014Page 13

•Actual work is done via Puppet–Hides details of each OS

•Modularized–Top level is manifests/default.pp–Each module is in modules/*

•Top level looks like:include selinux

include ntp

if $security == "true" and hasrole($roles, 'kdc') {

include kerberos_kdc

}

Puppet

Page 14: Structor - Automated Building of Virtual Hadoop Clusters

© Hortonworks Inc. 2014Page 14

•Add other Hadoop ecosystem tools–Tez–HBase

•Add other operating systems–Ubuntu, Suse, CentOS 5

•Support other Vagrant providers–Amazon EC2–Docker

•Support for other backing RDBs

Future Directions

Page 15: Structor - Automated Building of Virtual Hadoop Clusters

© Hortonworks Inc. 2013

Thank You!Questions & Answers

Page 15