Structor - Automated Building of Virtual Hadoop Clusters
-
Upload
owen-omalley -
Category
Technology
-
view
882 -
download
0
description
Transcript of Structor - Automated Building of Virtual Hadoop Clusters
© Hortonworks Inc. 2014
Structor – Automated Building of Virtual Hadoop Clusters
July 2014
Page 1
Owen O’Malley [email protected]@owen_omalley
© Hortonworks Inc. 2014Page 2
•Creating a virtual Hadoop cluster is hard–Takes time to set and configure VM–A “learning experience” for new engineers–Each engineer has a different setup–Experimenting is hazardous!
•Setting up security is even harder–Most developers don’t test with security
•Need to test Ambari and manual installs•Need to test various operating systems
What’s the Problem?
© Hortonworks Inc. 2014Page 3
•Create scripts to create a working Hadoop cluster–Secure or Non-secure–Multiple nodes
•Vagrant–Used for creating and managing the VMs–VM starts as a base box with no Hadoop
•Puppet–Used for provisioning the Hadoop packages
Solution
© Hortonworks Inc. 2014Page 4
•We’ve put everything for a development box into the Vagrant base box (CentOS6)–Build tools: ant, git, java, maven, protobuf, thrift
•Downloaded once and cached•Setup
% vagrant init omalley/centos6_x64
% vagrant up
% vagrant ssh
•Less than a minute
Simplest Case – Development Box
© Hortonworks Inc. 2014Page 5
•Ssh in with “vagrant ssh”–Account: vagrant, Password: vagrant–Become root with “sudo –i”
•Clone directory to make copies•Other useful vagrant commands:
% vagrant status – list virtual machines
% vagrant suspend – suspend virtual machines
% vagrant resume – resume virtual machines
% vagrant destroy – destroy virtual machines
Using the Box
© Hortonworks Inc. 2014Page 6
•Commands to start cluster% git clone [email protected]:hortonworks/structor.git
% cd structor
% vagrant up
•Default profile has 3 machines–gw – client gateway machine–nn – master (NameNode, ResourceMgr)–slave1 – slaves (DataNode, NodeManager)
•HDFS, Yarn, Hive, Pig, and Zookeeper
Setting up Non-Secure Cluster
© Hortonworks Inc. 2014Page 7
•Add hostnames to /etc/hosts240.0.0.10 gw.example.com
240.0.0.11 nn.example.com
240.0.0.12 slave1.example.com
240.0.0.13 slave2.example.com
240.0.0.14 slave3.example.com
•HDFS – http://nn.example.com:50070/•Yarn – http://nn.example.com:8088/•For security–Modify /etc/krb5.conf as in README.md.–Use Safari or Firefox (needs config change)
Setting up your Mac
© Hortonworks Inc. 2014Page 8
•Commands to start cluster% ln –s profiles/3node-secure.profile current.profile
% mkdir generated (bug workaround)
% vagrant up
•Brings up 3 machines with security–Includes a kdc and principles
•Yarn Web UI - https://nn.exaple.com:8090
•“kinit vagrant” on your Mac for Web UI•Ssh to gw and kinit for the CLI
Setting up Secure Cluster
© Hortonworks Inc. 2014Page 9
•JSON files that control cluster•3 node secure cluster:
{ "domain": "example.com”, "realm": "EXAMPLE.COM",
"security": true,
"vm_mem": 2048, "server_mem": 300, "client_mem": 200,
"clients" : [ "hdfs", "yarn", "pig", "hive", "zk" ],
"nodes": [
{ "hostname": "gw", "ip": "240.0.0.10", "roles": [ "client" ] },
{ "hostname": "nn", "ip": "240.0.0.11", "roles": [ "kdc", "nn", "yarn",
"hive-meta", "hive-db”, "zk" ]},
{ "hostname": "slave1", "ip": "240.0.0.12", "roles": [ "slave" ]}]}
Profiles
© Hortonworks Inc. 2014Page 10
•Various profiles–1node-nonsecure–3node-secure–5node-nonsecure–ambari-nonsecure– knox-nonsecure
•Great way to setup Ambari cluster•Project owners should add their project–Help other developers use your project
Additional Profiles
© Hortonworks Inc. 2014Page 11
•The master branch is Hadoop 2.4–There is also an Hadoop 1.1 (hdp-1.3) branch
•All packages are installed via Puppet–Uses built in OS package tools
•Repo file is in files/repos/hdp.repo–Can override source of packages–Easy to change to download custom builds
Choosing HDP versions
© Hortonworks Inc. 2014Page 12
•Each configuration file is templated•HDFS configuration is in–modules/hdfs_client/templates/*.erb–Changes will apply to all nodes
•We use Ruby to find NameNode:<% @namenode = eval(@nodes).select {|node|
node[:roles].include? 'nn’} [0][:hostname] + "." + @domain; %>
<property>
<name>fs.defaultFS</name>
<value>hdfs://<%= @namenode %>:8020</value>
</property>
Configuration Files (eg. core-site.xml)
© Hortonworks Inc. 2014Page 13
•Actual work is done via Puppet–Hides details of each OS
•Modularized–Top level is manifests/default.pp–Each module is in modules/*
•Top level looks like:include selinux
include ntp
if $security == "true" and hasrole($roles, 'kdc') {
include kerberos_kdc
}
Puppet
© Hortonworks Inc. 2014Page 14
•Add other Hadoop ecosystem tools–Tez–HBase
•Add other operating systems–Ubuntu, Suse, CentOS 5
•Support other Vagrant providers–Amazon EC2–Docker
•Support for other backing RDBs
Future Directions
© Hortonworks Inc. 2013
Thank You!Questions & Answers
Page 15