Ansible- Durham Meetup: Using Ansible for Cisco ACI deployment
Ansible + Hadoop
-
Upload
michael-young -
Category
Technology
-
view
440 -
download
1
Transcript of Ansible + Hadoop
![Page 1: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/1.jpg)
Ansible + HadoopDeploying Hortonworks Data Platform with Ansible
Michael YoungSolutions EngineerFebruary 23, 2017
![Page 2: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/2.jpg)
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
About Me
Michael Young– Solutions Engineer @ Hortonworks– 16+ years of experience (Almost all in Public Sector)– Information Retrieval (Solr, Elasticsearch)– Hadoop (HDP, MapR, Cloudera)– DevOps (Ansible, Puppet, Docker, Vagrant)– Development (Python, Perl, Node.js)
@jaraxal [email protected]
![Page 3: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/3.jpg)
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
About Hortonworks
Only 100% Open Source Hadoop Company Over 1,000 customers Over 2,100 partners Hortonworks Data Platform (HDP) Hortonworks Data Flow (HDF) Hortonworks Community Connection (HCC)
![Page 4: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/4.jpg)
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hortonworks Data Platform 2.5
![Page 5: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/5.jpg)
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ambari: Management and Monitoring
![Page 6: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/6.jpg)
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
HDP Provisioning Workflow
Prepare Infrastructure– Package Repos– DNS– NTP
Prepare OS– Disable Transparent Huge Pages– Disable Swapping– Jumbo Frames– Format and mount disk drives
Bootstrap Ambari– Install Ambari Server– Install Ambari Agents
Install HDP– Interactively via Ambari’s web-based UI– Automatically via Ambari Blueprints
![Page 7: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/7.jpg)
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ambari Blueprints for HDP Deployments
https://cwiki.apache.org/confluence/display/AMBARI/Blueprints
Declarative definition of a cluster written in JSON.
Preserves best practice configuration across deployments
Requires OS configuration pre-requisites already in place – Ambari will perform checks and warn you.
![Page 8: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/8.jpg)
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Automation! Why Ansible?
![Page 9: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/9.jpg)
9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ansible for HDP Deployments
Playbooks– Bootstrap baseline configuration– Install DBs– Install HDP software
Roles– Master Servers– Slave Servers– Ambari Server– Ambari Agent
Tasks– Install prerequisite packages– Install Ambari Server packages– Install Ambari Agent packages– Disable SELinux– Turn on NTP
Templates– /etc/hosts– Ambari Blueprints
Files– Disable THP– Disable Swapping
![Page 10: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/10.jpg)
10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Create 6-node Environment
• Using Amazon AWS• 6 x c4.4xlarge instances
• Simple Ansible solution• AWS provisioning using ec2 and ec2_group
modules• Simple inventory• Simple playbook• Simple ansible.cfg
![Page 11: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/11.jpg)
11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Simple Inventory
All Ansible commands run locally Uses AWS API Using Anaconda Python
![Page 12: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/12.jpg)
12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Simple Playbook: hadoop-demo.yml
• 2 Tasks• Create Security Group• Create EC2 Instances
![Page 13: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/13.jpg)
13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Task: Provision Security Group
ec2_group module– Region– VPC– Rules
![Page 14: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/14.jpg)
14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Task: Provision Servers
ec2 module– Region– Group– Instance type– AMI– Volumes– Counts– Tags
![Page 15: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/15.jpg)
15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Run Playbook
ansible-playbook -i inventory/hosts playbooks/hadoop-demo.yml
Takes ~35 seconds
![Page 16: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/16.jpg)
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
DEMO
![Page 17: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/17.jpg)
17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ansible AWS Ad-Hoc Examples
Dynamic Inventory– https://aws.amazon.com/blogs/apn/gettin
g-started-with-ansible-and-dynamic-amazon-ec2-inventory-management/
– https://raw.githubusercontent.com/ansible/ansible/devel/contrib/inventory/ec2.py
– https://raw.githubusercontent.com/ansible/ansible/devel/contrib/inventory/ec2.ini
– Handy Python script allows you to interact with AWS instances
![Page 18: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/18.jpg)
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ready to Create?
Inventory– Dev– Test– Prod
Playbook– Roles– Tasks– Templates– Files– Handlers
Generally an iterative process Start small, move towards more
complex Entire process could take a couple of
days to a couple of weeks
![Page 19: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/19.jpg)
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Why re-invent the wheel?• https://github.com/objectrocket/
ansible-hadoop• ObjectRocket is a Rackspace
company.• Enables deployment of hadoop
clusters using Ansible• Supports Rackspace cloud and
existing environments• Ansible == 2.1.3.0 (2.2 is not
supported at the moment)• Expects RHEL/CentOS 6/7 or
Ubuntu 14 hosts.• Simple – Configure, then run two
scripts
![Page 20: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/20.jpg)
20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
DEMO
![Page 21: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/21.jpg)
21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Minimal Configuration Needed
inventory/static playbooks/group_vars/master_nodes playbooks/group_vars/slave_nodes playbooks/group_vars/hortonworks ansible.cfg Optional: custom repos and blueprints
![Page 22: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/22.jpg)
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Modify inventory/static
Add information for master, slave and edge nodes
Use public IP for ansible_host Default user for my AMI is centos. Set
ansible_ssh_user appropriately. Using key, so no password specified Don’t forget to comment unused node
types (edge-nodes)
![Page 23: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/23.jpg)
23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Modify playbook/group_vars/*_nodes
Refer to template files for examples Most options are geared towards
Rackspace cloud
![Page 24: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/24.jpg)
24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Modify playbook/group_vars/hortonworks
Specify Configuration Details– version of HDP and Ambari to install– components to install– admin and service passwords– repo URL
I left this as-is
![Page 25: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/25.jpg)
25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Modify ansible.cfg
Change library value to playbooks/library/site_facts
Specify location of private_key_file.
![Page 26: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/26.jpg)
26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Run bootstrap_static.sh
Performs the common bootstrap configurations
$ bash bootstrap_static.sh Takes ~8 minutes Consistent timing regardless of node
count Same tasks done on all servers in
parallel – Ansible approach.
![Page 27: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/27.jpg)
27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
DEMO
![Page 28: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/28.jpg)
28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Run hortonworks_static.sh
Performs the Hortonworks installation $ bash hortonworks_static.sh Takes ~19 minutes (4-node m4.xlarge
cluster) master01 had significantly more tasks to
implement
![Page 29: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/29.jpg)
29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Retrying Tasks is Normal
The last task is waiting for the cluster to be built
Normal to see many failed checks with retry attempts.
![Page 30: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/30.jpg)
30 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Monitor Ambari During Install
Monitor Ambari during cluster installation.
![Page 31: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/31.jpg)
31 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
One Node: ~1,000 seconds
One node took ~1,000 seconds to complete install and startup
This node is the master node, has more components
Room to decrease deployment time by adding more master nodes
![Page 32: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/32.jpg)
32 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Five Node Cluster
5 x m4.xlarge 2 master and 3 slave nodes Took ~15 minutes ~3 minutes faster than 4-node cluster. More even distribution of components
on master servers
![Page 33: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/33.jpg)
33 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Six Node Cluster
6 x m4.xlarge 3 master and 3 slave nodes Took ~15 minutes No apparent improvement in
deployment times over 5-node cluster.
![Page 34: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/34.jpg)
34 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Comparing Instance Sizes - Six Node Cluster
m4.xlarge vs c4.4xlarge Same cluster configuration 3 master and 3 slave nodes Took ~12 minutes ~3 minutes faster than m4.xlarge cluster
![Page 35: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/35.jpg)
35 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Number & Size of Nodes
Factoring the number and size of nodes to decrease deployment time is interesting, but not generally important
Size your cluster on based on data size and workload– More Data: more local storage per slave node, more slave nodes– More Queries: more memory and cpu per slave node, more slave nodes– High Availability: Use at least 3 master nodes, at least 3 slave nodes
Minimum recommended cluster size for production is ~12 nodes
![Page 36: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/36.jpg)
36 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Summary
Easily created an AWS environment using a simple Ansible playbook– Takes ~1-2 minutes, includes modifying playbook
Easily deployed 6-node HDP cluster– Ran playbook from an AWS node with Ansible – Modify a couple of configuration files– Run 2 commands and have an HDP cluster in < 20 minutes
Demonstrated how cluster size and instance type affected deployment times
![Page 37: Ansible + Hadoop](https://reader036.fdocuments.us/reader036/viewer/2022062223/58cf8f2e1a28ab65538b49d1/html5/thumbnails/37.jpg)
37 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Questions?