Lajos Papp / DevOps / SequenceIQ...

29
Lajos Papp / DevOps / SequenceIQ Inc.

Transcript of Lajos Papp / DevOps / SequenceIQ...

Page 1: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

Lajos Papp / DevOps / SequenceIQ Inc.

Page 2: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

GOAL / MOTIVATION

TECHNOLOGY STACK

PROBLEM RESOLUTION / HOW IT WORKS

RESULTS / ACHIEVEMENTS

OVERVIEW

Page 3: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

GOAL / MOTIVATION

!  Ease Hadoop provisioning – everywhere

!  Automate and unify the process

!  Arbitrary cluster size

!  Same process through a cluster lifecycle (Dev, QA, UAT, Prod)

!  (Auto) scaling Hadoop

!  QoS

Page 4: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

OUR APPROACH

!  Use Docker

!  Build cloud-specific ‘Dockerized’ images

!  Provision the cluster

!  Use Ambari

Page 5: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

DOCKER

!  Lightweight, portable

!  Build once, run anywhere

!  VM – without the overhead of a VM

!  Isolated containers

!  Automated and scripted

Page 6: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

DOCKER – CONTAINERS vs. VMs

!  Containers are isolated, but share OS and, where appropriate, bins/libraries

Page 7: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

APACHE AMBARI – ARCHITECTURE

!  Easy Hadoop cluster provisioning

!  Management and monitoring

!  Key features – blueprints

!  REST API

Page 8: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

APACHE AMBARI – CREATE CLUSTER

!  Define a blueprint (POST /api/v1/blueprints)

!  Create cluster (POST /api/v1/clusters/mycluster)

Page 9: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

HADOOP PROVISIONG ISSUES

!  Each cloud provider has a proprietary API

!  Create images for each provider

!  Network configuration

!  Service discovery

!  Resize, failover, member join support

Page 10: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

OUR APPROACH – DETAILS

!  Build your Docker image

!  Install or pre-install Hadoop services with Ambari

!  Install Serf and dnsmasq

!  Build your cloud image

!  Use Ansible to create an image

!  Provision the cluster

Page 11: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

BUILD DOCKER IMAGES

!  Create the Dockerfile

!  Have Docker.io to build the image

!  Optionally pre-install services

!  Use Ambari

!  Push image to Docker.io

!  Licensing questions

Page 12: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

BUILD CLOUD IMAGES

!  Use a Docker ready base image

!  Use Ansible to provision the image template

!  Pull the Docker images

!  Apply custom infrastructure

!  Use cloud provider specific playbooks

!  AWS EC2

!  Azure

Page 13: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

ANSIBLE

!  Configuration as data

!  Simplest way to automate IT

!  Secure and agentless

!  Goal oriented

!  One playbook – multiple modules

!  We use it to “burn” cloud images/templates

Page 14: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

PROVISIONING – ISSUES

!  FQDN

!  /etc/hosts is read-only in Docker

!  Everybody needs to know everybody

!  DNS

!  Single point of failure

!  Dynamic cluster – nodes joining, leaving, failing

!  Routing

!  Cloud – ability to inter-host container routing

!  Collision free private IP range for Docker bridge

!  We need predefined host names/IP addresses

!  /etc/hosts is read-only in Docker

!  Use Ansible to provision the image template

!  Pull the Docker images

!  Start a DNS server

!  Use it as a reference docker run -dns <IP_OF_DNS>

!  Nodes need to know each other

Page 15: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

PROVISIONING – SOLUTION

!  FQDN

!  Use –h and –dns Docker params

!  DNS

!  dnsmasq is running on each Docker container

!  Serf member-xxx events trigger dnsmasq reconfiguration

!  Routing

!  Docker bridge configuration – follows a convention

Page 16: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

SERF

!  Gossip based membership

!  Service discovery

!  Decentralized

!  Lightweight, fault tolerant

!  Highly available

!  DevOps friendly

!  Keep an eye on Consul, Open vSwitch, pipework

Page 17: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

SERF - DECENTRALIZED SERVICE DISCOVERY

!  Gossip instead of heartbeat

!  LAN, WAN profiles

!  Provides membership information

!  Event handlers: member_join, member_leave, member_failed, member-update, member-reap, user

!  Query

Page 18: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

SERF – GOSSIPING

Page 19: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

SERF – MEMBERSHIP, EVENT HANDLERS

Page 20: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

DNSMASQ

!  Network infrastructure for small networks

!  Lightweight DNS, DHCP server

!  Comes with most Linux distributions

Page 21: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

AWS EC2 – HADOOP CLUSTER

!  Use EC2 REST API to provision instances (from Dockerized image)

!  Start Docker containers

!  One Ambari server

!  N-1 Ambari agents connecting to server

!  Connect ambari-shell to

!  Define blueprint

!  Provision the cluster

Page 22: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

AWS EC2 – NETWORK SECURITY

!  Create a VPC

!  Configure subnets

!  Routing tables

!  Security gateway

!  Set ACL

!  Configure VPN

Page 23: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

AWS EC2 - CLOUDFORMATION

!  Manually set up VPC is too complicated

!  Use CloudFormation

!  Manage the stack together

!  Template-based

!  Environments under version control

!  Customizable at runtime

!  No extra charge

"VpcId" : { "Type" : "String", "Description" : "VpcId of your existing Virtual Private Cloud (VPC)" }, "SubnetId" : { "Type" : "String", "Description" : "SubnetId of an existing subnet (for the primary network) in your Virtual Private Cloud (VPC)" }, "SecondaryIPAddressCount" : { "Type" : "Number", "Default" : "1", "MinValue" : "1", "MaxValue" : "5", "Description" : "Number of secondary IP addresses to assign to the network interface (1-5)", "ConstraintDescription": "must be a number from 1 to 5." }, "SSHLocation" : { "Description" : "The IP address range that can be used to SSH to the EC2 instances", "Type": "String", "MinLength": "9", "MaxLength": "18", "Default": "0.0.0.0/0", "AllowedPattern": "(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})/(\\d{1,2})", "ConstraintDescription": "must be a valid IP CIDR range of the form x.x.x.x/x." } }, "Mappings" : { "RegionMap" : { "us-east-1" : { "AMI" : "ami-7f418316" },

Page 24: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

CLOUDBREAK

Cloudbreak is a powerful left surf that breaks over a coral reef, a mile off

southwest the island of Tavarua, Fiji.

Cloudbreak is a cloud-agnostic

Hadoop as a Service API. Abstracts

the provisioning and ease

management and monitoring of on-

demand clusters.

Provisioning Hadoop has never been easier

Page 25: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

CLOUDBREAK

!  Benefits !  Elastic

!  Scalable

!  Blueprints

!  Flexible

!  Main REST resources !  /template – specify a cluster infrastructure

!  /stack – creates a cloud infrastructure built from a template

!  /blueprint – describes a Hadoop cluster

!  /cluster – creates a Hadoop cluster

Page 26: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

RESULTS AND ACHIEVEMENTS

!  Hadoop as a Service API

!  Available for EC2 and Azure cloud

!  OpenStack, bare metal is coming soon

!  Open source under Apache 2 licence

!  Same goals as Apache Ambari Launchpad project

!  What's next?

Page 27: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

HADOOP SERVICES - AS A SERVICE

!  Leverage YARN

!  Slider (Hoya) providers

!  HBase, Accumulo

!  SequenceIQ providers - Flume, Tomcat

!  YARN -1964

!  QoS for YARN – heuristic scheduler

!  Platform as a Service API

Page 28: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

BANZAI PIPELINE

Banzai Pipeline is a surf reef break located in Hawaii, off Ehukai Beach Park in

Pupukea on O'ahu's North Shore. Banzai Pipeline is a RESTful

application development

platform for building on-demand

data and job pipelines running

on Hadoop YARN.

Banzai Pipeline is a big data API for the REST

Page 29: Lajos Papp / DevOps / SequenceIQ Inc.2014.adattarhazforum.hu/letoltes/2014dwforum/sequenceiq_papp_lajos… · dnsmasq is running on each Docker container ! Serf member-xxx events

THANK YOU

!  Get the code: https://github.com/sequenceiq

!  Read about: http://blog.sequenceiq.com

!  Facebook: http://facebook.com/sequenceiq

!  Twitter: http://twitter.com/sequenceiq

!  LinkedIn: http://linkedin.com/sequenceiq

!  Contact: [email protected]

FEEL FREE TO CONTRIBUTE