Hadoop on Dockers

Post on 22-Jan-2018

90 views 0 download

Transcript of Hadoop on Dockers

Hadoop on Dockers

What are Dockers?

• Docker is a tool designed to make it easier to create, deploy, and run applications by using containers.

• At a high level, Docker is a Linux utility that can efficiently create, ship, and run containers.

• Docker containers wrap a piece of software in a complete file system that contains everything needed to run: code, runtime, system tools, system libraries

• Docker enables you to quickly, reliably, and consistently deploy applications regardless of environment.

What are containers?

• Linux containers are self-contained execution environments -- with their own, isolated CPU, memory, block I/O, and network resources

• Feels like a virtual machine, but sheds all the weight and startup overhead of a guest operating system

• Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package.

Container vs Virtual Machines

• Containers and virtual machines have similar resource isolation and allocation benefits -- but a different architectural approach allows containers to be more portable and efficient.


Virtual machines include the application, the

necessary binaries and libraries, and an entire guest

operating system -- all of which can amount to tens

of GBs.


Containers include the application and all of its

dependencies --but share the kernel with other containers,

running as isolated processes in user space on the host

operating system. Docker containers are not tied to any

specific infrastructure: they run on any computer, on any

infrastructure, and in any cloud.


Virtual Machines


Advantages• Containers running on a single machine share the same operating system

kernel; they start instantly and use less RAM. Images are constructed from layered file systems and share common files, making disk usage and image downloads much more efficient.

• Docker containers are based on open standards, enabling containers to run on all major Linux distributions and on Microsoft Windows -- and on top of any infrastructure.

• Containers isolate applications from one another and the underlying infrastructure, while providing an added layer of protection for the application.

• Eliminate Environment inconsistencies

• Distribute and share Content

• Simply Share your application with other without worrying about environment


• Quickly Scale

• Docker makes it easy to identify issues, isolate the problem container, quickly roll back to make the necessary changes, and then push the updated container into production

• Dockers allows you to bundle Build Once, Run anywhere

Dockerfile and Image



• Docker is the New Quick Start Option for Apache Hadoop and Cloudera



Some Challenges

• Which container manager to choose? Swarn, kubernetes, AWS ECS, MESOS ?

• How to handle Storage Configuration? Overlay files, flocker, canvoy?

• Which network configurations?

• Software compatibly? What OS(linus/ubunutu), build of Hadoop, application layer, how to make sure all these work together.

• Maintenance : availability, multi-container, upgrades, patches, back up?

