Stampede.io CoreOS + Digital Ocean Meetup

Post on 17-Nov-2014

541 views 0 download

Tags:

description

Overview of Stampede.io

Transcript of Stampede.io CoreOS + Digital Ocean Meetup

STAMPEDE.IO

CoreOS + Digitial Ocean Meetup 9/8/14

Darren Shepherd

About Me – Darren Shepherd @ibuildthecloud, darren0 Fancy Title Engineer @ Citrix

Previously Cloud Architect @ GoDaddy 3 yrs Building IaaS systems for past 5 yrs

Apache CloudStack – CommitterOpenStack2 from scratch proprietary clouds

Stampede.io is the 4th orchestration platform I’ve built

What is Stampede.io Hybrid IaaS/Docker orchestration platform

Can run both VMs and Containers in a consistent fashion

Share approaches where makes sense, but still respects that Containers != VMs

Easy to install/upgrade and use Tailored installation for CoreOS

Cattle.io is the more raw framework under the hood

Demo

Why you’d build this?

Boring Reason Personal R&D project @ Citrix

Just me, my opinions, and a MacBook Pro (running Linux) Loose upfront goals

Make IaaS somehow better○ Apply 5 years of lessons learned

Include Docker/Containers○ Shiny new tech is fun

Spent 6 months locked in a closet Hacking – ~50k lines of code Staring at wall – “How do I think containers can be useful?”

Quite Literally… My home office is a

closet

UI Developed by Vincent Fiduccia

(vincent99 on Github) Node.js and Ember Event driven using WebSockets

Grandiose Reason

Either changing the world or world domination are expected outcomes

Container Playbook Containers are portable Containers can run on a laptop, in a VM, or on

Bare metal Bare metal becomes more attractive

Faster and cheaper Bare metal + Container == world domination Hypervisors, Virtualization, VMware, AWS, etc

are doomed

But there’s a problem Containers are portable Containers are a compute technology Storage and Networking are not portable

EBS – Reliable storage w/ snapshotsVPC – IP addressing, firewall, L2

Nothing we can’t Architect around

Ephemeral apps Distributed storage

NoSQL - Cassandra Architect for the cloud!

Darren’s brief history of EC2 2006 – Ephemeral only VMs 2008 – EBS 2009 – VPC 2010-2014 – Crazy exponential growth

EBS and VPC are essential You can architect for the cloud But many won’t Amazon didn’t convince everyone to re-architect for

the cloud Amazon supported legacy architectures Containers are currently EC2 2006 Most people will continue to run on the infrastructure

they already haveNot a game changer

Containers won’t change the infrastructure world unless we tackle storage and networking

The theory Container are portable because Linux is

ubiquitousGreat idea, why didn’t I think of that

We can build a portable EBS and VPC with just Linux

Linux has the majority of the technology needed, we just need to piece it together.

Stampede is about building a portable cloud – compute, storage, and networking

What does this mean? Infrastructure providers only need to provide Linux

Simple block storageL3 connectivity If VT-x/SVM is available, VMs can be launched

Stampede provides everything elseStampede can be provided “as a Service”

Normalize the infrastructure marketMassive scale infrastructure provider is not needed

Fun Launching Containers on Digital Ocean

Default Stampede Deployment

CoreOS Node 1

Controller

Agent

Libvirt

CoreOS Node 2

Agent

Libvirt

CoreOS Node 3

Agent

Libvirt

One node becomes controller

Logical Components in Controller

Deployed in single process for simplicity

Should scale for clouds <50 servers

ControllerDatabase

Lock Manager

Event Bus

API Server

Process Server

Agent Server

Fully Distributed Controller

Stampede on Stampede

Digital Ocean Deployment 3 Etcd clusters w/ Fleet

Management Stack – 3 x 12GBSFO1 Nodes – 100 x 2GBNYC3 Nodes – 100 x 2GB

Tests Sustained rate 6-7 containers per second

Not tuned for raw throughput (naively stupid scheduler)

Gigantic batches10000 containers in one API call (count=10000)

Restart everything during deploymentsAgents, Servers, Database, ZooKeeper, etc

Just tried to break things

How Many Containers?

A lot – 127,884 127,884 Running Containers

~600 per VM17 failures (0.01%)

~6 hoursWas not testing for throughputFocus on reliability

Note: This was done with unmanaged networking

Questions?