[2C4]Clustered computing with CoreOS, fleet and etcd

291
Clustered computing with CoreOS, fleet and etcd DEVIEW 2014 Jonathan Boulle CoreOS, Inc.

description

DEVIEW 2014 [2C4]Clustered computing with CoreOS, fleet and etcd

Transcript of [2C4]Clustered computing with CoreOS, fleet and etcd

Page 1: [2C4]Clustered computing with CoreOS, fleet and etcd

Clustered computing with CoreOS, fleet and etcd

DEVIEW 2014

Jonathan BoulleCoreOS, Inc.

Page 2: [2C4]Clustered computing with CoreOS, fleet and etcd

@baronboulle

jonboulle

Who am I?

Jonathan Boulle

Page 3: [2C4]Clustered computing with CoreOS, fleet and etcd

South Africa -> Australia -> London -> San Francisco

@baronboulle

jonboulle

Who am I?

Jonathan Boulle

Page 4: [2C4]Clustered computing with CoreOS, fleet and etcd

South Africa -> Australia -> London -> San Francisco

Red Hat -> Twitter -> CoreOS

@baronboulle

jonboulle

Who am I?

Jonathan Boulle

Page 5: [2C4]Clustered computing with CoreOS, fleet and etcd

South Africa -> Australia -> London -> San Francisco

Red Hat -> Twitter -> CoreOS

Linux, Python, Go, FOSS

@baronboulle

jonboulle

Who am I?

Jonathan Boulle

Page 6: [2C4]Clustered computing with CoreOS, fleet and etcd

Agenda

Page 7: [2C4]Clustered computing with CoreOS, fleet and etcd

Agenda

● CoreOS Linux

Page 8: [2C4]Clustered computing with CoreOS, fleet and etcd

Agenda

● CoreOS Linux– Securing the internet

Page 9: [2C4]Clustered computing with CoreOS, fleet and etcd

Agenda

● CoreOS Linux– Securing the internet– Application containers

Page 10: [2C4]Clustered computing with CoreOS, fleet and etcd

Agenda

● CoreOS Linux– Securing the internet– Application containers– Automatic updates

Page 11: [2C4]Clustered computing with CoreOS, fleet and etcd

Agenda

● CoreOS Linux– Securing the internet– Application containers– Automatic updates

● fleet

Page 12: [2C4]Clustered computing with CoreOS, fleet and etcd

Agenda

● CoreOS Linux– Securing the internet– Application containers– Automatic updates

● fleet– cluster-level init system

Page 13: [2C4]Clustered computing with CoreOS, fleet and etcd

Agenda

● CoreOS Linux– Securing the internet– Application containers– Automatic updates

● fleet– cluster-level init system– etcd + systemd

Page 14: [2C4]Clustered computing with CoreOS, fleet and etcd

Agenda

● CoreOS Linux– Securing the internet– Application containers– Automatic updates

● fleet– cluster-level init system– etcd + systemd

● fleet and...

Page 15: [2C4]Clustered computing with CoreOS, fleet and etcd

Agenda

● CoreOS Linux– Securing the internet– Application containers– Automatic updates

● fleet– cluster-level init system– etcd + systemd

● fleet and...– systemd: the good, the bad

Page 16: [2C4]Clustered computing with CoreOS, fleet and etcd

Agenda

● CoreOS Linux– Securing the internet– Application containers– Automatic updates

● fleet– cluster-level init system– etcd + systemd

● fleet and...– systemd: the good, the bad

– etcd: the good, the bad

Page 17: [2C4]Clustered computing with CoreOS, fleet and etcd

Agenda

● CoreOS Linux– Securing the internet– Application containers– Automatic updates

● fleet– cluster-level init system– etcd + systemd

● fleet and...– systemd: the good, the bad

– etcd: the good, the bad

– Golang: the good, the bad

Page 18: [2C4]Clustered computing with CoreOS, fleet and etcd

Agenda

● CoreOS Linux– Securing the internet– Application containers– Automatic updates

● fleet– cluster-level init system– etcd + systemd

● fleet and...– systemd: the good, the bad

– etcd: the good, the bad

– Golang: the good, the bad

● Q&A

Page 19: [2C4]Clustered computing with CoreOS, fleet and etcd

CoreOS Linux

Page 20: [2C4]Clustered computing with CoreOS, fleet and etcd

CoreOS Linux

Page 21: [2C4]Clustered computing with CoreOS, fleet and etcd

CoreOS Linux

A minimal, automatically-updated Linux distribution,

designed for distributed systems.

Page 22: [2C4]Clustered computing with CoreOS, fleet and etcd

Why?

Page 23: [2C4]Clustered computing with CoreOS, fleet and etcd

Why?

● CoreOS mission: “Secure the internet”

Page 24: [2C4]Clustered computing with CoreOS, fleet and etcd

Why?

● CoreOS mission: “Secure the internet”● Status quo: set up a server and never touch it

Page 25: [2C4]Clustered computing with CoreOS, fleet and etcd

Why?

● CoreOS mission: “Secure the internet”● Status quo: set up a server and never touch it● Internet is full of servers running years-old

software with dozens of vulnerabilities

Page 26: [2C4]Clustered computing with CoreOS, fleet and etcd

Why?

● CoreOS mission: “Secure the internet”● Status quo: set up a server and never touch it● Internet is full of servers running years-old

software with dozens of vulnerabilities● CoreOS: make updating the default, seamless

option

Page 27: [2C4]Clustered computing with CoreOS, fleet and etcd

Why?

● CoreOS mission: “Secure the internet”● Status quo: set up a server and never touch it● Internet is full of servers running years-old

software with dozens of vulnerabilities● CoreOS: make updating the default, seamless

option● Regular

Page 28: [2C4]Clustered computing with CoreOS, fleet and etcd

Why?

● CoreOS mission: “Secure the internet”● Status quo: set up a server and never touch it● Internet is full of servers running years-old software

with dozens of vulnerabilities● CoreOS: make updating the default, seamless option

● Regular● Reliable

Page 29: [2C4]Clustered computing with CoreOS, fleet and etcd

Why?

● CoreOS mission: “Secure the internet”● Status quo: set up a server and never touch it● Internet is full of servers running years-old software

with dozens of vulnerabilities● CoreOS: make updating the default, seamless option

● Regular● Reliable● Automatic

Page 30: [2C4]Clustered computing with CoreOS, fleet and etcd

How do we achieve this?

Page 31: [2C4]Clustered computing with CoreOS, fleet and etcd

How do we achieve this?

● Containerization of applications

Page 32: [2C4]Clustered computing with CoreOS, fleet and etcd

How do we achieve this?

● Containerization of applications● Self-updating operating system

Page 33: [2C4]Clustered computing with CoreOS, fleet and etcd

How do we achieve this?

● Containerization of applications● Self-updating operating system● Distributed systems tooling to make applications

resilient to updates

Page 34: [2C4]Clustered computing with CoreOS, fleet and etcd

KERNELSYSTEMDSSHDOCKER

PYTHONJAVANGINXMYSQLOPENSSL

dist

ro d

istr

o di

stro

dis

tro

dist

ro d

istr

o di

stro

dis

tro

dist

ro d

istr

o di

stro

APP

Page 35: [2C4]Clustered computing with CoreOS, fleet and etcd

KERNELSYSTEMDSSHDOCKER

PYTHONJAVANGINXMYSQLOPENSSL

dist

ro d

istr

o di

stro

dis

tro

dist

ro d

istr

o di

stro

dis

tro

dist

ro d

istr

o di

stro

APP

Page 36: [2C4]Clustered computing with CoreOS, fleet and etcd

KERNELSYSTEMDSSHDOCKER

PYTHONJAVANGINXMYSQLOPENSSL

dist

ro d

istr

o di

stro

dis

tro

dist

ro d

istr

o di

stro

dis

tro

dist

ro d

istr

o di

stro

APP

Application Containers (e.g. Docker)

Page 37: [2C4]Clustered computing with CoreOS, fleet and etcd

KERNELSYSTEMDSSHDOCKER

- Minimal base OS (~100MB)- Vanilla upstream components wherever possible- Decoupled from applications- Automatic, atomic updates

Page 38: [2C4]Clustered computing with CoreOS, fleet and etcd

Automatic updates

Page 39: [2C4]Clustered computing with CoreOS, fleet and etcd

Automatic updates

How do updates work?

Page 40: [2C4]Clustered computing with CoreOS, fleet and etcd

Automatic updates

How do updates work?● Omaha protocol (check-in/retrieval)

Page 41: [2C4]Clustered computing with CoreOS, fleet and etcd

Automatic updates

How do updates work?● Omaha protocol (check-in/retrieval)

– Simple XML-over-HTTP protocol developed by Google to facilitate polling and pulling updates from a server

Page 42: [2C4]Clustered computing with CoreOS, fleet and etcd

Client sends application id andcurrent version to the update server

Omaha protocol

<request protocol="3.0" version="CoreOSUpdateEngine-0.1.0.0"> <app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" version="410.0.0" track="alpha" from_track="alpha"> <event eventtype="3"></event> </app></request>

Page 43: [2C4]Clustered computing with CoreOS, fleet and etcd

Client sends application id andcurrent version to the update server

Update server responds with theURL of an update to be applied

Omaha protocol

<url codebase="https://commondatastorage.googleapis.com/update-storage.core-os.net/amd64-usr/452.0.0/"></url> <package hash="D0lBAMD1Fwv8YqQuDYEAjXw6YZY=" name="update.gz" size="103401155" required="false">

Page 44: [2C4]Clustered computing with CoreOS, fleet and etcd

Client sends application id andcurrent version to the update server

Update server responds with theURL of an update to be applied

Client downloads data, verifieshash & cryptographic signature,and applies the update

Omaha protocol

Page 45: [2C4]Clustered computing with CoreOS, fleet and etcd

Client sends application id andcurrent version to the update server

Update server responds with theURL of an update to be applied

Client downloads data, verifieshash & cryptographic signature,and applies the update

Updater exits with response codethen reports the update to theupdate server

Omaha protocol

Page 46: [2C4]Clustered computing with CoreOS, fleet and etcd

Automatic updates

Page 47: [2C4]Clustered computing with CoreOS, fleet and etcd

Automatic updates

How do updates work?● Omaha protocol (check-in/retrieval)

– Simple XML-over-HTTP protocol developed by Google to facilitate polling and pulling updates from a server

● Active/passive read-only root partitions

Page 48: [2C4]Clustered computing with CoreOS, fleet and etcd

Automatic updates

How do updates work?● Omaha protocol (check-in/retrieval)

– Simple XML-over-HTTP protocol developed by Google to facilitate polling and pulling updates from a server

● Active/passive read-only root partitions– One for running the live system, one for updates

Page 49: [2C4]Clustered computing with CoreOS, fleet and etcd

Booted off partition A.Download update and committo partition B. Change GPT.

Active/passive root partitions

Page 50: [2C4]Clustered computing with CoreOS, fleet and etcd

Reboot (into Partition B).If tests succeed, continue normal operation and mark success in GPT.

Active/passive root partitions

Page 51: [2C4]Clustered computing with CoreOS, fleet and etcd

But what if partition B fails update tests...

Active/passive root partitions

Page 52: [2C4]Clustered computing with CoreOS, fleet and etcd

Change GPT to point to previous partition, reboot.Try update again later.

Active/passive root partitions

Page 53: [2C4]Clustered computing with CoreOS, fleet and etcd

Active/passive root partitions

core-01 ~ # cgpt show /dev/sda3 start size contents 264192 2097152 Label: "USR-A" Type: Alias for coreos-rootfs UUID: 7130C94A-213A-4E5A-8E26-6CCE9662 Attr: priority=1 tries=0 successful=1

core-01 ~ # cgpt show /dev/sda4 start size contents2492416 2097152 Label: "USR-B" Type: Alias for coreos-rootfs UUID: E03DD35C-7C2D-4A47-B3FE-27F15780A Attr: priority=2 tries=1 successful=0

Page 54: [2C4]Clustered computing with CoreOS, fleet and etcd

Active/passive root partitions

core-01 ~ # cgpt show /dev/sda3 start size contents 264192 2097152 Label: "USR-A" Type: Alias for coreos-rootfs UUID: 7130C94A-213A-4E5A-8E26-6CCE9662 Attr: priority=1 tries=0 successful=1

core-01 ~ # cgpt show /dev/sda4 start size contents2492416 2097152 Label: "USR-B" Type: Alias for coreos-rootfs UUID: E03DD35C-7C2D-4A47-B3FE-27F15780A Attr: priority=2 tries=1 successful=0

Page 55: [2C4]Clustered computing with CoreOS, fleet and etcd

Active/passive root partitions

core-01 ~ # cgpt show /dev/sda3 start size contents 264192 2097152 Label: "USR-A" Type: Alias for coreos-rootfs UUID: 7130C94A-213A-4E5A-8E26-6CCE9662 Attr: priority=1 tries=0 successful=1

core-01 ~ # cgpt show /dev/sda4 start size contents2492416 2097152 Label: "USR-B" Type: Alias for coreos-rootfs UUID: E03DD35C-7C2D-4A47-B3FE-27F15780A Attr: priority=2 tries=1 successful=0

Page 56: [2C4]Clustered computing with CoreOS, fleet and etcd

Active/passive /usr partitions

core-01 ~ # cgpt show /dev/sda3 start size contents 264192 2097152 Label: "USR-A" Type: Alias for coreos-rootfs UUID: 7130C94A-213A-4E5A-8E26-6CCE9662 Attr: priority=1 tries=0 successful=1

core-01 ~ # cgpt show /dev/sda4 start size contents2492416 2097152 Label: "USR-B" Type: Alias for coreos-rootfs UUID: E03DD35C-7C2D-4A47-B3FE-27F15780A Attr: priority=2 tries=1 successful=0

Page 57: [2C4]Clustered computing with CoreOS, fleet and etcd

Active/passive /usr partitions

Page 58: [2C4]Clustered computing with CoreOS, fleet and etcd

Active/passive /usr partitions

● Single image containing most of the OS

Page 59: [2C4]Clustered computing with CoreOS, fleet and etcd

Active/passive /usr partitions

● Single image containing most of the OS– Mounted read-only onto /usr

Page 60: [2C4]Clustered computing with CoreOS, fleet and etcd

Active/passive /usr partitions

● Single image containing most of the OS– Mounted read-only onto /usr

– / is mounted read-write on top (persistent data)

Page 61: [2C4]Clustered computing with CoreOS, fleet and etcd

Active/passive /usr partitions

● Single image containing most of the OS– Mounted read-only onto /usr

– / is mounted read-write on top (persistent data)– Parts of /etc generated dynamically at boot

Page 62: [2C4]Clustered computing with CoreOS, fleet and etcd

Active/passive /usr partitions

● Single image containing most of the OS– Mounted read-only onto /usr

– / is mounted read-write on top (persistent data)– Parts of /etc generated dynamically at boot– A lot of work moving default configs from /etc to /usr

Page 63: [2C4]Clustered computing with CoreOS, fleet and etcd

Active/passive /usr partitions

● Single image containing most of the OS– Mounted read-only onto /usr

– / is mounted read-write on top (persistent data)– Parts of /etc generated dynamically at boot– A lot of work moving default configs from /etc to /usr

# /etc/nsswitch.conf:

passwd: files usrfilesshadow: files usrfilesgroup: files usrfiles

Page 64: [2C4]Clustered computing with CoreOS, fleet and etcd

Active/passive /usr partitions

● Single image containing most of the OS– Mounted read-only onto /usr

– / is mounted read-write on top (persistent data)– Parts of /etc generated dynamically at boot– A lot of work moving default configs from /etc to /usr

# /etc/nsswitch.conf:

passwd: files usrfilesshadow: files usrfilesgroup: files usrfiles

/usr/share/baselayout/passwd/usr/share/baselayout/shadow/usr/share/baselayout/group

Page 65: [2C4]Clustered computing with CoreOS, fleet and etcd

Atomic updates

Page 66: [2C4]Clustered computing with CoreOS, fleet and etcd

Atomic updates

● Entire OS is a single read-only imagecore-01 ~ # touch /usr/bin/foo touch: cannot touch '/usr/bin/foo': Read-only file system

Page 67: [2C4]Clustered computing with CoreOS, fleet and etcd

Atomic updates

● Entire OS is a single read-only imagecore-01 ~ # touch /usr/bin/foo touch: cannot touch '/usr/bin/foo': Read-only file system

– Easy to verify cryptographically● sha1sum on AWS or bare metal gives the same result

Page 68: [2C4]Clustered computing with CoreOS, fleet and etcd

Atomic updates

● Entire OS is a single read-only imagecore-01 ~ # touch /usr/bin/foo touch: cannot touch '/usr/bin/foo': Read-only file system

– Easy to verify cryptographically● sha1sum on AWS or bare metal gives the same result

– No chance of inconsistencies due to partial updates● e.g. pull a plug on a CentOS system during a yum update● At large scale, such events are inevitable

Page 69: [2C4]Clustered computing with CoreOS, fleet and etcd

Automatic, atomic updates are great! But...

Page 70: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem:

Automatic, atomic updates are great! But...

Page 71: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: – updates still require a reboot (to use new kernel and

mount new filesystem)

Automatic, atomic updates are great! But...

Page 72: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: – updates still require a reboot (to use new kernel and

mount new filesystem)– Reboots cause application downtime...

Automatic, atomic updates are great! But...

Page 73: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: – updates still require a reboot (to use new kernel and

mount new filesystem)– Reboots cause application downtime...

● Solution: fleet

Automatic, atomic updates are great! But...

Page 74: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: – updates still require a reboot (to use new kernel and

mount new filesystem)– Reboots cause application downtime...

● Solution: fleet– Highly available, fault tolerant, distributed process

scheduler

Automatic, atomic updates are great! But...

Page 75: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: – updates still require a reboot (to use new kernel and

mount new filesystem)– Reboots cause application downtime...

● Solution: fleet– Highly available, fault tolerant, distributed process

scheduler ● ... The way to run applications on a CoreOS cluster

Automatic, atomic updates are great! But...

Page 76: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: – updates still require a reboot (to use new kernel and mount

new filesystem)– Reboots cause application downtime...

● Solution: fleet– Highly available, fault tolerant, distributed process scheduler

● ... The way to run applications on a CoreOS cluster– fleet keeps applications running during server downtime

Automatic, atomic updates are great! But...

Page 77: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet – the “cluster-level init system”

Page 78: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet – the “cluster-level init system”

fleet is the abstraction between machine and application:

Page 79: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet – the “cluster-level init system”

fleet is the abstraction between machine and application:– init system manages process on a machine

– fleet manages applications on a cluster of machines

Page 80: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet – the “cluster-level init system”

fleet is the abstraction between machine and application:– init system manages process on a machine

– fleet manages applications on a cluster of machines Similar to Mesos, but very different architecture

(e.g. based on etcd/Raft, not Zookeeper/Paxos)

Page 81: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet – the “cluster-level init system”

fleet is the abstraction between machine and application:– init system manages process on a machine– fleet manages applications on a cluster of machines

Similar to Mesos, but very different architecture (e.g. based on etcd/Raft, not Zookeeper/Paxos)

Uses systemd for machine-level process management, etcd for cluster-level co-ordination

Page 82: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet – low level view

Page 83: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet – low level view

fleetd binary (running on all CoreOS nodes)

Page 84: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet – low level view

fleetd binary (running on all CoreOS nodes)

– encapsulates two roles:

Page 85: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet – low level view

fleetd binary (running on all CoreOS nodes)

– encapsulates two roles:

• engine (cluster-level unit scheduling – talks to etcd)

Page 86: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet – low level view

fleetd binary (running on all CoreOS nodes)

– encapsulates two roles:

• engine (cluster-level unit scheduling – talks to etcd)

• agent (local unit management – talks to etcd and systemd)

Page 87: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet – low level view

fleetd binary (running on all CoreOS nodes)

– encapsulates two roles:

• engine (cluster-level unit scheduling – talks to etcd)

• agent (local unit management – talks to etcd and systemd) fleetctl command-line administration tool

Page 88: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet – low level view

fleetd binary (running on all CoreOS nodes)

– encapsulates two roles:

• engine (cluster-level unit scheduling – talks to etcd)

• agent (local unit management – talks to etcd and systemd) fleetctl command-line administration tool

– create, destroy, start, stop units

Page 89: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet – low level view

fleetd binary (running on all CoreOS nodes)

– encapsulates two roles:

• engine (cluster-level unit scheduling – talks to etcd)

• agent (local unit management – talks to etcd and systemd) fleetctl command-line administration tool

– create, destroy, start, stop units

– Retrieve current status of units/machines in the cluster

Page 90: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet – low level view

fleetd binary (running on all CoreOS nodes)

– encapsulates two roles:

• engine (cluster-level unit scheduling – talks to etcd)• agent (local unit management – talks to etcd and systemd)

fleetctl command-line administration tool

– create, destroy, start, stop units

– Retrieve current status of units/machines in the cluster HTTP API

Page 91: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet – high level view

Single machine

Cluster

Page 92: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd

Page 93: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd

Linux init system (PID 1) – manages processes

Page 94: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd

Linux init system (PID 1) – manages processes– Relatively new, replaces SysVinit, upstart, OpenRC, ...

Page 95: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd

Linux init system (PID 1) – manages processes– Relatively new, replaces SysVinit, upstart, OpenRC, ...

– Being adopted by all major Linux distributions

Page 96: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd

Linux init system (PID 1) – manages processes– Relatively new, replaces SysVinit, upstart, OpenRC, ...

– Being adopted by all major Linux distributions Fundamental concept is the unit

Page 97: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd

Linux init system (PID 1) – manages processes– Relatively new, replaces SysVinit, upstart, OpenRC, ...

– Being adopted by all major Linux distributions Fundamental concept is the unit

– Units include services (e.g. applications), mount points, sockets, timers, etc.

Page 98: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd

Linux init system (PID 1) – manages processes– Relatively new, replaces SysVinit, upstart, OpenRC, ...– Being adopted by all major Linux distributions

Fundamental concept is the unit– Units include services (e.g. applications), mount points,

sockets, timers, etc.– Each unit configured with a simple unit file

Page 99: [2C4]Clustered computing with CoreOS, fleet and etcd

Quick comparison

Page 100: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet + systemd

Page 101: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet + systemd

systemd exposes a D-Bus interface

Page 102: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet + systemd

systemd exposes a D-Bus interface– D-Bus: message bus system for IPC on Linux

Page 103: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet + systemd

systemd exposes a D-Bus interface– D-Bus: message bus system for IPC on Linux

– One-to-one messaging (methods), plus pub/sub abilities

Page 104: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet + systemd

systemd exposes a D-Bus interface– D-Bus: message bus system for IPC on Linux

– One-to-one messaging (methods), plus pub/sub abilities fleet uses godbus to communicate with systemd

Page 105: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet + systemd

systemd exposes a D-Bus interface– D-Bus: message bus system for IPC on Linux

– One-to-one messaging (methods), plus pub/sub abilities fleet uses godbus to communicate with systemd

– Sending commands: StartUnit, StopUnit

Page 106: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet + systemd

systemd exposes a D-Bus interface– D-Bus: message bus system for IPC on Linux– One-to-one messaging (methods), plus pub/sub abilities

fleet uses godbus to communicate with systemd– Sending commands: StartUnit, StopUnit

– Retrieving current state of units (to publish to the cluster)

Page 107: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is great!

Page 108: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is great!

Automatically handles:

Page 109: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is great!

Automatically handles:– Process daemonization

Page 110: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is great!

Automatically handles:– Process daemonization

– Resource isolation/containment (cgroups)

Page 111: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is great!

Automatically handles:– Process daemonization

– Resource isolation/containment (cgroups)• e.g. MemoryLimit=512M

Page 112: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is great!

Automatically handles:– Process daemonization

– Resource isolation/containment (cgroups)• e.g. MemoryLimit=512M

– Health-checking, restarting failed services

Page 113: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is great!

Automatically handles:– Process daemonization

– Resource isolation/containment (cgroups)• e.g. MemoryLimit=512M

– Health-checking, restarting failed services

– Logging (journal)

Page 114: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is great!

Automatically handles:– Process daemonization

– Resource isolation/containment (cgroups)• e.g. MemoryLimit=512M

– Health-checking, restarting failed services

– Logging (journal) • applications can just write to stdout, systemd adds metadata

Page 115: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is great!

Automatically handles:– Process daemonization– Resource isolation/containment (cgroups)

• e.g. MemoryLimit=512M

– Health-checking, restarting failed services– Logging (journal)

• applications can just write to stdout, systemd adds metadata

– Timers, inter-unit dependencies, socket activation, ...

Page 116: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet + systemd

Page 117: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet + systemd

systemd takes care of things so we don't have to

Page 118: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet + systemd

systemd takes care of things so we don't have to fleet configuration is just systemd unit files

Page 119: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet + systemd

systemd takes care of things so we don't have to fleet configuration is just systemd unit files fleet extends systemd to the cluster-level, and adds

some features of its own (using [X-Fleet])

Page 120: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet + systemd

systemd takes care of things so we don't have to fleet configuration is just systemd unit files fleet extends systemd to the cluster-level, and adds

some features of its own (using [X-Fleet])– Template units (run n identical copies of a unit)

Page 121: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet + systemd

systemd takes care of things so we don't have to fleet configuration is just systemd unit files fleet extends systemd to the cluster-level, and adds

some features of its own (using [X-Fleet])– Template units (run n identical copies of a unit)

– Global units (run a unit everywhere in the cluster)

Page 122: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet + systemd

systemd takes care of things so we don't have to fleet configuration is just systemd unit files fleet extends systemd to the cluster-level, and adds

some features of its own (using [X-Fleet])– Template units (run n identical copies of a unit)– Global units (run a unit everywhere in the cluster)– Machine metadata (run only on certain machines)

Page 123: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is... not so great

Page 124: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is... not so great

Problem: unreliable pub/sub

Page 125: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is... not so great

Problem: unreliable pub/sub– fleet agent initially used a systemd D-Bus subscription

to track unit status

Page 126: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is... not so great

Problem: unreliable pub/sub– fleet agent initially used a systemd D-Bus subscription

to track unit status

– Every change in unit state in systemd triggers an event in fleet (e.g. “publish this new state to the cluster”)

Page 127: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is... not so great

Problem: unreliable pub/sub– fleet agent initially used a systemd D-Bus subscription

to track unit status

– Every change in unit state in systemd triggers an event in fleet (e.g. “publish this new state to the cluster”)

– Under heavy load, or byzantine conditions, unit state changes would be dropped

Page 128: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is... not so great

Problem: unreliable pub/sub– fleet agent initially used a systemd D-Bus subscription

to track unit status– Every change in unit state in systemd triggers an event

in fleet (e.g. “publish this new state to the cluster”)– Under heavy load, or byzantine conditions, unit state

changes would be dropped– As a result, unit state in the cluster became stale

Page 129: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is... not so great

Page 130: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is... not so great

Problem: unreliable pub/sub

Page 131: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is... not so great

Problem: unreliable pub/sub Solution: polling for unit states

Page 132: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is... not so great

Problem: unreliable pub/sub Solution: polling for unit states

– Every n seconds, retrieve state of units from systemd, and synchronize with cluster

Page 133: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is... not so great

Problem: unreliable pub/sub Solution: polling for unit states

– Every n seconds, retrieve state of units from systemd, and synchronize with cluster

– Less efficient, but much more reliable

Page 134: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is... not so great

Problem: unreliable pub/sub Solution: polling for unit states

– Every n seconds, retrieve state of units from systemd, and synchronize with cluster

– Less efficient, but much more reliable

– Optimize by caching state and only publishing changes

Page 135: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd is... not so great

Problem: unreliable pub/sub Solution: polling for unit states

– Every n seconds, retrieve state of units from systemd, and synchronize with cluster

– Less efficient, but much more reliable– Optimize by caching state and only publishing changes– Any state inconsistencies are quickly fixed

Page 136: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Page 137: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Problem: poor integration with Docker

Page 138: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Problem: poor integration with Docker– Docker is de facto application container manager

Page 139: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Problem: poor integration with Docker– Docker is de facto application container manager

– Docker and systemd do not always play nicely together..

Page 140: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Problem: poor integration with Docker– Docker is de facto application container manager– Docker and systemd do not always play nicely together..– Both Docker and systemd manage cgroups and

processes, and when the two are trying to manage the same thing, the results are mixed

Page 141: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Page 142: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Example: sending signals to a container

Page 143: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Example: sending signals to a container– Given a simple container:

[Service]ExecStart=/usr/bin/docker run busybox /bin/bash -c \ "while true; do echo Hello World; sleep 1; done"

Page 144: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Example: sending signals to a container– Given a simple container:

[Service]ExecStart=/usr/bin/docker run busybox /bin/bash -c \ "while true; do echo Hello World; sleep 1; done"

– Try to kill it with systemctl kill hello.service

Page 145: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Example: sending signals to a container– Given a simple container:

[Service]ExecStart=/usr/bin/docker run busybox /bin/bash -c \ "while true; do echo Hello World; sleep 1; done"

– Try to kill it with systemctl kill hello.service– ... Nothing happens

Page 146: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Example: sending signals to a container– Given a simple container:

[Service]ExecStart=/usr/bin/docker run busybox /bin/bash -c \ "while true; do echo Hello World; sleep 1; done"

– Try to kill it with systemctl kill hello.service– ... Nothing happens– Kill command sends SIGTERM, but bash in a Docker

container has PID1, which happily ignores the signal...

Page 147: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Page 148: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Example: sending signals to a container

Page 149: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Example: sending signals to a container OK, SIGTERM didn't work, so escalate to SIGKILL:

systemctl kill -s SIGKILL hello.service

Page 150: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Example: sending signals to a container OK, SIGTERM didn't work, so escalate to SIGKILL:

systemctl kill -s SIGKILL hello.service

● Now the systemd service is gone: hello.service: main process exited, code=killed, status=9/KILL

Page 151: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Example: sending signals to a container OK, SIGTERM didn't work, so escalate to SIGKILL:

systemctl kill -s SIGKILL hello.service

● Now the systemd service is gone: hello.service: main process exited, code=killed, status=9/KILL

● But... the Docker container still exists?# docker ps CONTAINER ID COMMAND STATUS NAMES7c7cf8ffabb6 /bin/sh -c 'while tr Up 31 seconds hello # ps -ef|grep '[d]ocker run'root 24231 1 0 03:49 ? 00:00:00 /usr/bin/docker run -name hello ...

Page 152: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Page 153: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Why?

Page 154: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Why? – Docker client does not run containers itself; it just sends

a command to the Docker daemon which actually forks and starts running the container

Page 155: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Why? – Docker client does not run containers itself; it just sends

a command to the Docker daemon which actually forks and starts running the container

– systemd expects processes to fork directly so they will be contained under the same cgroup tree

Page 156: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Why? – Docker client does not run containers itself; it just sends

a command to the Docker daemon which actually forks and starts running the container

– systemd expects processes to fork directly so they will be contained under the same cgroup tree

– Since the Docker daemon's cgroup is entirely separate, systemd cannot keep track of the forked container

Page 157: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

# systemctl cat hello.service[Service]ExecStart=/bin/bash -c 'while true; do echo Hello World; sleep 1; done'

# systemd-cgls... ├─hello.service │ ├─23201 /bin/bash -c while true; do echo Hello World; sleep 1; done │ └─24023 sleep 1

Page 158: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

# systemctl cat hello.service[Service]ExecStart=/usr/bin/docker run -name hello busybox /bin/sh -c \"while true; do echo Hello World; sleep 1; done"

# systemd-cgls...│ ├─hello.service│ │ └─24231 /usr/bin/docker run -name hello busybox /bin/sh -c while true; do echo Hello World; sleep 1; done...│ ├─docker-51a57463047b65487ec80a1dc8b8c9ea14a396c7a49c1e23919d50bdafd4fefb.scope│ │ ├─24240 /bin/sh -c while true; do echo Hello World; sleep 1; done│ │ └─24553 sleep 1

Page 159: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Page 160: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Problem: poor integration with Docker Solution: ... work in progress

Page 161: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Problem: poor integration with Docker Solution: ... work in progress

– systemd-docker – small application that moves cgroups of Docker containers back under systemd's cgroup

Page 162: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Problem: poor integration with Docker Solution: ... work in progress

– systemd-docker – small application that moves cgroups of Docker containers back under systemd's cgroup

– Use Docker for image management, but systemd-nspawn for runtime (e.g. CoreOS's toolbox)

Page 163: [2C4]Clustered computing with CoreOS, fleet and etcd

systemd (and docker) are... not so great

Problem: poor integration with Docker Solution: ... work in progress

– systemd-docker – small application that moves cgroups of Docker containers back under systemd's cgroup

– Use Docker for image management, but systemd-nspawn for runtime (e.g. CoreOS's toolbox)

– (proposed) Docker standalone mode: client starts container directly rather than through daemon

Page 164: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet – high level view

Single machine

Cluster

Page 165: [2C4]Clustered computing with CoreOS, fleet and etcd

etcd

Page 166: [2C4]Clustered computing with CoreOS, fleet and etcd

etcd

A consistent, highly available key/value store

Page 167: [2C4]Clustered computing with CoreOS, fleet and etcd

etcd

A consistent, highly available key/value store– Shared configuration, distributed locking, ...

Page 168: [2C4]Clustered computing with CoreOS, fleet and etcd

etcd

A consistent, highly available key/value store– Shared configuration, distributed locking, ...

Driven by Raft

Page 169: [2C4]Clustered computing with CoreOS, fleet and etcd

etcd

A consistent, highly available key/value store– Shared configuration, distributed locking, ...

Driven by Raft Consensus algorithm similar to Paxos

Page 170: [2C4]Clustered computing with CoreOS, fleet and etcd

etcd

A consistent, highly available key/value store– Shared configuration, distributed locking, ...

Driven by Raft Consensus algorithm similar to Paxos Designed for understandability and simplicity

Page 171: [2C4]Clustered computing with CoreOS, fleet and etcd

etcd

A consistent, highly available key/value store– Shared configuration, distributed locking, ...

Driven by Raft Consensus algorithm similar to Paxos Designed for understandability and simplicity

Popular and widely used

Page 172: [2C4]Clustered computing with CoreOS, fleet and etcd

etcd

A consistent, highly available key/value store– Shared configuration, distributed locking, ...

Driven by Raft Consensus algorithm similar to Paxos Designed for understandability and simplicity

Popular and widely used– Simple HTTP API + libraries in Go, Java, Python, Ruby, ...

Page 173: [2C4]Clustered computing with CoreOS, fleet and etcd

etcd connects CoreOS hosts

Page 174: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet + etcd

Page 175: [2C4]Clustered computing with CoreOS, fleet and etcd

● fleet needs a consistent view of the cluster to make scheduling decisions: etcd provides this view

fleet + etcd

Page 176: [2C4]Clustered computing with CoreOS, fleet and etcd

● fleet needs a consistent view of the cluster to make scheduling decisions: etcd provides this view– What units exist in the cluster?

fleet + etcd

Page 177: [2C4]Clustered computing with CoreOS, fleet and etcd

● fleet needs a consistent view of the cluster to make scheduling decisions: etcd provides this view– What units exist in the cluster?– What machines exist in the cluster?

fleet + etcd

Page 178: [2C4]Clustered computing with CoreOS, fleet and etcd

● fleet needs a consistent view of the cluster to make scheduling decisions: etcd provides this view– What units exist in the cluster?– What machines exist in the cluster?– What are their current states?

fleet + etcd

Page 179: [2C4]Clustered computing with CoreOS, fleet and etcd

● fleet needs a consistent view of the cluster to make scheduling decisions: etcd provides this view– What units exist in the cluster?– What machines exist in the cluster?– What are their current states?

● All unit files, unit state, machine state and scheduling information is stored in etcd

fleet + etcd

Page 180: [2C4]Clustered computing with CoreOS, fleet and etcd

etcd is great!

Page 181: [2C4]Clustered computing with CoreOS, fleet and etcd

● Fast and simple API

etcd is great!

Page 182: [2C4]Clustered computing with CoreOS, fleet and etcd

● Fast and simple API● Handles all cluster-level/inter-machine

communication so we don't have to

etcd is great!

Page 183: [2C4]Clustered computing with CoreOS, fleet and etcd

● Fast and simple API● Handles all cluster-level/inter-machine

communication so we don't have to● Powerful primitives:

etcd is great!

Page 184: [2C4]Clustered computing with CoreOS, fleet and etcd

● Fast and simple API● Handles all cluster-level/inter-machine

communication so we don't have to● Powerful primitives:

– Compare-and-Swap allows for atomic operations and implementing locking behaviour

etcd is great!

Page 185: [2C4]Clustered computing with CoreOS, fleet and etcd

● Fast and simple API● Handles all cluster-level/inter-machine

communication so we don't have to● Powerful primitives:

– Compare-and-Swap allows for atomic operations and implementing locking behaviour

– Watches (like pub-sub) provide event-driven behaviour

etcd is great!

Page 186: [2C4]Clustered computing with CoreOS, fleet and etcd

etcd is... not so great

Page 187: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: unreliable watches

etcd is... not so great

Page 188: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: unreliable watches– fleet initially used a purely event-driven architecture

etcd is... not so great

Page 189: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: unreliable watches– fleet initially used a purely event-driven architecture– watches in etcd used to trigger events

etcd is... not so great

Page 190: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: unreliable watches– fleet initially used a purely event-driven architecture– watches in etcd used to trigger events

● e.g. “machine down” event triggers unit rescheduling

etcd is... not so great

Page 191: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: unreliable watches– fleet initially used a purely event-driven architecture– watches in etcd used to trigger events

● e.g. “machine down” event triggers unit rescheduling– Highly efficient: only take action when necessary

etcd is... not so great

Page 192: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: unreliable watches– fleet initially used a purely event-driven architecture– watches in etcd used to trigger events

● e.g. “machine down” event triggers unit rescheduling– Highly efficient: only take action when necessary– Highly responsive: as soon as a user submits a new

unit, an event is triggered to schedule that unit

etcd is... not so great

Page 193: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: unreliable watches– fleet initially used a purely event-driven architecture– watches in etcd used to trigger events

● e.g. “machine down” event triggers unit rescheduling– Highly efficient: only take action when necessary– Highly responsive: as soon as a user submits a new

unit, an event is triggered to schedule that unit– Unfortunately, many places for things to go wrong...

etcd is... not so great

Page 194: [2C4]Clustered computing with CoreOS, fleet and etcd

Unreliable watches

Page 195: [2C4]Clustered computing with CoreOS, fleet and etcd

● Example:

Unreliable watches

Page 196: [2C4]Clustered computing with CoreOS, fleet and etcd

● Example:– etcd is undergoing a leader election or otherwise

unavailable, watches do not work during this period

Unreliable watches

Page 197: [2C4]Clustered computing with CoreOS, fleet and etcd

● Example:– etcd is undergoing a leader election or otherwise

unavailable, watches do not work during this period– Change occurs (e.g. a machine leaves the cluster)

Unreliable watches

Page 198: [2C4]Clustered computing with CoreOS, fleet and etcd

● Example:– etcd is undergoing a leader election or otherwise

unavailable, watches do not work during this period– Change occurs (e.g. a machine leaves the cluster)– Event is missed

Unreliable watches

Page 199: [2C4]Clustered computing with CoreOS, fleet and etcd

● Example:– etcd is undergoing a leader election or otherwise

unavailable, watches do not work during this period– Change occurs (e.g. a machine leaves the cluster)– Event is missed – fleet doesn't know machine is lost!

Unreliable watches

Page 200: [2C4]Clustered computing with CoreOS, fleet and etcd

● Example:– etcd is undergoing a leader election or otherwise

unavailable, watches do not work during this period– Change occurs (e.g. a machine leaves the cluster)– Event is missed – fleet doesn't know machine is lost!– Now fleet doesn't know to reschedule units that were

running on that machine

Unreliable watches

Page 201: [2C4]Clustered computing with CoreOS, fleet and etcd

etcd is... not so great

Page 202: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: limited event history

etcd is... not so great

Page 203: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: limited event history– etcd retains a history of all events that occur

etcd is... not so great

Page 204: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: limited event history– etcd retains a history of all events that occur– Can “watch” from an arbitrary point in the past, but..

etcd is... not so great

Page 205: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: limited event history– etcd retains a history of all events that occur– Can “watch” from an arbitrary point in the past, but..– History is a limited window!

etcd is... not so great

Page 206: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: limited event history– etcd retains a history of all events that occur– Can “watch” from an arbitrary point in the past, but..– History is a limited window!– With a busy cluster, watches can fall out of this window

etcd is... not so great

Page 207: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: limited event history– etcd retains a history of all events that occur– Can “watch” from an arbitrary point in the past, but..– History is a limited window!– With a busy cluster, watches can fall out of this window– Can't always replay event stream from the point we

want to

etcd is... not so great

Page 208: [2C4]Clustered computing with CoreOS, fleet and etcd

Limited event history

Page 209: [2C4]Clustered computing with CoreOS, fleet and etcd

● Example:– etcd holds history of last 1000 events

Limited event history

Page 210: [2C4]Clustered computing with CoreOS, fleet and etcd

● Example:– etcd holds history of last 1000 events– fleet sets watch at i=100 to watch for machine loss

Limited event history

Page 211: [2C4]Clustered computing with CoreOS, fleet and etcd

● Example:– etcd holds history of last 1000 events– fleet sets watch at i=100 to watch for machine loss– Meanwhile, many changes occur in other parts of the

keyspace, advancing index to i=1500

Limited event history

Page 212: [2C4]Clustered computing with CoreOS, fleet and etcd

● Example:– etcd holds history of last 1000 events– fleet sets watch at i=100 to watch for machine loss– Meanwhile, many changes occur in other parts of the

keyspace, advancing index to i=1500– Leader election/network hiccup occurs and severs

watch

Limited event history

Page 213: [2C4]Clustered computing with CoreOS, fleet and etcd

● Example:– etcd holds history of last 1000 events– fleet sets watch at i=100 to watch for machine loss– Meanwhile, many changes occur in other parts of the

keyspace, advancing index to i=1500– Leader election/network hiccup occurs and severs

watch– fleet tries to recreate watch at i=100 and fails:

Limited event history

Page 214: [2C4]Clustered computing with CoreOS, fleet and etcd

● Example:– etcd holds history of last 1000 events– fleet sets watch at i=100 to watch for machine loss– Meanwhile, many changes occur in other parts of the

keyspace, advancing index to i=1500– Leader election/network hiccup occurs and severs watch– fleet tries to recreate watch at i=100 and fails:

err="401: The event in requested index is outdated and cleared (the requested history has been cleared [1500/100])

Limited event history

Page 215: [2C4]Clustered computing with CoreOS, fleet and etcd

etcd is... not so great

Page 216: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: unreliable watches– Missed events lead to unrecoverable situations

● Problem: limited event history– Can't always replay entire event stream

etcd is... not so great

Page 217: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: unreliable watches– Missed events lead to unrecoverable situations

● Problem: limited event history– Can't always replay entire event stream

● Solution: move to “reconciler” model

etcd is... not so great

Page 218: [2C4]Clustered computing with CoreOS, fleet and etcd

Reconciler model

Page 219: [2C4]Clustered computing with CoreOS, fleet and etcd

Reconciler model

In a loop, run periodically until stopped:

Page 220: [2C4]Clustered computing with CoreOS, fleet and etcd

Reconciler model

In a loop, run periodically until stopped:

1. Retrieve current state (how the world is) and desired state (how the world should be) from datastore (etcd)

Page 221: [2C4]Clustered computing with CoreOS, fleet and etcd

Reconciler model

In a loop, run periodically until stopped:

1. Retrieve current state (how the world is) and desired state (how the world should be) from datastore (etcd)

2. Determine necessary actions to transform current state --> desired state

Page 222: [2C4]Clustered computing with CoreOS, fleet and etcd

Reconciler model

In a loop, run periodically until stopped:

1. Retrieve current state (how the world is) and desired state (how the world should be) from datastore (etcd)

2. Determine necessary actions to transform current state --> desired state

3. Perform actions and save results as new current state

Page 223: [2C4]Clustered computing with CoreOS, fleet and etcd

Reconciler model

Example: fleet's engine (scheduler) looks something like:for { // loop forever

select {

case <- stopChan: // if stopped, exit return

case <- time.After(5 * time.Minute): units = fetchUnits() machines = fetchMachines() schedule(units, machines)

}

}

Page 224: [2C4]Clustered computing with CoreOS, fleet and etcd

etcd is... not so great

Page 225: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: unreliable watches– Missed events lead to unrecoverable situations

● Problem: limited event history– Can't always replay entire event stream

● Solution: move to “reconciler” model

etcd is... not so great

Page 226: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: unreliable watches– Missed events lead to unrecoverable situations

● Problem: limited event history– Can't always replay entire event stream

● Solution: move to “reconciler” model– Less efficient, but extremely robust

etcd is... not so great

Page 227: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: unreliable watches– Missed events lead to unrecoverable situations

● Problem: limited event history– Can't always replay entire event stream

● Solution: move to “reconciler” model– Less efficient, but extremely robust– Still many paths for optimisation (e.g. using watches to

trigger reconciliations)

etcd is... not so great

Page 228: [2C4]Clustered computing with CoreOS, fleet and etcd

fleet – high level view

Single machine

Cluster

Page 229: [2C4]Clustered computing with CoreOS, fleet and etcd

golang

Page 230: [2C4]Clustered computing with CoreOS, fleet and etcd

● Standard language for all CoreOS projects (above OS)

golang

Page 231: [2C4]Clustered computing with CoreOS, fleet and etcd

● Standard language for all CoreOS projects (above OS)

– etcd

golang

Page 232: [2C4]Clustered computing with CoreOS, fleet and etcd

● Standard language for all CoreOS projects (above OS)

– etcd– fleet

golang

Page 233: [2C4]Clustered computing with CoreOS, fleet and etcd

● Standard language for all CoreOS projects (above OS)

– etcd– fleet– locksmith (semaphore for reboots during updates)

golang

Page 234: [2C4]Clustered computing with CoreOS, fleet and etcd

● Standard language for all CoreOS projects (above OS)

– etcd– fleet– locksmith (semaphore for reboots during updates)– etcdctl, updatectl, coreos-cloudinit, ...

golang

Page 235: [2C4]Clustered computing with CoreOS, fleet and etcd

● Standard language for all CoreOS projects (above OS)

– etcd– fleet– locksmith (semaphore for reboots during updates)– etcdctl, updatectl, coreos-cloudinit, ...

● fleet is ~10k LOC (and another ~10k LOC tests)

golang

Page 236: [2C4]Clustered computing with CoreOS, fleet and etcd

Go is great!

Page 237: [2C4]Clustered computing with CoreOS, fleet and etcd

● Fast!

Go is great!

Page 238: [2C4]Clustered computing with CoreOS, fleet and etcd

● Fast!– to write (concise syntax)

Go is great!

Page 239: [2C4]Clustered computing with CoreOS, fleet and etcd

● Fast!– to write (concise syntax)– to compile (builds typically <1s)

Go is great!

Page 240: [2C4]Clustered computing with CoreOS, fleet and etcd

● Fast!– to write (concise syntax)– to compile (builds typically <1s)– to run tests (O(seconds), including with race detection)

Go is great!

Page 241: [2C4]Clustered computing with CoreOS, fleet and etcd

● Fast!– to write (concise syntax)– to compile (builds typically <1s)– to run tests (O(seconds), including with race detection)– Never underestimate power of rapid iteration

Go is great!

Page 242: [2C4]Clustered computing with CoreOS, fleet and etcd

● Fast!– to write (concise syntax)– to compile (builds typically <1s)– to run tests (O(seconds), including with race detection)– Never underestimate power of rapid iteration

● Simple, powerful tooling

Go is great!

Page 243: [2C4]Clustered computing with CoreOS, fleet and etcd

● Fast!– to write (concise syntax)– to compile (builds typically <1s)– to run tests (O(seconds), including with race detection)– Never underestimate power of rapid iteration

● Simple, powerful tooling– Built in package management, code coverage, etc.

Go is great!

Page 244: [2C4]Clustered computing with CoreOS, fleet and etcd

Go is great!

Page 245: [2C4]Clustered computing with CoreOS, fleet and etcd

● Rich standard library

Go is great!

Page 246: [2C4]Clustered computing with CoreOS, fleet and etcd

● Rich standard library– “Batteries are included”

Go is great!

Page 247: [2C4]Clustered computing with CoreOS, fleet and etcd

● Rich standard library– “Batteries are included”– e.g.: completely self-hosted HTTP server, no need for

reverse proxies or worker systems to serve many concurrent HTTP requests

Go is great!

Page 248: [2C4]Clustered computing with CoreOS, fleet and etcd

● Rich standard library– “Batteries are included”– e.g.: completely self-hosted HTTP server, no need for

reverse proxies or worker systems to serve many concurrent HTTP requests

● Static compilation into a single binary

Go is great!

Page 249: [2C4]Clustered computing with CoreOS, fleet and etcd

● Rich standard library– “Batteries are included”– e.g.: completely self-hosted HTTP server, no need for

reverse proxies or worker systems to serve many concurrent HTTP requests

● Static compilation into a single binary– Ideal for a minimal OS with no libraries

Go is great!

Page 250: [2C4]Clustered computing with CoreOS, fleet and etcd

Go is... not so great

Page 251: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: managing third-party dependencies

Go is... not so great

Page 252: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: managing third-party dependencies– modular package management but: no versioning

Go is... not so great

Page 253: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: managing third-party dependencies– modular package management but: no versioning– import “github.com/coreos/fleet” - which SHA?

Go is... not so great

Page 254: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: managing third-party dependencies– modular package management but: no versioning– import “github.com/coreos/fleet” - which SHA?

● Solution: vendoring :-/

Go is... not so great

Page 255: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: managing third-party dependencies– modular package management but: no versioning– import “github.com/coreos/fleet” - which SHA?

● Solution: vendoring :-/– Copy entire source tree of dependencies into repository

Go is... not so great

Page 256: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: managing third-party dependencies– modular package management but: no versioning– import “github.com/coreos/fleet” - which SHA?

● Solution: vendoring :-/– Copy entire source tree of dependencies into repository– Slowly maturing tooling: goven, third_party.go, Godep

Go is... not so great

Page 257: [2C4]Clustered computing with CoreOS, fleet and etcd

Go is... not so great

Page 258: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: (relatively) large binary sizes

Go is... not so great

Page 259: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: (relatively) large binary sizes– “relatively”, but... CoreOS is the minimal OS

Go is... not so great

Page 260: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: (relatively) large binary sizes– “relatively”, but... CoreOS is the minimal OS– ~10MB per binary, many tools, quickly adds up

Go is... not so great

Page 261: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: (relatively) large binary sizes– “relatively”, but... CoreOS is the minimal OS– ~10MB per binary, many tools, quickly adds up

● Solutions:

Go is... not so great

Page 262: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: (relatively) large binary sizes– “relatively”, but... CoreOS is the minimal OS– ~10MB per binary, many tools, quickly adds up

● Solutions: – upgrading golang!

Go is... not so great

Page 263: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: (relatively) large binary sizes– “relatively”, but... CoreOS is the minimal OS– ~10MB per binary, many tools, quickly adds up

● Solutions: – upgrading golang!

● go1.2 to go1.3 = ~25% reduction

Go is... not so great

Page 264: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: (relatively) large binary sizes– “relatively”, but... CoreOS is the minimal OS– ~10MB per binary, many tools, quickly adds up

● Solutions: – upgrading golang!

● go1.2 to go1.3 = ~25% reduction

– sharing the binary between tools

Go is... not so great

Page 265: [2C4]Clustered computing with CoreOS, fleet and etcd

Sharing a binary

Page 266: [2C4]Clustered computing with CoreOS, fleet and etcd

● client/daemon often share much of the same code

Sharing a binary

Page 267: [2C4]Clustered computing with CoreOS, fleet and etcd

● client/daemon often share much of the same code– Encapsulate multiple tools in one binary, symlink the

different command names, switch off command name

Sharing a binary

Page 268: [2C4]Clustered computing with CoreOS, fleet and etcd

● client/daemon often share much of the same code– Encapsulate multiple tools in one binary, symlink the

different command names, switch off command name– Example: fleetd/fleetctl

Sharing a binary

Page 269: [2C4]Clustered computing with CoreOS, fleet and etcd

● client/daemon often share much of the same code– Encapsulate multiple tools in one binary, symlink the

different command names, switch off command name– Example: fleetd/fleetctl

Sharing a binary

func main() { switch os.Args[0] { case “fleetctl”: Fleetctl() case “fleetd”: Fleetd() }}

Page 270: [2C4]Clustered computing with CoreOS, fleet and etcd

● client/daemon often share much of the same code– Encapsulate multiple tools in one binary, symlink the

different command names, switch off command name– Example: fleetd/fleetctl

Sharing a binary

func main() { switch os.Args[0] { case “fleetctl”: Fleetctl() case “fleetd”: Fleetd() }}

Before:

9150032 fleetctl 8567416 fleetd

After:

11052256 fleetctl 8 fleetd -> fleetctl

Page 271: [2C4]Clustered computing with CoreOS, fleet and etcd

Go is... not so great

Page 272: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: young language => immature libraries

Go is... not so great

Page 273: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: young language => immature libraries– CLI frameworks

Go is... not so great

Page 274: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: young language => immature libraries– CLI frameworks– godbus, go-systemd, go-etcd :-(

Go is... not so great

Page 275: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: young language => immature libraries– CLI frameworks– godbus, go-systemd, go-etcd :-(

● Solutions:

Go is... not so great

Page 276: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: young language => immature libraries– CLI frameworks– godbus, go-systemd, go-etcd :-(

● Solutions:– Keep it simple

Go is... not so great

Page 277: [2C4]Clustered computing with CoreOS, fleet and etcd

● Problem: young language => immature libraries– CLI frameworks– godbus, go-systemd, go-etcd :-(

● Solutions:– Keep it simple– Roll your own (e.g. fleetctl's command line)

Go is... not so great

Page 278: [2C4]Clustered computing with CoreOS, fleet and etcd

type Command struct {

Name string

Summary string

Usage string

Description string

Flags flag.FlagSet

Run func(args []string) int

}

fleetctl CLI

Page 279: [2C4]Clustered computing with CoreOS, fleet and etcd

Wrap up/recap

Page 280: [2C4]Clustered computing with CoreOS, fleet and etcd

Wrap up/recap

CoreOS Linux

Page 281: [2C4]Clustered computing with CoreOS, fleet and etcd

Wrap up/recap

CoreOS Linux– Minimal OS with cluster capabilities built-in

Page 282: [2C4]Clustered computing with CoreOS, fleet and etcd

Wrap up/recap

CoreOS Linux– Minimal OS with cluster capabilities built-in

– Containerized applications --> a[u]tom[at]ic updates

Page 283: [2C4]Clustered computing with CoreOS, fleet and etcd

Wrap up/recap

CoreOS Linux– Minimal OS with cluster capabilities built-in

– Containerized applications --> a[u]tom[at]ic updates fleet

Page 284: [2C4]Clustered computing with CoreOS, fleet and etcd

Wrap up/recap

CoreOS Linux– Minimal OS with cluster capabilities built-in

– Containerized applications --> a[u]tom[at]ic updates fleet

– Simple, powerful cluster-level application manager

Page 285: [2C4]Clustered computing with CoreOS, fleet and etcd

Wrap up/recap

CoreOS Linux– Minimal OS with cluster capabilities built-in

– Containerized applications --> a[u]tom[at]ic updates fleet

– Simple, powerful cluster-level application manager

– Glue between local init system (systemd) and cluster-level awareness (etcd)

Page 286: [2C4]Clustered computing with CoreOS, fleet and etcd

Wrap up/recap

CoreOS Linux– Minimal OS with cluster capabilities built-in– Containerized applications --> a[u]tom[at]ic updates

fleet– Simple, powerful cluster-level application manager– Glue between local init system (systemd) and cluster-level

awareness (etcd)– golang++

Page 287: [2C4]Clustered computing with CoreOS, fleet and etcd

Questions?

Page 288: [2C4]Clustered computing with CoreOS, fleet and etcd

Thank you :-)

● Everything is open source – join us!– https://github.com/coreos

● Any more questions, feel free to email– [email protected]

● CoreOS stickers!

Page 289: [2C4]Clustered computing with CoreOS, fleet and etcd

References

● CoreOS updates https://coreos.com/using-coreos/updates/

● Omaha protocol https://code.google.com/p/omaha/wiki/ServerProtocol

● Raft algorithmhttp://raftconsensus.github.io/

Page 290: [2C4]Clustered computing with CoreOS, fleet and etcd

References

● fleet https://github.com/coreos/fleet

● etcd https://github.com/coreos/etcd

● toolboxhttps://github.com/coreos/toolbox

● systemd-dockerhttps://github.com/ibuildthecloud/systemd-docker

● systemd-nspawn

http://0pointer.de/public/systemd-man/systemd-nspawn.html

Page 291: [2C4]Clustered computing with CoreOS, fleet and etcd

Brief side-note: locksmith

● Reboot manager for CoreOS● Uses a semaphore in etcd to co-ordinate reboots● Each machine in the cluster:

1. Downloads and applies update

2. Takes lock in etcd (using Compare-And-Swap)

3. Reboots and releases lock