[2C4]Clustered computing with CoreOS, fleet and etcd

Clustered computing with CoreOS, fleet and etcd

DEVIEW 2014

Jonathan BoulleCoreOS, Inc.

@baronboulle

jonboulle

Who am I?

Jonathan Boulle

South Africa -> Australia -> London -> San Francisco

@baronboulle

jonboulle

Who am I?

Jonathan Boulle


Red Hat -> Twitter -> CoreOS

@baronboulle

jonboulle

Who am I?

Jonathan Boulle


Red Hat -> Twitter -> CoreOS

Linux, Python, Go, FOSS

@baronboulle

jonboulle

Who am I?

Jonathan Boulle

Agenda

Agenda

● CoreOS Linux

Agenda

● CoreOS Linux– Securing the internet

Agenda

● CoreOS Linux– Securing the internet– Application containers

Agenda

● CoreOS Linux– Securing the internet– Application containers– Automatic updates

Agenda


● fleet

Agenda


● fleet– cluster-level init system

Agenda


● fleet– cluster-level init system– etcd + systemd

Agenda



● fleet and...

Agenda



● fleet and...– systemd: the good, the bad

Agenda




– etcd: the good, the bad

Agenda





– Golang: the good, the bad

Agenda





– Golang: the good, the bad

● Q&A

CoreOS Linux

CoreOS Linux

A minimal, automatically-updated Linux distribution,

designed for distributed systems.

Why?

● CoreOS mission: “Secure the internet”

Why?

● CoreOS mission: “Secure the internet”● Status quo: set up a server and never touch it

Why?

● CoreOS mission: “Secure the internet”● Status quo: set up a server and never touch it● Internet is full of servers running years-old

software with dozens of vulnerabilities

Why?


software with dozens of vulnerabilities● CoreOS: make updating the default, seamless

option

Why?


software with dozens of vulnerabilities● CoreOS: make updating the default, seamless

option● Regular

Why?

● CoreOS mission: “Secure the internet”● Status quo: set up a server and never touch it● Internet is full of servers running years-old software

with dozens of vulnerabilities● CoreOS: make updating the default, seamless option

● Regular● Reliable

Why?

● CoreOS mission: “Secure the internet”● Status quo: set up a server and never touch it● Internet is full of servers running years-old software

with dozens of vulnerabilities● CoreOS: make updating the default, seamless option

● Regular● Reliable● Automatic

How do we achieve this?


● Containerization of applications


● Containerization of applications● Self-updating operating system


● Containerization of applications● Self-updating operating system● Distributed systems tooling to make applications

resilient to updates

KERNELSYSTEMDSSHDOCKER

PYTHONJAVANGINXMYSQLOPENSSL

dist

ro d

istr

o di

stro

dis

tro

dist

ro d

istr

o di

stro

dis

tro

dist

ro d

istr

o di

stro

APP


PYTHONJAVANGINXMYSQLOPENSSL

dist

ro d

istr

o di

stro

dis

tro

dist

ro d

istr

o di

stro

dis

tro

dist

ro d

istr

o di

stro

APP

Application Containers (e.g. Docker)


- Minimal base OS (~100MB)- Vanilla upstream components wherever possible- Decoupled from applications- Automatic, atomic updates

Automatic updates

Automatic updates

How do updates work?

Automatic updates

How do updates work?● Omaha protocol (check-in/retrieval)

Automatic updates


– Simple XML-over-HTTP protocol developed by Google to facilitate polling and pulling updates from a server

Client sends application id andcurrent version to the update server

Omaha protocol

<request protocol="3.0" version="CoreOSUpdateEngine-0.1.0.0"> <app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" version="410.0.0" track="alpha" from_track="alpha"> <event eventtype="3"></event> </app></request>


Update server responds with theURL of an update to be applied

Omaha protocol

<url codebase="https://commondatastorage.googleapis.com/update-storage.core-os.net/amd64-usr/452.0.0/"></url> <package hash="D0lBAMD1Fwv8YqQuDYEAjXw6YZY=" name="update.gz" size="103401155" required="false">



Client downloads data, verifieshash & cryptographic signature,and applies the update

Omaha protocol



Client downloads data, verifieshash & cryptographic signature,and applies the update

Updater exits with response codethen reports the update to theupdate server

Omaha protocol

Automatic updates

Automatic updates



● Active/passive read-only root partitions

Automatic updates



● Active/passive read-only root partitions– One for running the live system, one for updates

Booted off partition A.Download update and committo partition B. Change GPT.

Active/passive root partitions

Reboot (into Partition B).If tests succeed, continue normal operation and mark success in GPT.


But what if partition B fails update tests...


Change GPT to point to previous partition, reboot.Try update again later.



core-01 ~ # cgpt show /dev/sda3 start size contents 264192 2097152 Label: "USR-A" Type: Alias for coreos-rootfs UUID: 7130C94A-213A-4E5A-8E26-6CCE9662 Attr: priority=1 tries=0 successful=1

core-01 ~ # cgpt show /dev/sda4 start size contents2492416 2097152 Label: "USR-B" Type: Alias for coreos-rootfs UUID: E03DD35C-7C2D-4A47-B3FE-27F15780A Attr: priority=2 tries=1 successful=0

Active/passive /usr partitions

core-01 ~ # cgpt show /dev/sda3 start size contents 264192 2097152 Label: "USR-A" Type: Alias for coreos-rootfs UUID: 7130C94A-213A-4E5A-8E26-6CCE9662 Attr: priority=1 tries=0 successful=1

core-01 ~ # cgpt show /dev/sda4 start size contents2492416 2097152 Label: "USR-B" Type: Alias for coreos-rootfs UUID: E03DD35C-7C2D-4A47-B3FE-27F15780A Attr: priority=2 tries=1 successful=0


● Single image containing most of the OS


● Single image containing most of the OS– Mounted read-only onto /usr



– / is mounted read-write on top (persistent data)



– / is mounted read-write on top (persistent data)– Parts of /etc generated dynamically at boot



– / is mounted read-write on top (persistent data)– Parts of /etc generated dynamically at boot– A lot of work moving default configs from /etc to /usr




# /etc/nsswitch.conf:

passwd: files usrfilesshadow: files usrfilesgroup: files usrfiles




# /etc/nsswitch.conf:

passwd: files usrfilesshadow: files usrfilesgroup: files usrfiles

/usr/share/baselayout/passwd/usr/share/baselayout/shadow/usr/share/baselayout/group

Atomic updates

Atomic updates

● Entire OS is a single read-only imagecore-01 ~ # touch /usr/bin/foo touch: cannot touch '/usr/bin/foo': Read-only file system

Atomic updates


– Easy to verify cryptographically● sha1sum on AWS or bare metal gives the same result

Atomic updates


– Easy to verify cryptographically● sha1sum on AWS or bare metal gives the same result

– No chance of inconsistencies due to partial updates● e.g. pull a plug on a CentOS system during a yum update● At large scale, such events are inevitable

Automatic, atomic updates are great! But...

● Problem:


● Problem: – updates still require a reboot (to use new kernel and

mount new filesystem)



mount new filesystem)– Reboots cause application downtime...




● Solution: fleet




● Solution: fleet– Highly available, fault tolerant, distributed process

scheduler




● Solution: fleet– Highly available, fault tolerant, distributed process

scheduler ● ... The way to run applications on a CoreOS cluster


● Problem: – updates still require a reboot (to use new kernel and mount

new filesystem)– Reboots cause application downtime...

● Solution: fleet– Highly available, fault tolerant, distributed process scheduler

● ... The way to run applications on a CoreOS cluster– fleet keeps applications running during server downtime


fleet – the “cluster-level init system”


fleet is the abstraction between machine and application:


fleet is the abstraction between machine and application:– init system manages process on a machine

– fleet manages applications on a cluster of machines


fleet is the abstraction between machine and application:– init system manages process on a machine

– fleet manages applications on a cluster of machines Similar to Mesos, but very different architecture

(e.g. based on etcd/Raft, not Zookeeper/Paxos)


fleet is the abstraction between machine and application:– init system manages process on a machine– fleet manages applications on a cluster of machines

Similar to Mesos, but very different architecture (e.g. based on etcd/Raft, not Zookeeper/Paxos)

Uses systemd for machine-level process management, etcd for cluster-level co-ordination

fleet – low level view


fleetd binary (running on all CoreOS nodes)



– encapsulates two roles:




• engine (cluster-level unit scheduling – talks to etcd)





• agent (local unit management – talks to etcd and systemd)





• agent (local unit management – talks to etcd and systemd) fleetctl command-line administration tool






– create, destroy, start, stop units







– Retrieve current status of units/machines in the cluster




• engine (cluster-level unit scheduling – talks to etcd)• agent (local unit management – talks to etcd and systemd)

fleetctl command-line administration tool


– Retrieve current status of units/machines in the cluster HTTP API

fleet – high level view

Single machine

Cluster

systemd

systemd

Linux init system (PID 1) – manages processes

systemd

Linux init system (PID 1) – manages processes– Relatively new, replaces SysVinit, upstart, OpenRC, ...

systemd


– Being adopted by all major Linux distributions

systemd


– Being adopted by all major Linux distributions Fundamental concept is the unit

systemd


– Being adopted by all major Linux distributions Fundamental concept is the unit

– Units include services (e.g. applications), mount points, sockets, timers, etc.

systemd

Linux init system (PID 1) – manages processes– Relatively new, replaces SysVinit, upstart, OpenRC, ...– Being adopted by all major Linux distributions

Fundamental concept is the unit– Units include services (e.g. applications), mount points,

sockets, timers, etc.– Each unit configured with a simple unit file

Quick comparison

fleet + systemd

fleet + systemd

systemd exposes a D-Bus interface

fleet + systemd

systemd exposes a D-Bus interface– D-Bus: message bus system for IPC on Linux

fleet + systemd


– One-to-one messaging (methods), plus pub/sub abilities

fleet + systemd


– One-to-one messaging (methods), plus pub/sub abilities fleet uses godbus to communicate with systemd

fleet + systemd


– One-to-one messaging (methods), plus pub/sub abilities fleet uses godbus to communicate with systemd

– Sending commands: StartUnit, StopUnit

fleet + systemd

systemd exposes a D-Bus interface– D-Bus: message bus system for IPC on Linux– One-to-one messaging (methods), plus pub/sub abilities

fleet uses godbus to communicate with systemd– Sending commands: StartUnit, StopUnit

– Retrieving current state of units (to publish to the cluster)

systemd is great!

systemd is great!

Automatically handles:

systemd is great!

Automatically handles:– Process daemonization

systemd is great!


– Resource isolation/containment (cgroups)

systemd is great!


– Resource isolation/containment (cgroups)• e.g. MemoryLimit=512M

systemd is great!



– Health-checking, restarting failed services

systemd is great!




– Logging (journal)

systemd is great!




– Logging (journal) • applications can just write to stdout, systemd adds metadata

systemd is great!

Automatically handles:– Process daemonization– Resource isolation/containment (cgroups)

• e.g. MemoryLimit=512M

– Health-checking, restarting failed services– Logging (journal)

• applications can just write to stdout, systemd adds metadata

– Timers, inter-unit dependencies, socket activation, ...

fleet + systemd

fleet + systemd

systemd takes care of things so we don't have to

fleet + systemd

systemd takes care of things so we don't have to fleet configuration is just systemd unit files

fleet + systemd

systemd takes care of things so we don't have to fleet configuration is just systemd unit files fleet extends systemd to the cluster-level, and adds

some features of its own (using [X-Fleet])

fleet + systemd


some features of its own (using [X-Fleet])– Template units (run n identical copies of a unit)

fleet + systemd


some features of its own (using [X-Fleet])– Template units (run n identical copies of a unit)

– Global units (run a unit everywhere in the cluster)

fleet + systemd


some features of its own (using [X-Fleet])– Template units (run n identical copies of a unit)– Global units (run a unit everywhere in the cluster)– Machine metadata (run only on certain machines)

systemd is... not so great


Problem: unreliable pub/sub


Problem: unreliable pub/sub– fleet agent initially used a systemd D-Bus subscription

to track unit status




– Every change in unit state in systemd triggers an event in fleet (e.g. “publish this new state to the cluster”)




– Every change in unit state in systemd triggers an event in fleet (e.g. “publish this new state to the cluster”)

– Under heavy load, or byzantine conditions, unit state changes would be dropped



to track unit status– Every change in unit state in systemd triggers an event

in fleet (e.g. “publish this new state to the cluster”)– Under heavy load, or byzantine conditions, unit state

changes would be dropped– As a result, unit state in the cluster became stale


Problem: unreliable pub/sub


Problem: unreliable pub/sub Solution: polling for unit states



– Every n seconds, retrieve state of units from systemd, and synchronize with cluster




– Less efficient, but much more reliable




– Less efficient, but much more reliable

– Optimize by caching state and only publishing changes




– Less efficient, but much more reliable– Optimize by caching state and only publishing changes– Any state inconsistencies are quickly fixed

systemd (and docker) are... not so great


Problem: poor integration with Docker


Problem: poor integration with Docker– Docker is de facto application container manager


Problem: poor integration with Docker– Docker is de facto application container manager

– Docker and systemd do not always play nicely together..


Problem: poor integration with Docker– Docker is de facto application container manager– Docker and systemd do not always play nicely together..– Both Docker and systemd manage cgroups and

processes, and when the two are trying to manage the same thing, the results are mixed


Example: sending signals to a container


Example: sending signals to a container– Given a simple container:

[Service]ExecStart=/usr/bin/docker run busybox /bin/bash -c \ "while true; do echo Hello World; sleep 1; done"




– Try to kill it with systemctl kill hello.service




– Try to kill it with systemctl kill hello.service– ... Nothing happens




– Try to kill it with systemctl kill hello.service– ... Nothing happens– Kill command sends SIGTERM, but bash in a Docker

container has PID1, which happily ignores the signal...


Example: sending signals to a container


Example: sending signals to a container OK, SIGTERM didn't work, so escalate to SIGKILL:

systemctl kill -s SIGKILL hello.service




● Now the systemd service is gone: hello.service: main process exited, code=killed, status=9/KILL




● Now the systemd service is gone: hello.service: main process exited, code=killed, status=9/KILL

● But... the Docker container still exists?# docker ps CONTAINER ID COMMAND STATUS NAMES7c7cf8ffabb6 /bin/sh -c 'while tr Up 31 seconds hello # ps -ef|grep '[d]ocker run'root 24231 1 0 03:49 ? 00:00:00 /usr/bin/docker run -name hello ...


Why?


Why? – Docker client does not run containers itself; it just sends

a command to the Docker daemon which actually forks and starts running the container




– systemd expects processes to fork directly so they will be contained under the same cgroup tree




– systemd expects processes to fork directly so they will be contained under the same cgroup tree

– Since the Docker daemon's cgroup is entirely separate, systemd cannot keep track of the forked container


# systemctl cat hello.service[Service]ExecStart=/bin/bash -c 'while true; do echo Hello World; sleep 1; done'

# systemd-cgls... ├─hello.service │ ├─23201 /bin/bash -c while true; do echo Hello World; sleep 1; done │ └─24023 sleep 1


# systemctl cat hello.service[Service]ExecStart=/usr/bin/docker run -name hello busybox /bin/sh -c \"while true; do echo Hello World; sleep 1; done"

# systemd-cgls...│ ├─hello.service│ │ └─24231 /usr/bin/docker run -name hello busybox /bin/sh -c while true; do echo Hello World; sleep 1; done...│ ├─docker-51a57463047b65487ec80a1dc8b8c9ea14a396c7a49c1e23919d50bdafd4fefb.scope│ │ ├─24240 /bin/sh -c while true; do echo Hello World; sleep 1; done│ │ └─24553 sleep 1


Problem: poor integration with Docker Solution: ... work in progress



– systemd-docker – small application that moves cgroups of Docker containers back under systemd's cgroup




– Use Docker for image management, but systemd-nspawn for runtime (e.g. CoreOS's toolbox)




– Use Docker for image management, but systemd-nspawn for runtime (e.g. CoreOS's toolbox)

– (proposed) Docker standalone mode: client starts container directly rather than through daemon


Single machine

Cluster

etcd

A consistent, highly available key/value store

etcd

A consistent, highly available key/value store– Shared configuration, distributed locking, ...

etcd


Driven by Raft

etcd


Driven by Raft Consensus algorithm similar to Paxos

etcd


Driven by Raft Consensus algorithm similar to Paxos Designed for understandability and simplicity

etcd



Popular and widely used

etcd



Popular and widely used– Simple HTTP API + libraries in Go, Java, Python, Ruby, ...

etcd connects CoreOS hosts

fleet + etcd

● fleet needs a consistent view of the cluster to make scheduling decisions: etcd provides this view

fleet + etcd

● fleet needs a consistent view of the cluster to make scheduling decisions: etcd provides this view– What units exist in the cluster?

fleet + etcd

● fleet needs a consistent view of the cluster to make scheduling decisions: etcd provides this view– What units exist in the cluster?– What machines exist in the cluster?

fleet + etcd

● fleet needs a consistent view of the cluster to make scheduling decisions: etcd provides this view– What units exist in the cluster?– What machines exist in the cluster?– What are their current states?

fleet + etcd

● fleet needs a consistent view of the cluster to make scheduling decisions: etcd provides this view– What units exist in the cluster?– What machines exist in the cluster?– What are their current states?

● All unit files, unit state, machine state and scheduling information is stored in etcd

fleet + etcd

etcd is great!

● Fast and simple API

etcd is great!

● Fast and simple API● Handles all cluster-level/inter-machine

communication so we don't have to

etcd is great!


communication so we don't have to● Powerful primitives:

etcd is great!



– Compare-and-Swap allows for atomic operations and implementing locking behaviour

etcd is great!



– Compare-and-Swap allows for atomic operations and implementing locking behaviour

– Watches (like pub-sub) provide event-driven behaviour

etcd is great!

etcd is... not so great

● Problem: unreliable watches


● Problem: unreliable watches– fleet initially used a purely event-driven architecture


● Problem: unreliable watches– fleet initially used a purely event-driven architecture– watches in etcd used to trigger events



● e.g. “machine down” event triggers unit rescheduling



● e.g. “machine down” event triggers unit rescheduling– Highly efficient: only take action when necessary



● e.g. “machine down” event triggers unit rescheduling– Highly efficient: only take action when necessary– Highly responsive: as soon as a user submits a new

unit, an event is triggered to schedule that unit



● e.g. “machine down” event triggers unit rescheduling– Highly efficient: only take action when necessary– Highly responsive: as soon as a user submits a new

unit, an event is triggered to schedule that unit– Unfortunately, many places for things to go wrong...


Unreliable watches

● Example:

Unreliable watches

● Example:– etcd is undergoing a leader election or otherwise

unavailable, watches do not work during this period

Unreliable watches


unavailable, watches do not work during this period– Change occurs (e.g. a machine leaves the cluster)

Unreliable watches


unavailable, watches do not work during this period– Change occurs (e.g. a machine leaves the cluster)– Event is missed

Unreliable watches


unavailable, watches do not work during this period– Change occurs (e.g. a machine leaves the cluster)– Event is missed – fleet doesn't know machine is lost!

Unreliable watches


unavailable, watches do not work during this period– Change occurs (e.g. a machine leaves the cluster)– Event is missed – fleet doesn't know machine is lost!– Now fleet doesn't know to reschedule units that were

running on that machine

Unreliable watches

● Problem: limited event history


● Problem: limited event history– etcd retains a history of all events that occur


● Problem: limited event history– etcd retains a history of all events that occur– Can “watch” from an arbitrary point in the past, but..


● Problem: limited event history– etcd retains a history of all events that occur– Can “watch” from an arbitrary point in the past, but..– History is a limited window!


● Problem: limited event history– etcd retains a history of all events that occur– Can “watch” from an arbitrary point in the past, but..– History is a limited window!– With a busy cluster, watches can fall out of this window


● Problem: limited event history– etcd retains a history of all events that occur– Can “watch” from an arbitrary point in the past, but..– History is a limited window!– With a busy cluster, watches can fall out of this window– Can't always replay event stream from the point we

want to


Limited event history

● Example:– etcd holds history of last 1000 events


● Example:– etcd holds history of last 1000 events– fleet sets watch at i=100 to watch for machine loss


● Example:– etcd holds history of last 1000 events– fleet sets watch at i=100 to watch for machine loss– Meanwhile, many changes occur in other parts of the

keyspace, advancing index to i=1500



keyspace, advancing index to i=1500– Leader election/network hiccup occurs and severs

watch



keyspace, advancing index to i=1500– Leader election/network hiccup occurs and severs

watch– fleet tries to recreate watch at i=100 and fails:



keyspace, advancing index to i=1500– Leader election/network hiccup occurs and severs watch– fleet tries to recreate watch at i=100 and fails:

err="401: The event in requested index is outdated and cleared (the requested history has been cleared [1500/100])


● Problem: unreliable watches– Missed events lead to unrecoverable situations

● Problem: limited event history– Can't always replay entire event stream




● Solution: move to “reconciler” model


Reconciler model

Reconciler model

In a loop, run periodically until stopped:

Reconciler model


1. Retrieve current state (how the world is) and desired state (how the world should be) from datastore (etcd)

Reconciler model



2. Determine necessary actions to transform current state --> desired state

Reconciler model



2. Determine necessary actions to transform current state --> desired state

3. Perform actions and save results as new current state

Reconciler model

Example: fleet's engine (scheduler) looks something like:for { // loop forever

select {

case <- stopChan: // if stopped, exit return

case <- time.After(5 * time.Minute): units = fetchUnits() machines = fetchMachines() schedule(units, machines)

}

}



● Solution: move to “reconciler” model




● Solution: move to “reconciler” model– Less efficient, but extremely robust




● Solution: move to “reconciler” model– Less efficient, but extremely robust– Still many paths for optimisation (e.g. using watches to

trigger reconciliations)



Single machine

Cluster

golang

● Standard language for all CoreOS projects (above OS)

golang


– etcd

golang


– etcd– fleet

golang


– etcd– fleet– locksmith (semaphore for reboots during updates)

golang


– etcd– fleet– locksmith (semaphore for reboots during updates)– etcdctl, updatectl, coreos-cloudinit, ...

golang


– etcd– fleet– locksmith (semaphore for reboots during updates)– etcdctl, updatectl, coreos-cloudinit, ...

● fleet is ~10k LOC (and another ~10k LOC tests)

golang

Go is great!

● Fast!

Go is great!

● Fast!– to write (concise syntax)

Go is great!

● Fast!– to write (concise syntax)– to compile (builds typically <1s)

Go is great!

● Fast!– to write (concise syntax)– to compile (builds typically <1s)– to run tests (O(seconds), including with race detection)

Go is great!

● Fast!– to write (concise syntax)– to compile (builds typically <1s)– to run tests (O(seconds), including with race detection)– Never underestimate power of rapid iteration

Go is great!


● Simple, powerful tooling

Go is great!


● Simple, powerful tooling– Built in package management, code coverage, etc.

Go is great!

Go is great!

● Rich standard library

Go is great!

● Rich standard library– “Batteries are included”

Go is great!

● Rich standard library– “Batteries are included”– e.g.: completely self-hosted HTTP server, no need for

reverse proxies or worker systems to serve many concurrent HTTP requests

Go is great!



● Static compilation into a single binary

Go is great!



● Static compilation into a single binary– Ideal for a minimal OS with no libraries

Go is great!

Go is... not so great

● Problem: managing third-party dependencies


● Problem: managing third-party dependencies– modular package management but: no versioning


● Problem: managing third-party dependencies– modular package management but: no versioning– import “github.com/coreos/fleet” - which SHA?



● Solution: vendoring :-/



● Solution: vendoring :-/– Copy entire source tree of dependencies into repository



● Solution: vendoring :-/– Copy entire source tree of dependencies into repository– Slowly maturing tooling: goven, third_party.go, Godep


● Problem: (relatively) large binary sizes


● Problem: (relatively) large binary sizes– “relatively”, but... CoreOS is the minimal OS


● Problem: (relatively) large binary sizes– “relatively”, but... CoreOS is the minimal OS– ~10MB per binary, many tools, quickly adds up



● Solutions:



● Solutions: – upgrading golang!




● go1.2 to go1.3 = ~25% reduction




● go1.2 to go1.3 = ~25% reduction

– sharing the binary between tools


Sharing a binary

● client/daemon often share much of the same code

Sharing a binary

● client/daemon often share much of the same code– Encapsulate multiple tools in one binary, symlink the

different command names, switch off command name

Sharing a binary


different command names, switch off command name– Example: fleetd/fleetctl

Sharing a binary



Sharing a binary

func main() { switch os.Args[0] { case “fleetctl”: Fleetctl() case “fleetd”: Fleetd() }}



Sharing a binary

func main() { switch os.Args[0] { case “fleetctl”: Fleetctl() case “fleetd”: Fleetd() }}

Before:

9150032 fleetctl 8567416 fleetd

After:

11052256 fleetctl 8 fleetd -> fleetctl

● Problem: young language => immature libraries


● Problem: young language => immature libraries– CLI frameworks


● Problem: young language => immature libraries– CLI frameworks– godbus, go-systemd, go-etcd :-(



● Solutions:



● Solutions:– Keep it simple



● Solutions:– Keep it simple– Roll your own (e.g. fleetctl's command line)


type Command struct {

Name string

Summary string

Usage string

Description string

Flags flag.FlagSet

Run func(args []string) int

}

fleetctl CLI

Wrap up/recap

Wrap up/recap

CoreOS Linux

Wrap up/recap

CoreOS Linux– Minimal OS with cluster capabilities built-in

Wrap up/recap


– Containerized applications --> a[u]tom[at]ic updates

Wrap up/recap


– Containerized applications --> a[u]tom[at]ic updates fleet

Wrap up/recap



– Simple, powerful cluster-level application manager

Wrap up/recap



– Simple, powerful cluster-level application manager

– Glue between local init system (systemd) and cluster-level awareness (etcd)

Wrap up/recap

CoreOS Linux– Minimal OS with cluster capabilities built-in– Containerized applications --> a[u]tom[at]ic updates

fleet– Simple, powerful cluster-level application manager– Glue between local init system (systemd) and cluster-level

awareness (etcd)– golang++

Questions?

Thank you :-)

● Everything is open source – join us!– https://github.com/coreos

● Any more questions, feel free to email– [email protected]

● CoreOS stickers!

https://github.com/coreos

mailto:[email protected]

References

● CoreOS updates https://coreos.com/using-coreos/updates/

● Omaha protocol https://code.google.com/p/omaha/wiki/ServerProtocol

● Raft algorithmhttp://raftconsensus.github.io/

https://coreos.com/using-coreos/updates/

https://code.google.com/p/omaha/wiki/ServerProtocol

http://raftconsensus.github.io/

References

● fleet https://github.com/coreos/fleet

● etcd https://github.com/coreos/etcd

● toolboxhttps://github.com/coreos/toolbox

● systemd-dockerhttps://github.com/ibuildthecloud/systemd-docker

● systemd-nspawn

http://0pointer.de/public/systemd-man/systemd-nspawn.html

https://github.com/coreos/fleet

https://github.com/coreos/etcd

https://github.com/coreos/toolbox

https://github.com/ibuildthecloud/systemd-docker

http://0pointer.de/public/systemd-man/systemd-nspawn.html

Brief side-note: locksmith

● Reboot manager for CoreOS● Uses a semaphore in etcd to co-ordinate reboots● Each machine in the cluster:

1. Downloads and applies update

2. Takes lock in etcd (using Compare-And-Swap)

3. Reboots and releases lock

[2C4]Clustered computing with CoreOS, fleet and etcd

Technology

Transcript of [2C4]Clustered computing with CoreOS, fleet and etcd