Ignacy Kowalczyk

Post on 15-Apr-2017

363 views 0 download

Transcript of Ignacy Kowalczyk

Kubernetes: lessons learnt from cluster management at Google

Ignacy

Kowalczyk

Engineering

Manager

Google

Image by Connie Zhou

For the past 15 years, Google has been building out the world’s fastest, most powerful, highest quality cloud infrastructure on the planet.

job hello_world = {

runtime = { cell = 'ic' } // Cell (data center) to run in

binary = '.../hello_world_webserver' // Program to run

args = { port = '%port%' } // Command line parameters

requirements = { // Resource requirements (optional)

ram = 100M

disk = 100M

cpu = 0.1

}

replicas = 10000 // Number of tasks

}

User view

Google Cloud PlatformImages by Connie

Zhou

Google has been developing and using containers to manage our applications for over 10 years.

Google Cloud Platform

Everything at Google runs in containers:

• Gmail, Web Search, Maps, ...• MapReduce, batch, ...• GFS, Colossus, ...• Even Google’s Cloud Platform:

our VMs run in containers!

We launch over 2 billion containers per week

What justhappened?

web browsers

BorgMaster

link shard

UI shardBorgMaster

link shard

UI shardBorgMaster

link shard

UI shardBorgMaster

link shard

UI shard

Cell

Scheduler

borgcfg web browsers

scheduler

Borglet Borglet Borglet Borglet

BorgMaster

link shard

read/UI shard

Config file

persistent store (Paxos)

Binary

User view

What justhappened?

web browsers

BorgMaster

link shard

UI shardBorgMaster

link shard

UI shardBorgMaster

link shard

UI shardBorgMaster

link shard

UI shard

Cell

Scheduler

borgcfg web browsers

scheduler

Borglet Borglet Borglet Borglet

BorgMaster

link shard

read/UI shard

Config file

persistent store (Paxos)

Binary

User view

What justhappened?

web browsers

BorgMaster

link shard

UI shardBorgMaster

link shard

UI shardBorgMaster

link shard

UI shardBorgMaster

link shard

UI shard

Cell

Scheduler

borgcfg web browsers

scheduler

Borglet Borglet Borglet Borglet

BorgMaster

link shard

read/UI shard

Config file

persistent store (Paxos)

Binary

User view

What justhappened?

web browsers

BorgMaster

link shard

UI shardBorgMaster

link shard

UI shardBorgMaster

link shard

UI shardBorgMaster

link shard

UI shard

Cell

Scheduler

borgcfg web browsers

scheduler

Borglet Borglet Borglet Borglet

BorgMaster

link shard

read/UI shard

Config file

persistent store (Paxos)

Binary

User view

What justhappened?

web browsers

BorgMaster

link shard

UI shardBorgMaster

link shard

UI shardBorgMaster

link shard

UI shardBorgMaster

link shard

UI shard

Cell

Scheduler

borgcfg web browsers

scheduler

Borglet Borglet Borglet Borglet

BorgMaster

link shard

read/UI shard

Config file

persistent store (Paxos)

Binary

User view

What justhappened?

web browsers

BorgMaster

link shard

UI shardBorgMaster

link shard

UI shardBorgMaster

link shard

UI shardBorgMaster

link shard

UI shard

Cell

Scheduler

borgcfg web browsers

scheduler

Borglet Borglet Borglet Borglet

BorgMaster

link shard

read/UI shard

Config file

persistent store (Paxos)

Binary

User view

Hello world!

Hello world!

Hello world!

Hello world!Hello

world! Hello world! Hello

world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world!Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world! Hello

world!

Hello world!

Hello world!

Hello world!

Image by Connie Zhou

User view

Hello world!

Hello world!

Hello world! Hello

world!

Hello world! Hello

world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world! Hello

world!

Hello world! Hello

world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world! Hello

world!

Hello world! Hello

world!

Hello world!

Hello world!

task-eviction rates and causes

Failures

Images by Connie Zhou

A 2000-machine service will have >10 task exits per dayThis is not a problem: it's normal

Failures

Google Cloud Platform

Pets Cattle

Failures

Advanced bin-packing algorithms

Experimental placement of production VM workload, July 2014

Efficiency

stranded resourcesavailable resourcesone

machine

machine image locked into a

platform

Plain IaaS downsides:Not portable & Opaque

Hypervisor

Guest environment

app code

libraries

guest kernel

Efficiency

Plain IaaS downsides:No Isolation

Hypervisor

Guest environment

app code

libraries

guest kernel

dependency???app code

Efficiency

Plain IaaS downsides:Little Reuse

Hypervisor

Guest environment

app code

libraries

guest kernel

Guest environment

app code

libraries

guest kernel

Guest environment

app code

libraries

guest kernelredundant

Efficiency

Containers create a better abstraction layer

Hypervisor

Guest environment

app code

libraries

guest kernel

cut here

Efficiency

Node environment

Much better: Portable, isolated, static app environments

Hypervisor

node kernel

app code

libraries

app code

libraries

app code

libraries

container 1 container 2 container 3

Efficiency

Key observation from Google’s experience:

A datacenter is not a collection of computers,a datacenter is a computer.

Google Cloud Platform

Kubernetes

Greek for “Helmsman”; also the root of the words “governor” and “cybernetic”

• Runs and manages containers

• Inspired by Borg

• Runs on VMs and bare metal

• 100% Open source, written in Go

Run applications on clustersnot processes on machines

Google Cloud Platform

Kubernetes

Getting startedhttp://kubernetes.io/docs/hellonode/

Google Cloud Platform

Containers

Google confidential │ Do not distribute

Docker containers

● Isolation - OS-level virtualization features:○ cgroups - access to resources (CPU, RAM, I/O○ namespaces - isolated filesystems, networking○ Copy-on-Write file systems

● Packaging○ Repositories of images○ Overlay file systems

Google Cloud Platform

Pods

Google Cloud Platform

Pods

Small group of tightly coupled containers & volumes

The atom of scheduling & placement

Shared namespaces• shared IP address & localhost• shared IPC, etc.• shared volumes

Example: data puller & web server

ConsumersContent Manager

File Puller

Web Server

Volume

Pod

Google Cloud Platform

Labels & Selectors

Google Cloud Platform

Arbitrary metadata

Attached to any API object

Generally represent identity

Queryable by selectors• think SQL ‘select ... where ...’

Labels

Google Cloud Platform

App: MyApp

Phase: prod

Role: FE

App: MyApp

Phase: test

Role: FE

App: MyApp

Phase: prod

Role: BE

App: MyApp

Phase: test

Role: BE

Selectors

Google Cloud Platform

App: MyApp

Phase: prod

Role: FE

App: MyApp

Phase: test

Role: FE

App: MyApp

Phase: prod

Role: BE

App: MyApp

Phase: test

Role: BE

App = MyApp

Selectors

Google Cloud Platform

App: MyApp

Phase: prod

Role: FE

App: MyApp

Phase: test

Role: FE

App: MyApp

Phase: prod

Role: BE

App: MyApp

Phase: test

Role: BE

App = MyApp, Role = FE

Selectors

Google Cloud Platform

App: MyApp

Phase: prod

Role: FE

App: MyApp

Phase: test

Role: FE

App: MyApp

Phase: prod

Role: BE

App: MyApp

Phase: test

Role: BE

App = MyApp, Role = BE

Selectors

Google Cloud Platform

Selectors

App: MyApp

Phase: prod

Role: FE

App: MyApp

Phase: test

Role: FE

App: MyApp

Phase: prod

Role: BE

App: MyApp

Phase: test

Role: BE

App = MyApp, Phase = prod

Google Cloud Platform

App: MyApp

Phase: prod

Role: FE

App: MyApp

Phase: test

Role: FE

App: MyApp

Phase: prod

Role: BE

App: MyApp

Phase: test

Role: BE

App = MyApp, Phase = test

Selectors

Google Cloud Platform

ReplicationControllers

Google Cloud Platform

ReplicationControllers

A simple control loop

Has 1 job: ensure N copies of a pod• if too few, start some• if too many, kill some• grouped by a selector

Cleanly layered on top of the core• all access is by public APIs

Replicated pods are fungible• No implied order or identity

ReplicationController- name = “my-rc”- selector = {“App”: “MyApp”}- podTemplate = { ... }- replicas = 4

API Server

How many?

3

Start 1 more

OK

How many?

4

Google Cloud Platform

Services

Google Cloud Platform

Services

A group of pods that work together• grouped by a selector

Gets a stable virtual IP and port• sometimes called the service portal• also a DNS name

Hides complexity

Client

Virtual IP

Google Cloud Platform

Networking

Google Cloud Platform

Kubernetes networking

A: 172.16.1.1

3306

B: 172.16.1.2

80

9376

11878SNAT

SNATC: 172.16.1.1

8000

Google Cloud Platform

Kubernetes networking

A: 172.16.1.1

3306

B: 172.16.1.2

80

9376

11878SNAT

SNATC: 172.16.1.1

8000REJECTED

Google Cloud Platform

Kubernetes networking

10.1.1.0/24

10.1.1.1

10.1.1.2

10.1.2.0/2410.1.2.1

10.1.3.0/24

10.1.3.1

Design principles

Declarative > imperative: State your desired results, let the system actuate

Control loops: Observe, rectify, repeat

Network-centric: IP addresses are cheap

Cattle > Pets: Manage your workload in bulk

Borg contributors

Core: Abhishek Rai, Abhishek Verma, Andy Zheng, Ashwin Kumar, Ben Smith, Beng-Hong Lim, Bin Zhang, Bolu Szewczyk, Brad Strand, Brian Budge, Brian Grant, Brian Wickman, Chengdu Huang, Chris Colohan, Cliff Stein, Cynthia Wong, Daniel Smith, Dave Bort, David Oppenheimer, David Wall, Divyesh Shah, Dawn Chen, Eric Haugen, Eric Tune, Eric Wilcox, Ethan Solomita, Gaurav Dhiman, Geeta Chaudhry, Greg Roelofs, Grzegorz Czajkowski, James Eady, Jarek Kusmierek, Jaroslaw Przybylowicz, Jason Hickey, Javier Kohen, Jeff Dean, Jeremy Dion, Jeremy Lau, Jerzy Szczepkowski, Joe Hellerstein, John Wilkes, Jonathan Wilson, Joso Eterovic, Jutta Degener, Kai Backman, Kamil Yurtsever, Ken Ashcraft, Kenji Kaneda, Kevan Miller, Kurt Steinkraus, Leo Landa, Liza Fireman, Madhukar Korupolu, Maricia Scott, Mark Logan, Mark Vandevoorde, Markus Gutschke, Matt Sparks, Maya Haridasan, Michael Abd-El-Malek, Michael Kenniston, Ming-Yee Iu, Monika Henzinger, Mukesh Kumar, Nate Calvin, Onufry Wojtaszczyk, Olcan Sercinoglu, Paul Menage, Patrick Johnson, Pavanish Nirula, Pedro Valenzuela, Percy Liang, Piotr Witusowski, Praveen Kallakuri, Rafal Sokolowski, Rajmohan Rajaraman, Richard Gooch, Rishi Gosalia, Rob Radez, Robert Hagmann, Robert Jardine, Robert Kennedy, Rohit Jnagal, Roy Bryant, Rune Dahl, Scott Garriss, Scott Johnson, Sean Howarth, Sheena Madan, Smeeta Jalan, Stan Chesnutt, Temo Arobelidze, Tim Hockin, Todd Wang, Tomasz Blaszczyk, Tomasz Wozniak, Tomek Zielonka, Victor Marmol, Vish Kannan, Vrigo Gokhale, Walfredo Cirne, Walt Drummond, Weiran Liu, Xiaopan Zhang, Xiao Zhang, Ye Zhao, and Zohaib Maya.SRE: Adam Rogoyski, Alex Milivojevic, Anil Das, Cody Smith, Cooper Bethea, Folke Behrens, Matt Liggett, James Sanford, John Millikin, Matt Brown, Miki Habryn, Peter Dahl, Robert van Gent, Seppi Wilhelmi, Seth Hettich, Torsten Marek, and Viraj Alankar.BCL and borgcfg: Marcel van Lohuizen and Robert Griesemer.Reviewers: Christos Kozyrakis, Eric Brewer, Malte Schwarzkopf, and Tom Rodeheffer.

K8s is helping you with ● deployment automation● utilization optimization

Q&A

kubernetes.io/ignac@google.com

Ignacy Kowalczyk

EngineeringManager

Google