Ignacy Kowalczyk
Transcript of Ignacy Kowalczyk
Kubernetes: lessons learnt from cluster management at Google
Ignacy
Kowalczyk
Engineering
Manager
Image by Connie Zhou
For the past 15 years, Google has been building out the world’s fastest, most powerful, highest quality cloud infrastructure on the planet.
job hello_world = {
runtime = { cell = 'ic' } // Cell (data center) to run in
binary = '.../hello_world_webserver' // Program to run
args = { port = '%port%' } // Command line parameters
requirements = { // Resource requirements (optional)
ram = 100M
disk = 100M
cpu = 0.1
}
replicas = 10000 // Number of tasks
}
User view
Google Cloud PlatformImages by Connie
Zhou
Google has been developing and using containers to manage our applications for over 10 years.
Google Cloud Platform
Everything at Google runs in containers:
• Gmail, Web Search, Maps, ...• MapReduce, batch, ...• GFS, Colossus, ...• Even Google’s Cloud Platform:
our VMs run in containers!
We launch over 2 billion containers per week
What justhappened?
web browsers
BorgMaster
link shard
UI shardBorgMaster
link shard
UI shardBorgMaster
link shard
UI shardBorgMaster
link shard
UI shard
Cell
Scheduler
borgcfg web browsers
scheduler
Borglet Borglet Borglet Borglet
BorgMaster
link shard
read/UI shard
Config file
persistent store (Paxos)
Binary
User view
What justhappened?
web browsers
BorgMaster
link shard
UI shardBorgMaster
link shard
UI shardBorgMaster
link shard
UI shardBorgMaster
link shard
UI shard
Cell
Scheduler
borgcfg web browsers
scheduler
Borglet Borglet Borglet Borglet
BorgMaster
link shard
read/UI shard
Config file
persistent store (Paxos)
Binary
User view
What justhappened?
web browsers
BorgMaster
link shard
UI shardBorgMaster
link shard
UI shardBorgMaster
link shard
UI shardBorgMaster
link shard
UI shard
Cell
Scheduler
borgcfg web browsers
scheduler
Borglet Borglet Borglet Borglet
BorgMaster
link shard
read/UI shard
Config file
persistent store (Paxos)
Binary
User view
What justhappened?
web browsers
BorgMaster
link shard
UI shardBorgMaster
link shard
UI shardBorgMaster
link shard
UI shardBorgMaster
link shard
UI shard
Cell
Scheduler
borgcfg web browsers
scheduler
Borglet Borglet Borglet Borglet
BorgMaster
link shard
read/UI shard
Config file
persistent store (Paxos)
Binary
User view
What justhappened?
web browsers
BorgMaster
link shard
UI shardBorgMaster
link shard
UI shardBorgMaster
link shard
UI shardBorgMaster
link shard
UI shard
Cell
Scheduler
borgcfg web browsers
scheduler
Borglet Borglet Borglet Borglet
BorgMaster
link shard
read/UI shard
Config file
persistent store (Paxos)
Binary
User view
What justhappened?
web browsers
BorgMaster
link shard
UI shardBorgMaster
link shard
UI shardBorgMaster
link shard
UI shardBorgMaster
link shard
UI shard
Cell
Scheduler
borgcfg web browsers
scheduler
Borglet Borglet Borglet Borglet
BorgMaster
link shard
read/UI shard
Config file
persistent store (Paxos)
Binary
User view
Hello world!
Hello world!
Hello world!
Hello world!Hello
world! Hello world! Hello
world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world! Hello
world!
Hello world!
Hello world!
Hello world!
Image by Connie Zhou
User view
Hello world!
Hello world!
Hello world! Hello
world!
Hello world! Hello
world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world! Hello
world!
Hello world! Hello
world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world! Hello
world!
Hello world! Hello
world!
Hello world!
Hello world!
task-eviction rates and causes
Failures
Images by Connie Zhou
A 2000-machine service will have >10 task exits per dayThis is not a problem: it's normal
Failures
Google Cloud Platform
Pets Cattle
Failures
Advanced bin-packing algorithms
Experimental placement of production VM workload, July 2014
Efficiency
stranded resourcesavailable resourcesone
machine
machine image locked into a
platform
Plain IaaS downsides:Not portable & Opaque
Hypervisor
Guest environment
app code
libraries
guest kernel
Efficiency
Plain IaaS downsides:No Isolation
Hypervisor
Guest environment
app code
libraries
guest kernel
dependency???app code
Efficiency
Plain IaaS downsides:Little Reuse
Hypervisor
Guest environment
app code
libraries
guest kernel
Guest environment
app code
libraries
guest kernel
Guest environment
app code
libraries
guest kernelredundant
Efficiency
Containers create a better abstraction layer
Hypervisor
Guest environment
app code
libraries
guest kernel
cut here
Efficiency
Node environment
Much better: Portable, isolated, static app environments
Hypervisor
node kernel
app code
libraries
app code
libraries
app code
libraries
container 1 container 2 container 3
Efficiency
Key observation from Google’s experience:
A datacenter is not a collection of computers,a datacenter is a computer.
Google Cloud Platform
Kubernetes
Greek for “Helmsman”; also the root of the words “governor” and “cybernetic”
• Runs and manages containers
• Inspired by Borg
• Runs on VMs and bare metal
• 100% Open source, written in Go
Run applications on clustersnot processes on machines
Google Cloud Platform
Kubernetes
Getting startedhttp://kubernetes.io/docs/hellonode/
Google Cloud Platform
Containers
Google confidential │ Do not distribute
Docker containers
● Isolation - OS-level virtualization features:○ cgroups - access to resources (CPU, RAM, I/O○ namespaces - isolated filesystems, networking○ Copy-on-Write file systems
● Packaging○ Repositories of images○ Overlay file systems
Google Cloud Platform
Pods
Google Cloud Platform
Pods
Small group of tightly coupled containers & volumes
The atom of scheduling & placement
Shared namespaces• shared IP address & localhost• shared IPC, etc.• shared volumes
Example: data puller & web server
ConsumersContent Manager
File Puller
Web Server
Volume
Pod
Google Cloud Platform
Labels & Selectors
Google Cloud Platform
Arbitrary metadata
Attached to any API object
Generally represent identity
Queryable by selectors• think SQL ‘select ... where ...’
Labels
Google Cloud Platform
App: MyApp
Phase: prod
Role: FE
App: MyApp
Phase: test
Role: FE
App: MyApp
Phase: prod
Role: BE
App: MyApp
Phase: test
Role: BE
Selectors
Google Cloud Platform
App: MyApp
Phase: prod
Role: FE
App: MyApp
Phase: test
Role: FE
App: MyApp
Phase: prod
Role: BE
App: MyApp
Phase: test
Role: BE
App = MyApp
Selectors
Google Cloud Platform
App: MyApp
Phase: prod
Role: FE
App: MyApp
Phase: test
Role: FE
App: MyApp
Phase: prod
Role: BE
App: MyApp
Phase: test
Role: BE
App = MyApp, Role = FE
Selectors
Google Cloud Platform
App: MyApp
Phase: prod
Role: FE
App: MyApp
Phase: test
Role: FE
App: MyApp
Phase: prod
Role: BE
App: MyApp
Phase: test
Role: BE
App = MyApp, Role = BE
Selectors
Google Cloud Platform
Selectors
App: MyApp
Phase: prod
Role: FE
App: MyApp
Phase: test
Role: FE
App: MyApp
Phase: prod
Role: BE
App: MyApp
Phase: test
Role: BE
App = MyApp, Phase = prod
Google Cloud Platform
App: MyApp
Phase: prod
Role: FE
App: MyApp
Phase: test
Role: FE
App: MyApp
Phase: prod
Role: BE
App: MyApp
Phase: test
Role: BE
App = MyApp, Phase = test
Selectors
Google Cloud Platform
ReplicationControllers
Google Cloud Platform
ReplicationControllers
A simple control loop
Has 1 job: ensure N copies of a pod• if too few, start some• if too many, kill some• grouped by a selector
Cleanly layered on top of the core• all access is by public APIs
Replicated pods are fungible• No implied order or identity
ReplicationController- name = “my-rc”- selector = {“App”: “MyApp”}- podTemplate = { ... }- replicas = 4
API Server
How many?
3
Start 1 more
OK
How many?
4
Google Cloud Platform
Services
Google Cloud Platform
Services
A group of pods that work together• grouped by a selector
Gets a stable virtual IP and port• sometimes called the service portal• also a DNS name
Hides complexity
Client
Virtual IP
Google Cloud Platform
Networking
Google Cloud Platform
Kubernetes networking
A: 172.16.1.1
3306
B: 172.16.1.2
80
9376
11878SNAT
SNATC: 172.16.1.1
8000
Google Cloud Platform
Kubernetes networking
A: 172.16.1.1
3306
B: 172.16.1.2
80
9376
11878SNAT
SNATC: 172.16.1.1
8000REJECTED
Google Cloud Platform
Kubernetes networking
10.1.1.0/24
10.1.1.1
10.1.1.2
10.1.2.0/2410.1.2.1
10.1.3.0/24
10.1.3.1
Design principles
Declarative > imperative: State your desired results, let the system actuate
Control loops: Observe, rectify, repeat
Network-centric: IP addresses are cheap
Cattle > Pets: Manage your workload in bulk
Borg contributors
Core: Abhishek Rai, Abhishek Verma, Andy Zheng, Ashwin Kumar, Ben Smith, Beng-Hong Lim, Bin Zhang, Bolu Szewczyk, Brad Strand, Brian Budge, Brian Grant, Brian Wickman, Chengdu Huang, Chris Colohan, Cliff Stein, Cynthia Wong, Daniel Smith, Dave Bort, David Oppenheimer, David Wall, Divyesh Shah, Dawn Chen, Eric Haugen, Eric Tune, Eric Wilcox, Ethan Solomita, Gaurav Dhiman, Geeta Chaudhry, Greg Roelofs, Grzegorz Czajkowski, James Eady, Jarek Kusmierek, Jaroslaw Przybylowicz, Jason Hickey, Javier Kohen, Jeff Dean, Jeremy Dion, Jeremy Lau, Jerzy Szczepkowski, Joe Hellerstein, John Wilkes, Jonathan Wilson, Joso Eterovic, Jutta Degener, Kai Backman, Kamil Yurtsever, Ken Ashcraft, Kenji Kaneda, Kevan Miller, Kurt Steinkraus, Leo Landa, Liza Fireman, Madhukar Korupolu, Maricia Scott, Mark Logan, Mark Vandevoorde, Markus Gutschke, Matt Sparks, Maya Haridasan, Michael Abd-El-Malek, Michael Kenniston, Ming-Yee Iu, Monika Henzinger, Mukesh Kumar, Nate Calvin, Onufry Wojtaszczyk, Olcan Sercinoglu, Paul Menage, Patrick Johnson, Pavanish Nirula, Pedro Valenzuela, Percy Liang, Piotr Witusowski, Praveen Kallakuri, Rafal Sokolowski, Rajmohan Rajaraman, Richard Gooch, Rishi Gosalia, Rob Radez, Robert Hagmann, Robert Jardine, Robert Kennedy, Rohit Jnagal, Roy Bryant, Rune Dahl, Scott Garriss, Scott Johnson, Sean Howarth, Sheena Madan, Smeeta Jalan, Stan Chesnutt, Temo Arobelidze, Tim Hockin, Todd Wang, Tomasz Blaszczyk, Tomasz Wozniak, Tomek Zielonka, Victor Marmol, Vish Kannan, Vrigo Gokhale, Walfredo Cirne, Walt Drummond, Weiran Liu, Xiaopan Zhang, Xiao Zhang, Ye Zhao, and Zohaib Maya.SRE: Adam Rogoyski, Alex Milivojevic, Anil Das, Cody Smith, Cooper Bethea, Folke Behrens, Matt Liggett, James Sanford, John Millikin, Matt Brown, Miki Habryn, Peter Dahl, Robert van Gent, Seppi Wilhelmi, Seth Hettich, Torsten Marek, and Viraj Alankar.BCL and borgcfg: Marcel van Lohuizen and Robert Griesemer.Reviewers: Christos Kozyrakis, Eric Brewer, Malte Schwarzkopf, and Tom Rodeheffer.
K8s is helping you with ● deployment automation● utilization optimization
Q&A
kubernetes.io/[email protected]
Ignacy Kowalczyk
EngineeringManager