Federation of Kubernetes Clusters (a.k.a. "Ubernetes") - KubeCon 2015 slides - Quinton Hoole

Post on 12-Jan-2017

2.139 views 1 download

Transcript of Federation of Kubernetes Clusters (a.k.a. "Ubernetes") - KubeCon 2015 slides - Quinton Hoole

Federation of Kubernetes Clusters ("Übernetes")Kubecon 2015

Quinton Hoole <quinton@google.com>Staff Software Engineer - Googlequinton_hoole@github

Google has beeeg data centers...... but you know that already.

Images by Connie Zhou

But we also have rather a lot of them...

Treating these differently can have benefits...

UI

CLI

API

Control Plane Servers

Kubernetes

Users

containerscontainers

containerscontainers

containers

containerscontainers

containerscontainers

containers

containerscontainers

containerscontainers

containers

Cluster / Data Center / Availability Zone

UI

All you really care about?

API Containers

UI

CLI

API

Control Plane Clusters

Übernetes

API

Users

Kubernetes on

Kubernetes on

Kubernetes on Premise

Federation

Why is this interesting?

Reason 1: High Availability

• Cloud providers have outages, yes, but...• Has one of your application software

upgrades ever gone terribly wrong?• How about infrastructure upgrades

(auth systems? quota? data store?)• How about a fat-fingered config

change?• There are several interesting variants:

• Multiple availability zones?• Multiple cloud providers?

Cross-cluster Load Balancer

Your paying

customer

Cluster 1

Cluster 2

Cluster 3

Reason 2: Application Migration

• Migrating applications between clusters is tedious and error-prone if done manually• Much like software upgrades, you

*can* script them, but (K)ubernetes just does it quicker/safer/better.• Now with rollback too!

• On-premise ↔ Cloud• Amazon ↔ Google :-)• ...

Ubernetes

UI

On-Premise Cluster In-Cloud Cluster

Migrate: On Premise→Cloud

Different Cloud Provider

Reason 3: Policy Enforcement

• Some data must be stored and processed within specified political jurisdictions, by law.

• Some software/data must be on premise and air-gapped, by company policy.

• Some business units get to use the expensive gear, some don't.

• Auditing is also a big deal, so funnelling all operations through a central control point makes this easier.

Ubernetes

UI

U.S. Cloud Cluster E.U Cloud Cluster

On-premise Cluster

Reason 4: Vendor Lock-in Avoidance

• Make it easy to migrate applications between cloud providers.

• Run the same app on multiple cloud providers and choose the best one for your:• workload characteristics• budget• performance requirements• availability requirements

Ubernetes

UI

Kubernetes on GCE Kubernetes on AWS

Kubernetes On-Premise

Reason 5: Capacity Overflow

• Make intelligent placement decisions • Utilization• Cost• Performance Ubernetes

User

On Premise Cluster

Other Cloud Provider

Preferred Cloud Provider

Run my stuff

"OK, I'm sold. Where's the catch?"

Provider 1

Zone A

Zone B

Federation comes with some challenges...

Provider 2Zone C

Provider 1

Zone D

● Different bandwidth charges/latency/through-put/reliability

● Different service discovery (but DNS!)

● Consolidated monitoring & alerting

Cross-cluster load balancing

• Geographically aware DNS gets clients to the "closest" healthy cluster.

• Standard Kubernetes service load balancing within each cluster.

• New L7 LB's available soon.• Can be extended to divert traffic away from

"healthy-but-saturated" clusters.

Cross-cluster service discovery

• DNS + Kubernetes cluster-local service discovery.

• Can default to cluster-local with failover to remote clusters.

Location affinity

• Strictly coupled pods/applications• High bandwidth requirements• Low latency requirements• High fidelity requirements• Cannot easily span clusters

• Loosely coupled• Opposite of above• Relatively easily distributed across

clusters• Preferentially coupled

• Strongly coupled but can be migrated piecemeal.

Cross-cluster monitoring and auditing...

• "Cluster per tab" might suffice for small numbers of clusters

• Some monitoring solutions provide stronger integration and global summarization

Cluster Federation - The Implementation...

API Compatible with Kubernetes

• Less new stuff to learn• Can learn incrementally, as you

need new functionality.• Analogous argument applies to

existing automation systems (PAAS etc). • These can be ported to

Ubernetes relatively easily.• All Kubernetes entities are

"federatable".

Ubernetes or Kubernetes

Client

Applications

Applications

Applications

Run my stuff

State and control resides in underlying clusters (for the most part)

• Better scalability• Kubernetes scales with

number of nodes per cluster (<10,000)

• Ubernetes scales with number of clusters (~100)

• Beter fault isolation• Kubernetes clusters fail

independently of Ubernetes

Kubernetes Cluster Kubernetes Cluster

Ubernetes

API

APIRepl. Ctrl etcState

API

APIRepl. Ctrl etcState

API

APIRepl. Ctrl etcState

• Drive current state -> desired state• But per-cluster state, not per node,

per pod etc.

• Observed state is the truth

Recurring pattern in the system

Examples: • ReplicationController• Service

observe

diff

act

Similar Control loops to Kubernetes

Modularity

Loose coupling is a goal everywhere• simpler• composable• extensible

Code-level plugins where possible

Multi-process where possible

Isolate risk by interchangeable parts

Examples:• MigrationController• Scheduler

Federation status & plans

Federation Lite (single cluster, multiple zones)• In alpha Q4 2015• Productionized ~Q1 2016

Federation Proper (multiple clusters, federated)• Alpha Q1 2016

Google Container Engine (GKE)• hosted Federation too• GKE Federation Lite ~Q1-Q2 2016

PaaSes and Distros• RedHat OpenShift, CoreOS Tectonic, RedHat Atomic...• ... watch this space...

I want more!

• Requirements doc - comments welcome• tinyurl.com/ubernetesv2

• Special interest group• groups.google.com/forum/kubernetes-sig-federation

• quinton@google.com• quinton_hoole@github

Kubernetes Cluster Kubernetes Cluster

Ubernetes

API

APIRepl. Ctrl etcState

API

APIRepl. Ctrl etcState

API

APIRepl. Ctrl etcState