KubeCon EU 2016: A Practical Guide to Container Scheduling
-
Upload
kubeacademy -
Category
Technology
-
view
386 -
download
3
Transcript of KubeCon EU 2016: A Practical Guide to Container Scheduling
@tekgrrl #kubecon #kubernetes
web browsers
BorgMaster
link shard
UI shardBorgMaster
link shard
UI shardBorgMaster
link shard
UI shardBorgMaster
link shard
UI shard
Scheduler
borgcfg web browsers
scheduler
Borglet Borglet Borglet Borglet
Config file
BorgMaster
link shard
UI shard
persistent store (Paxos)
Binary
Cell Storage
@tekgrrl #kubecon #kubernetes
Developer View
job hello_world = {
runtime = { cell = 'ic' } // Cell (cluster) to run in
binary = '.../hello_world_webserver' // Program to run
args = { port = '%port%' } // Command line parameters
requirements = { // Resource requirements
ram = 100M
disk = 100M
cpu = 0.1
}
replicas = 5 // Number of tasks
}
10000
@tekgrrl #kubecon #kubernetes
Hello world!
Hello world!
Hello world!
Hello world!Hello
world! Hello world! Hello
world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world! Hello
world!
Hello world!
Hello world!
Hello world!
Image by Connie Zhou
Hello world!
Hello world!
Hello world! Hello
world!
Hello world! Hello
world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world! Hello
world!
Hello world! Hello
world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world! Hello
world!
Hello world! Hello
world!
Hello world!
Hello world!
7 @tekgrrl #kubecon #kubernetes
Developer View
Hello world!
“Internally, we don't use VMs - we just use containers to pack multiple tasks onto one machine, and stop them treading on one another.” - John Wilkes
10 @tekgrrl #kubecon #kubernetesImages by
Connie Zhou
A 2000-machine service will have >10 task exits per dayThis is not a problem: it's normal
11 @tekgrrl #kubecon #kubernetes
available resourcesone
machine
Efficiency
Advanced bin-packing algorithms
Experimental placement of production VM workload, July 2014
stranded resources
12 @tekgrrl #kubecon #kubernetes
Efficiency
Use
d CP
U
Use
d CP
U (i
n co
res)
Use
d M
emor
y
Use
d M
emor
y
Available Resources
Stranded Resources
Use
d CP
U (i
n co
res)
Use
d M
emor
y
13 @tekgrrl #kubecon #kubernetes
tasks per machine
Efficiency
Multiple applications per machine
CPI^2 paper, EuroSys 2013
Median
14 @tekgrrl #kubecon #kubernetes
web browsers
BorgMaster
link shard
UI shardBorgMaster
link shard
UI shardBorgMaster
link shard
UI shardBorgMaster
link shard
UI shard
Scheduler
borgcfg web browsers
scheduler
Cell
Config file
BorgMaster
link shard
UI shard
persistent store (Paxos)
Binary
Cell Storage
Efficiency
batch
Cells run both Prod and Non Prod tasks
batch
15 @tekgrrl #kubecon #kubernetes
Efficiency
Cell
Sharing Cells between prod/non-prod is Better
shared cell (original)
shared cell (compacted)
Cell
Non-Prod load (compacted)
Prod load (compacted)
Represents the overhead of running prod and non-prod in their own cells
16 @tekgrrl #kubecon #kubernetes
Resource reclamation
time
limit: amount of resource requested
usage: actual resource consumption
Efficiency
reservation: estimate of future usage
potentially reusable resources
17 @tekgrrl #kubecon #kubernetes
Resource reclamation could be more aggressive
Nov/Dec 2013
Efficiency
18 @tekgrrl #kubecon #kubernetes
Nov/Dec 2013
EfficiencyResource reclamation could be more aggressive
@tekgrrl #kubecon #kubernetes
K8s Master
API Server
Dash Board
scheduler
Kubelet Kubelet Kubelet Kubelet
Container Registry
etcdControllers
web browserskubectl web browsers
Config file
Image
@tekgrrl #kubecon #kubernetes
Kubernetes without a Scheduler
K8s Master
API Server
Dash Board
scheduler etcd
apiVersion: v1kind: Podmetadata: name: bursty-staticspec: containers: - name: nginx image: nginx ports: - containerPort: 80
Controllers
k8s-minion-xyz
Kubelet
k8s-minion-abc
Kubelet
k8s-minion-fig
Kubelet
k8s-minion-cat
Kubelet
@tekgrrl #kubecon #kubernetes
Kubernetes without a Scheduler
K8s Master
API Server
Dashboard
k8s-minion-xyz
poddy
Kubelet
k8s-minion-abc
Kubelet
k8s-minion-fig
Kubelet
k8s-minion-cat
Kubelet
etcd
apiVersion: v1kind: Podmetadata: name: poddyspec: nodeName: k8s-minion-xyz containers: - name: nginx image: nginx ports: - containerPort: 80
Controllers
@tekgrrl #kubecon #kubernetes
A Resource is something that can be requested, allocated, or consumed to/by a pod or a container
CPU: Specified in units of Cores, what that is depends on the provider
Memory: Specified in units of Bytes
CPU is Compressible (i.e. it has a rate and can be throttled)
Memory is Incompressible, it can’t be throttled
Kubernetes Resources
@tekgrrl #kubecon #kubernetes
Future Plans:
More Resources:
● Network Ops● Network Bandwidth● Storage● IOPS● Storage Time
Kubernetes Compute Unit (KCU)
Kubernetes Resources (contd)
@tekgrrl #kubecon #kubernetes
... spec: containers: - name: locust image: gcr.io/rabbit-skateboard/guestbook:gdg-rtv resources: requests: memory: "300Mi" cpu: "100m" limits: memory: "300Mi" cpu: "100m"
my-controller.yaml
Resource based Scheduling
@tekgrrl #kubecon #kubernetes
Resource based Scheduling (Work In Progress)
Provide QoS for Scheduled Pods
Per Container CPU and Memory requirements
Specified as Request and Limit
Future releases will [better] support:
● Best Effort (Request == 0)● Burstable ( Request < Limit)● Guaranteed (Request == Limit)
Best Effort Scheduling for low priority workloads improves Utilization at Google by 20%
@tekgrrl #kubecon #kubernetes
Scheduling Pods: Nodes
K8s Node
Kubelet
disk = ssd
Resources
LabelsDisks
Nodes may not be heterogeneous, they can differ in important ways:
● CPU and Memory Resources● Attached Disks● Specific Hardware
Location may also be important
@tekgrrl #kubecon #kubernetes
What CPU and Memory Resources does it need?
Can also be used as a measure of priority
Pod Scheduling: Identifying Potential Nodes
K8s Node
Kubelet Proxy
CPU
Mem
@tekgrrl #kubecon #kubernetes
What Resources does it need?
What Disk(s) does it need (GCE PD and EBS) and can it/they be mounted without conflict?
Note: 1.1 limits to
Pod Scheduling: Finding Potential Nodes
K8s Node
Kubelet Proxy
CPU
Mem
@tekgrrl #kubecon #kubernetes
What Resources does it need?
What Disk(s) does it need?
What node(s) can it run on (Node Selector)?
Pod Scheduling: Identifying Potential Nodes
K8s Node
Kubelet Proxy
CPU
Mem
disktype = ssd
kubectl label nodes node-3 disktype=ssd
(pod) spec: nodeSelector: disktype: ssd
@tekgrrl #kubecon #kubernetes
nodeAffinity (Alpha in 1.2)
{ "nodeAffinity": { "requiredDuringSchedulingIgnoredDuringExecution": { "nodeSelectorTerms": [ { "matchExpressions": [ { "key": "beta.kubernetes.io/instance-type", "operator": "In", "values": ["n1-highmem-2", "n1-highmem-4"] } ] } ] } } }
http://kubernetes.github.io/docs/user-guide/node-selection/
Implemented through Annotations in 1.2, through fields in 1.3
Can be ‘Required’ or ‘Preferred’ during scheduling
In future can can be ‘Required’ during execution (Node labels can change)
Will eventually replace NodeSelector
If you specify both nodeSelector and nodeAffinity, both must be satisfied
@tekgrrl #kubecon #kubernetes
Prefer node with most free resource left after the pod is deployed
Prefer nodes with the specified label
Minimise number of Pods from the same service on the same node
CPU and Memory is balanced after the Pod is deployed [Default]
Pod Scheduling: Ranking Potential Nodes
Node2
Node3
Node1
@tekgrrl #kubecon #kubernetes
Extending the Scheduler
1. Add rules to the scheduler and recompile
2. Run your own scheduler process instead of, or as well as, the Kubernetes scheduler
3. Implement a "scheduler extender" that the Kubernetes scheduler calls out to as a final pass when making scheduling decisions
@tekgrrl #kubecon #kubernetes
Admission Control
Admission Control enforces certain conditions, before a request is accepted by the API Server
AC functionality implemented as plugins which are executed in the sequence they are specified
AC is performed after AuthN checks
Enforcement usually results in either
● A Request denial● Mutation of the Request Resource● Mutation of related Resources
K8s Master
API Server
scheduler
ControllersAdm
issi
on C
ontro
l
@tekgrrl #kubecon #kubernetes
NamespaceLifecycle Enforces that a Namespace that is undergoing termination cannot have new objects created in it, and ensures that requests in a non-existant Namespace are rejected
LimitRanger Observes the incoming request and ensures that it does not violate any of the constraints enumerated in the LimitRange object in a Namespace
ServiceAccount Implements automation for serviceAccounts
ResourceQuota Observes the incoming request and ensures that it does not violate any of the constraints enumerated in the ResourceQuota object in a Namespace.
Default plug-ins in 1.2: --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,ResourceQuota,PersistentVolumeLabel
Admission Control Examples
@tekgrrl #kubecon #kubernetes
Mandy’s Canonical K8s deck: http://bit.ly/1oRMS0r
One little-o R M S Zero little-r
Setting Pod and CPU Limits
Runtime Constraints Example
Extending the Scheduler
Resource Model Design Doc (beyond 1.1)
Resources
@tekgrrl #kubecon #kubernetes
Kubernetes is Open SourceWe want your help!
http://kubernetes.io
https://github.com/kubernetes/kubernetes
Slack: #kubernetes-users
@kubernetesio