あいち健康プラザ あいち健康の森健康科学総合セン …Created Date 4/18/2013 11:35:01 AM
孙健波 namespace cgroups docker...
Transcript of 孙健波 namespace cgroups docker...
Agenda• Namespace
• ipc、uts、pid、network、mount、user
• Cgroup
• what are cgroups?
• usage、concepts、implementation……
What is Namespace ?
• Lightweight Process virtualization
• Isolation:Enable a process (or several processes) to have different views of the system than other processes.
hostname… IPC
network stack filesystem
PID1,PID2,…. uid,gid,capabilities…
hostname……
IPC
network stack
filesystem
PID1,PID2,….uid,gid,capabilities…
namespacesThere are currently 6 namespaces: uts (hostname)ipc (System V IPC)net (network stack)mnt (mount points, filesystems) pid (processes)user (UIDs)
namespace
clone()
process
new process
new namespace
• creates a new process and a new namespace
clone()
unshare()
• creates a new namespace
• attaches the current process to it
namespace
process
new namespace
unshare()
process
UTS namespacestruct task_struct
……
*nsproxy
struct nsproxy
……
*uts_ns
*mnt_ns
*net_ns
*pid_ns
*ipc_ns
struct uts_namespace
ceenodename
sysname
release
version
machinestatic inline struct new_utsname *utsname(void){ return ¤t->nsproxy->uts_ns->name; }
SYSCALL_DEFINE2(gethostname, char __user *, name, int, len){ struct new_utsname *u; ... u = utsname(); if (copy_to_user(name, u->nodename, i)) errno = -EFAULT; ... }
Network namespace• logically another copy of the network stack
• use pipe to create veth pair to communicate
container namespace A
container namespace B
eth0 eth0
Bridge: docker0
veth veth
Physical Network Device
Host
Mount namespace
/bin
mount namespace
/lib /proc /root
/bin
child namespace
master
slave
/lib
share
share
/proc
private
private
unbindable
another namespace/bin
share
share
• First namespace in history
• Default to create a new copy instead of point to root namespace
PID namespace• Same PID in
different namespace
• can be nested up to 32 levels
• PID 1 = init process
• child reaping
• ignore SIGKILL
User namespace
• Will be supported by Docker in future.
• Docker in Yarn
• ……
normal useruser namespace (privileged user)
pid network
uts ipc
mount
What are cgroups ?
• Control Groups provide a mechanism for aggregating/partitioning sets of tasks, and all their future children, into hierarchical groups with specialized behaviour.
Usage of cgroups• Resource Limitation:groups can be set to not
exceed a configured limit
• Prioritization:some groups may get a larger share of CPU utilization or disk I/O throughput
• Accounting:measures how much resources certain systems use
• Control :freezing the groups of processes, their checkpointing and restarting
Concepts• cgroup – a group of tasks with shared characteristics
• subsystem – a module that applies parameters to cgroups to control them in particular ways, typically for resource management
• hierarchy – a set of cgroups organized in a hierarchical tree, plus one or more subsystems associated with that tree
• VFS -> API
Cgroups—Example
/cgroup
/cgroup/memlimits
(memory subsystem mount point & hierarchy)
/cgroup/cpulimits
(cpuset subsystem mount point & hierarchy)
/cgroup/memlimits/student
memory.limit=1G tasks=1,2,3,4,5
/cgroup/memlimits/teacher
memory.limit=2G tasks= 6,7,8
/cgroup/cpulimits/student
cpuset.cpus=0-1 tasks=1,2,3,4,5
/cgroup/cpulimits/teacher
cpuset.cpus=0-3 tasks= 6,7,8
two hierarchy
Parameters—Examples• cpuset subsystem
• cpuset.cpus: defines the set of cpus that the tasks in the cgroup are allowed to execute on
• echo “0-2” > /cgroup/cpuset/lab2/cpuset.cpus
• memory subsystem
• memory.limit_in_bytes: sets the maximum amount of user memory
• echo 1G > /cgroup/memory/lab1/memory.limit_in_bytes
Current subsystems used by Docker
• cpuset – controls access to individual CPUs and memory nodes by a cgroup
• cpu – schedules CPU access to cgroups
• cpuacct – reports CPU resource usage by a cgroup
• memory – controls access to memory resources and reports memory resource usage by a cgroup
• devices – controls access to devices by a cgroup; e.g., gpus
• freezer – suspends and resumes tasks in a cgroup
• blkio – tracks I/O ownership, allowing control of access to block I/O resources
cgroups hookstask_struct
……
css_set *cgroups
list_head cg_list
……
css_set
……
hlist_node hlist
list_head tasks
list_head cg_links
cgroup_subsys_state *subsys[]
……
cg_cgroup_link
list_head cgrp_link_list
cgroup *cgrp
list_head cg_link_list
css_set *cg
cgroup
……
cgroup_subsys_state *subsys[]
list_head css_sets
cgroupfs_root *root
……cgroup_subsys_
state……
cgroup *cgroup……
cgroupfs_root
……
int hierarchy_id
list_head root_list
list_head subsys_list
……
cgroup_subsys
func create
func destroy
func attach
func forkfunc exit
……
int subsys_id
cgroupfs_root *root
list_head sibling
……
css_set_table
……
css_set_hash()
cpuset
……freezer
……blkio_cgroup
…………
……
References
• http://lwn.net/Articles/531114/
• https://www.kernel.org/doc/Documentation/cgroups
• https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/