Inside Docker for Fedora20/RHEL7
-
Upload
etsuji-nakai -
Category
Technology
-
view
5.213 -
download
1
Transcript of Inside Docker for Fedora20/RHEL7
Inside Docker for Fedora20/RHEL7
ver1.8e Etsuji NakaiTwitter @enakai00
Open Cloud Campus
Inside Docker for Fedora20/RHEL7
Open Cloud Campus2
Inside Docker for Fedora20/RHEL7
$ who am i
–The author of “Professional Linux Systems” series.• Available only in Japanese (some are in Korean taranslation.)• Translation offering from publishers are welcomed ;-)
Self-study LinuxDeploy and Manage by yourself
Professional Linux SystemsDeployment and Management
Professional Linux SystemsNetwork Management
Etsuji Nakai–Senior solution architect and
cloud evangelist at Red Hat.
Professional Linux SystemsTechnology for Next Decade
New OpenStack bookis in store now!
Open Cloud Campus3
Inside Docker for Fedora20/RHEL7
Contents
What is Linux Container Device Mapper Thin-Provisioning Network Namespace systemd and cgroups
(*) Contents of this document is based on Fedora20 with docker-io-1.0.0-1.fc20.x86_64
Inside Docker for Fedora20/RHEL7
What is Linux Container
Open Cloud Campus5
Inside Docker for Fedora20/RHEL7
Traditional server virtualization
Physical machine
Physical machine
ホスト OS
Hypervisor(Kernel Module)
VirtualMachine
GuestOS
VMware vSphere, Xen, etc.
Linux KVM
Hardware assisted virtualization(Hypervisor is embedded in firmware.)
Software assisted virtualization(Hypervisor is installed on physical machine.)
Software assisted virtualization(Host OS provides the hypervisor feature.)
Physical machine
OS
Baremetal
Traditional "server virtualization" is a technology to create software emulated "virtual machines" hosting various guest operating systems.
Hypervisor (Software)
Physical machine
Hypervisor (Firmware)
VirtualMachine
GuestOS
VirtualMachine
GuestOS
VirtualMachine
GuestOS
VirtualMachine
GuestOS
VirtualMachine
GuestOS
VirtualMachine
GuestOS
VirtualMachine
GuestOS
Open Cloud Campus6
Inside Docker for Fedora20/RHEL7
"Linux Container" is a Linux kernel feature to contain a group of processes in an independent execution environment called a container.
Linux kernel provides an independent apllication execution environment for each container which includes:– Independent filesystem.– Independent network interface and IP address.– Usage limit for memory and CPU time.
You can use containers on Linux virtual machines in addition to baremetal servers since the container can co-exist with the traditional server virtualization technology.
Linux Kernel
Use
r Pr
oces
s
・・・
Physical MachinePhysical Machine
OS
ContainerBaremetal
Use
r Pr
oces
s
Use
r Pr
oces
s
User Space
Linux Kernel
Use
r Pr
oces
s
Use
r Pr
oces
s
User Space
Use
r Pr
oces
s
Use
r Pr
oces
s
User Space
・・・
What is container technology?
Container
Open Cloud Campus7
Inside Docker for Fedora20/RHEL7
Container supports separation of various resources. They are internally realized with different technologies called "namespace."– Filesystem separation → Mount namespace (kernel 2.4.19) – Hostname separation → UTS namespace (kernel 2.6.19)– IPC separtion → IPC namespece (kernel 2.6.19)– User (UID/GID) separation → User namespace (kernel 2.6.23〜kernel 3.8)– Processtable separation → PID namespace (kernel 2.6.24) – Network separtion → Network Namepsace (kernel 2.6.24)– Usage limit of CPU/Memory → Control groups
(*) Reference: "Namespaces in operation, part 1: namespaces overview"• http://lwn.net/Articles/531114/
Linux container is realized by integrating these namespace features. There are multiple container management tools such as lxctools, libvirt and docker. They may use different parts of these features.
Under the hood
Open Cloud Campus8
Inside Docker for Fedora20/RHEL7
Processes in all containers are executed on the same Linux kernel. But inside a container, you can see processes only in the container.– This is because each container has its own process table. On host linux, which is outside
containers, you can see all processes icnluding ones in containers.
Resource separation / Process tables
# ps -efUID PID PPID C STIME TTY TIME CMDroot 1 0 0 09:49 ? 00:00:00 /bin/sh /usr/local/bin/init.shroot 35 1 0 09:49 ? 00:00:00 /usr/sbin/sshdroot 47 1 0 09:49 ? 00:00:00 /usr/sbin/httpdapache 49 47 0 09:49 ? 00:00:00 /usr/sbin/httpdapache 50 47 0 09:49 ? 00:00:00 /usr/sbin/httpd...apache 56 47 0 09:49 ? 00:00:00 /usr/sbin/httpdroot 57 1 0 09:49 ? 00:00:00 /bin/bash
# ps -efUID PID PPID C STIME TTY TIME CMD...root 802 1 0 18:10 ? 00:01:20 /usr/bin/docker -d --selinux-enabled -H fd://...root 3687 802 0 18:49 pts/2 00:00:00 /bin/sh /usr/local/bin/init.shroot 3736 3687 0 18:49 ? 00:00:00 /usr/sbin/sshdroot 3748 3687 0 18:49 ? 00:00:00 /usr/sbin/httpd48 3750 3748 0 18:49 ? 00:00:00 /usr/sbin/httpd...48 3757 3748 0 18:49 ? 00:00:00 /usr/sbin/httpdroot 3758 3687 0 18:49 pts/2 00:00:00 /bin/bash
Processes seen inside container
Processes seen outside container
Open Cloud Campus9
Inside Docker for Fedora20/RHEL7
Resource separation / Process tables (cont.)
fork/exec
sshd
PID namespace
In the example of previous page, docker daemon fork/exec-ed the initial process "init.sh" and put it in a new "PID namespace." After that, all processes fork/exec-ed from init.sh are put in the same namespace.– Inside container, the initial process has PID=1 independently from the host. Likewise, child
processes of it have independent PID's.– Since Docer1.0 doesn't support UID namespace, the same UID/GID's are used as the host even in
the container. User/group names could be different because /etc/passwd is different in the containter.• Reference:"Docker 1.0 and user namespaces"
https://groups.google.com/forum/#!topic/docker-dev/MoIDYDF3suY
PID=1
bash
/bin/sh /usr/local/bin/init.sh
httpd
httpd
・・・
#!/bin/sh
service sshd startservice httpd startwhile [[ true ]]; do /bin/bashdone
init.sh
docker daemon
Open Cloud Campus10
Inside Docker for Fedora20/RHEL7
Resource separation / Filesystem
A specific directory on the host is bind mounted as a root directory of the container. Inside container, that directory is seen as a root directory, very similar mechanism to the "chroot jail."
When using traditional container management tools such as lxctools or libvirt, you need to prepare the directory contents by hand.– You can put minimam contants for a specific application such as application
bianaries and shared libraries in the directory.– It's also possible to copy a whole root filesystem of a specific linux distribution to
the directory.– If necessary, special filesystems such as /dev, /proc and /sys are mounted in the
container by the management tool.
Mount namespace
/ |--etc |--bin |--sbin...
/export/container01/rootfs/ |--etc |--bin |--sbin ...
bind mount
Open Cloud Campus11
Inside Docker for Fedora20/RHEL7
Resource separation / Filesystem (cont.)
Docker provides the original disk image management system which mounts the specified image on the host and make it the root filesystem of the container.
# df -aFilesystem 1K-blocks Used Available Use% Mounted onrootfs 10190136 169036 9480428 2% //dev/mapper/docker-252:3-130516-d798a41bcba1dbe621bf2dd87de0f9c6dd9f9c8aadb79f84e0170 5ee82f364c6 10190136 169036 9480428 2% /proc 0 0 0 - /procsysfs 0 0 0 - /systmpfs 1025136 0 1025136 0% /devshm 65536 0 65536 0% /dev/shmdevpts 0 0 0 - /dev/pts/dev/vda3 14226800 3013432 10467640 23% /.dockerinit/dev/vda3 14226800 3013432 10467640 23% /etc/resolv.conf/dev/vda3 14226800 3013432 10467640 23% /etc/hostname/dev/vda3 14226800 3013432 10467640 23% /etc/hostsdevpts 0 0 0 - /dev/console...
# dfFilesystem 1K-blocks Used Available Use% Mounted on.../dev/dm-2 10190136 169036 9480428 2% /var/lib/docker/devicemapper/mnt/d798a41bcba1dbe621bf2dd87de0f9c6dd9f9c8aadb79f84e01705ee82f364c6
Filesystemseen in a container
Specified disk imagemounted on the host
Disk image mountedon the host.
Some files are separatelybind-mounted.
Open Cloud Campus12
Inside Docker for Fedora20/RHEL7
Network namespace
Resource separation / Network
Container uses Linux's "veth" device for network communication.– veth is a pair of logical NIC devices connected through a (virtual) crossover cable.
One side of the veth pair is placed in a container's network namespace so that it can be seen only inside the container. The other side is connected to a Linux bridge on the host.– A device name in the container is renamed such as "eth0." By means of the namespace, network
settings such as IP address, routing table and iptables are independently configured in the container。
– The connection between the bridge and a physical network is up to the host configuration.
Host LinuxvethXX
eth0
docker0
eth0
IP masquerade
Physical network
Docker creates a bridge "docker0" and packets from containers are forwarded with IP masquerade.– Packets from the physical network targeted to specified
ports are forwarded to the container using the port forwarding feature of iptables.
172.17.42.1
Open Cloud Campus13
Inside Docker for Fedora20/RHEL7
Resource separation / CPU and Memory
Processes inside container recognize all physical memory and CPU cores. But allocation is restricted with Linux's controll groups (cgroups).– In theory, fine grained allocation controll including number of CPU cores, CPU time quota and I/O
bandwidth is possible.
Docker uses systemd's unit mechanism to manage the group of processes in the container.– When creating a container, Docker asks systemd to create a new unit to start the initial process.
As a result, all processes fork/exec-ed from the initial process belong to the same unit. At the same time, systemd creates a new cgroups' group for the unit.
# systemd-cgls...└─system.slice ├─docker-cc08291a81556ba55f049e50fd2c04287b04c6cf657a8a9971ef42468a2befa7.scope │ ├─7444 nginx: master process ngin │ ├─7458 nginx: worker proces │ ├─7459 nginx: worker proces │ ├─7460 nginx: worker proces │ └─7461 nginx: worker proces...
"docker-<Container ID>.scope" isthe cgroups' group name
Inside Docker for Fedora20/RHEL7
Device Mapper Thin-Provisioning
Open Cloud Campus15
Inside Docker for Fedora20/RHEL7
Device Mapper is a Linux's virtual filesystems mechanism to create a logical device which provides additional features on top of physical block devices. This is done through a wrapper of software modules. Typical moduldes are:– dm-raid : add a software RAID feature– dm-multipath : add a multipath access to LUN's– dm-crypt : add an encryption feature– dm-delay : add an access delay emulation feature
What is Device Mapper?
/dev/sda /dev/sdb
/dev/dm1
Mirroring
dm-raid
/dev/sda
/dev/dm1
dm-crypt
Encryption/Decryption
/dev/sda
/dev/dm1
dm-delay
Access delay
Open Cloud Campus16
Inside Docker for Fedora20/RHEL7
Device Mapper Thin-Provisioning (dm-thin) is a relatively new module which provides "thin-provisioning" and "snapshot" features similar to commercial storage appliances.
dm-thin uses two block devices, one is for "block pool" and the others is for "metadata device."– Fixed size blocks are dynamically allocated to logical devices so that blocks are consumed only
when data are actually written.– Pointers from segments of logical devices to blocks in the block pool are stored in the metadata
device.– CoW (Copy on Write) snapshots are created by allowing pointing to the same block from
different logical devices. You can create multi-generation snapshots with this mecanism.
What is Device Mapper Thin-Provisioning?
Block PoolMetadata
Device
Pointers from segments of logical devicesto block in the pool are stored.
Logical device #001 Logical device #002 Logical device #003
Open Cloud Campus17
Inside Docker for Fedora20/RHEL7
On recent Linux distributions, you can use dm-thin through LVM interface as below.– First, create a volume group as usual.
– Then, define a "thin pool". It creates LV's for block pool and metadata in the background.
Using dm-thin through LVM interface
# fallocate -l $((1024*1024*1024)) pooldev.img# losetup -f pooldev.img # losetup -a/dev/loop0: [64768]:39781720 (/root/pooldev.img)# pvcreate /dev/loop0# vgcreate vg_data /dev/loop0
# lvcreate -L 900M -T vg_data/thinpool Logical volume "lvol1" created Logical volume "thinpool" created# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert... lvol0 vg_data -wi------- 4.00m thinpool vg_data twi-a-tz-- 900.00m 0.00
LV: thinpool LV: lvol1
VG: vg_data
Block pool Metadata device
Logical devicevol00
Logical devicevol01
・・・
Open Cloud Campus18
Inside Docker for Fedora20/RHEL7
– Define a new logical device specifying its logical size with -V option.
– Create a snapshot with the following command.
– Snapshots are inactive by default for the sake of data protection. You can use it after activating with the following command.
Using dm-thin through LVM interface (cont.)
# lvcreate -V 100G -T vg_data/thinpool -n vol00 Logical volume "vol00" created# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert... lvol0 vg_data -wi------- 4.00m thinpool vg_data twi-a-tz-- 900.00m 0.00 vol00 vg_data Vwi-a-tz-- 100.00g thinpool 0.00
# lvcreate -s --name vol01 vg_data/vol00 Logical volume "vol01" created# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert... lvol0 vg_data -wi------- 4.00m thinpool vg_data twi-a-tz-- 900.00m 0.00 vol00 vg_data Vwi-a-tz-- 100.00g thinpool 0.00 vol01 vg_data Vwi---tz-k 100.00g thinpool vol00
# lvchange -K -ay /dev/vg_data/vol01
Open Cloud Campus19
Inside Docker for Fedora20/RHEL7
Docker has a plugin mechanism for image management drivers and "Device Mapper driver" is used in Fedora20/RHEL7. It stores each image in a logical device of "Device Mapper Thin Provisioning (dm-thin)."– When starting a new container, a snapshot of the specified image is attached to the container.– When storing the image with "docker commit", it creates a new snapshot of the snapshot. You'd
better stop the container with "docker stop" before executing "docker commit."
Use of Thin Provisioning in Docker
Local image Snapshot
Create a snapshotwhen starting a container.
×run
commit
rm
Processes
Snapshot
stop
start
Local image
When a container is sopped,all processes in it are stopped.
(The snapshot image is not deleted.)
When a container is removed,the associated snapshot is deleted.Save a new local image by taking
a snapshot of the snapshot.
Open Cloud Campus20
Inside Docker for Fedora20/RHEL7
Docker uses the native dm interface of dm-thin module instead of LVM interface.– When a docker service is launched, it loop-mounts the following "data" and "meadata" disk image
file, and create a block pool with them.
How Docker uses Device Mapper Thin-Provisioning?
# ls -lh /var/lib/docker/devicemapper/devicemapper/total 1.2G-rw-------. 1 root root 100G May 11 21:37 data-rw-------. 1 root root 2.0G May 11 22:05 metadata
# losetup NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE/dev/loop0 0 0 1 0 /var/lib/docker/devicemapper/devicemapper/data/dev/loop1 0 0 1 0 /var/lib/docker/devicemapper/devicemapper/metadata
# lsblkNAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT...loop0 7:0 0 100G 0 loop └─docker-252:3-130516-pool 253:0 0 100G 0 dm loop1 7:1 0 2G 0 loop └─docker-252:3-130516-pool 253:0 0 100G 0 dm
Block pool device
Metadata device
Open Cloud Campus21
Inside Docker for Fedora20/RHEL7
Configuration data of logical devices are stored in the following JSON files.– /var/lib/docker/devicemapper/metadata/<Image ID>
– The logical device with device ID "0" has a special role. It is created with 10GB size when Docker service is started for the first time. Docker initializes it as an empty ext4 filesystem.
– When you downloads images from an external registory, snapshots of thie device are used to store those images. Therefore, all logical devices have the same 10GB size and ext4 filesystem.
How Docker uses Device Mapper Thin-Provisioning? (cont.)
# docker images enakai/httpdREPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZEenakai/httpd ver1.0 d3d92adfcafb 36 hours ago 206.6 MB
# cat /var/lib/docker/devicemapper/metadata/d3d92adfcafb* | python -mjson.tool{ "device_id": 72, "initialized": false, "size": 10737418240, "transaction_id": 99}
# cat /var/lib/docker/devicemapper/metadata/base | python -mjson.tool{ "device_id": 0, "initialized": true, "size": 10737418240, "transaction_id": 1}
Open Cloud Campus22
Inside Docker for Fedora20/RHEL7
As a sort of hacking technique, you can mount disk image contents by hand, using dmsetup command to interact with dm-thin module.– At first, using the commands in the previous page, check the "deivce_id" and "size" of the disk
image you want to mount. In addition, check the name of thin pool with the following command. It's "docker-252:3-130516-pool" in this example.
– For the sake of simplicity, set these values in shell variables.
Manipulating image contents by hand
# lsblkNAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT...loop0 7:0 0 100G 0 loop └─docker-252:3-130516-pool 253:0 0 100G 0 dm loop1 7:1 0 2G 0 loop └─docker-252:3-130516-pool 253:0 0 100G 0 dm
# device_id=72# size=10737418240# pool=docker-252:3-130516-pool
Open Cloud Campus23
Inside Docker for Fedora20/RHEL7
– Activate and mount the logical device with the following command. Under "rootfs" is the root filesystem seen from a container.
– Finally, unmount and deactivate the logical device.
(*) Modifying the contents of images is not a supported procedure of Docker. You should do it at you own risk as it may damage the image.
– Reference: https://www.kernel.org/doc/Documentation/device-mapper/thin-provisioning.txt
Manipulating image contents by hand (cont.)
# dmsetup create myvol --table "0 $(($size / 512)) thin /dev/mapper/$pool $device_id"# lsblk...loop0 7:0 0 100G 0 loop └─docker-252:3-130516-pool 253:0 0 100G 0 dm └─myvol 253:1 0 10G 0 dm loop1 7:1 0 2G 0 loop └─docker-252:3-130516-pool 253:0 0 100G 0 dm └─myvol 253:1 0 10G 0 dm # mount /dev/mapper/myvol /mnt# ls /mntid lost+found rootfs# cat /mnt/rootfs/var/www/html/index.html Hello, World!
# umount /mnt# dmsetup remove myvol
Inside Docker for Fedora20/RHEL7
Network Namespace
Open Cloud Campus25
Inside Docker for Fedora20/RHEL7
Network namespace
Network configuration in Docker
Container's logical NIC "eth0" is connected to a Linux bridge "docker0." Communication between container and external network is controlled with iptables on the host.– Packets from a container is forwarded with IP
masquerade.– Packets from external network to specified ports are
forwarded to a container with iptables' port forward feature.
Host LinuxvethXX
eth0
docker0
eth0
IP Masquerade
172.17.42.1
As an example, starting a container with portforwarding from 8000 to 80, and from 2222 to 22.
– The one end of a veth pair is connected to the bridge "docker0."
# docker run -itd -p 8000:80 -p 2222:22 enakai/httpd:ver1.0a7838c84cd008161086839379e4a0be2d0e109e02c779229cde49f53b79ae1d5
# brctl showbridge name bridge id STP enabled interfacesdocker0 8000.56847afe9799 no veth66c0
# ifconfig docker0docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.17.42.1 netmask 255.255.0.0 broadcast 0.0.0.0...
Open Cloud Campus26
Inside Docker for Fedora20/RHEL7
Network configuration in Docker (cont.)
– nat table of iptables is configured as below.
① Packets from an external network are processed in DOCKER chain for port forwarding.② Packets from localhost to localhost's IP address (except "127.0.0.0/8") are processed in DOCKER chain, too.③ Packets from a container to an external network are forwarded with IP masquerade.④⑤ Portforwading configuration specified with "docker run".
– I'm not sure why "127.0.0.0/8" is excluded in ②. But anyway, packets to "127.0.0.0/8" are processed appropriately because... (see next page.)
# iptables-save # Generated by iptables-save v1.4.19.1 on Fri Jun 13 22:36:14 2014*nat...-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER-A POSTROUTING -s 172.17.0.0/16 ! -d 172.17.0.0/16 -j MASQUERADE-A DOCKER ! -i docker0 -p tcp -m tcp --dport 2222 -j DNAT --to-destination 172.17.0.23:22-A DOCKER ! -i docker0 -p tcp -m tcp --dport 8000 -j DNAT --to-destination 172.17.0.23:80COMMIT
①②③④⑤
Open Cloud Campus27
Inside Docker for Fedora20/RHEL7
Network configuration in Docker (cont.)
– Docker daemon provides the port forward proxy feature, and packets which are not processed with iptables are handled with this.
– Originally, the feature is prepared for hosts without iptables. I'm not sure why packets to "127.0.0.0/8" are selectively handled with this.
# lsof -i -PCOMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME...docker 20003 root 11u IPv6 177010 0t0 TCP *:2222 (LISTEN)docker 20003 root 12u IPv6 178468 0t0 TCP *:8000 (LISTEN)...
Open Cloud Campus28
Inside Docker for Fedora20/RHEL7
Network namespace manipulation
As a sort of hacking technique, you can directly manipulate network namespaces. Without Docker, you would use network namespaces in the following steps.– Define a new namespace.– Add network configuration in the namespace such as logical NIC, IP address, routing table and
iptables.– Launch processes in the namespace.
You can use "ip netns" command to manipulate network namespaces. But you need some additional operations to manipulate network namespaces created by Docker.– Find a PID of one of the processes in the container.
– There is a sysmlink to the descripter to manipulate the namespace in /proc filesystem of this process.
# systemd-cgls...└─system.slice ├─docker-61151db106a7fd6d5cf937a03eac0e9b33c7799d3d48b6cddc83070839afeea9.scop │ ├─502 /bin/sh /usr/local/bin/init.sh │ ├─545 /usr/sbin/sshd │ ├─557 /usr/sbin/httpd...
# ls -l /proc/502/ns/net lrwxrwxrwx 1 root root 0 June 13 22:52 /proc/502/ns/net -> net:[4026532255]
Open Cloud Campus29
Inside Docker for Fedora20/RHEL7
Network namespace manipulation (cont.)
– By creating a symlink under /var/run/netns/ to the descriptor, ip command recognizes the namespace.
– From this point, you can execute any commands inside the namespace "foo-ns."
– For example, by starting bash inside the namespace, you can see the network configuration in the container. But configurations other than network is the same as host since you switched only the network namespace.
# mkdir /var/run/netns# ln -s /proc/502/ns/net /var/run/netns/foo-ns# ip netnsfoo-ns
# ip netns exec foo-ns <command>
# ip netns exec foo-ns bash# ifconfig eth0eth0: flags=67<UP,BROADCAST,RUNNING> mtu 1500 inet 172.17.0.2 netmask 255.255.0.0 broadcast 0.0.0.0...# route -nKernel IP routing tableDestination Gateway Genmask Flags Metric Ref Use Iface0.0.0.0 172.17.42.1 0.0.0.0 UG 0 0 0 eth0172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0# exit
# ip netns exec foo-ns <command>
Open Cloud Campus30
Inside Docker for Fedora20/RHEL7
Adding more logical NIC's
With the hacking technique of "ip netns", you can add logical NIC's after starting a new container. The following is an example of adding a logical NIC which connects to the physical network through a bridge "br0." (This is not a supported operation of Docker.)– Create a bridge "br0" and move the IP address (192.168.200.20/24 in this case) of physical NIC
to the bridge.
# brctl addbr br0; ip link set br0 up# ip addr del 192.168.200.20/24 dev eth0; ip addr add 192.168.200.20/24 broadcast 192.168.200.255 dev br0; brctl addif br0 eth0; route add default gw 192.168.200.1# echo 'NM_CONTROLLED="no"' >> /etc/sysconfig/network-scripts/ifcfg-eth0# systemctl enable network.service
Host LinuxvethXX
eth0
Container
docker0IP Masquerade
External network
vethYY
eth1
br0
192.168.200.99
192.168.200.20
192.168.200.20eth0
(*) You should understand what you're doing with these commands. It may disable the network connection if you made a mistake.
Open Cloud Campus31
Inside Docker for Fedora20/RHEL7
Adding more logical NIC's (cont.)
– Create a veth pair "veth-host / veth-guest", and attach "veth-host" to the bridge br0.
# ip link add name veth-host type veth peer name veth-guest# ip link set veth-guest down# brctl addif br0 veth-host# brctl show br0bridge name bridge id STP enabled interfacesbr0 8000.525400677470 no eth0
veth-host
Host LinuxvethXX
eth0
Container
docker0IP Masquerade
External network
veth-host
veth-guest
br0
eth0
• At this point, both veth-host and veth-guest are visible on the host, not in the container.
Open Cloud Campus32
Inside Docker for Fedora20/RHEL7
Adding more logical NIC's (cont.)
– Add veth-guest to the container's namespace. At this point, veth-guest becomes invisible on the host.
– From this point, you can use "ip netns exec" to make additional network configurations in the container. The following is to rename the logical NIC to "eth0" and add an IP address. In addition, modifying routing table to make eth1 as a default gateway.
# ip link set veth-guest netns foo-ns# ifconfig veth-guestveth-guest: error fetching interface information: Device not found
# ip netns exec foo-ns ip link set veth-guest name eth1# ip netns exec foo-ns ip addr add 192.168.200.99/24 dev eth1# ip netns exec foo-ns ip link set eth1 up# ip netns exec foo-ns ip route delete default# ip netns exec foo-ns ip route add default via 192.168.200.1
Open Cloud Campus33
Inside Docker for Fedora20/RHEL7
Adding more logical NIC's (cont.)
– Login to the container and check the network configuration inside container.
– Now you can directly access the container without port forwarding.
– You can remove the symlink in /var/run/netns once you finished the configuration.
By the way, there is a shell script to automate this procedure....– jpetazzo/pipework– https://github.com/jpetazzo/pipework
# ssh enakai@localhost -p 2222 $ ifconfig eth1eth1 Link encap:Ethernet HWaddr BE:53:16:06:BF:3A inet addr:192.168.200.99 Bcast:0.0.0.0 Mask:255.255.255.0...$ route -nKernel IP routing tableDestination Gateway Genmask Flags Metric Ref Use Iface0.0.0.0 192.168.200.1 0.0.0.0 UG 0 0 0 eth1172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0192.168.200.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
$ curl http://192.168.200.99:80Hello, World!
# rm /var/run/netns/foo-ns
Inside Docker for Fedora20/RHEL7
systemd and cgroups
Open Cloud Campus35
Inside Docker for Fedora20/RHEL7
Basics of systemd and cgroups
Refer to the following slides for systemd basics.– Your first dive into systemd
• http://www.slideshare.net/enakai/systemd-study-v14e
Especially, you need to understand how systemd manages cgroups in conjunction with units.– systemd defines various "units" corresponding to services and daemons.– When systemd starts a service as a unit, it dynamically creates cgroups' group for that unit. All
processes of the service is place under this group.– If You specify "CPUShares" and "MemoryLimit" in the unit's configuration file, they are
translated to the corresponding cgroups settings. (CPUShares specifies relative weight of CPU time allocation, and "MemoryLimit" specifies the upper limit of memory usage.)
Open Cloud Campus36
Inside Docker for Fedora20/RHEL7
Basics of systemd and cgroups (cont.)
You can check the cgroups status managed by systemd with the following command.
# systemd-cgls├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 23├─user.slice│ └─user-0.slice│ ├─session-1.scope│ │ ├─439 sshd: root@pts/0 │ │ ├─444 -bash│ │ ├─464 systemd-cgls│ │ └─465 systemd-cgls│ └─[email protected]│ ├─441 /usr/lib/systemd/systemd --user│ └─442 (sd-pam) └─system.slice ├─polkit.service │ └─352 /usr/lib/polkit-1/polkitd --no-debug ├─auditd.service │ └─301 /sbin/auditd -n ├─systemd-udevd.service │ └─248 /usr/lib/systemd/systemd-udevd...
Open Cloud Campus37
Inside Docker for Fedora20/RHEL7
How Docker works with systemd?
When starting a container, Docker asks systemd to create a new unit to start the initial process. – As a result, all processes fork/exec-ed from the initial process belong to the same unit and
placed under the same cgroups' group. The unit name is "docker-<container ID>.scope".
# docker run -td -p 8000:80 -p 2222:22 enakai/httpd:ver1.0# systemd-cgls -a...└─system.slice ├─var-lib-docker-devicemapper-mnt-a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.mount ├─docker-a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope │ ├─496 /bin/sh /usr/local/bin/init.sh │ ├─538 /usr/sbin/sshd │ ├─550 /usr/sbin/httpd │ ├─552 /bin/bash │ ├─553 /usr/sbin/httpd │ ├─554 /usr/sbin/httpd │ ├─555 /usr/sbin/httpd │ ├─556 /usr/sbin/httpd │ ├─557 /usr/sbin/httpd │ ├─558 /usr/sbin/httpd │ ├─559 /usr/sbin/httpd │ └─560 /usr/sbin/httpd...
Open Cloud Campus38
Inside Docker for Fedora20/RHEL7
How Docker works with systemd?
– You can check the unit status corresponding to a container.
# unitname=docker-a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope# systemctl status $unitnamedocker-a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope - docker container a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b Loaded: loaded (/run/systemd/system/docker-a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope; static) Drop-In: /run/systemd/system/docker-a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope.d └─90-BlockIOAccounting.conf, 90-CPUAccounting.conf, 90-Description.conf, 90-MemoryAccounting.conf, 90-Slice.conf Active: active (running) since 金 2014-06-13 23:05:27 JST; 1min 41s ago CGroup: /system.slice/docker-a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a47f7c4bc8b37e3b488b.scope ├─496 /bin/sh /usr/local/bin/init.sh ├─538 /usr/sbin/sshd ├─550 /usr/sbin/httpd ├─552 /bin/bash ├─553 /usr/sbin/httpd ├─554 /usr/sbin/httpd ├─555 /usr/sbin/httpd... └─560 /usr/sbin/httpd
6月 13 23:05:27 fedora20 systemd[1]: Started docker container a985fc6dbe8dfc6335474ae68291ad3c51cddcbc28c1a...488b.Hint: Some lines were ellipsized, use -l to show in full.
Open Cloud Campus39
Inside Docker for Fedora20/RHEL7
How Docker works with systemd? (cont.)
– There are "-c" and "-m" options for "docker run" command. They are translated to the unit's configuration parameter "CPUShares" and "MemoryLimit".
– After starting a container, you can change these parameters through systemd's interface.
Systemd will be more integrated with cgroups in the future. After that, additional resource control (CPU pinning, CPU quota, I/O bandwidth) may be added to Docker.
# systemctl show $unitname | grep -E "(CPUShares=|MemoryLimit=)"CPUShares=1024MemoryLimit=18446744073709551615
# systemctl set-property $unitname CPUShares=512 --runtime# systemctl show $unitname | grep -E "(CPUShares=|MemoryLimit=)"CPUShares=512MemoryLimit=18446744073709551615
Inside Docker for Fedora20/RHEL7
Etsuji NakaiTwitter @enakai00
Open Cloud Campus
Let's learn the up-to-datetechnology with Fedora/RHEL