Seven problems of Linux Containers

32
parallels.com || openvz.org || criu.org Seven Problems of Linux Containers Kir Kolyshkin <[email protected] > 28 April 2013 LinuxFest Northwest

description

OpenVZ, which has turned 7 recently, is an implementation of lightweight virtualization technology for Linux, something which is also referred to as LXC or just containers. The talk gives an insight into 7 different problems with containers and how they were solved. While most of these problems and solutions belongs in the Linux kernel, kernel knowledge is not expected from the audience.

Transcript of Seven problems of Linux Containers

Page 1: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Seven Problemsof Linux Containers

Kir Kolyshkin<[email protected]>

28 April 2013 LinuxFest Northwest

Page 2: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Seventy Seven Problemsof Linux Containers

Kir Kolyshkin<[email protected]>

28 April 2013 LinuxFest Northwest

(of which I am going to cover six)

Page 3: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Problem 1: Effective virtualization

● Virtualization is partitioning● Historical way: $M mainframes● Modern way: virtual machines● Problem: performance overhead● Partial solution: hardware support

(Intel VT, AMD V)

Page 4: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution: isolation

● Run many isolated userspace instanceson top of sone single (Linux) kernel

● All processes see each other– files, process information, network,

shared memory, users, etc.● Make them unsee it!

Page 5: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Page 6: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

One historical way to unsee

chroot()

Page 7: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Namespaces

● Implemented in the Linux kernel– PID– net– IPC– UTS– mnt– user

● clone() with CLONE_NEW* flags

Page 8: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Problem 2: Shared resources

● All containers share the same set of resources (CPU, RAM, disk, various kernel things ...)

● Need fair distribution of goods so everyone gets their share

● Need DoS prevention● Need prioritization

– “All animals are equal, but some animals are more equal than others” -- George Orwell

Page 9: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Page 10: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution: OpenVZ resource controls

● OpenVZ:– user beancounters

● controls 20 parameters– hierarchical CPU scheduler– disk quota per containers– I/O priorities per-container

● Dynamic control, can “resize” runtime

Page 11: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution: cgroups

● Cgroups is a mechanism to control resources per hierarchical groups of processes

● Cgroups is nothing without controllers:– blkio, cpu, cpuacct, cpuset, devices, freezer,

memory, net_cls, net_prio● Cgroups are orthogonal to namespaces● Still a work in progress (kernel memory)

Page 12: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Problem 3: easy resources

● User Beancounters are complicated:– http://wiki.openvz.org/UBC_consistency_check– user has to set all these parameters– some of which are interdependent

● We created a collection of valid configs,● ... wrote a whole book about UBC● ... and a set of tools to help

Page 13: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Page 14: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution: VSwap

● Only two primary parameters: RAM and swap– others still exist, but no longer required to set

● Swap is virtual, no actual I/O is performed● Slow down to emulate real swap● Only when actual global RAM shortage

occurs,virtual swap goes into the real swap

● Currently only available in OpenVZ kernel

Page 15: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Problem 4: fast live migration

● We can migrate an OpenVZ containerfrom one physical server to anotherwithout a shutdown

● We want to do it fast even for huge containers– huge disk: use shared storage– huge RAM: ???

Page 16: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Normal migration process

● (Assuming shared storage)● 1 Freeze the container● 2 Dump its complete state to a dump file● 3 Copy dump file to destination server● 4 Undump● 5 Unfreeze● Problem: huge dump file

Page 17: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution 1: network swap

● 1 Dump the minimal memory, lock the rest● 2 Restore the minimal memory,

mark the rest as swapped out● 3 Set up network swap from the source● 4 Unfreeze. Missing RAM will be “swapped in”● 5 Migrate the rest of RAM and kill it on source

Page 18: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Page 19: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution 1: network swap

● 1 Dump the minimal memory, lock the rest● 2 Copy, undump what we have,

mark the rest as swapped out● 3 Set up network swap served from the source● 4 Unfreeze. Missing RAM will be “swapped in”● 5 Migrate the rest of RAM and kill it on source● PROBLEM? Reliability, no way to rollback

Page 20: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution 2: Iterative RAM migration

● 1 Ask kernel to track modified pages● 2 Copy all memory to destination system● 3 Ask kernel for list of modified pages● 4 Copy those pages● 5 GOTO 3 until satisfied● 6 Freeze and do migration as usual

Page 21: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Problem 5: upstreaming

● OpenVZ was developed separately● Then we wanted to merge it upstream

(i.e. to vanilla Linux kernel)● Problem?

Page 22: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Page 23: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Problem 5: upstreaming

● OpenVZ was developed separately● Then we wanted to merge it upstream

(i.e. to vanilla Linux kernel)● Problem:● upstream devs are not accepting our work

Page 24: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution 1: rewrite from scratch

● User Beancounters -> CGroups● Did 2 rewrites for PID namespace

until it finally got accepted● Network namespace redone● It works!● about 1500 patches got landed to vanilla● II Parallels made it to top10 contributors

Page 25: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution 2: CRIU

● We tried hard to merge checkpoint/restore● Other people tried hard too, no luck● Can't make it to the kernel, let's go userspace● With minimal kernel intervention when

required● Kernel exports most of information already, so

let's just add missing bits and pieces

Page 26: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

CRIU

● Checkpoint / Restore (mostly) In Userspace

Tools currently at version 0.4● Will do 1.0 release this year● Kernel 3.8 has about 120 patches from us

– 95% of needed features are there● Memory snapshot recently made it to -mm tree

Page 27: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Page 28: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Problem 6: common file system

● Container is just a directory on host,all CTs reside on the same FS

● File system journal is a bottleneck● Lots of small-size files I/O on CT backup● No sub-tree disk quota support in upstream● No per-container snapshots● Live migration: rsync -- changed inodes● File system type and properties are fixed

Page 29: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution 1: LVM

● Only works only on top of block device● Hard to manage (e.g. how to migrate huge

volume?)● No dynamic allocation● Complicated management

Page 30: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution 2: loop device

● VFS operations leads to double page-caching– (already fixed in the recent kernels)

● No dynamic allocation, max space is used● Limited feature set

Page 31: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Solution 3: ploop

● Basic idea: same as loop, just better● Modular design:

– various image formats (qcow2 in TODO)– various I/O backends

● More features:– live resize– instant live snapshots– write tracker to help in live migration

Page 32: Seven problems of Linux Containers

parallels.com || openvz.org || criu.org

Any problems questions?

[email protected]● Twitter: @kolyshkin