Dockerizing the Hard Services: Neutron and Nova

Post on 19-Jan-2017

140 views 0 download

Transcript of Dockerizing the Hard Services: Neutron and Nova

Dockerizing the Hard Services:Neutron & Nova

Clayton O’Neill<clayton.oneill@charter.com>

Overview● What magic incantations are needed to run these services at all?● How to prevent HA router failover on service restarts● How to prevent network namespaces from breaking everything● Bonus: How are network namespaces are related to Cinder!?

Previously Seen On...● “Deploying OpenStack Using Docker in Production”

• Why use Docker?• How do we deploy OpenStack with Docker?• Pain Points

● Video: https://youtu.be/3pc85InNR20● Puppet module:

https://github.com/twc-openstack/puppet-os_docker

Docker & OpenStack @ Charter● Docker in production in July 2015● Running in Docker in production:

• Cinder, Designate, Glance, Heat, Keystone, Nova & Neutron● Using Ceph & Solidfire for volume storage● Using VXLAN tenant networks with HA routers● Using Docker 1.12

/var/run vs /run

Overview● What magic incantations are needed to run these services at

all?● How to prevent HA router failover on service restarts● How to prevent network namespaces from breaking everything● Bonus: How are network namespaces are related to Cinder!?

How to Run Neutron OVS Agent in Dockerdocker run --net host --privileged-v /run/openvswitch:/run/openvswitch -v /lib/modules:/lib/modules:ro -v /etc/neutron:/etc/neutron:ro -v /var/log/neutron:/var/log/neutron -v /var/lib/neutron:/var/lib/neutron -v /run/lock/neutron:/run/lock/neutron -v /run/neutron:/run/neutronmy-docker-registry:5000/cirrus/neutron:7.0.4-120-g1a1224a.19.7a17221/usr/bin/neutron-openvswitch-agent --config-file=/etc/neutron/neutron.conf --config-file=/etc/neutron/plugins/ml2/ml2_conf.ini --config-file=/etc/neutron/plugins/ml2/openvswitch_agent.ini

How to Run Nova Compute in Dockerdocker run --net host --privileged -e OS_DOCKER_GROUP_DIR=/etc/nova/groups-e OS_DOCKER_HOME_DIR=/var/lib/nova -v /etc/nova:/etc/nova:ro -v /etc/ceph:/etc/ceph:ro -v /etc/iscsi:/etc/iscsi -v /dev:/dev -v /etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro -v /lib/modules:/lib/modules:ro -v /run/libvirt:/run/libvirt -v /run/openvswitch:/run/openvswitch -v /run/lock:/run/lock-v /var/log/nova:/var/log/nova -v /var/lib/nova:/var/lib/nova -v /var/lib/libvirt:/var/lib/libvirtmy-docker-registry:5000/cirrus/nova:12.0.4-2-gc55aacf.19.0522b22/usr/bin/nova-compute

Docker “--net host”● The “--net host” flag turns off Docker networking● Slightly faster...● Nova and Neutron both interact directly with host networking

Docker “--privileged”● Similar to giving container root privileges● Needed for iptables, mount, etc● Neutron & Nova still run as unprivileged user● Root-wrap is used to execute privileged commands● More fine grained options now (--cap-add & --cap-drop)

Neutron OVS “Data Volume” Mounts● Docker volumes used expose the host filesystem into container● -v /etc/neutron:/etc/neutron:ro

• Allows read-only access to config files● -v /var/log/neutron:/var/log/neutron

• Allows writing log files● -v /run/openvswitch:/run/openvswitch

• Allows communicating with OVS & OVSDB via control socket• Allows OVS commands to work inside container

● /var/lib/neutron, /run/lock/neutron, /run/neutron• Mounted to store state outside of the container

Nova Compute “Data Volume” Mounts● /etc/ceph - Ceph keys (read-only)● /etc/iscsi - iSCSI (Solidfire configuration)● /dev - iSCSI mount validation● /etc/ssh/ssh_known_hosts - Live migration (read only)● /run/openvswitch- OVS/OVSDB control● /run/libvirt - Libvirt control socket

Overview● What magic incantations are needed to run these services at all?● How to prevent HA router failover on service restarts● How to prevent network namespaces from breaking everything● Bonus: How are network namespaces are related to Cinder!?

How Do Neutron HA Routers Work Anyway?● Keepalived is used to provide HA feature for virtual routers● Keepalived sends heartbeats two network nodes● Keepalived failover if no heartbeats heard● Keepalived failover on shutdown● IP failover interrupts data path traffic

• Not instantaneous• NAT/Firewall mappings lost

The Problem with HA Routers and Docker● L3-agent spawns Keepalived as a child process● L3-agent in the container means Keepalived also inside● Keepalived lifetime tied to container lifetime● Container restarts lead to router failovers!● L3-agent rolling restart causes all routers to failover!● DHCP-agent has the same issue with DNSMasq

Stabilizing Keepalived & DNSMasq● Separate Keepalived and DNSMasq from the agents● Keepalived and DNSMasq in separate containers● Start Keepalived/DNSMasq containers from inside a container!● L3/DHCP agent restarts don’t affect Keepalived/DNSMasq

Enable Docker Inside Agent Containers● L3-Agent and DHCP-Agent need to start Docker containers● Need access to Docker Engine Socket

• Socket provides API access to the Docker Engine• -v /var/run/docker.sock:/var/run/docker.sock

● Need Docker client inside container● Docker client API version has to match Docker Engine API version

• https://docs.docker.com/engine/reference/api/docker_remote_api/● Neutron user *cannot* access Docker Engine

How to Get Neutron to Cooperate?● Intercept keepalived and DNSMasq invocations

• Place keepalived/dnsmasq wrappers in /usr/local/bin● Update rootwrap filters● Add ‘--pid host’ to ‘docker run’

• Agents need to see keepalived/dnsmasq process ids

Upgrade to Docker 1.12● Docker 1.12 allows engine restarts w/o container restarts● Initially mounted /var/run/docker.sock into containers● Worked until we restarted Docker Engine● Socket inside the container pointed to old socket● Reconfigured Docker Engine for second socket

• /var/run/docker-sock/docker.sock● Updated container to mount /var/run/docker-sock● Updated scripts to use socket in new location

Neutron HA Routers And Docker – It Works!# docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Command}}'NAMES IMAGE COMMANDneutron-metadata-agent neutron:7.0.4-120-g1a1224a.19.7a17221 "/venv/wrapper.sh /us"neutron-l3-agent neutron:7.0.4-120-g1a1224a.19.7a17221 "/venv/wrapper.sh /us"neutron-openvswitch-agent neutron:7.0.4-120-g1a1224a.19.7a17221 "/venv/wrapper.sh /us"keepalived-035063f3-e480-4ce6-9e16-087d862ca0c1 openstack-dev:20160731.0-11002e.25.70d62e6 "ip netns exec qroute"keepalived-9718ffd2-5125-4894-88f0-da93a6cf451d openstack-dev:20160731.0-11002e.25.70d62e6 "ip netns exec qroute"keepalived-488b09fc-f17b-4db3-b3ab-46d54533d291 openstack-dev:20160731.0-11002e.25.70d62e6 "ip netns exec qroute"keepalived-f0b84d2d-dc61-4179-b3fd-7a8966801d8d openstack-dev:20160731.0-11002e.25.70d62e6 "ip netns exec qroute"keepalived-e86e466f-ddb4-4fa7-92d9-87ff71a6ed6c openstack-dev:20160731.0-11002e.25.70d62e6 "ip netns exec qroute"keepalived-66a30a15-df7c-4cde-be52-20a1d8cc772a openstack-dev:20160731.0-11002e.25.70d62e6 "ip netns exec qroute"keepalived-4e120804-959d-45c9-8c9a-8376351508d2 openstack-dev:20160731.0-11002e.25.70d62e6 "ip netns exec qroute"keepalived-c589785a-5594-46dc-b726-b499025f1c80 openstack-dev:20160731.0-11002e.25.70d62e6 "ip netns exec qroute"keepalived-34fb33ea-fee0-4fb8-addc-32ed27b4b02c openstack-dev:20160731.0-11002e.25.70d62e6 "ip netns exec qroute"keepalived-48529201-aeab-4f17-9d4f-1ea2a631311c openstack-dev:20160731.0-11002e.25.70d62e6 "ip netns exec qroute"keepalived-d9dcb6b1-9dc6-49b8-86e7-2ef7a97fedbe openstack-dev:20160731.0-11002e.25.70d62e6 "ip netns exec qroute"

Overview● What magic incantations are needed to run these services at all?● How to prevent HA router failover on service restarts● How to prevent network namespaces from breaking

everything● Bonus: How are network namespaces are related to Cinder!?

Network Namespace Magic● What is a network namespace?● How do network namespaces work?● How can Dockerized services break namespaces?

What Is a Network Namespace?From ip-netns(8): A network namespace is logically another copy of the network stack, with its own routes, firewall rules, and network devices.

By default a process inherits its network namespace from its parent. Initially all the processes share the same default network namespace from the init process.

By convention a named network namespace is an object at /var/run/netns/NAME that can be opened. The file descriptor resulting from opening /var/run/netns/NAME refers to the specified network namespace. Holding that file descriptor open keeps the network namespace alive. The file descriptor can be used with the setns(2) system call to change the network namespace associated with a task.

What Is a Network Namespace?From ip-netns(8): A network namespace is logically another copy of the network stack, with its own routes, firewall rules, and network devices.

By default a process inherits its network namespace from its parent. Initially all the processes share the same default network namespace from the init process.

By convention a named network namespace is an object at /var/run/netns/NAME that can be opened. The file descriptor resulting from opening /var/run/netns/NAME refers to the specified network namespace. Holding that file descriptor open keeps the network namespace alive. The file descriptor can be used with the setns(2) system call to change the network namespace associated with a task.

What Is a Network Namespace?From ip-netns(8): A network namespace is logically another copy of the network stack, with its own routes, firewall rules, and network devices.

By default a process inherits its network namespace from its parent. Initially all the processes share the same default network namespace from the init process.

By convention a named network namespace is an object at /var/run/netns/NAME that can be opened. The file descriptor resulting from opening /var/run/netns/NAME refers to the specified network namespace. Holding that file descriptor open keeps the network namespace alive. The file descriptor can be used with the setns(2) system call to change the network namespace associated with a task.

How Is a Network Namespace Created?● The “ip netns add” command calls unshare(CLONE_NEWNET)● This places the “ip netns” process in a new, empty namespace● Changes what /proc/self/ns/net points to● If nothing is using the namespace, it disappears● Using the namespace means:

• A process in the namespace• Process that has this file open

# ls -l /proc/self/ns/netlrwxrwxrwx 1 root root 0 Sep 6 20:18 /proc/self/ns/net -> net:[4026531957]

Network Namespace Persistence● Running “strace ip netns add test” shows:

• open("/var/run/netns/test", O_RDONLY|O_CREAT|O_EXCL, 0) = 4• close(4) • unshare(CLONE_NEWNET)• mount("/proc/self/ns/net", "/var/run/netns/test", 0x43981d, MS_BIND, NULL)

How Are Namespaces Persisted?● Running “strace ip netns add test” shows:

• open("/var/run/netns/test", O_RDONLY|O_CREAT|O_EXCL, 0) = 4• close(4) • unshare(CLONE_NEWNET)• mount("/proc/self/ns/net", "/var/run/netns/test", 0x43981d, MS_BIND, NULL)

● Creates an alias for the namespace file● Alias outlives the “ip netns add test” process● Namespace becomes permanent and reusable

What Does This Have To Do With Docker?● Docker volumes interact with mounts in unintuitive ways● Network namespaces are persisted as filesystem mounts!● L3-Agent and DHCP agent need /var/run/netns!

Docker Volumes And Filesystem Mounts● By default:

• Existing mounts are visible when container is started are visible inside it• New mounts inside container aren’t visible outside• New mounts outside container aren’t visible inside• Unmounts don’t propagate from host to container or vice-versa

● Mounts are setup on container start, but not synchronized

Docker Namespace Test

# ip netns add test1# docker run --name test --detach -v /run/netns:/run/netns ubuntu sleep 1h8fda0cf570d33f4984cdcaa4e540dd07531e7a4ecb51df3594a85b3346aa294c# ip netns add test2

Docker Namespace Test - Host

# ls -l /run/netnstotal 0-r--r--r-- 1 root root 0 Sep 29 15:15 test1-r--r--r-- 1 root root 0 Sep 29 15:15 test2# grep /run/netns /proc/mountstmpfs /run/netns tmpfs rw,nosuid,noexec,relatime,size=791316k,mode=755 0 0nsfs /run/netns/test1 nsfs rw 0 0nsfs /run/netns/test2 nsfs rw 0 0

Docker Namespace Test - Container

# docker exec test ls -l /var/run/netns/total 0-r--r--r-- 1 root root 0 Sep 29 15:15 test1---------- 1 root root 0 Sep 29 15:15 test2# docker exec test grep /run/netns /proc/mountstmpfs /run/netns tmpfs rw,nosuid,noexec,relatime,size=791316k,mode=755 0 0nsfs /run/netns/test1 nsfs rw 0 0

Solution: Docker Volume Flags!● Docker supports flags to change default mount behavior● Flags on mount determine propagation of new mounts/unmounts● Uses Linux shared subtree flags● Kernel docs

• https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt● Docker docs

• https://github.com/docker/docker/blob/master/docs/reference/commandline/service_create.md#bind-propagation

● Fix is simple: Use the “shared” flag for /run/netns• -v /run/netns:run/netns:shared• Enables bidirectional propagation of mounts

Overview● What magic incantations are needed to run these services at all?● How to prevent HA router failover on service restarts● How to prevent network namespaces from breaking everything● Bonus: How are network namespaces are related to Cinder!?

Cinder NFS Backend and Nova● NFS backend wants to be able to mount NFS shares itself● Libvirt/QEMU need to be able to see NFS shares

• Libvirt & QEMU are outside nova-compute container● If nova-compute mounts the share, Libvirt can’t see it!● We need mounts under /var/lib/cinder/mnt to propagate● Shared flag must be applied to a filesystem mount on host● Solution

• mount --bind /var/lib/cinder/mnt /var/lib/cinder/mnt• mount --make-shared /var/lib/cinder/mnt• docker run -v /var/lib/cinder/mnt:/var/lib/cinder/mnt:shared

Questions?● Email: clayton.oneill@charter.com● IRC: clayton● Twitter: clayton_oneill