Networking approaches in a container world · Disclaimer There a many container engines, I’m...
Transcript of Networking approaches in a container world · Disclaimer There a many container engines, I’m...
Disclaimer● There a many container engines, I’m going to focus on Docker
● Multiple networking solutions are available:
– Introduce the core concepts
– Many projects → cover only some of them
● Container orchestration engines:
– Tightly coupled with networking
– I’m going to focus on Docker Swarm and Kubernetes
● Remember: the container ecosystem moves at a fast peace, things can suddenly change
2
The problem● Given:
– Containers are lightweight
– Containers are great for microservices
– Microservices: multiple distributed processes communicating
→ Lots of containers that need to be connected together
3
Single host
4
Reuse the host network
5
host
container-01
eth0lo ...
Container has full access to host’s interfaces!
Reuse the host network
6
$ docker run --rm --name container-01 --net=host -ti busybox /bin/sh/ # ifconfig docker0 Link encap:Ethernet HWaddr xx:xx:xx:xx:xx:xx inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0 inet6 addr: fe80::x/64 Scope:Link UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:19888 errors:0 dropped:0 overruns:0 frame:0 TX packets:19314 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:3063342 (2.9 MiB) TX bytes:29045336 (27.6 MiB)
eth0 Link encap:Ethernet HWaddr xx:xx:xx:xx:xx:xx inet addr:192.168.1.121 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::x/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:135513 errors:0 dropped:0 overruns:0 frame:0 TX packets:109723 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:102680118 (97.9 MiB) TX bytes:22766730 (21.7 MiB)
lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:230 errors:0 dropped:0 overruns:0 frame:0 TX packets:230 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:37871 (36.9 KiB) TX bytes:37871 (36.9 KiB)
Warning: the container can see and control all the host interfaces
Network bridge
7
host
container-01
eth0
docker0
container-02
172.17.0.0/16
● An internal, virtual switch● Containers are plugged in that switch● Containers on the same bridge can talk to each other● Users can create multiple bridges
Network bridge: as seen by the host
8
$ ifconfig docker0docker0 Link encap:Ethernet HWaddr xx:xx:xx:xx:xx:xx inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0 inet6 addr: fe80::42:a2ff:fe10:ccf7/64 Scope:Link UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:7 errors:0 dropped:0 overruns:0 frame:0 TX packets:30 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:480 (480.0 B) TX bytes:5025 (5.0 KB)
$ ip route default via 192.168.1.1 dev wlan0 proto static metric 600 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1...
route handling traffic from host to containers
docker0 is by default at 172.17.0.1
How to expose a service
9
host
container-01 container-02
172.17.0.0/16
eth0
docker0
port 80
port 8080
● Port 80 of container-02 is mapped to port 8080 of the host● Risk: port exhaustion on the host
Multi-host networking
10
Multi-host networking scenarios
11
host-A host-B host-C
container-02
container-01
container-03
container-04
container-05
container-06
frontendnetwork
applicationnetwork
databasenetwork
eth0 eth0 eth0
Multi-host networking scenarios
12
a big host-A
frontendnetwork
container-02
container-01
container-03
container-04
container-05
container-06
applicationnetwork
databasenetwork
Multi-host networking scenarios
13
a big host-A
frontendnetwork
container-02
container-01
container-03
container-04
container-05
container-06
applicationnetwork
databasenetwork
VM-1 VM-2 VM-3
Routing solutions
14
Routing approach● Create a common IP space at container level
● Assign a /24 subnet to each host
● Setup IP routes between the hosts
● Main projects:
– Calico
– Flannel
– Romana
15
Routing approach
16
192.168.1.2
host-A
container-01
eth0
docker0
10.0.9.0/24
10.0.9.4 10.0.9.5
10.0.9.1
container-02
host-B
container-03
eth0
docker0
10.0.10.0/24
10.0.10.8 10.0.10.9
10.0.10.1
container-04
Routing rule: 10.0.10.* goes through eth0
192.168.1.3
Routing rule: 10.0.10.* goes through docker0
Calico's approach
17
192.168.1.2
host-A
container-01
eth0
docker0
10.0.9.0/24
10.0.9.4 10.0.9.5
10.0.9.1
container-02
host-B
container-03
eth0
docker0
10.0.10.0/24
10.0.10.8 10.0.10.9
10.0.10.1
container-04
192.168.1.3
BGP● One of the protocols used to
build the Internet● Used to advertise routes
● Felix agent● Uses kernel’s L3 forwarding
capabilities● Handles ACLs
Flannel's approach
18
192.168.1.2
host-A
container-01
eth0
docker0
10.0.9.0/24
10.0.9.4 10.0.9.5
10.0.9.1
container-02
host-B
container-03
eth0
docker0
10.0.10.0/24
10.0.10.8 10.0.10.9
10.0.10.1
container-04
192.168.1.3
etcd● flanneld process● Keep routes up-to-date
● Network configuration● Network topology
Calico + flannel = Canal● Collaboration announced on May 9th 2016
● Use Calico and flannel together
● Project still in its early days
19
Overlay solutions
20
Overlay network approach● Create a parallel network for cross communication
● Connect hosts with encapsulation tunnels
● Connect containers to the virtual networks
● Main projects:
– Docker (native)
– Flannel
– Weave
21
Overlay network
22
192.168.1.2
host-A
container-01
eth0
docker0
10.0.9.0/24
container-02
host-B
container-03
eth0
docker0
10.0.9.0/24
container-04
192.168.1.3
capture traffic leaving to some other container in 10.0.9.X
outerEther
Header (src/dst)
outerIP
Header (src/dst)
outerUDP
Header
VXLAN Header
Inner Ether FrameEncapsulated traffic (eg. VXLAN)
Overlay traffic
Overlay network and k/v storeNetwork state and configuration can be saved into a k/v store:
● Docker < 1.12: supports etcd, consul and zookeeper via libkv
● Docker >= 1.12: no external dependency, built-in component
● Flannel: etcd
● Weave: no external dependency, doesn’t use k/v store at all
23
Overlay backends
● VXLAN:
– faster than UDP, traffic doesn't go to userspace
– Some hardware acceleration available
● UDP: can add encryption easily24
VXLAN UDP
Docker X -
Weave X X*
Flannel X X*
* backed by custom protocol
Routing vs Overlay
25
Good Bad
Routing
● Native performance● Easy debugging
● Requires control over the infrastructure
● Hybrid cloud more complicated (requires VPN)
● Can run out of addresses
Overlay
● Easier inter-cloud● Doesn’t require control over
the infrastructure
● Inferior performances● Debugging more
complicated● No IP multicast (except for
weave)
How to use these projects● Container Network Module (CNM):
– Specification used by Docker
– Plugins for: calico, weave, …
– Note well: Docker 1.12+ Swarm mode works only with the native overlay network driver
● Container Network Interface (CNI):
– Derived from rkt networking proposal
– Supported by rkt, kubernetes, Cloud Foundry, Mesos,…
– Support for: calico, flannel, weave,...
26
More troubles...
27
Are we done?● Now we can:
– Connect containers running on different hosts
– React to network changes
● Is that enough? Unfortunately not...
28
Service discovery● A container runs a service: producer
● A container accesses this service: consumer
● The consumer needs to find where the producer is located (IP address, in some cases even port #)
29
Challenge #1: find the producer
30
host-A host-B
web-01 redis-01
eth0 eth0
Where is redis?
host-A host-B
web-01 redis-01
eth0 eth0
Challenge #2: react to changes
31
host-A host-B
web-01 redis-01
eth0 eth0
host-C
eth0
“web” is already connected to “redis”
Challenge #2: react to changes
32
host-A host-B
web-01 redis-01
eth0 eth0
host-C
redis-02
eth0
“web” points to to the old location → it’s broken
“redis” is moved to another host → different IP
Challenge #2: react to changes
33
The link has to be reestablished
host-A host-B
web-01
eth0 eth0
host-C
redis-02
eth0
Containers can be moved at any time:● The producer can be moved to a different host● The consumer should keep working
Challenge #3: multiple choices
34
Multiple instances of the “redis” image
host-A host-B
web-01 redis-01
eth0 eth0
host-C
redis-02
eth0
Which redis?
Workloads can be scaled:● More instances of the same producer● How to choose between all of them?
Addressing service discovery
35
Use DNS● Not a good solution:
– Containers can die/be moved somewhere more often
– Return DNS responses with a short TTL → more load on the server
– Some clients ignore TTL → old entries are cached
Note well:
● Docker < 1.11: updates /etc/hosts dynamically
● Docker >= 1.11: integrates a DNS server
36
Key-value store● Rely on a k/v store (etcd, consul, zookeeper)
● Producer register itself: IP, port #
● Orchestration engine handles this data to the consumer
● At run time either:
– Change your application to read data straight from the k/v
– Rely on some helper that exposes the values via environment file or configuration file
37
Handing changes &multiple choices
38
DIY solution● Use a load balancer
● Point all the consumers to a load balancer
● Expose the producer(s) using the load balancer
● Configure the load balancer to react to changes
→ More moving pieces
39
Rely on the orchestration engine● Service has an unique and stable IP address
● Consumers are pointed to the service
● Service redirects the request to one of the containers running the producer
● Traditional DNS can be added on top of it → no changes to legacy applications
● Feature offered by Kubernetes and Docker >= 1.12
40
Kubernetes and Swarm services
41
host-B
redis-01
eth0
host-C
redis-02
eth0
host-A
web-01
eth0
redisservice
VIP
● User declares a service● Orchestration engine allocates a virtual IP address for it● On each container node:
– iptables rules to handle VIP ↔ container IP translation– A process keeps the iptables rules up-to-date
Are we really done?
42
Ingress traffic● Your production application is running inside of a container
cluster
● How to route customers’ requests to these containers?
● How to react to changes (containers moved, scaling,…)?
43
Kubernetes’ approachServices can be of three different types:
● ClusterIP: virtual IP reachable only by containers inside of the cluster
● NodePort: ClusterIP + the service is exposed on all the nodes of the cluster on a specific port → <NodeIP>:<NodePort>
● LoadBalancer: NodePort + k8s allocates a load balancer using the underlying cloud provider. Then it configures it and it keep it up-to-date
44
Docker 1.12 approach● Define a service using the `--publish` flag
● The service is exposed on all the nodes of the cluster on a specific port → <NodeIP>:<ServicePort>
45
Ingress traffic flow
46
Load balancer
http://guestbook.com
host-B
guestbook-01
8081
blog-01
8080
host-A
guestbook-01
80818080
host-C
8081
blog-01
8080
● Load balancer picks a container host● Traffic is handled by the internal service● Works even when the node chosen by the load balancer is not running the
container
Recap
47
Calico Docker built-in Flannel Weave
Approach routing overlay routing, overlay overlay
Specification CNI, CNM CNM CNI, CNM CNI, CNM
It’s not just a matter of connecting containers:● Service discovery● Handling changes & multiple choices● Handling ingress traffic
Questions?
48