Post on 30-Jun-2015
description
VyOS VXLAN and Linux Device Driver
2014/11/2
VyOS users meeting #2
Ryo Nakamura
upa@haeena.net
VyOSの VXLANと Linuxのデバドラの話
Virtual eXtensible LAN• An Ethernet over IP overlay. RFC7348.
– Ethernet frame is encapsulated in IP + UDP + VXLAN headers.
– VXLAN header contains 24bit Virtual Network Identifier (VNI) field.
2^24 L2 segments can be multiplexed in one VXLAN overlay network
domain.
– Unicast traffic is encapsulated in IP Unicast.
– BUM traffic is encapsulated in IP Multicast.
• Multicast based VTEP learning is described in RFC, Sec 4.
– Many vendors propose and use their original control planes.
– Of course, I know that Multicast is difficult in actual environments, but
they don’t have INTEROPERBILITY :(
2
Multicast based VTEP learning
OuterIP Src AOuterIP Dst MSrcMAC : 1DstMAC : FF
VTEP:A
VTEP:C
VTEP:B
VTEP:DNode:1
Node:3
Node:4
Node:2Node 1 send arp request Node 4
3
Node 1 is in VTEP A !!
Multicast based VTEP learning
VTEP:A
VTEP:C
VTEP:B
VTEP:DNode:1Node:4
Node:2
OuterIP Src DOuterIP Dst ASrcMAC : 4DstMAC : 1
Node 4 send arp reply to Node 1
4Node:3
Node 4 is in VTEP D !! Node 1 is in
VTEP A !!
Linux kernel version issue
• Linux VXLAN Driver is appeared in kernel 3.7
– 2012/9/24, first patch was contributed to netdev.
– I was really looking forward to Vyatta Core with
kernel 3.7 and later.
• Kernel version of VyOS Helium is 3.13.11 !!
– HooooooooOOOO!!! WrrrrryyyyyyYYYYYYYYYY !!!!!!!!
– Hydrogen is kernel 3.3
6
VyOS VXLAN CLI• Under the interfaces section
– set interfaces vxlan vxlan0
– set interfaces vxlan vxlan0 group 239.0.0.1
– set interfaces vxlan vxlan0 vni 0
– and basic interface operations
• IPv4/v6 routing
• bridge-group
• policy
interfaces { vxlan vxlan0 { group 239.0.0.1 vni 0 }}
7
Operation example
interfaces { vxlan vxlan0 { address 172.16.0.1/24 group 239.0.0.10 ip { ospf { cost 10 } } vni 0 }}
protocols { ospf { area 0 { network 172.16.0.0/24 } }}
8
Operation example
vyos@vyos:~$ show interfaces vxlan vxlan0 vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default link/ether b2:74:c9:fa:1d:fd brd ff:ff:ff:ff:ff:ff inet 172.16.0.1/24 brd 172.16.0.255 scope global vxlan0 valid_lft forever preferred_lft forever inet6 fe80::b074:c9ff:fefa:1dfd/64 scope link valid_lft forever preferred_lft forever
RX: bytes packets errors dropped overrun mcast 0 0 0 0 0 0 TX: bytes packets errors dropped carrier collisions 2446 25 0 0 0 0
9
Operation example
vyos@vyos:~$ show ip ospf interface vxlan0 vxlan0 is up ifindex 3, MTU 1500 bytes, BW 0 Kbit <UP,BROADCAST,RUNNING,MULTICAST> Internet Address 172.16.0.1/24, Broadcast 172.16.0.255, Area 0.0.0.0 MTU mismatch detection:enabled Router ID 10.10.20.189, Network Type BROADCAST, Cost: 10 Transmit Delay is 1 sec, State DR, Priority 1 Designated Router (ID) 10.10.20.189, Interface Address 172.16.0.1 No backup designated router on this network Multicast group memberships: OSPFAllRouters OSPFDesignatedRouters Timer intervals configured, Hello 10s, Dead 40s, Wait 40s, Retransmit 5 Hello due in 7.900s Neighbor Count is 0, Adjacent neighbor count is 0
10
node.def• VXLAN interface name
– Different number from VNI can be used to an interface
name. But, I think it is really confusing :(
val_help: <vxlanN>; VXLAN interface namesyntax:expression: pattern $VAR(@) "vxlan[0-9]+$"
11
node.def (cont’d)• REQUIRED
– A vxlan overlay network is identified by VNI.
– Multicast Group Address is required to encapsulate BUM
Traffic in IP Multicast.
Group Address can be reused for other VNI.
commit:expression: $VAR(./group/) != ""; \ "Must configure vxlan group for $VAR(@)"commit:expression: $VAR(./vni/) != ""; \ "Must configure vxlan vni for $VAR(@)“
12
node.def (cont’d)
• create interface
VXLAN_VNI="id $VAR(./vni/@)" VXLAN_GROUP="group $VAR(./group/@)" VXLAN_TTL="ttl 16"
if [ ! $VAR(./link/) == "" ]; then VXLAN_DEV="dev $VAR(./link/@)" fi
ip link add name $VAR(@) type vxlan \ $VXLAN_VNI $VXLAN_GROUP $VXLAN_TTL $VXLAN_DEV ip link set $VAR(@) up
touch /tmp/vxlan-$VAR(@)-create
skimped work...
underlay device
And, execute iproute2
13
Change vni or group of existing vxlan interfaces
• Sorry, it is not supported.
• Changing group or vni requires
delete and re-create the vxlan
interface.
14
VXLAN in Linux• ip link add type vxlan
– Pseudo ethernet interface : vxlanX
– Interfaces are connected to each vxlan overlay network corresponding
to a VNI (vxlan_dev and FDB / VNI)
– Namespace is supported
Linux Kernel
vxlan1
FDB
vxlan0
kernel udp socket
FDB
udp_sk(sk)->encap_rcv = vxlan_udp_encap_recv
netif_rx(skb)
iptunnel_xmit()
struct net_device
15
How to specify attributes• ip link add type vxlan id 0 group X
– Netlink API : An API to communicate to Kernel
– NETLINK_ROUTE, NETLINK_NETFILTER and more
Linux Kernel
Userland Application
Netlink Socketsocket(AF_NETLINK, SOCK_RAW, netlink_family)
Interface
routing table
Netfilter
struct nlmsghdrand rtattr etc
16
How to specify attributes (cont’d)
• ip link add type vxlan id 0 group X
– RTNETLINK : routing socket
• RTM_NEWLINK message is sent with attributes related
to VXLAN (see man ip-link)
int do_iplink(int argc, char **argv){ if (argc > 0) { if (iplink_have_newlink()) { if (matches(*argv, "add") == 0) return iplink_modify(RTM_NEWLINK, NLM_F_CREATE|NLM_F_EXCL, argc-1, argv+1);
iproute2 package is a good text book of
Netlink !!
17
Attributes of vlxan interface• id : Virtual Network Identifier
• dev : Uunderlay device (in VyOS, link)
• group : Multicast group address
• remote : An unicast IP address of VTEP for BUM traffic
• local : Source IP address of encapsulated packet
• ttl : TTL of encapsulated packet
• port : Source port range of encapsulated packet
But, these attributes can be only specified when a pseudo interface is created !!
18
How to specify attributes (cont’d)
• VXLAN driver kernel-source/drivers/net/vxlan.c
– RTM messages are processed by rtnl_link_ops
static struct rtnl_link_ops vxlan_link_ops __read_mostly = { .kind = "vxlan", .maxtype = IFLA_VXLAN_MAX, .policy = vxlan_policy, .priv_size = sizeof(struct vxlan_dev), .setup = vxlan_setup, .validate = vxlan_validate, .newlink = vxlan_newlink, .dellink = vxlan_dellink, .get_size = vxlan_get_size, .fill_info = vxlan_fill_info,};
vxlan_newlink () is calledwhen RTM_NEWLINKis received
19
vxlan_newlink ()
• Codes can not be pasted... too long...
1. Parse attributes
2. Set up parsed parameters to vxlan_dev
3. register_netdeivce
20
And, you can see vxlan0asano2:/home/upa % ifconfig vxlan0 vxlan0 Link encap:Ethernet HWaddr 02:0a:1e:ad:7f:31 inet6 addr: fe80::a:1eff:fead:7f31/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:7 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:690 (690.0 B)
asano2:/home/upa % ip -d link show dev vxlan09: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default link/ether 02:0a:1e:ad:7f:31 brd ff:ff:ff:ff:ff:ff promiscuity 0 vxlan id 0 group 239.0.0.1 srcport 32768 61000 dstport 8472 ageing 300
asano2:/home/upa % bridge fdb show dev vxlan000:00:00:00:00:00 dst 239.0.0.1 self permanent
21
As a result• vxlan parameters can not be changed after
pseudo interface is created.
• Do you have good ideas ?
– I have just only one idea.
• Use Generic Netlink like l2tp driver
• Generic Netlink is a mechanism to add user defined
netlink family dynamically.
• It requires patches to vxlan driver and iproute2...
22
Future work ?• Change destination port ?
– Default is 8472 (OTV). 4789 is assigned for VXLAN by IANA
– It can be changed through module_param.
But it requires rmmod/insmod when port is changed.
Of course, all pseudo interfaces are removed...
• Support “remote” attribute
– Easy. Is it needed for the community ?
23