Post on 16-Apr-2017
DEVOOPS: LESSONS LEARNED FROM A CLOUD NETWORK ARCHITECT
James DentonPrincipal Architect@jimmdenton
Jonathan AlmalehOpenStack Network Architect@ckent99999
2
SLIDES AVAILABLE AT
http://www.slideshare.net/JamesDenton1
3
WHAT WE SUPPORT
4
WHY WE’RE HERE : THE LESSON• Consider everything when designing:
- Business requirements- “Community” opinions- Available resources- Scale- Performance requirements
• Be willing and able to change and find creative solutions when issues arise or priorities change for your private cloud
• Understand your options and how to migrate or transition rather than scratch and rebuild
5
NOT WHY WE’RE HERE
• To knock or discourage any technology• To advocate one method or solution
over another• To say that one is better than the other
(but we all know Superman > Batman)
6
Open vSwitch vs LinuxBridge
BATTLE OF THE SWITCHES
7
WHY NEUTRON• Chose Neutron over Nova-Network
- Eventual nova-network deprecation. How eventual, no one was sure.
• Gained tenant-managed networking• First glimpse of overlay networking• Obvious community direction
8
THE ISSUES• Open vSwitch 1.x
- Packet loss / corruption- Slow (Microflows vs Megaflows)- Kernel panics
• Neutron agents immature- Oops, sorry about the bridging loops
Bugs:1228313 - Multiple tap interfaces on controller have overlapping tags1324703 - Default NORMAL flows on OVS bridges at boot has potential to cause network storm
9
Live migration from OVS to LinuxBridge
THE FIX
- Upgrade from Grizzly to Havana- Migrate from OVS plugin to ML2 plugin- Hack the database
- Converted OVS to ML2 schema- Converted GRE networks to VLAN
- Delete the bridges and interfaces- Restart the agent- …- Profit!
10
THE FUTURE• LinuxBridge became the standard driver for Rackspace Private Cloud
starting with the Icehouse release• Open vSwitch, and Neutron itself, continue to mature and focus on stability,
speed, and functionality• Some features depend on the use of OVS, such as TAPaaS, OVN, DVR,
and more• Open vSwitch and LinuxBridge are both supported mechanism drivers and
switching technologies for RPC
11
VXLAN vs VLAN
LAYER 2 SEGMENTATION
12
TENANT NETWORKING
• Neutron network type that uses VXLAN overlay to tunnel instance traffic between hosts
• Runs over UDP
• Uses a unique segmentation ID, VXLAN Network Identifier (VNI)
• ~16 million unique IDs
• Considered a better tunneling protocol for cloud versus GRE
• Neutron network type that uses the more traditional 802.1q VLAN tagging to pass and segment traffic between hosts
• Limited to 4096 "REAL" datacenter VLANs - this limit is much lowered depending on spanning tree mode.
VXLAN VLAN
13
THE ISSUES• MTU woes
- Configure instances to drop MTU by ~50- Or, configure jumbo MTU on VTEP interface- 1242534 - Linux Bridge MTU bug when the VXLAN tunneling is used
• L2population- Dropped packets due slow FDB and ARP table updates- Missing entries may require static programming for quick resolution- Inability to leverage allowed-address-pairs extension due to ARP proxy- 1445089 - allowed-address-pairs broken with l2pop/arp responder and
LinuxBridge/VXLAN
14
THE ISSUES
• Slow throughput of TCP traffic compared to VLAN- Newer network cards needed to handle
offloading
15
THE ISSUES
• FDB bug resulted in duplicate flooding entries- 1531013 - Duplicate entries in FDB table- 1568969 - FDB table grows out of control
00:00:00:00:00:00 dev vxlan-15 dst 172.29.243.133 self permanent00:00:00:00:00:00 dev vxlan-15 dst 172.29.243.133 self permanent…00:00:00:00:00:00 dev vxlan-15 dst 172.29.243.133 self permanent28870 entries
16
Live migration from VXLAN to VLAN
THE FIX
- Provision and trunk a range of VLANs to replace VNIs
- Set default tenant network type to VLAN- Hack the database, again
- Convert VXLAN networks to VLAN- Delete vxlan interfaces on all hosts- Restart the agent- Watch the magic
Shall I dare tempt fate again?
17
SIDE BY SIDE
18
THE FUTURE• VLAN tenant networks may have a place after all
- Less work for Neutron agents- Better performance and better scale in some cases
• VXLAN moving from the host to ToR switch• Invest in newer hardware
- Mellanox ConnectX-3 Pro- Intel XL710
19
L3 Agent, or Not.
TO ROUTE OR TO BRIDGE
20
I HAVE THE POWER!!
21
NORTH/SOUTH CONNECTIVITY
• Allows for multiple tenant networks to attach to a single provider network via Neutron Routers
• Provides routing between instances in different tenant networks
• Allows user to easily create/modify/delete NATs to instances
• Requires little to no change to the physical infrastructure as tenants and respective routers are created
• Allows for overlapping subnets
• Ability to use VPNaaS and FWaaS
• Instances are directly in a VLAN provider network behind a physical network gateway such as a router or firewall
• Each compute can forward instance traffic to a physical gateway device
• As new networks are created, the physical gateway device must be updated with new interfaces
• Requires NATing to be addressed on physical device
L3 Agent w/ Routers Provider Networks Only
22
THE ISSUES• A single router became a single point of failure
• Possible network congestion routing through a single router, or a network node hosting multiple routers
• Router failover or reschedule requires reprogramming of interfaces and floating IPs- Could take minutes to restore connectivity
23
THE ISSUES• Increased time to reach newly booted instances
Consistent build time between 6 –
10 sec Increased accessibility time up
to over 2 minutes
Drop in TTP result of bug fix: 1566007 - l3 iptables floating IP rules don't match iptables rules
24
THE ISSUES• The use of NAT had a negative impact on some applications
- WMI/DCOM on Windows
• Sometimes, users wanted to access via fixed and floating IP- Required static routes to Neutron routers
25
So long, L3…
THE FIX
- Detach the routers from Neutron Networks- Create and address new interfaces on
physical network device- Address upstream routing- Delete routers and restart DHCP agents- Reboot instance or renew DHCP leases
26
SIDE BY SIDE
27
THE FUTURE• BGP speaker functionality will allow access directly to tenant networks
behind Neutron routers- This will also eliminate need for floating IPs in some cases
• Distributed virtual routers (DVR) address the SPoF and throughput issues- Requires OVS
• L3 HA provides automatic failover and resiliency via VRRP/keepalived- No reliance on external script or check- Few bugs recently addressed in recent release
28
L2population vs Multicast … vs L2population
VXLAN VTEP LEARNING
29
UPDATE UPDATE UPDATE
30
VTEP LEARNING
• VTEP learning process that relies on static programming of forwarding table and ARP table on all hosts
• Implemented and managed by a Neutron agent
• Developed by Neutron community
• Requires consistent programming across all hosts for proper operation
• Does not require physical switch or router changes
• VTEP learning process that uses multicast to distribute forwarding information to hosts in the multicast group
• Not managed by Neutron agent
• Leverages vxlan or OVS kernel module for operation
• Requires IGMP configuration on switches and routers
L2population Multicast
31
MULTICAST
comp
comp
comp
dhcp comp
comp
Multicast Group Address239.1.1.1
Network Node
routerVTEP
VTEP
VTEP
VTEP
VTEP
VTEP
32
dhcpNetwork Node
router
L2 POPULATION
comp
comp
comp
comp
comp
Neutron Server
33
L2POP ISSUES• With l2pop, some agents would fail to properly build the FDB and ARP
tables- An issue with the agent, server, or message bus could result in missing
messages- Kernel bug resulted in millions of duplicate flooding entries- Connectivity issues between instances and gateways
• As a result, multicast recommended over L2population, and became default in OpenStack-Ansible
34
MULTICAST ISSUES• Physical infrastructure had not been configured to support multicast
- Requires IGMP snooping and IGMP querier if multicast router not present
• Without a reboot, forwarding database and ARP table failed to properly populate or contained stale data
• As a result, all traffic between instances, DHCP servers, and routers failed on VXLAN networks
35
Multicast, out!
THE FIX
- Smaller cloud had little to no scaling issues with l2pop or L3 Neutron Routers
- Created an Ansible override to make L2popluation the default, again
- VXLAN interfaces deleted, and agents restarted to rebuild VXLAN mesh and properly build FDB and ARP tables
36
THE FUTURE• L2population can continue to be used for smaller scale clouds
- Lower number of hosts and networks
• Multicast may work better for larger scale clouds- Will require proper configuration of physical switches/routers
• BGP/eVPN for robust propagation between ToR switches- Move VXLAN off the hosts!
37
RECAP• If you live on the bleeding edge, prepare to feel some pain
• Sometimes, short term pain is needed for long term gain- Gaining operational experience in the early days was crucial for understanding and adoption of features
• Available hardware and overall network requirements should be considered when choosing supported network types- How many networks am I expected to support? Do I require ‘scalable’ networking? What type of networking
features are necessary?
• Investments in hardware should be made for optimal performance
• End users can lose faith in the system if it doesn’t provide reasonable access times, stability, and consistency
• There’s nothing wrong with keeping things simple.
38
39
Copyright © 2016 Rackspace | Rackspace® Fanatical Support® and other Rackspace marks are either registered service marks or service marks of Rackspce US, Inc. in the United States and other countries. Features, benefits and pricing presented depend on system configuration and are subject to change without notice. Rackspace disclaims any representation, warranty or other legal commitment regarding its services except for those expressly stated
in a Rackspace services agreement. All other trademarks, service marks, images, products and brands remain the sole property of their respective holders and do not imply endorsement or sponsorship.
ONE FANATICAL PLACE | SAN ANTONIO, TX 78218
US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM