Virtualisation des réseaux: commutation, protocoles, orchestration
Transcript of Virtualisation des réseaux: commutation, protocoles, orchestration
école RESCOM 2016
Virtualisation des réseaux:
commutation, protocoles, orchestration
Stefano Secci
Associate Professor Univ. Pierre et Marie Curie
LIP6 - Office 25-26/518 - BC 169 4 place Jussieu, 75005 Paris, France
http://lip6.fr/Stefano.Secci
Agenda
n Principles n Virtualization and switching
n Virtual bridging n Programmable switches and virtual network functions n Mobile edge
n Cloud network protocols n Virtual machine mobility n Cloud network overlay protocols n Mobile edge computing n High availability
n Wrap-up
3
Principles
4
Cloud Networking evolution
5
From localized cloud (a) to multi-site cloud (b) and multi-tenancy networks (c) Cloud resources (shown in green) are spread much finer and deeper into the network, close to the actual point of usage.
Source: B. Ahlgren et al, “Content, Connectivity, and Cloud: Ingredients for the Network of the Future”, IEEE Communications Magazine, 2011
The challenge of decentralization
6 S. Secci, S. Murugesan, Cloud Networks: Enhancing Performance and Resiliency”, IEEE Computer Magazine, Oct. 2014.
In-network / edge Clouds a.k.a. CloudLets, Mobile Edge Computing
7
Source: B. Ahlgren et al, “Content, Connectivity, and Cloud: Ingredients for the Network of the Future”, IEEE Communications Magazine, 2011
Virtual Network Interfaces
Virtual ports deployment is growing faster than the deployment of physical ports
B. Davie (VMware). Network Virtualization: Enabling the Software-Defined Data Center. IEEE Cloudnet 2013, Keynote (2013).
Virtualization: historical perspective
n Taxonomy n VM (Virtual Machine): entities capable of virtualization a set of hw resources n VMM (VM Monitor, or hypervisor): sw providing the abstraction of a VM
n Hardware-based virtualization (“old” concept) n 1970 : made in hard with electronic isolation of different application components
n Popek & Goldberg requirements for virtualization (1974) n Equivalence: whatever VM should be executed in the same way over an
hypervisor (VMM) or a physical machine n Resource control: the hypervisor must have exclusive shared resource control.
Whatever VM needs to pass through it to access a shared resource. n Efficiency: most part of the instructions should be executed by the CPU with no
hypervisor intervention. n violated with modern full virtualization hypervisors
9 Popek, G. J.; Goldberg, R. P. (July 1974). "Formal requirements for virtualizable third generation architectures". Communications of the ACM 17 (7): 412–421.
Infrastructure as a Service (IaaS)
n IaaS n A virtual network of a group of VMs managed by a single user/customer n Composed of public VMs (front-office) and private VMs (back-office) n Private VMs typically not visible to end-user
n IaaS interconnect n One IaaS traffic need to be isolated from other IaaSs traffic n VMs can be distributed over multiple physical machines n Physical machines running a IaaS can be in different distant networks
14
Virtualization and switching
15
Virtual Bridging: the Linux (default) way
n The old version of Citrix XenServer using simple Linux Bridge.
n Most hypervisor based virtualization bridges use Linux Bridge model, such as KVM, libvirt.
n All of bridging work are done by ‘brctl’.
n The default bridge provides simple L2 switching functions
n IEEE 802.1Q
eth0 eth1 eth2 eth3
xenbr0 xenbr1 xenbr2 Xenbr3
eth1.1 eth1.30
xapi1 xapi30
VM1
vif1.0
domU
dom0
xapi2
eth1.2
eth0 eth1
vif1.1
Xen Server
XenMgmt Internet(192.168.0.0/16)
VmMgmt(10.5.0.0/16)
DataNetwork (vlan)(172.v.v.h/16)
Switch(trunk port)
VM2
vif2.0
eth0 eth1
vif2.1
VM3
vif3.0
eth0 eth1
vif3.1
VM4
vif4.0
eth0 eth1
vif4.1
(Vlan 2) (Vlan 30) (Vlan 30)
(Vlan 1) (Vlan 30)(Vlan 2)
(insert vlan tag)
(tagged traffic)(untagged traffic)
16
Refresh: IEEE 802.1Q Ethernet switching
Ethernet frame to ingress Bridge
For its VLAN (if any), 3Fs via (VLAN-specific)MAC table querying: - Filter is destination MAC address recently seen as
source MAC behind same port; - otherwise, if seen behind another port, Forward
to that port; - otherwise, Flood to all STP ports à In this case…
Destination MAC addresses used to transmit Ethernet frames (using 3Fs, MAC tables, STP)
17
If there is a quick reply, the frame will not use the ST: entries in all MAC table in the reverse symmetric path
Spanning Tree already there
Virtual bridging Switching at the Edge VM container
n Strengths n Greater context n Enforce policies early n Inter-VM traffic has less overhead
n Weaknesses
n CPU overhead n Additional switches to configure and
monitor
18
Impact of virtualization on DC topologies horizontal vs traffic volume / network capacity & need of virtual bridging
19
Virtual bridging (2) Switching at the edge VM container
n Strengths n Greater context n Enforce policies early n Inter-VM traffic has less overhead
n Weaknesses n CPU overhead n Additional switches to configure and
monitor
n Advanced Edge Switches n Hardware-offloading n Centralized management n Approaching feature-parity with hardware switches
n Visibility, ACLs, QoS n Examples: Vmware vSwitch, Cisco Nexus 1000V, Open vSwitch
20
Open vSwitch (OVS)
n Open vSwitch is a multilayer software switch licensed under the open source Apache 2 license. n A flow-based software L2/L3 switching. n supports multiple Linux-based virtualization technologies including
Xen/XenServer, XCP, KVM, and VirtualBox.
n Open vSwitch is built for programmatic state distribution, through two interfaces: n OpenFlow protocol
n for managing the forwarding behavior of the fast path
n JSON-RPC configure protocol n used for configuration
and monitor functions (tunnels, QoS, NetFlow, etc.)
35
OpenFlow (Eth-IP-TCP/UDP data plane, no distributed control-plane, centralized routing)
Ethernet/IP/TCP-UDP frame to ingress OpenFlow switch
Flow switching tables pre-set by a central server Flow defined as any combination of Eth MACs, VLANs, IP fields, TCP/UPD fields
Forwarding based on flow tables at every node in the network
42
Central server a master-slave hierarchy can be defined using many PoPs (e.g., Google)
The OpenFlow switching freedom
n Match scale: n Freedom requires huge switching rule space: O(109+) n Flow table size on commodity switches (COTS) is however small: O(104) n Live memory (TCAM) expensive and asking high power consumption n Multiple match tables partially solve the scalability aspect
n Action: theoretically, no limit to the function f() one could apply
43
f()
vSwitches and v network functions relationship between SDN and FV
44
f()
VM VM VM
action semantic
VMs
vNICs
vSwitch
NICs
n How rich can a switching rule be?
n f() can potentially be anything n A firewall action (stop/pass) n A tunnelling action (encap) n A chyphering action (encrypt) n A load-balancing action
n … a network function “nf()” 8-) n When the switching state is too rich,
too critical, too * n Shall be isolated “vnf()” ;-) n Can so be “orchestrated” (see next)
vnf()
n If the routing complexity is left to VNFs, then f() can stay very simple (e.g., 802.1Q Linux bridge or default OVS) and no need for OpenFlow/SDN switching rules
n If stateless VNFs are preferable and/or advanced orchestration is needed, then f() shall be made programmable à la OpenFlow
Network Functions Virtualization (NFV)
n Convergence between IT and Network
n Service-oriented multi-tenant systems
n pay as you go, on demand…
n Software-defined systems n programmability,
virtualization, automation…
ETSI’s vision for NFV
How networks are becoming carrier network 4.0?
46
VM VM VM
VM VM VM
VM VM VM
Internet Mobile 4G, 5G Residential
Triple-Play
Challenges: VNF SHARING,
VNF SWITCHING LATENCY, HIGH AVAILABILITY,
DYNAMIC SYS&NET ALLOCATION
Scaling Up/Down The ability to scale by changing allocated resource on the fly, e.g. memory, CPU, storage
47
VM VM VM
VM VM VM
VM VM VM
Internet Mobile 4G, 5G Residential
Triple-Play
Challenges:
MULTI-THREADING PROCESS PERFORMANCE UPON RESCALING
Possible orchestration actions in virtualized networks An abstract view
48
(a) Multipath load-balancing
(c) Node replication
(d) Node replication at distant locations
(e) Node migration (b) Node resizing
Disabled NFV node Disabled network link Disabled VM synchronization link
New NFV node New network link New VM synchronization link
Resized NFV node
Scaling out/in the ability to scale by adding/removing VM instances
49
VM VM VM
VM VM VM
VM VM VM
Internet Mobile 4G, 5G Residential
Triple-Play
Scaling out/in (2) The ability to scale by adding/removing VM instances
50
VM VM VM
VM VM VM
VM VM VM
Internet Mobile 4G, 5G Residential
Triple-Play
Fault Recovery and High Availability (1) The ability to synch VM states and restore from VNF failure
51
VM VM VM
VM VM VM
VM VM VM
Internet Residential Triple-Play
Fault Recovery and High Availability (2) The ability to synch VM states and restore from physical machine failure
52
VM VM VM
VM VM VM
VM VM VM
Internet Residential Triple-Play
Challenges for scaling up/down and fault recovery (specific to NFV – HA less important in legacy cloud):
DYNAMICALLY PROGRAMMABLE
(VIRTUAL AND EDGE) NETWORK SWITCHES
STATELESS VS STATEFUL VM STATE MANAGEMENT
Virtual switching performance
n Intra-VM traffic (between processes / no “vSwitching”) n ~< 20 microseconds with current systems
n Inter-VM, VMs running in the same physical machine n Without socket/memory acceleration: ~100 microseconds n With socket/memory acceleration (e.g., MPI, Nahanni, Xway,
XenLoop, MMNet…): ~<30 microseconds
n External traffic n VNF switching n Without data-plane/NIC accelleration: with current systems, difficult
to go beyond 10 Gbps n With data-plane accelleration (e.g., DPDK, NF.io, etc): with current
systems, reaching about 200 Gbps
NFV ETSI Architecture
54
n Composed of three key elements
n NFV Infrastructure (NFVI): hw and sw required to deploy, manage, and execute VNFs
n Virtual Network Functions (VNFs): sw implementation of NFs
n Management and Orchestration (MANO)
NFV use-case : virtual Customer Premises Equipment (vCPE)
Without NFV With NFV
NFV use-case : virtualization of the cellular Evolved Packet Core (vEPC)
Without NFV With NFV
Credit: D. Lopez, Telefonica
Horizontal EPC Functions Virtualization
Vertical EPC Functions Virtualization
NFV Service Chaining
VNF = Virtual Network Function Credit: M. Bouet, Thales
(pre-)NFV use-case : virtualization of the cellular base station (Cloud-RAN)
Without NFV With NFV
Credit: China Mobile
Mobile Edge Computing (MEC) & Fog
n Approach computing servers to increase throughput and decrease cloud access latency n Clear decoupling between virtualization layer and physical layer
n NFV (chaining) and SDN not strictly required
n MEC servers at the access point & aggregation levels n Support computation offloading from devices to the local cloud
n Identified use-cases by ETSI n Connected cars n IoT gateway n Content & DNS caching n RAN-aware & application-aware content optimization n Intelligent video analytics n Augmented reality
NFV, SDN, MEC, Fog: overlapping scopes
60
Commonalities 1. From physical to virtual network nodes management 2. Require dynamic allocation at virtualization system level 3. Require dynamic virtual switch reconfiguration 4. Can require dynamic physical switch/router reconfiguration 5. Can require externalization of VM states
NFV & SDN
MEC MEC Fog
Where are we?
61
Cloud network protocols
62
Virtual Machine Mobility
n The major innovation of cloud networking is the VM volatility
n Originally, for disaster recovery n Read VM relocation rather than VM mobility n VM Mobility = VM relocation
n Mostly based on VM state snapshots synchronization n Planned
n Recently, also to increase performance n Decrease the user-VM network/SLA distance n VM mobility = VM migration
n Live migrations vs bulk migrations n Could be triggered by events
63
Disaster recovery needed how could this have been avoided?
64
http://www.comsoc.org Nov. 2, 2012 à Nov. 7, 2012
Virtual Machine Migration types
n VM migration allows a server/cloud administrator to move a running VM between different physical machines without disconnecting the client.
n Two main typologies n Full migration
n Everything is copied, notably the storage disks too n Used mostly by desktop/private users
n Live “hot” migration n Only the memory, storage, and network connectivity of the virtual
machine needs to be migrated to the destination. n The storage is handled separately (e.g., SAN) or not handled at all n The one used in Cloud services
n In live migration, the critical part is memory page transmission. 2 phases n Pre-copy memory pages transfer (can be a lot) n Post-copy memory pages transfer (often negligible amount)
65
IP A
IP A
IP A
Live Virtual Machine Migration pre-copy phase
n VM memory migration
n Pre-copy memory migration n Warm-up phase
n the hypervisor copies all the memory pages from source to destination while the VM is still running on the source.
n dirty pages n Those memory pages changing during the memory copy
process will be re-copied until the rate of re-copied pages is not less than a pre-set page dirtying rate
n Stop-and-copy phase n the VM will be stopped in source and the remaining dirty
pages will be copied to the destination n VM will be resumed in destination.
n Post-copy memory migration n Page-faults management
66
IP A
IP A
IP A
VM mobility (VMM) rerouting solutions
n Triangular routing solutions n There remains a home DC where the
Internet traffic arrives, then rerouted toward the current VM location
n Router-level solutions based on Mobile-IP n Hypervisor-level solutions: cloud network
overlay protocols n Vs: long transmission latency and not
robust against site failures
n “Ahead/Source”-rerouting solution n Rerouting done at the source or in between
the source and the DC facility n No notion of VM’s home DC n Cloud access overlay protocol: LISP is the
single standard implemented solution n Some L4+ overlay solutions also exist
67
Ahead/source rerou-ng
IP-‐level triangular rou-ng hypervisor-‐level triangular rou-ng
VM migra-on
Internet client
Cloud/virtual network overlay protocols
n Able to manage n VM mobility at the Eth/IP level n VM grouping in extended LANs / Infrastructure as a Service, IaaS, networks
n Working at the virtualization server / hypervisor level n Can stay independent of the physical legacy network infrastructure
n Three main protocols n Network Virtualization using Generic Routing Encapsulation (NVGRE) n Virtual eXtensible LAN (VXLAN) n Stateless Transport Tunnelling (STT)
68
Network Virtualization using Generic Routing Encapsulation (NVGRE)
69 http://tools.ietf.org/html/draft-sridharan-virtualization-nvgre-03
n Tenant Network ID (TNI) n 24 bit as VXLAN VNI
n Supported by Microsoft, in Hyper-V n Don’t pass through firewalls, which need to be configured to let GRE pass
n Can’t cross easily the Internet
NVGRE (IP-GRE data plane, no control plane)
Ethernet frame from a VM to virtualization server
Ethernet-over-IP-GRE with tenant ID encapsulation. The destination hypervisor’s IP is manually given by external tools (no control-plane)
Decapsulation and virtual forwarding toward the vif
70
IP routing in the DC fabric
Virtual eXtensible LAN (VXLAN)
71 http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-05
n VXLAN Segment ID/VXLAN Network Identifier (VNI) n 24 bit value used to designate the individual VXLAN overlay network on which
the communicating VMs are situated. n VMs in different VXLAN overlay networks cannot communicate with each
other.
n Implemented by a few vendors (e.g., Brocade) and in OVS n Simplified version under adoption by IBM (called « Dove ») n Passes through Firewall thanks to UDP
n Can cross the Internet
VXLAN (IP-UDP-VXLAN data plane, control plane based on multicast signaling)
Ethernet frame from a VM to virtualization server
Ethernet-over-IP-UDP-VXLAN with VNI The destination hypervisor’s IP is obtained from control-plane multicast signaling
Decapsulation and virtual forwarding toward the vif
72
IP routing in the DC fabric
Stateless Transport Tunnelling (STT) in VMWare NSX (former Nicira)– used between VMWare hypervisors
75 http://tools.ietf.org/html/draft-davie-stt-04
n Fake TCP header
n No real TCP session establishement n TCP fields just used as a horse to code other information n including 32 bits for virtual network ID (context ID of 64 bit)
n Objective n Exploit TCP offload done by some physical NICs
n I.e., fragmentation of large TCP packets, improving performances n Exploit load-balancing (e.g. ECMP) looking into TCP header diffs
STT (IP-TCP/STT data plane, no control plane)
Ethernet frame from a VM to virtualization server
Ethernet-over-IP-TCP/STT with context ID TCP offloading out of the NIC The destination hypervisor’s IP is obtained from control-plane multicast signaling)
Decapsulation and virtual forwarding toward the vif
76
IP routing in the DC fabric
Comparison between Cloud network overlay
78
n Common outer IP/Eth header & Inner Eth/IP payload n Differences in the shim header, but equivalent functionality n Security
n STT arise problems, won’t go through firewalls, rise security alarms n VXLAN and NVGRE using UDP, will pass through
n Common control-plane problem n How to get the destination VM’s hypervisor IP address (outer header) n Solved by VXLAN, left to network stack platforms by VXGRE & STT
VM mobility (VMM) rerouting solutions
n Triangular routing solutions n There remains a home DC where the
Internet traffic arrives, then rerouted toward the current VM location
n Router-level solutions based on Mobile-IP n Hypervisor-level solutions: cloud network
overlay protocols n Vs: long transmission latency and not
robust against site failures
n “Ahead/Source”-rerouting solution n Rerouting done at the source or in between
the source and the DC facility n No notion of VM’s home DC n Cloud access overlay protocol: LISP is the
single standard implemented solution n Some L4+ overlay solutions also exist
79
Ahead/source rerou-ng
IP-‐level triangular rou-ng hypervisor-‐level triangular rou-ng
VM migra-on
Internet client
Locator/Identifier Separation Protocol (LISP) (IP data plane, BGP+LISP control plane)
RLOC mapping entry for 19.76.2.4?
No local mapping for 11.3.9.5!
80
LISP DDT/ALT session
11.3.9.5
19.76.2.4
RLOC1
RLOC2
RLOC3
RLOC4
Network Routing Locator
Priority/Weight
> 19.76.2.0/24 RLOC3 1/100
19.76.2.0/24 RLOC4 2/100
Packet encapsulated towards least priority RLOC
Decap
Map-request message to the mapping server
Map-reply with the mapping
Network Routing Locator
Priority/Weight
> 11.3.0.0/8 RLOC1 1/30
> 11.3.0.0/8 RLOC2 1/70
Decap Packet encapsulated with load-balancing (equal priorities)
BGP routing toward RLOC3’s IP
EIDs preregistered using periodically sent map-register msgs
LISP-VMM tests
81 Patrick RAAD, Stefano SECCI, Dung Chi PHUNG, Antonio CIANFRANI, Pascal GALLARD, Guy PUJOLLE, "Achieving Sub-Second Downtimes in Large-Scale Virtual Machine Migrations with LISP", IEEE Trans. on Network and Service Management, Vol. 11, No. 2, pp: 133-143, June 2014.
Experimentation results
82
Nice Rio Hanoi
SMR (solicit map-request): max {1.5 RTT source DC-MS, timeout mapping entry + RTT user-MS} CP (change priority): max {OWD source DC-dest DC, source DC-user}
Patrick RAAD, Stefano SECCI, Dung Chi PHUNG, Antonio CIANFRANI, Pascal GALLARD, Guy PUJOLLE, "Achieving Sub-Second Downtimes in Large-Scale Virtual Machine Migrations with LISP", IEEE Trans. on Network and Service Management, Vol. 11, No. 2, pp: 133-143, June 2014.
Linking VM Mobility to User Mobility
Terminal displacement Access Network A
Access Network C
Access Network C
Heterogeneous IP access
Adaptive computing and content service provisioning
Internet Edge (ISPs)
Multi Cloud
83 S. Secci, “Cloud and Mobility: a Castling Move?”, Global Security Magazine, Apr. 2012.
With LISP-capable mobile equipment and/or mobile access network
Node 2.3.4.5
DC1 DC2 13.12.11.10
Réseau Localisateur Priorité/Poids
>13.12.11.10/32 132.227.62.1 5/100
13.12.11.10/32 132.227.22.1 7/50
13.12.11.10/32 137.194.0.5 7/50
13.12.11.10/32 137.194.32.4 22/100
>2.3.4.5/32 132.227.8.3 5/100
132.227.62.1 132.227.22.2
Cloud/mobile access provider 1
137.194.32.4 137.194.0.5
Cloud/mobile access provider 2
84
LISP encapsulation at the mobile equipment (with LISP-MN) or at NodeB/BS
Node 2.3.4.5
VM migration triggered by some intelligence (e.g., latency measurement)
132.227.8.3
137.194.76.6 Réseau Localisateur Priorité/Poids
>13.12.11.10/32 132.227.62.1 5/100
13.12.11.10/32 132.227.22.1 7/50
13.12.11.10/32 137.194.0.5 7/50
13.12.11.10/32 137.194.32.4 22/100
2.3.4.5/32 132.227.8.3 5/100
>2.3.4.5/32 137.194.76.6 1/100
Réseau Localisateur Priorité/Poids
13.12.11.10/32 132.227.62.1 5/100
13.12.11.10/32 132.227.22.1 7/100
>13.12.11.10/32 137.194.0.5 1/100
13.12.11.10/32 137.194.32.4 22/100
2.3.4.5/32 132.227.8.3 5/100
>2.3.4.5/32 137.194.76.6 1/100
Linking VM mobility to user mobility
n Two online decisions to take: n When/if to switch the user VM routing locator n When/if to migrate/relocate the VM
n Such decisions are context-specific n Is the time to migrate/relocate largely lower than the time the user
latency profile significantly changes? n Depend on the geographical density of cloud/MEC deployment
n « VM mobility » meaning is also context-specific n « migration » appear to be more appropriate for end-user VMs n « relocation » of redundant VMs particularly interesting for ensuring
high availability to VNFs
85 S. Secci, “Cloud and Mobility: a Castling Move?”, Global Security Magazine, Apr. 2012.
Testbed results
86 S. Secci, P. Raad, P. Gallard. “Linking Virtual Machine Mobility to User Mobility”. IEEE Transactions on Network and Service Management, to appear
PACAO-1: only RLOC switching PACAO-2: RLOC switching+VM mobility
Simulation results based on 36k real individual traces from Orange mobile
87 S. Secci, P. Raad, P. Gallard. “Linking Virtual Machine Mobility to User Mobility”. IEEE Transactions on Network and Service Management, to appear
40% gain 30% gain
MEC infrastructure planning
n An offline decision to take: n How & where to plan the installation of cloudlet/MEC facilities
n Such decisions depend on n User mobility patterns n Density of access
points n Load on access
network links
88 Alberto CESELLI, Marco PREMOLI, Stefano SECCI, "Cloudlet Network Design Optimization", Proc. of IFIP Networking Conference 2015, May 20-22, 2015, Toulouse, France.
MEC infrastructure planning for Paris metropolitan area (4G load)
89 Alberto CESELLI, Marco PREMOLI, Stefano SECCI, "Cloudlet Network Design Optimization", Proc. of IFIP Networking Conference 2015, May 20-22, 2015, Toulouse, France.
Wrap-up
92
Summary
n Network virtualization put in perspective with respect to switching practices and requirements
n Physical network programmability can be a ‘plus’, but may not be strictly required
n Virtual network node programmability is instead required n Adequate virtual network protocols exist to cope with VM states and mobility
n Some scientific challenges n Algorithms to feed the orchestrators for cloud infrastructure and telco networks n Offline orchestration
n dimension and re-dimension virtual network infrastructure computing capacity scaling with state variations and including protection/high availability requirements
n Online orchestration: n Take robust routing decisions at the flow arrival rate
n Integrate prediction of traffic patterns and network states n Take virtualization infrastructure re-allocation decisions
n as a function of aggregate traffic load on VM/VNFs and network states
93
Questions?
94
Annexes
More on SDN Cloud platform - OpenStack DC network architectures
95