2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:#...
Transcript of 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:#...
![Page 1: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/1.jpg)
TU Berlin / T-‐Labs
2 0 N o v ’12
![Page 2: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/2.jpg)
We Live in a Connected World
20 Nov 2012
Network
UCL -‐ Doctoral School day in Cloud CompuEng 2
![Page 3: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/3.jpg)
When Users NoEce the Network
20 Nov 2012
Like electricity, we assume it is magically always there
UCL -‐ Doctoral School day in Cloud CompuEng 3
![Page 4: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/4.jpg)
Network Failure Example 1: SoMware Bugs in Inter-‐Domain Routers
20 Nov 2012
Router type A Router type B
?
0-‐length AS4_PATH aVribute!
Protocol-‐compliant but confusing message
On 19th August 2009, CNCI (AS9354), a small ISP in Japan, adver&sed a handful of BGP updates containing an empty AS4_PATH a?ribute
Reset session!
UCL -‐ Doctoral School day in Cloud CompuEng 4
![Page 5: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/5.jpg)
…what could possibly go wrong?
20 Nov 2012
BGP Update Rate Percentage Increase
10x increase
à rou&ng
instabili&es
[renesys]
UCL -‐ Doctoral School day in Cloud CompuEng 5
![Page 6: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/6.jpg)
What Went Wrong: (CISCO) Session Reset Flood
20 Nov 2012
Unaffected router Affected router
?
?
?
?
?
Unreachable! Repeated
service disrup&ons Innocuous soMware fault caused Internet-‐wide outage
UCL -‐ Doctoral School day in Cloud CompuEng 6
![Page 7: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/7.jpg)
Network Failure Example 2: Planned Network Maintenance
• Amazon EC2 disrupEon on 21st April 2011 – Incorrectly executed network change during a planned network capacity upgrade
20 Nov 2012
MisconfiguraEon caused catastrophic outage
UCL -‐ Doctoral School day in Cloud CompuEng 7
![Page 8: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/8.jpg)
SoMware-‐ and config-‐related issues
Affect even well tested, standard Internet technology With more soMware in networks, need ways to deal with reliability issues
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 8
![Page 9: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/9.jpg)
Why is network reliability so difficult to achieve?
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 9
![Page 10: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/10.jpg)
Networks are Hard to Manage
New control requirements led to great complexity – Network virtualizaEon, VM migraEon, perf. isolaEon, …
Kept working by “Masters of Complexity” When things don’t work? – Only limited tools:
ping, traceroute, tcpdump, SNMP, NetFlow 20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 10
![Page 11: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/11.jpg)
SoMware-‐Defined Networking (SDN)
20 Nov 2012
Control
Control
Control
Control
Third-‐party control program
UCL -‐ Doctoral School day in Cloud CompuEng 11
![Page 12: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/12.jpg)
SDN Promises
Advantages over status quo of management Reduce complexity New funcEonality through programmability SDN is great, but …
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 12
![Page 13: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/13.jpg)
… at the risk of bugs
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 13
![Page 14: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/14.jpg)
SoMware Faults
• Will make communicaEon unreliable
• Major hurdle for success of SDN
20 Nov 2012
We need effecEve ways to test SDN networks
UCL -‐ Doctoral School day in Cloud CompuEng 14
![Page 15: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/15.jpg)
Roadmap
Ø Intro Ø OpenFlow background Ø NICE [NSDI’12]: systemaEcally tesEng OpenFlow Apps
Ø SOFT [CoNEXT’12]: automaEng interop tesEng of OpenFlow Agents
Ø Conclusions
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 15
![Page 16: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/16.jpg)
Quick OpenFlow 101
Host B Host A
Switch 2 Flow Table Rule 1 Rule 2
Rule N
Switch 1 Packet
OpenFlow program
Controller
Install rule; forward packet
Default: forward to controller
Match AcEons Counters Dst: Host B Fwd: Switch 2 pkts / bytes
System is distributed and asynchronous à can misbehave under corner cases
Execute packet_in event handler
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 16
![Page 17: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/17.jpg)
Bugs in OpenFlow Apps
OpenFlow program
Host B Host A
Switch 2
Controller
Switch 1 Packet
Install rule
?
Goal: systemaEcally test possible behaviors to detect bugs
Install rule
Delayed!
20 Nov 2012
Drop packet
Inconsistent distributed state!
UCL -‐ Doctoral School day in Cloud CompuEng 17
![Page 18: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/18.jpg)
Roadmap
Ø Intro Ø OpenFlow background Ø NICE [NSDI’12]: systemaEcally tesEng OpenFlow Apps
Ø SOFT [CoNEXT’12]: automaEng interop tesEng of OpenFlow Agents
Ø Conclusions
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 18
![Page 19: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/19.jpg)
State-‐space exploraEon via Model Checking (MC)
SystemaEcally TesEng OpenFlow Apps
Target system
Unmodified OpenFlow program
Complex environment
Environment model
Switch 1
Switch 2
Host A Host B
• Carefully-‐craMed streams of packets
• Many orderings of packet arrivals and events
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 19
![Page 20: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/20.jpg)
Scalability Challenges
Huge space of possible packets
Huge space of possible
event orderings
Data-‐plane driven Complex network behavior
EnumeraEng all inputs and event orderings is intractable
Equivalence classes of packets
Domain-‐specific search
strategies
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 20
![Page 21: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/21.jpg)
Network topology
Correctness properEes
(e.g., no loops)
Traces of property violaEons
Input Output NICE
State-‐space search
No bugs In Controller ExecuEon
NICE found 11 bugs in 3 real OpenFlow Apps
Unmodified OpenFlow program
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 21
![Page 22: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/22.jpg)
Model Checking
State-‐Space Model State 0
State 2
State 6
State 7
State 4
State 9
State 1
State 3
State 5
State 8
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 22
![Page 23: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/23.jpg)
System State
20 Nov 2012
State
Controller (global variables)
Environment: Switches (flow table)
Simplified switch model
End-‐hosts (network stack) Simple clients/servers
Communica&on channels (in-‐flight pkts)
UCL -‐ Doctoral School day in Cloud CompuEng 23
![Page 24: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/24.jpg)
TransiEon System State 0
State 2
State 6
State 7
State 4
State 9
State 1
State 3
ctrl
packet_in(pkt B)
Run actual packet_in handler
State 5
State 8
20 Nov 2012
Data-‐dependent transiEons!
UCL -‐ Doctoral School day in Cloud CompuEng 24
![Page 25: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/25.jpg)
CombaEng Huge Space of Packets Packet arrival handler
is dst broadcast?
Flood packet Install rule and forward packet
dst in mactable?
Equivalence classes of packets: 1. Broadcast desEnaEon 2. Unknown unicast desEnaEon 3. Known unicast desEnaEon
yes
no
no
yes
Code itself reveals equivalence classes of packets
pkt
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 25
![Page 26: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/26.jpg)
Code Analysis: Symbolic ExecuEon (SE)
Packet arrival handler
is λ.dst broadcast? yes no
Symbolic packet λ
Flood packet
λ .dst ∈ {Broadcast}
λ.dst in mactable? no
yes
λ .dst ∉ {Broadcast}
Install rule and forward packet
λ .dst ∉ {Broadcast}�∧�
λ .dst ∉ mactable λ .dst ∉ {Broadcast} ∧�
λ .dst ∈ mactable
1 path = 1 equivalence
class of packets = 1 packet to inject
Infeasible from iniEal state
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 26
![Page 27: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/27.jpg)
New packets
Enable new transi&ons:
host / send(pkt B) host / send(pkt C)
Symbolic execuEon
of packet_in handler
State 0
State 1
Controller state 1
State 2
host discover_packets State
3
host send(pkt B)
State 4
discover_packets transi&on:
Combining SE with Model Checking
Controller state changes
host send(pkt A)
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 27
![Page 28: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/28.jpg)
CombaEng Huge Space of Orderings
MC + SE
FLOW-‐IR
NO-‐DELAY
UNUSUAL
OpenFlow-‐specific search strategies for up to 20x state-‐space reducEon:
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 28
![Page 29: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/29.jpg)
Specifying App Correctness
• Library of common proper&es – No forwarding loops – No black holes – Direct paths (no unnecessary flooding) – Etc…
• Correctness is app-‐specific in nature
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 29
![Page 30: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/30.jpg)
API to Define App-‐Specific ProperEes
20 Nov 2012
State 0
State 1
ctrl packet_in(pkt A)
def init(): init local vars register(“packet_in”) def on_packet_in(): check system-‐wide state
Register callbacks to observe transiEons
Execute aMer transiEons
UCL -‐ Doctoral School day in Cloud CompuEng 30
![Page 31: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/31.jpg)
Prototype ImplementaEon
• Built a NICE prototype in Python • Target the Python API of NOX
20 Nov 2012
Unmodified OpenFlow program
Stub NOX API
NICE
Controller state & transiEons
UCL -‐ Doctoral School day in Cloud CompuEng 31
![Page 32: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/32.jpg)
Experiences
• Tested 3 unmodified NOX OpenFlow Apps – MAC-‐learning switch – LB: Web server load balancer [Wang et al., HotICE’11] – TE: Energy-‐aware traffic engineering [CoNEXT’11]
• Setup – Iterated with 1, 2 or 3-‐switch topologies; 1,2,… pkts – App-‐specific properEes • LB: All packets of same request go to same server replica • TE: Use appropriate path based on network load
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 32
![Page 33: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/33.jpg)
Results
• NICE found 11 property violaEons à bugs – Few secs to find 1st violaEon of each bug (max 30m)
– Few simple mistakes (not freeing buffered packets)
– 3 insidious bugs due to network race condiEons
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 33
![Page 34: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/34.jpg)
Take Aways
• Why were mistakes easy to make? – Centralized programming model only an abstracEon
• Why the programmer could not detect them? – Bugs don’t always manifest – TCP masks transient packet loss – Plaxorm lacks runEme checks
• Why NICE easily found them? – Makes corner cases as likely as normal cases
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 34
![Page 35: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/35.jpg)
Roadmap
Ø Intro Ø OpenFlow background Ø NICE [NSDI’12]: systemaEcally tesEng OpenFlow Apps
Ø SOFT [CoNEXT’12]: automaEng interop tesEng of OpenFlow Agents
Ø Conclusions
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 35
![Page 36: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/36.jpg)
Testbed tesEng
Mininet
Interoperability at Deployment Time
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 36
OpenFlow program NICE
SystemaEc tesEng
![Page 37: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/37.jpg)
Release
Interoperability at Deployment Time
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 37
OpenFlow program
OpenFlow messages
One OpenFlow API specificaEon… Are OF switches interoperable?
Interop is criEcal for the success of SDN
![Page 38: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/38.jpg)
Interop: How Hard Can It Be?
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 38
OF Switch
ASIC switch chip
OS
OpenFlow Agent
![Page 39: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/39.jpg)
DefiniEon of Interoperability*
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 39
“Being able to accomplish end-‐user applica&ons using different types of systems, whose interfaces are completely understood, in a manner that requires the user to have li?le or no knowledge of the unique characteris&cs of those systems”
* NB: Many other definiEons exist
![Page 40: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/40.jpg)
Interop: How Hard Can It Be?
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 40
OF Switch Inputs
Hardware correctness is formally verified
Packets
OpenFlow messages
“Forwarding” interface
OpenFlow interface
ASIC switch chip
OS
OpenFlow Agent
Likely source of OpenFlow interop issues
Flow Table Hardware AbstracEon Layer
![Page 41: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/41.jpg)
OpenFlow SoMware Agent
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 41
SpecificaEons • Rapid flux (3 revisions in ~ 1 year) • AmbiguiEes (FlowMod is 2.5 pages long) SpecificaEons à ImplementaEon • ImplementaEon freedom • Vendors may not follow the specs
TesEng, tesEng and tesEng…
Switch soMware is not provably correct L
![Page 42: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/42.jpg)
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 42
![Page 43: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/43.jpg)
Interop’12 TesEng Event
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 43
Open Networking Foundation White Paper
8
issues will be disclosed in this document. Many of the products tested are not commercially available yet.
The original proposal was to focus on test cases that applied to service provider, data center and enterprise use-cases. Vendors of controllers and applications running on controllers really determine what can be tested. For this event we were able to test:
• Topology discovery (LLDP method) • Layer 2 Ethernet/VLAN path (circuit) provisioning (primary and backup) • Layer 3 (IP) learning (shortest path primary and backup path) • Layer 3 (IP) load balancing • Enabling multi-controller connectivity using FlowVisor to slice the network
Each one of these applications requires the switches to support the OpenFlow v1.0 protocol.
Testing at the Interoperability Event
• Gather various vendors in Vegas • Hook up switches and controllers • Create and run test cases • See what breaks and …
• Very high manual effort • Test cases are not exhausEve • It is not a one Eme thing
What happens in Vegas, stays in Vegas What happens in Vegas, stays in Vegas
![Page 44: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/44.jpg)
AutomaEng Interop TesEng
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 44
Insight: systemaEcally crosscheck OF implementaEons
![Page 45: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/45.jpg)
The 10,000 foot view
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 45
OF Agent 1
Test inputs
Input-‐driven execuEon
Observable behaviors
Inconsistency!
OF Agent 2
![Page 46: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/46.jpg)
Challenges
• Manage test inputs and coverage efficiently – Or manage “path explosion”
• Capture behaviors
• Avoid simultaneous access to all code
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 46
![Page 47: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/47.jpg)
SOFT (SystemaEc OpenFlow TesEng)
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 47
OF Agent 1
Test inputs
Input-‐driven execuEon
Observable behaviors
OF Agent 2 Determine mapping inputs à behaviors through symbolic
execuEon
IdenEfy inconsistencies
• Automated soluEon to interop tesEng • SystemaEc code coverage • No simultaneous access to all agents
![Page 48: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/48.jpg)
Structured Inputs
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 48
... * * * * * * * * * * * * FLOW MOD N1
STAT REQ N2 1.0 1.0
Further reducEons • Some inputs are independent • Many inputs are enErely concrete • Small number of messages • Concrete values at cost of completeness
C1 C2
![Page 49: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/49.jpg)
Capturing Behaviors
Externally observable outputs • OpenFlow reply messages • Data plane packets • Normalize harmless nondeterminism (e.g., Buffer IDs)
Internal state changes affect successive inputs • Use concrete probe packets
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 49
![Page 50: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/50.jpg)
Example
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 50
If ( p == OFPP_CTRL ) send_to_ctrl ( ) else if ( p < 25 ) send_to_port( p ) else error( BAD_PORT )
if ( p < 25 ) send_to_port( p ) else error( BAD_PORT )
Agent 1 Agent 2
FWD ERR CTRL ERR
p: 1 24
25
...
65535
...
FWD ERR
p: 1 24
25
65535
![Page 51: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/51.jpg)
N-‐version Comparison
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 51
If ( p == OFPP_CTRL ) send_to_ctrl ( ) else if ( p < 25 ) send_to_port( p ) else error( BAD_PORT )
if ( p < 25 ) send_to_port( p ) else error( BAD_PORT )
Agent 1 Agent 2
FWD ERR CTRL ERR
p: 1 24
25
...
65535
...
p:
FWD ERR 1 24
25
65535
![Page 52: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/52.jpg)
N-‐version Comparison
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 52
FWD ERR CTRL ERR
p: 1 24
25
...
65535
...
FWD ERR
![Page 53: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/53.jpg)
Results
Ø Compared Ø OpenFlow 1.0 Switch Reference ImplementaEon Ø Open VSwitch 1.0.0
Ø Input Sequences containing 1 -‐ 4 messages
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 53
![Page 54: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/54.jpg)
Results
Found 7 classes of inconsistencies Mostly related to message validaEon Result of underspecificaEon
Ø No expected behavior in the specificaEon Ø Inconsistent interpretaEon of the specificaEon
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 54
![Page 55: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/55.jpg)
Results -‐ Example
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 55
FlowMod message 1. Modify VLAN to value greater than 212 2. Forward packet
Reference Implementa&on
1. Trim VLAN value to 12 bits 2. Install the rule
Open VSwitch
1. Silently ignore the message
Network in 2 different states Which one is assumed by the controller?
![Page 56: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/56.jpg)
Conclusions NICE automates the tesEng
of OpenFlow Apps
SDN: a new role for soMware tool chains to make networks more dependable.
20 Nov 2012
NICE and SOFT are a step in this direc&on! UCL -‐ Doctoral School day in Cloud CompuEng 56
hVp://code.google.com/p/nice-‐of/
SOFT automates interop tesEng of OpenFlow Agents
![Page 57: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/57.jpg)
Thanks
20 Nov 2012 UCL -‐ Doctoral School day in Cloud CompuEng 57
Peter Perešíni (EPFL)
Maciej Kuźniar (EPFL)
Daniele Venzano (EPFL)
Dejan KosEć (EPFL à IMDEA Networks)
Jennifer Rexford (Princeton)
![Page 58: 2 0 N o v ’12pvr/GrascompCloudDay2012/... · 2020. 10. 23. · Network#Failure#Example#1:# SoMware#Bugs#in#Inter+Domain#Routers# 20Nov2012 Router#type#A# Router#type#B#? 0+length#AS4_PATHaribute!#](https://reader035.fdocuments.us/reader035/viewer/2022071501/611fdb8cc9c1b5151124692b/html5/thumbnails/58.jpg)
Thank you! QuesEons? NICE automates the tesEng
of OpenFlow Apps
SDN: a new role for soMware tool chains to make networks more dependable.
20 Nov 2012
NICE and SOFT are a step in this direc&on! UCL -‐ Doctoral School day in Cloud CompuEng 58
hVp://code.google.com/p/nice-‐of/
SOFT automates interop tesEng of OpenFlow Agents