Virtually Eliminating Router Bugs

27
Virtually Eliminating Router Bugs Minlan Yu Princeton University http://verb.cs.princeton.edu Joint work with Eric Keller (Princeton), Matt Caesar (UIUC), Jennifer Rexford (Princeton) 1 CoNEXT’09

description

CoNEXT’09. Virtually Eliminating Router Bugs. Minlan Yu Princeton University http://verb.cs.princeton.edu Joint work with Eric Keller (Princeton), Matt Caesar (UIUC), Jennifer Rexford (Princeton). Router Bugs in the News. Router Bugs in the News. Example of Router Bugs. - PowerPoint PPT Presentation

Transcript of Virtually Eliminating Router Bugs

Page 1: Virtually Eliminating Router Bugs

Virtually Eliminating Router Bugs

Minlan YuPrinceton University

http://verb.cs.princeton.edu

Joint work with Eric Keller (Princeton), Matt Caesar (UIUC), Jennifer Rexford (Princeton)

1

CoNEXT’09

Page 2: Virtually Eliminating Router Bugs

Router Bugs in the News

2

Page 3: Virtually Eliminating Router Bugs

Router Bugs in the News

3

Page 4: Virtually Eliminating Router Bugs

• 1 misconfiguration tickled 2 bugs (2 vendors)– Real bugs on Feb 16, 2009– Huge increase in the global rate of updates– 10x increase in global instability for an hour

Misconfiguration:as-path prepend 47868

MikroTik bug: no-range check

prepended 252 times

Did not filter

Cisco bug:Long AS paths

AS pathPrependingAfter: len > 255

Notification

AS47878AS47878 AS29113AS29113

4

Example of Router Bugs

Global Instability by Country

Page 5: Virtually Eliminating Router Bugs

Router Bugs

• Router bugs are a serious problem– Routers are getting more complicated• Quagga 220K lines, XORP 826K lines

– Vendors are allowing third-party software– Other outages are becoming less common

• Router bugs are hard to detect and fix – Byzantine failures don’t simply crash the router– Violate protocol, can cause cascading outages– Often discovered after serious outage

5

How to detect bugs and stop their effects before they spread?

How to detect bugs and stop their effects before they spread?

Page 6: Virtually Eliminating Router Bugs

Avoiding Bugs via Diversity

• Run multiple, diverse routing instances– Use voting to select majority result– Software and Data Diversity (SDD) ensures

correctness • E.g., XORP and Quagga, different update timing

– Similar approach applied in other fields– But new challenges and opportunities in routing

6

Vote

Page 7: Virtually Eliminating Router Bugs

SDD Challenges in Routers• Making replication transparent– Interoperate with existing routers– Duplicate network state to routing instances– Present a common configuration interface

• Handling transient, real-time nature of routers– React quickly to network events • E.g., buggy behaviors, link failures

– But not over-react to transient inconsistency

7time

Routing Instance IAA

Routing Instance IIBB CC

BB AA CC

Page 8: Virtually Eliminating Router Bugs

SDD Opportunities in Routers

• Easy to vote on standardized output– Control plane: IETF-standardized routing protocols– Data plane: forwarding-table entries

• Easy to recover from errors via bootstrap– Routing has limited dependency on history – Don’t need much information to bootstrap instance

• Diversity is effective in avoiding router bugs– Based on our studies on router bugs and code

8

Page 9: Virtually Eliminating Router Bugs

Outline

• Exploiting software and data diversity (SDD)– Effective in avoiding bugs– Enough hardware resources to support diversity

• Bug-tolerant router (BTR) architecture– Make replication transparent with low overhead– React quickly and handle transient inconsistency

• Prototype and evaluation– Small, trusted code base– Low processing overhead

9

Page 10: Virtually Eliminating Router Bugs

Outline

• Exploiting software and data diversity (SDD)– Effective in avoiding bugs– Enough hardware resources to support diversity

• Bug-tolerant router (BTR) architecture– Make replication transparent with low overhead– React quickly and handle transient inconsistency

• Prototype and evaluation– Small, trusted code base– Low processing overhead

10

Page 11: Virtually Eliminating Router Bugs

Why Diversity Works? • Enough diversity in routers– Software: Quagga, XORP, BIRD– Protocols: OSPF and IS-IS– Environment: timing, ordering, memory

• Enough resources for diversity– Extra processor blades for hardware reliability– Multi-core processors, separate route servers

• Effective in avoiding bugs

11

Page 12: Virtually Eliminating Router Bugs

Evaluate Diversity Effect

• Most bugs can be avoided by diversity – Reproduce and avoid real bugs – .. in XORP and Quagga bugzilla database

• Diversity on execution environmentDiversity Mechanism Avoid bugs in

database

Timing/Order of Messages

39%

Configuration 25%

Timing/Order of Connections

12%

Combining all execution diversity

88%12

Page 13: Virtually Eliminating Router Bugs

Effect of Software Diversity

• Sanity check on implementation diversity– Picked 10 bugs from XORP, 10 bugs from Quagga– None were present in the other implementation

• Static code analysis on version diversity– Overlap decreases quickly between versions• 75% of bugs in Quagga 0.99.1 are fixed in Quagga 0.99.9• 30% of bugs in Quagga 0.99.9 are newly introduced

• Vendors can also achieve software diversity– Different code versions, different code trains– Code from acquired companies, open-source

13

Page 14: Virtually Eliminating Router Bugs

Outline

• Exploiting software and data diversity (SDD)– Effective in avoiding bugs– Enough hardware resources to support diversity

• Bug-tolerant router (BTR) architecture– Make replication transparent with low overhead– React quickly and handle transient inconsistency

• Prototype and evaluation– Small, trusted code base– Low processing overhead

14

Page 15: Virtually Eliminating Router Bugs

Bug-tolerant Router Architecture

15

UPDATE VOTER

FIB VOTER

REPLICAMANAGER

Hypervisor

Forwarding table (FIB)Interface 1

Iinterface 2

Protocol daemon

Routing table

Protocol daemon

Routing table

Protocol daemon

Routing table

Page 16: Virtually Eliminating Router Bugs

UPDATE VOTER

FIB VOTER

REPLICAMANAGER

Hypervisor

Forwarding table (FIB)Interface 1

Iinterface 2

Protocol daemon

Routing table

Protocol daemon

Routing table

Protocol daemon

Routing table

Replicating Incoming Routing Messages

12.0.0.0/8Update

No need for protocol parsing – operates at socket level

16

Page 17: Virtually Eliminating Router Bugs

UPDATE VOTER

FIB VOTER

REPLICAMANAGER

Hypervisor

Forwarding table (FIB)Interface 1

Iinterface 2

Protocol daemon

Routing table

Protocol daemon

Routing table

Protocol daemon

Routing table

Voting: Updates to Forwarding Table

12.0.0.0/8 IF 2

12.0.0.0/8Update

17

Transparent by intercepting calls to “Netlink”

Page 18: Virtually Eliminating Router Bugs

UPDATE VOTER

FIB VOTER

REPLICAMANAGER

Hypervisor

Forwarding table (FIB)Interface 1

Iinterface 2

Protocol daemon

Routing table

Protocol daemon

Routing table

Protocol daemon

Routing table

Voting: Control-Plane Messages

12.0.0.0/8 IF 2

12.0.0.0/8Update

18

Transparent by intercepting socket system calls

Page 19: Virtually Eliminating Router Bugs

Simple Voting Mechanisms • Tolerate transient periods of disagreement– Different replicas can have different outputs– … during routing-protocol convergence

• Several different voting mechanisms– Master-slave: speeding reaction time– Continuous majority: handling transience

19

Routing Instance IAA

Routing Instance IIBB CC

BB AA CC

AA CCRouting Instance III time

master

Page 20: Virtually Eliminating Router Bugs

Simple Voting Mechanisms • Tolerate transient periods of disagreement– Different replicas can have different outputs– … during routing-protocol convergence

• Several different voting mechanisms– Master-slave: speeding reaction time– Continuous majority: handling transience

20

Routing Instance IAA

Routing Instance IIBB CC

BB AA CC

AA CCRouting Instance III time

Continuous majorityAA

BB

AA

AA

BB CC

CC

CC

CC

Page 21: Virtually Eliminating Router Bugs

Simple Voting and Recovery

• Recovery– Hiding replica failure from neighboring routers– Hypervisor kills faulty instance, invokes new one

• Small, trusted software component– No parsing, treats data as opaque strings– Just 514 lines of code in voter implementation

21

Page 22: Virtually Eliminating Router Bugs

Outline

• Exploiting software and data diversity (SDD)– Effective in avoiding bugs– Enough hardware resources to support diversity

• Bug-tolerant router (BTR) architecture– Make replication transparent with low overhead– React quickly and handle transient inconsistency

• Prototype and evaluation– Small, trusted code base– Low processing overhead

22

Page 23: Virtually Eliminating Router Bugs

Prototype• Prototype implementation– No modification of routing software– Simple, trusted hypervisor – Built on Linux with XORP and Quagga

• Evaluation environment– Evaluated in 3GHz Intel Xeon– BGP trace from Route Views on March, 2007

• Evaluation metric– Voting delay and fault rate of different voting algo.– Delay of hypervisor

23

Page 24: Virtually Eliminating Router Bugs

Effectiveness of Voting• Setup– 3 XORP and 3 Quagga routing instances– Inject bugs of realistic frequency and duration

24

Voting algorithm

Avg voting delay (sec)

Fault rate

Single router - 0.066%

Master-slave 0.02 0.0006%

Continuous-majority

0.035 0.00001%

Page 25: Virtually Eliminating Router Bugs

Small Overhead

• Small increase on FIB pass through time– Time between receiving an update to FIB changes – Delay overhead of just hypervisor is 0.1% (0.06sec)– Delay overhead of 5 routing instances is 4.6%

• Little effect on network-wide convergence– ISP networks from Rocketfuel, and cliques– Found no significant change in convergence (beyond the

pass through time)

25

Page 26: Virtually Eliminating Router Bugs

Conclusion

• Seriousness of routing software bugs– Cause outages, misbehaviors, vulnerabilities– Violate protocol semantics, so not handled by

traditional failure detection and recovery

• Software and data diversity (SDD) – Effective, has reasonable overhead

• Design and prototype of bug-tolerant router– Works with Quagga and XORP software– Low overhead, and small trusted code base

26

Page 27: Virtually Eliminating Router Bugs

• More information at http://verb.cs.princeton.edu

• Thanks!

• Questions?

27