Virtualization of network environment and study of dynamic ...

Virtualization of network environment and study of dynamicrouting protocols

A thesis submitted in fulfilment of the requirements

for the degree of Engineering of Telecomunications

Universitat Politecnica de Catalunya, 2014

Author:Jaume TremolosaDirector:Jose L. Muñoz

Escola Tecnica Superior d’Enginyeria de Telecomunicacions de Barcelona

Universitat Politecnica de Catalunya

Spain

Contents

1 Preamble 31.1 Road Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Part I - Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Part II - Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Part III - Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

I Virtualization 7

2 Introduction to Virtualization 92.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Types of Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 What is UML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4 Building your UML Kernel and Filesystem . . . . . . . . . . . . . . . . . . . . . . . . 112.5 Starting an UML Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.6 Copy on Write Filesystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.7 Problems and Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.8 Networking with UML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Virtual Network UML (VNUML) 193.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Preliminaries: XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2.2 XML Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2.3 Escaping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2.4 Well-formed XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2.5 Valid XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3 VNUML Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3.1 VNUML DTD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3.2 Structure of a VNUML Specification . . . . . . . . . . . . . . . . . . . . . . . 223.3.3 Simple Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.4 VNUML language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.4.1 The Global Section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.4.2 The Section of Virtual Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 273.4.3 The Section of Virtual Machines . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.5 The VNUML processor command: vnumlparser.pl . . . . . . . . . . . . . . . . . 31

II Routing 33

4 Shortest Path Algorithms 354.1 Bellman-ford . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.1.1 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.1.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.1.3 Proof of Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.1.4 Distributed Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2 Dijkstra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.2.1 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.2.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.2.3 Proof of Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 RIP 415.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.2 Routing Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.3 Update Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.3.1 Static Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.3.2 Dynamic Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.3.3 Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.4 RIP Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.4.1 RIP version 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.4.2 RIP version 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.4.3 RIPng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.5 Limitations of RIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.6 RIP Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.7 Answers to practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6 Linux/Quagga 756.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756.2 Routing daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766.3 FIB and Host Forwarding Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766.4 Longest Match Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766.5 Administrative distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776.6 Quagga . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.6.1 vtysh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776.6.2 Static and Kernel Routes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.7 RIP Router . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786.7.1 Basic RIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786.7.2 Loopback Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796.7.3 Routes in RIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

7 OSPF 817.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7.1.1 Link State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817.1.2 Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817.1.3 ABRs and ASBRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827.1.4 Basic Quagga: Adding Links . . . . . . . . . . . . . . . . . . . . . . . . . . . 827.1.5 Router Identifier (router-id) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7.2 Broadcast Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837.2.1 Flooding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7.2.2 Designated Routers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837.2.3 DR/BDR Election . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7.3 States & Packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857.3.1 DOWN State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857.3.2 INIT State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867.3.3 2-WAY State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867.3.4 Hello Packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867.3.5 EXSTART State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877.3.6 DD Packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877.3.7 LOADING State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 897.3.8 LSR Packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 897.3.9 LSU Packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907.3.10 LSACK Packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 917.3.11 FULL State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

7.4 Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927.4.1 Set the Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927.4.2 Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7.5 Basic LSAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937.5.1 Router-LSA (type-1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 947.5.2 Network-LSA (type-2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 947.5.3 ABR Summary LSA (type-3) . . . . . . . . . . . . . . . . . . . . . . . . . . . 957.5.4 AS External LSA (type-5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 957.5.5 ASBR Location LSA (type-4) . . . . . . . . . . . . . . . . . . . . . . . . . . . 977.5.6 Link State IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987.5.7 Advertising Router . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987.5.8 LS sequence numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987.5.9 LS age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

7.6 Types of Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987.6.1 Stub Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987.6.2 Totally Stub Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997.6.3 Area Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

7.7 Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1007.8 Answers to practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

8 Conclusions 1078.1 Virtual Network Laboratory Environment as Learning Environment . . . . . . . . . . . 1078.2 Virtual Network Laboratory Environment as Working Environment . . . . . . . . . . . . 1088.3 RIP versus OSPF Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1088.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

III Appendices 111

A Simulation Tools 113A.1 A Wrapper for VNUML: simctl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113A.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113A.3 Profile for simctl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114A.4 Simple Example Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116A.5 Getting Started with simctl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118A.6 Start and Stop Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119A.7 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

A.8 Access to Virtual Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120A.9 Network Topology Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121A.10 Managing and Executing Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121A.11 Install Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122A.12 Drawbacks of Working with Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

B Ubuntu in a Pen-drive 125B.1 Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125B.2 Tunning the system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

C Introduction to Unix/Linux 131C.1 Introduction to OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131C.2 Resources Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

C.2.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131C.2.2 OS Rings and the Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132C.2.3 System Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133C.2.4 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

C.3 User Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134C.4 Implementations and Distros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135C.5 Switching Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135C.6 Installing Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

C.6.1 Static and Dynamic Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136C.6.2 Sofware Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137C.6.3 Advanced Package Management Systems . . . . . . . . . . . . . . . . . . . . . 138C.6.4 Installing from the Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

List of Figures

2.1 Virtualization: a physical host and several guests . . . . . . . . . . . . . . . . . . . . . 92.2 Types of Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Two UML Guests Connected to an uml_swith. . . . . . . . . . . . . . . . . . . . . . 152.4 Two UML Guests and the Phyhost Connected to an uml_swith. . . . . . . . . . . . . 152.5 Guests Connected with a Bridge in the Hypervisor. . . . . . . . . . . . . . . . . . . . . 172.6 Guests Connected with a Router (with or without NAT). . . . . . . . . . . . . . . . . . . 17

3.1 Simple Network Topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.1 Sample Topologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.2 Decentralized Bellman-Ford. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.3 Example of Distributed BF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.4 Example Topology to Compute Dijkstra. . . . . . . . . . . . . . . . . . . . . . . . . . . 394.5 Proof of Dijkstra. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.1 Basic RIP Update Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.2 Basic RIP Update Algorithm with a Broken Link . . . . . . . . . . . . . . . . . . . . . 455.3 The RIP Count To Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.4 The RIP Split Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.5 The RIP Split Horizon with Poisoned Reverse . . . . . . . . . . . . . . . . . . . . . . . 485.6 Poisoned Reverse Drawbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.7 Three Router Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.8 Triggered Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525.9 The RIP-1 message format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.10 RIP Version 2 (RIP-2) Message Format . . . . . . . . . . . . . . . . . . . . . . . . . . 555.11 RIPng Message Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.12 Scenario for testing RIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.1 Routing architecture with Linux and Quagga. . . . . . . . . . . . . . . . . . . . . . . . 75

7.1 OSPF Areas.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827.2 Unicast Flooding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847.3 Updates with DR/BDR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847.4 Hello Messages of R2 in Different Networks. . . . . . . . . . . . . . . . . . . . . . . . 877.5 DD Messages during the EXSTART State. . . . . . . . . . . . . . . . . . . . . . . . . . 877.6 Messages exchanged during the LOADING State. . . . . . . . . . . . . . . . . . . . . . 907.7 OSPF Final States in a Broadcast Segment.. . . . . . . . . . . . . . . . . . . . . . . . . 927.8 External Routes with One or More ASBRs. . . . . . . . . . . . . . . . . . . . . . . . . 967.9 Motivation for the Type 4 LSA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977.10 Stub and Totally Stub Areas Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1007.11 Basic network for configuring OSPF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

v

A.1 Simple Network Topology (Continued). . . . . . . . . . . . . . . . . . . . . . . . . . . 116

B.3 Advanced specification of disk partitions . . . . . . . . . . . . . . . . . . . . . . . . . 125B.4 Advanced specification of disk partitions 2 . . . . . . . . . . . . . . . . . . . . . . . . . 125B.1 First selection windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126B.5 Account Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126B.2 Time zone and Keyboard layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127B.6 Import documents and settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127B.7 Final installation step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128B.8 Selecting the partition of the Boot loader . . . . . . . . . . . . . . . . . . . . . . . . . . 128B.9 Firefox cache in /tmp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

C.1 Origins of Linux. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132C.2 OS Rings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132C.3 System Calls & Modular Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133C.4 Mainframe with Old Physical Terminals. . . . . . . . . . . . . . . . . . . . . . . . . . . 134C.5 Linux Terminals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135C.6 How sudo works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136C.7 Static and dynamic libraries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

List of Tables

4.1 Example of Bellman-Ford . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.2 Example of Dijkstra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.1 Example of administrative distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

C.1 dpkg and rpm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138C.2 apt and yum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

vii

Acknowledgements

The development of this Master’s thesis has been a joint undertaking in the Deparment of TelematicsEngineering. Some people have been instrumental in allowing this project to be completed. First and

especially we would like to thank my supervisor, Jose Luis Munoz, whose expertise, understanding, andpatience, has been a great help for accomplishing the aims of the thesis. I really appreciate his vast

knowledge and skills in many areas, and his assistance not only with the thesis but with managing myenrollent since I’m living now abroad.

I am deeply indebted to Juanjo Alins. Without his guidance, support in terms of vnuml simulator, I wouldnever have been able to develop this thesis successfully.

My last, but not least gratitude is for my family. First of all my parents, Jaume Tremolosa and EncarnacioRovira, it is difficult to find words to express my gratitude and thanks to both of you, specially my motherwho undoubtedly would have wanted to be here in this special moment. I would also like to thank my wife,

Ninoska Burgos and my daughter Abril Tremolosa for their support. I am grateful to their for beingpatient when I was more obsessed by my work than by real life...

I realize that not all people who contributed either directly or indirectly to my study are mentioned in thispage. From the deepest of my heart, I would like to thank all of you...

Chapter 1

Preamble

1.1 Road Map

Virtualization provides many benefits – greater efficiency in CPU utilization, greener IT with less powerconsumption, better management through central environment control, more availability, reduced projecttimelines by eliminating hardware procurement, improved disaster recovery capability, more centralcontrol of the desktop, and improved outsourcing services.

Nowadays, virtualization is a technology that is applied for sharing the capabilities of physicalcomputers by splitting the resources among OSs. The concept of Virtual Machines (VMs) startedback in 1964 with a IBM project called CP/CMS system. Currently, there are several virtualizationtechniques that can be used for supporting the execution of entire operating systems. We classify thevirtualization techniques from the OS view. First, we discuss two techniques that executes modified guestOSs: operating system-level virtualization and para-virtualization. Second, we discuss techniques thatexecutes unmodified guest OSs: binary translation and hardware assisted. Finally, we present our choiceUML a para-virtualization technique, which is going to help us in our goals to study the behaviour ofDynamic Routing Protocols.

The thesis is organized in three parts. The first part provides the background of Virtualizationrequired to understand the thesis. In the second part, we introduce the most known Routing ShortestPath Algorithms: Bellman-ford and Dijkstra. Those are to understand how RIP and OSPF Protocols areworking. Afterwards, we deeply study all the characteristics of both protocols thanks to the virtualizationtechniques. And finally, the third part provides an appendix with information regarding the simulationtool and how to install and prepare all the study environment. The last part can be read independentlyfrom the others since understanding the internals of the modeling tool is not required to understand thesimulation results.

1.2 Part I - Virtualization

Chapter 2 provides the background notions required to understand the thesis. We first give an overview ofthe Virtualization and explain the different types of virtualization from OS view. From all the techniques,we are going to use User Mode Linux (UML), which was created as a kernel development tool to be ableto boot a kernel in the user space of another kernel. Then, we explain how to build an UML Kernel andFilesystem and the problems of working with UML and the solutions that can be used. Finally, we spendsome time in how to build a virtual TCP/IP network with UML guests.

In Chapter 3, we discuss about VNUML (Virtual Network User Mode Linux) which is an open-sourcegeneral purpose virtualization tool designed to quickly define and test complex network simulationscenarios based on the User Mode Linux (UML) virtualization software. This environment is the onewe’re going to use study our Dynamic Routing Protocols later on. In short, the VNUML framework isa tool made of two components: VNUML language for describing simulations in XML and VNUML

3

Chapter 1. Preamble 4

interpreter for processing the VNUML language. Along the whole chapter, we explain in detail all thebasis of VNUML.

1.3 Part II - Routing

In the second part, we perform the main purpose of the thesis: the study of dynamic routing protocols.Chapter 4 explains the most known Shortest Path Algorithms : Bellman-Ford (BF) algorithm and

Dijkstra algorithm. These algorithms allow us to find the shortest path to go from a source (s) to anydestination in a certain topology. We describe in detail and provide examples of both of them.

In Chapter 5 we define the Routing Information Protocol (RIP) as a dynamic routing protocol thatcan be used in small/medium IP networks. RIP is based on a distance-vector exchange and a distributedversion of the Bellman-Ford algorithm. We discuss the information that RIP routing database includes andwe explain in detail the algorithm. Afterwards, we explain some enhancements to deal with the networkdynamics in a RIP domain. Next, the message format used by each of the three versions of RIP, as well ascertain specific features not common to all versions are described. We begin with the description of theoriginal RIP, also now known as RIP Version 1. Then, the updated version of RIP called RIP Version 2or RIP-2 and, finally, we discuss RIPng, the protocol for IP version 6 (IPv6) also called RIPv6. Finally,we propose a couple of practices to get acquainted with the benefits of RIP. We use Quagga environmentwhich will be described in next chapter.

Chapter 6 provides an overall overview of the routing process. The description tries to be rathergeneric, but in some aspects it is based on the Linux/Quagga implementation. We describe how Dynamicrouting protocols internally operate based on a Routing Information Base (RIB) and we explain thatForwarding Information Base (FIB) contains the so-called “active routes”. We also introduce the conceptof “Administrative distances” to solve if we have routes from different protocols that have the sameprefix and length. Afterwards, it’s explained how useful Quagga is to get used with RIP protocol: Basiccommands, default route, etc.

In Chapter 7 we present the Open Shortest Path First (OSPF) as a link state protocol. Each routerhas a map of the network and runs Dijkstra’s algorithm to choose the shortest path to each destination.OSPF router maintains three basic tables: Neighbor table,Topology table or Link State Database (LSDB)expressed as a set of Link State Advertisements (LSAs) and Routing table which is the RIB that containsOSPF routes.Then, we discuss how separating the network into smaller networks called areas helps us to decreaserouting overhead and speed up convergence in OSPF. Some important concepts in OSPF are introduced:Autonomous System Router Identifier (router-id), ABRs (Area Border Routers) and ASBR (AutonomousSystem Border Routers). Afterwards, we explain that OSPF uses a Designated Router (DR), a BackupDesignated Router (BDR) and multicast to reduce the flood of packets in the network. Then, we spendsome time to explain in detail all the OSPF neighbor States and discussing OSPF costs that unlike RIP,OSPF can use a cost different from the number of traversed hops.

In OSPF we have five type of packets: Hello packets, Database Description (DD) packets, Link StateRequest (LSR) packets, Link State Update (LSU) packets and Link State Acknowledgment (LSACK) packets.And there are five basic types of Link State Advertisement LSAs: Router-LSA (type-1), Network-LSA (type-2), ABR Summary-LSA (type-3), ASBR Location (type-4) and AS-external-LSA or ASBR-summary-LSA(type-5). We widely describe each of them.

Area types help to control the advertisement of routes into an area. There are several types and two ofthe most widely used are: stub areas and totally stub areas: Then, we answer the question: for what arethese types of areas useful? To finish this chapter, we propose a couple of practices to get acquaintedwith the benefits of OSPF.

Finally, chapter 8 explains the main conclusions and future work.

5 Chapter 1. Preamble

1.4 Part III - Appendix

In Appendix A with the aim of simplifying and extending the management capabilities of VNUML, theDepartment of Telematics Engineering (ENTEL) of the UPC has developed several modifications overthe vnumlparser.pl of VNUML 1.8, a wrapper written for Bash called simctl and some otherscripts. The modifications over the vnumlparser.pl are essentially for (i) allowing a virtual guestmachine to have several consoles connected to several pts in the phyhost (mpts functionality) and for (ii)allowing the implementation of virtual networks with other virtual switches like VDE (Virtual DistributedEthernet).

Then, we show you how to install the tools to build VNUML Virtual Networks and use simctlwrapper.

Appendix B provides an overview of how to install Ubuntu in an USB pen-drive. Once the system hasbeen installed, we configure properly the system in order to extend as much as possible the life of theUSB device.

In Appendix C we present some basic background about the Linux Operating System. In short, anOperating System (OS) is a set of software whose purpose is to (i) manage the resources of a computersystem while (ii) providing an interface for the interaction with human beings. This is useful to be able tounderstand the basis and how to install the tools we’re going to use in the thesis.

Chapter 1. Preamble 6

Part I

Virtualization

7

Chapter 2

Introduction to Virtualization

2.1 Introduction

Virtualization is a methodology for dividing the resources of a physical computer into multiple operatingsystem (OS) environments. Virtualization techniques create multiple isolated guests also called VirtualMachines (VM) [1] or Virtual Environments (VEs) on a single physical server. Virtualized enviromentshave three basic elements (see Figure 2.1):

• Hypervisor or Physical Host (Phyhost). This is the hardware, the operating system and any othersoftware needed to run the virtual machines.

• Guests. These are the virtual systems running over the phyhost. A guest might be a traditionalOS running just like if it was on a real host. To do so, the host emulates all the system calls forhardware. This makes the guests feel like if they were in a real computer.

• Virtual switches. The virtual network is composed of a virtual switches that connect the guestslike in a real network. As an additional feature, the phyhost can provide connectivity for its guests,allowing them to exchange traffic with real networks like Internet.

eth0

hypervisor

phyhostOS

Real Network(e.g. Internet)

eth0

Virtual Network

tap0

guest2

eth0

guest1

VirtualSwitch

Figure 2.1: Virtualization: a physical host and several guests

The guests have to be accesible by some method: via a CLI, a GUI or by some network service (SSH,etc.). On the other hand, the phyhost can have a virtual network interface (TUN/TAP interface) that canbe connected to the virtual switch.

9

Chapter 2. Introduction to Virtualization 10

2.2 Types of Virtualization

There are several kinds of virtualization techniques which provide similar features but differ in the degreeof abstraction and the methods used [2].

• Hardware Emulation or Virtual machines (VMs). This approach allows an hypervisor (phyhost)to run an arbitrary guest operating system. The guest OS is not modified and it is not aware that itis not running on real hardware. The main issue with this approach is that some OS instructionsrequire to be in supervisor mode and this causes problems since the guest OS is being executedin the user space of the hypervisor. As a result, we need a virtual machine monitor (VMM) inthe hypervisor to analyze executed code and to make it safe on-the-fly. This VMM is part of the“virtualization middleware”. Hardware emulation approach is used by VirtualBox, QEMU, Parallelsand Microsoft Virtual Server.

• Paravirtualization. In this virtualization approach most of the work of the VMM is implementedin the guest OS code, which is modified to avoid the use of privileged instructions. The paravirtu-alization technique also enables running different OSs on a single server, but requires them to beported, i.e. guest kernels must “know” that they are running in a user space of an hypervisor. Theparavirtualization approach is used by projects such as Wine and User Mode Linux (UML).

• Virtualization on the OS level, a.k.a. containers virtualization. Most applications running ona server can easily share a machine with others, if they could be isolated and secured. Further, inmost situations, different operating systems are not required on the same server, merely multipleinstances of a single operating system. OS-level virtualization systems have been designed toprovide the required isolation and security to run multiple applications or copies of the same OS(but different distributions of the OS) on the same server. OpenVZ, Virtuozzo, Linux-VServer,Solaris Zones and FreeBSD Jails are examples of OS-level virtualization.

Some technologies support several virtualization tecniques like VMware and Xen, which supporthardware Emulation and paravirtualization. The three techniques differ in complexity of implementation,OS support, level of access to common resources and performance in comparison with an standaloneserver. For example, hardware emulation has a wider scope of usage (many OS), but poorer performance.Paravirtualization has better performance, but can support fewer OSs because these OS have to be modified.Containers virtualization also provides good performance and scalability compared to hardware emulation.Figure 2.2 shows a picture of the different virtualization types.

Hardware

Hypervisor Kernel

Regular Guest Kernel

Virtualization Middleware

Hardware

Hypervisor Kernel

Paravirtual Kernel (e.g. UML)

Hardware

Hypervisor Kernel

Virtualcontainer

Virtualcontainer

.

.

.

Virtualcontainer

.

.

.

Figure 2.2: Types of Virtualization

11 Chapter 2. Introduction to Virtualization

2.3 What is UML

From all the virtualization possibilities, we are going to use User Mode Linux (UML) [3]. UML wascreated as a kernel development tool to be able to boot a kernel in the user space of another kernel. So if adeveloper messes with the code and the kernel is unstable, it is not necessary to reboot the phyhost, justkill the kernel process.

As you can observe in Figure 2.2, UML is a type of paravirtualization. In particular, UML is designedto be run over another Linux kernel. So UML does not require an intermediate virtualization layer orVMM in the phyhost. Notice that paravirtualization is less complex than hardware emulation but lessflexible too: the guest has to be an UML Kernel and phyhost must be a Linux Kernel (but a conventionalkernel is enough).

2.4 Building your UML Kernel and Filesystem

To run an UML machine we need a compiled UML kernel and a Linux filesystem. The steps to buildthese elements is shown below.

UML Kernel

We will be working in the directory ~/uml. To create it:

phyhost:~$ mkdir ~/umlphyhost:~$ cd ~/umlphyhost:~/uml$

Download1 and copy your Kernel source code to ~/uml, then untar it and change into the newdirectory:

phyhost:~/uml$ tar -jxvf linux-XXX.tar.bz2phyhost:~/uml$ cd linux-XXXphyhost:~/uml/linux-XXX$

In this case, the version of the UML kernel that we are going to compile is XXX. Compiling a UMLKernel uses the same procedure as to compiling a standard Kernel, with the exception that every line youtype in the process must have the option ‘ARCH=um’ appended. To compile you need also the packagebuild-essential installed in the phyhost. Type the following commands:

phyhost:~/uml/linux-XXX$ sudo apt-get install build-essentialphyhost:~/uml/linux-XXX$ make mrproper ARCH=umphyhost:~/uml/linux-XXX$ make defconfig ARCH=umphyhost:~/uml/linux-XXX$ make ARCH=um

When this completes, you will be have an executable file ’linux’ in the ~/uml/linux-XXX/directory. This is the UML Kernel, adapted to run in user-space. This kernel is quite big, that is becausewe have not stripped the debug symbols from it. They may be useful in some cases, but for now we reallydo not need them so lets remove this debugging info:

phyhost:~/uml/linux-XXX$ strip linux

The UML Kernel that we have compiled contains the default settings and it is prepared to use modules.

1You can download Linux kernels from http://www.kernel.org.


File System for UML

We will show how to create a basic root filesystem that can be used by the UML kernel. With these twoelements you will have a fully functional Linux machine, running inside your system (phyhost), but fullyisolated and independent. To create the new virtual filesystem, we use the debootstrap command. Toinstall debootstrap, type:

phyhost# apt-get install debootstrap

Then, we create a 2 GB file to hold the new root filesystem. Create the empty filesystem and format asext4:

phyhost$ cd ~/umlphyhost:~/uml$ dd bs=1M if=/dev/zero of=debian7.fs count=2048phyhost:~/uml$ mkfs.ext4 debian7.fs -F

Now, create a mount point, and mount this new file so we can begin to fill it up:

phyhost:~/uml$ mkdir imagephyhost:~/uml$ sudo mount -o loop debian7.fs image/

Use debootstrap to populate this directory with a basic Debian Wheezy:frame

p h y h o s t : ~ / uml$ sudo d e b o o t s t r a p −−a r c h i386 wheezy image / f t p : / / f t p . de . d e b i a n . o rg / d e b i a n /

The previous command contacts the Debian archive servers and downloads the required packages toget a minimal Debian Wheezy (Debian 7) system installed. Notice I also asked to install vim because itis my preferred command line text editor. Once it completes, if you list the directory image/ you willsee a familiar Linux root system.

Kernel Modules

Before running the UML machine we have to install the modules of the kernel into the filesystem. Withthe image still mounted, type the following commands:

phyhost$ cd ~/uml/linux-XXX/phyhost:~/uml/linux-XXX# make modules_install INSTALL_MOD_PATH=../image ARCH=um

The previous commands are equivalent to make a copy of the kernel modules in the directory/lib/modules of the filesystem for the UML machine.

fstab

We must edit the file etc/fstab in the image/ directory to mount the root filesystem when the systemboots. Open this file with your preferred text editor (e.g. gedit) and change the contents of the file to thefollowing ones:

1 / dev / ubda / e x t 4 d e f a u l t s 0 12 p roc / p roc p roc d e f a u l t s 0 0

Password for root

Finally, we must set the password for the root user. For this purpose, in a terminal we have to change theroot filesystem to the one under the directory image/. This is acomplished with the chroot command:

phyhost:~/uml# chroot imagephyhost# passwd<type your new UML root password here><repeat it>


Then to finish the configuration, we exit chroot and umount the mount point image/:

phyhost# exitphyhost:~/uml# umount image

Update and Install Soft

To update the system or install a package XXX use the following commands:

phyhost# mount -o loop debian7.fs image/phyhost# cp /etc/resolv.conf image/etc/resolv.confphyhost# mount -t proc none image/procphyhost# chroot image

To install software in the system:

phyhost# apt-get updatephyhost# apt-get install XXX #install package

To finish:

phyhost# exitphyhost# umount image/procphyhost# fuser -k imagephyhost# umount image

2.5 Starting an UML Machine

Let us show you how UML works. Firstly, you have to install the package uml-utilities, which can beinstalled in the phyhost system with the following command:

phyhost$ sudo apt-get install uml-utilities

Then, let us assume that you have compiled the Linux UML Kernel and that you have created afilesystem (as explained in Section 2.4). It is not absolutely necessary, but let us make a copy of the UMLkernel called uml-linux in the directory uml/:

phyhost:~/uml$ cp linux-XXX/linux uml-linux

Then, you can start the UML machine executing the following command:

phyhost:~/uml$ ./uml-linux ubda=debian7.fs mem=128M

Note. If the previous command does not work go to Section 2.7.The previous command executes the UML kernel in the user space of the phyhost using the filesystem

debian7.fs. Notice that we have also specified the size of RAM memory that is going to be used(128 Megabytes). This is a minimal configuration, but UML Kernels support a large number of parameters.

2.6 Copy on Write Filesystems

Now, let us examine how to boot two virtual guests at the same time. We can try to open two terminalsand execute the previous command twice but obviously, if kernels try to operate over the same filesystem,we are in trouble because we will have the filesystem in an unpredictable state. A naive solution could beto make a copy of the filesystem in another file and start a couple of UML kernel processes each using adifferent filesystem file. A better solution is to use the UML technology called COW (Copy-On-Write).COW allows changes to a filesystem to be stored in a file in the phyhost separate from the filesystem itself.This has two advantages:


• We can start two UML kernels from the same filesystem.• Undoing changes to a filesystem is simply a matter of deleting the COW file (the file that contains

the changes).Now, let’s fire up our UML kernels with COW. This is achieved basically using the same command

line as before, with a couple of changes:

phyhost:~uml$ ./uml-linux ubda=cowfile1,debian7.fs mem=128M

If the COW file ”cowfile1“ does not exist, the previous command will create and initialize it. Oncethe COW file has been initialized, it can be used alone in the command line:

phyhost:~uml$ ./uml-linux ubda=cowfile1 mem=128M

The name of the backing file (”debian7.fs”) is stored in the COW file header, so it would be redundantto continue specifying it on the command line. The normal way to create a COW file is to specify anon-existent COW file on the UML command line, and let UML create it for you. However, if you wantto create a new COW file without booting the UML machine, you can use the uml_mkcow command.This command comes with the uml-utilities package. To build a cow file, you can type:

phyhost:~uml$ uml_mkcow cowfile1 debian7.fs

Finally, in another terminal we can fire up our second UML kernel with another COW file (cowfile2):

phyhost:~uml$ ./uml-linux ubda=cowfile2,debian7.fs mem=128M umid=uml1

When you have finished, simply type ‘halt’ to stop. You can even ‘reboot’ and pretty muchanything else without affecting the phyhost system in any way.

2.7 Problems and Solutions

To start an UML guest you must be sure that the UML kernel has permission to be executed and that thefilesystem has permission to be written. To be sure that these permissions are granted type:

phyhost$ chmod u+x uml-linuxphyhost$ chmod u+w debian7.fs

If something goes wrong while the UML guest is booting, the Kernel process might go into a badstate. In this case, the best way to “clean” the system is to kill all the processes generated while booting.In our case, as the UML Kernel is called uml-linux, to kill all these processes, we can type the following:

phyhost$ killall uml-linux

In addition, it might be also necessary to remove the cow files and all the uml related files:

phyhost$ rm cowfile?phyhost$ rm ~/.uml

Finally, unless otherwise stated, all the UML programs have to be launched with your unprivilegeduser (not with the root user).

2.8 Networking with UML

Virtual Switch

In this section, we explain how to build a virtual TCP/IP network with UML guests. To build thevirtual network we will use the uml_switch application (which is in the uml-utilities package). Anuml_switch can be defined as a virtual switch or a software switch for connection guests. UML


instances use internally Ethernet interfaces which are connected to the uml_switch. This connectionuses a Unix domain socket2 on the phyhost (see Figure 2.3). In three different terminals (t1,t2 and t3)type the following commands to start two UML machines with COW connected to an uml_switch:

/tmp/uml.ctl

uml_switch

eth0 eth0UML1 Guest

UML2 Guest

Figure 2.3: Two UML Guests Connected to an uml_swith.

t1-phyhost$ uml_switchuml_switch attached to unix socket '/tmp/uml.ctl'

t2-phyhost:~uml$ ./linux-uml ubda=cowfile1 mem=128M eth0=daemon

t3-phyhost:~uml$ ./linux-uml ubda=cowfile2 mem=128M eth0=daemon

Once you have the two UML guests running, you can enter the usr/passwd (root/xxxx) and configureyour IP address and subnet mask using ifconfig. For example:

UML1$ ifconfig eth0 192.168.0.1 netmask 255.255.255.0

UML2$ ifconfig eth0 192.168.0.2 netmask 255.255.255.0

Then, you can try a ping from one UML guest to the other one:

UML1$ ping 192.168.0.2

Connecting Phyhost

Now, our goal is to enable network communications between the phyhost and the UML guests (see Figure2.4).

/tmp/uml.ctl

uml_switch

eth0 eth0UML1 Guest

UML2 Guest

hostOS

tap0

Figure 2.4: Two UML Guests and the Phyhost Connected to an uml_swith.

2Simplifying, a Unix socket is like connecting to a file.


For this purpose, we need to create a virtual Ethernet network interface in the phyhost and then,connect this virtual interface to the uml_swith. The command to create the special network is tunctl,which is included in the uml-utilities package. This command has to be executed as root (or with sudo)and you have to indicate which user is going to be able to read/write over this virtual interface.

phyhost# tunctl -u telematics -t tap0Set 'tap0' persistent and owned by uid 1000

The previous command-line creates a special Ethernet interface called tap0 and enables the usertelematics to read/write on tap0. Obviously, you must replace telematics with your unprivilegedusername. Then, we can simply run the uml_swith (with your unprivileged user) connecting the virtualswitch to the special interface tap0 we have just created. For this, type:

phyhost$ uml_switch -tap tap0uml_switch attached to unix socket '/tmp/uml.ctl' tap device 'tap0'New connection

Now, you can give an IP address and a mask to the tap0 interface, start the UML guests and try aping from the phyhost to an UML guest:

phyhost# ifconfig tap0 192.168.0.3 netmask 255.255.255.0phyhost# ping 192.168.0.1

If everything is configured correctly, the ping will succeed.

Multiple Switches

If you want multiple switches on the host, then the UNIX domain sockets have to be different for eachswitch. A different socket can be specified with:

phyhost$ uml_switch -unix /tmp/uml2.ctl

In order to attach to this switch, the same socket must be provided to the UML network driver:

phyhost:~uml$ ./linux-uml ubda=cowfile2 mem=128M eth0=daemon,,unix,/tmp/uml2.ctl

Interfaces tun/tap

Tun/tap interfaces are software-only interfaces, that is, they exist only in the kernel and, unlike regularnetwork interfaces, they have no physical hardware component. Thus, there is no physical “wire”connected to them. You can think of a tun/tap interface as a regular network interface that, when the kerneldecides that the moment has come to send data “on the wire”, instead sends data to an user-space programthat is attached to the interface. When the program attaches to the tun/tap interface, it gets a special filedescriptor, reading from which it will read the data that the interface is sending out. In a similar fashion,the program can write to this special descriptor, and the data will appear as input to the tun/tap interface.To the kernel, it would look like the tun/tap interface is receiving data “from the wire”.

The difference between a tap interface and a tun interface is that:

• A tap interface outputs and must be given full Ethernet frames.

• A tun interface outputs and must be given “raw” IP packets.

Once a tun/tap interface is in place, it can be used just like any other interface. IP addresses can beassigned, its traffic can be analyzed, firewall rules can be created, routes pointing to it can be established,etc.


More Networking with Guests

A way of enabling external communications for guests is to use a bridge in the hypervisor (see Figure 2.5).With this configuration Guests can send their L2 (Ethernet) frames outside the virtual network. The frameoriginated by a guest is switched by the virtual switch, enters the tap interface of the phyhost and finally isswitched to a physical interface. Through this physical interface (eth0 in our example) the frame can leavethe hypervisor and reach a physical network. In this case, a tap interface must be used because we need aninterface that understands frames. The main advantage of this configuration is that guests can use anyprotocol that can be encapsulated over an Ethernet frame (which is practically any existing protocol).

eth0

hypervisor

phyhostOS

Physical Networketh0

Virtual Network

tap0

guest2

eth0

guest1

VirtualSwitch

OS Bridge(brctl)

Bridged Guests

Figure 2.5: Guests Connected with a Bridge in the Hypervisor.

Another possibility is to make the hypervisor to behave as router (see Figure 2.6). In this case, thehypervisor (router) forwards the IP datagrams originated by the guests. This is done re-encapsulating theIP packets in new frames that are sent through the output physical interface (Ethernet frames througheth0 in our example). The router can also do NAT over the IP packet sent by the guest to allow theaccess to the public network (Internet).

eth0

hypervisor

phyhostOS

eth0

Virtual Network

tap0

guest2

eth0

guest1

VirtualSwitch

router

routing + NAT

Physical Network

routing only

Figure 2.6: Guests Connected with a Router (with or without NAT).

Chapter 3

Virtual Network UML (VNUML)

3.1 Introduction

Implementing large UML networks is hard to do and prone to errors. For this reason, it is very helpful tohave some systematic way of defining these networks. To this respect, a research group of the universityUPM (Universidad Politecnica de Madrid) of Spain has developed a virtualization tool that allows you toeasily define and run virtual networks using UML [4]. The related project is called VNUML1 (VirtualNetwork User Mode Linux). From the authors, the purpose of VNUML is:

«VNUML (Virtual Network User Mode Linux) is an open-source general purpose virtualizationtool designed to quickly define and test complex network simulation scenarios based on the UserMode Linux (UML) virtualization software. VNUML is a useful tool that can be used to simulategeneral Linux based network scenarios. It is aimed to help in testing network applications andservices over complex testbeds made of several nodes (even tenths) and networks inside oneLinux machine, without involving the investment and management complexity needed to createthem using real equipment.»

In short, the VNUML framework is a tool made of two components:• A VNUML language for describing simulations in XML (Extensible Markup Language).• A VNUML interpreter for processing the VNUML language.Using the VNUML language the user can write a simple text file describing the elements of the

VNUML scenario such as virtual machines, virtual switches and the inter-connection topology. Then,the user can use the VNUML interpreter called vnumlparser.pl to read the VNUML file and torun/manage the virtual network scenario. This scheme provides also a way of hiding all UML complexdetails to the user. In the following sections, we provide a description about VNUML and how can we useit.

3.2 Preliminaries: XML

3.2.1 Introduction

The essential component of VNUML is its specification language. As the VNUML language is based onXML, in this section we provide a brief overview of this technology. XML (eXtensible Markup Language)defines a set of rules for encoding documents in a readable form. An XML document is a “text” file, i.e astring of characters coded with UTF8 or with an ISO standard like ISO-8859-1 (Latin1). The characterswhich make up an XML document are divided into markup and content. All strings which constitutemarkup either begin with the character "<" and end with a ">", or begin with the character "&" and end

1There are other projects like NetKit from the University of Rome. A new project called VNX has been also proposed as anevolution of VNUML but we will still use VNUML.

19

Chapter 3. Virtual Network UML (VNUML) 20

with a ";". Strings which are not markup are content. In particular, a tag is a markup construct that beginswith "<" and ends with ">". Tags come in three flavors:

• start-tags, for example <section>• end-tags, for example </section>• empty-element tags, for example <line-break />Another special component in a XML file is the element. An element is a logical document component

that either begins with a start-tag and ends with a matching end-tag or consists only of an empty-elementtag. The characters between the start-tag and the end-tag, if any, are the element’s content. The elementcontent may also contain markup, including other elements, which are called child elements. An exampleof an element is <Greeting>Hello, world.</Greeting>. A more elaborated example is thefollowing:

1 < person >2 < n i f >46117234 </ n i f >3 <name>4 < f i r s t > P e t e r < / f i r s t >5 < l a s t > S c o t t < / l a s t >6 </ name>7 </ person >

Finally, the attribute of an element is a markup construct consisting of a name="value" pair that existswithin a start-tag or empty-element tag. For example, the above person record can be modified usingattributes to add the age and the gender of the person definition:

1 < p e r s o n age ="17" ge nd e r =" male ">2 < n i f >46117234 </ n i f >3 <name>4 < f i r s t > P e t e r < / f i r s t >5 < l a s t > S c o t t < / l a s t >6 </ name>7 </ person >

3.2.2 XML Comments

You can use comments to leave a note or to temporarily edit out a portion of XML code. Although XMLis supposed to be self-describing data, you may still come across some instances where an XML commentmight be necessary. XML comments have the exact same syntax as HTML comments: they start with"". Below is an example of a notation comment that should be used when youneed to leave a note to yourself or to someone who may be viewing your XML.

1 < p e r s o n age ="17" ge nd e r =" male ">2 <!−− P e t e r i s a r e a l l y n i c e p e r s o n −−>3 < n i f >46117234 </ n i f >4 <name>5 < f i r s t > P e t e r < / f i r s t >6 < l a s t > S c o t t < / l a s t >7 </ name>8 </ person >

3.2.3 Escaping

XML uses several characters in special ways as part of its markup, in particular the less-than symbol (<),the greater-than symbol (>), the double quotation mark ("), the apostrophe (’), and the ampersand (&). Butwhat if you need to use these characters in your content, and you don’t want them to be treated as part ofthe markup by XML processors? For this purpose, XML provides escape facilities for including characterswhich are problematic to include directly. These escape facilities to reference problematic characters or

21 Chapter 3. Virtual Network UML (VNUML)

“entities” are implemented with the ampersand (&) and semicolon (;). There are five predefined entities inXML:

• & refers to an ampersand (&)• < refers to a less-than symbol (<)• > refers to a greater-than symbol (>)• ' refers to an apostrophe symbol (’)• " refers to an quotation symbol (")For example, suppose that our XML file should contain the following text line:

1 <commnand> echo "1" >/ p roc / s y s / n e t / i pv4 / i p _ f o r w a r d </commnand>

The previous line is not correct in XML. To avoid our XML parser being confused with the greater-thancharacter, we have to use:

1 <commnand> echo "1" &g t ; / p roc / s y s / n e t / i pv4 / i p _ f o r w a r d </commnand>

In the same way, the quotation mark (") might be problematic if you need to use it inside an attribute.In this case, you have to scape this symbol. Notice however, that escaping the quotation mark is notnecessary in our previous example, since the quotation mark appears inside the content of the element(and not in the value of an attribute).

3.2.4 Well-formed XML

A “well-formed” XML document is a text document that satisfies the list of syntax rules provided in theXML specification. The list of syntax rules is fairly lengthy but some key rules are the following:

• The document contains only properly encoded legal Unicode characters.• None of the special syntax characters such as "<" and "&" appear except when performing their

markup-delineation roles.• The begin, end, and empty-element tags that delimit the elements are correctly nested, with none

missing and none overlapping.• The element tags are case-sensitive; the beginning and end tags must match exactly.• Tag names cannot contain any of the characters !"#$%&’()*+, /;<=>?@[] \^‘{|}~ nor a space

character, and cannot start with - (dash), . (point), or a numeric digit.• There must be a single "root" element that contains all the other elements.

3.2.5 Valid XML

In addition to being well-formed, an XML document has to be “valid“. This means that all the elementsand attributes used in the XML document must be in the set defined in the language specification and mustbe used correctly. For example, if we define a language specification for person registry, we can define theelements: person, nif, name, first, last. We can also define the person attributes: age and gender and thetype of values for each of the attributes (e.g. age attribute is an integer number and gender attribute has avalue inside the set {male, female}). We might also define the order in which elements can appear and thenesting rules.

For addressing all these issues, XML defines a especial file called ”Document Type Definition” (DTD)file. A DTD file defines an XML specification language, including all the elements, attributes andgrammatical rules. Finally, the DTD file is used by XML processors to check if an XML document is”valid”.


3.3 VNUML Overview

3.3.1 VNUML DTD

The VNUML language 2 defines a set of elements, its corresponding attributes and the way these elementshave to be used inside an XML document to be a "valid". In Code 3.1 we show the beginning of the DTDfile of the VNUML language.

<!−− VNUML DTD version 1.8 −−><!ELEMENT vnuml (global,net*,vm*,host?)><!ELEMENT global (version,simulation_name,ssh_version?,ssh_key*,automac?,netconfig?,vm_mgmt?,

tun_device?,vm_defaults?)><!ELEMENT vm_defaults (filesystem?,mem?,kernel?,shell?,basedir?,

mng_if?,console*,xterm?,route*,forwarding?,user*,filetree*)>...

Code 3.1: Beginning of the VNUML DTD file.

In the previous DTD file we can see several quantifiers. A quantifier in a DTD file is a single characterthat immediately follows the specified item to which it applies, to restrict the number of successiveoccurrences of these items at the specified position in the content of the element. The quantifier may beeither:

• + for specifying that there must be one or more occurrences of the item. The effective content ofeach occurrence may be different.

• * for specifying that any number (zero or more) of occurrences are allowed. The item is optionaland the effective content of each occurrence may be different.

• ? for specifying that there must not be more than one occurrence. The item is optional.• If there is no quantifier, the specified item must occur exactly one time at the specified position in

the content of the element.

3.3.2 Structure of a VNUML Specification

<?xml version="1.0" encoding="UTF−8"?><!DOCTYPE vnuml SYSTEM "/usr/share/xml/vnuml/vnuml.dtd"><vnuml>

<!−− Global definitions−−><global>

.......</global><!−−Network definitions −−><net name="Net0" .... />...<!−− Virtual machines definition −−><vm name="uml1">

...</vm>....

</vnuml>

Code 3.2: Structure of a VNUML file.

According to the VNUML DTD file, a VNUML specification document has the structure shown in Code3.2. The first two lines of the VNUML file, are mandatory for any XML document. The first line is used

2For a extensive description of VNUML language see http://neweb.dit.upm.es/vnumlwiki/index.php/Reference


to check the XML version and the text encoding scheme used. The second line tells the processor whereto find the corresponding DTD file.

Following the first two lines, the main body of the virtual network definition is inside the element"vnuml". Inside the <vnuml> tag, we find, in first place, the global definitions section which is markedwith the <global> tag. This tag groups all global definitions for the simulation that do not fit inside other,more specific tags. Following the global definitions section, we can find zero or more definitions ofvirtual networks. These definitions use the tag <net>. Each network created with <net> is point ofinterconnection of virtual machines. This point of interconnection is implemented with a virtual switchlike uml_switch. The third part of an VNUML specification is devoted to the definition of virtualmachines. In this part, we can populate zero or more definitions of UML virtual machines using the <vm>tag. The last part of an VNUML specification is devoted to define actions to be performed in the hostusing the <host> tag. This part is optional and we are not going to use it.

3.3.3 Simple Example

Before going deeper into the details and possibilities of the VNUML language, we show a simple andauto-explicative example to illustrate the VNUML working cycle. The design phase for this example isdescribed by the topology in Figure 3.1. As shown, the topology is composed of two networks: Net0 andNet1, built with two soft switches: switch0 and switch1, and three virtual machines: uml1, uml2 anduml3, where the machine uml2 is connected to both networks.

uml3uml1eth1

uml2

eth1

eth2eth1Net0switch0

Net1switch1

Figure 3.1: Simple Network Topology.

Now, we have to write the VNUML file describing this topology. As mentioned, this is an XML file,whose syntax and semantic is described by the VNUML Language. The Code 3.3 shows a VNUML filethat describes the design of Figure 3.1.



<!−− Global definitions −−><global>

<version>1.8</version><simulation_name>simple_example</simulation_name><automac/><vm_mgmt type="none" /><vm_defaults exec_mode="mconsole">

<filesystem type="cow">/usr/share/vnuml/filesystems/root_fs_tutorial</filesystem><kernel>/usr/share/vnuml/kernels/linux</kernel><console id="0">xterm</console>

</vm_defaults></global><!−−Network definitions −−><net name="Net0" mode="uml_switch" /><net name="Net1" mode="uml_switch" /><!−− Virtual machines definition −−><vm name="uml1">

<if id="1" net="Net0"></if></vm><vm name="uml2">

<if id="1" net="Net0"></if><if id="2" net="Net1"></if>

</vm><vm name="uml3">

<if id="1" net="Net1"></if></vm>

</vnuml>

Code 3.3: VNUML File for a Simple Example

The Code 3.3 is rather auto-explicative but we are going to discuss it a little bit. The global sectiondefines that the version of the VNUML language is 1.8. The simulation name is “simple_example”. TheMAC addresses of virtual machines are auto-generated. The vm_mgmt tag with the attribute type="none"means that the virtual machines are not accessed via a management network shared with the host. Virtualmachines are accessed via a console (attribute exec_mode="mconsole"). This console is the tty0 ofthe guest and the xterm terminal is used in the host to connect to tty0 in the guest. Virtual machinesuse COW and all of them use the same root filesystem and kernel. In the virtual networks section, wedescribe two networks in which two uml_switch are used to connect the machines of these networks.Finally, in the virtual machines section, we describe three virtual machines with names uml1, uml2and uml3. The machine uml1 has an Ethernet NIC called eth1 which is “connected” to network Net0,uml3 has an Ethernet NIC called eth1 connected to network Net1, and uml2 has two Ethernet NICscalled eth1 and eth2, which are connected to networks Net0 and Net1 respectively. Now, if we savethe previous VNUML description in a file called simple_example.vnuml, we can run the scenario using"-t" option of vnumlparser.pl:

phyhost$ vnumlparser.pl -t simple_example.xml

This command builds the virtual network topology described in simple_example.vnuml and boots allthe three virtual machines defined. Once you have finished playing around with the simulation scenario,you can release it using the "-d" option of vnumlparser.pl:

phyhost$ vnumlparser.pl -d simple_example.xml


3.4 VNUML language

In this section, we provide a general description of the VNUML language. For this purpose, we describethe main tags that can be found in each of the three sections of a VNUML specification: the global section,the virtual networks section and the virtual machines section. We will find three kinds of tags in VNUML:structural tags, with few or no semantics; topology tags, used to describe the topology; and simulationtags, used to describe simulation parameters and commands. The following sections describe these tags.

3.4.1 The Global Section

A non-intensive list of tags that may appear inside a <global> tag is the following:

• <version>. This tag is required and must be unique.Specifies which VNUML language version is being used in the VNUML file. The stable currentversion is 1.8 and it this the one that we will use.

frame< v e r s i o n >1 .8 < / v e r s i o n >

• <simulation\_name>. Required and unique.Specifies simulation name. Each simulation must have a different name.

frame< s imu la t i on_name > s imple_example < / s imu la t i on_name >

• <automac />. Optional and unique.Empty tag. When used, MAC address for the virtual machines interfaces are generated automatically.

frame<automac / >

• <vm\_mgmt>. Optional and unique.This tag defines aspects related to a management network interface in the guest to build a manage-ment network shared with the host. This defines a way of interacting with the guest from the host.If you do not desire this management network you can use the attribute type="none".

frame<vm_mgmt t y p e =" none " / >

• <vm\_defaults>. Optional and unique.This tag specifies the values by default that are used by virtual machines. This tag has an attributecalled exec_mode which is used to indicate the execution mode. One of the “execution modes”is "mconsole", which allows executing commands in the virtual machines from the host withoutrequiring a management network. With the "mconsole" mode, we can specify commands in theVNUML file to be executed when desired (view later the <exec> tag). Note. The UML kernelhas to be compiled with support for mconsole to use this functionality.

frame< v m _ d e f a u l t s exec_mode =" mconsole ">

In addition, the tags allowed inside this element are the following:


– <filesystem>. Optional and unique.It is used to define the file that contains the filesystem of the virtual machine. It acceptsan attribute named type. When type attribute is set to "COW" (type="cow"), then the<filesystem> value is a master filesystem, that will be used in a copy-on-write fash-ion (COW).The way to share a filesystem between two virtual machines is to use the copy-on-write (COW)layering capability of the ubd block driver. This block driver supports layering a read-writeprivate device over a read-only shared device. A machine’s writes are stored in the privatedevice, while reads come from either device. Using this scheme, the majority of data whichis unchanged is shared between an arbitrary number of virtual machines, each of which hasa much smaller file containing the changes that it has made. With a large number of UMLsbooting from a large root filesystem, this leads to a huge disk space saving.COW mechanism saves a lot of storage space, so COW mode is recommended to boot UMLs.Example:

frame< f i l e s y s t e m t y p e ="cow " >/ u s r / s h a r e / vnuml / f i l e s y s t e m s / r o o t _ f s _ t u t o r i a l < / f i l e s y s t e m >

– <mem>. Optional and unique.Specifies the amount of RAM memory used in the virtual machine. Suffixes can be used for(k|K)ilobytes and (m|M)egabytes. The default value is 64M. We can change the default:

frame<mem>128M</mem>

– <kernel>. Optional and unique.Specifies the UML kernel file absolute path name to boot the virtual machine. Note that thefile must be executable. Example:

frame< k e r n e l > / u s r / s h a r e / vnuml / k e r n e l s / l i n u x </ k e r n e l >

– <shell>. Optional and unique.Path to the shell. The default value is /bin/bash. We can change the default:

frame< s h e l l > / b i n / sh < / s h e l l >

– <basedir>. Optional and unique.Value of the root path used by the <filetree> tags (explained later). Example:

frame< b a s e d i r > / u s r / s h a r e / vnuml / s c e n a r i o s / f i l e s < / b a s e d i r >

– <console>. Optional and multiple.Example:

frame< c o n s o l e i d ="0" > xterm </ c o n s o l e >

There are more types of consoles, for more information see the description of the same tag inthe section of the virtual machines.


3.4.2 The Section of Virtual Networks

Following the global definitions section, we find the definition of zero or more virtual networks. The<net> tag is used for such purpose. The <net> tag has some attributes:

• The name attribute (mandatory) which identifies the network.

• The mode attribute (mandatory) which defines how the interconnection is implemented.By default, we will use mode="uml_switch" to indicate that virtual network is in implementedusing an uml_switch process.

For example, to define a virtual network named Net0 using an uml_switch process, we use:frame

< n e t name=" Net0 " mode=" uml_swi t ch " / >

There are other attributes that we can find within the <net> tag help us to accurately define thebehavior of the virtual network. In this context, we can use the hub attribute set to "yes" to configure theuml_switch process in hub mode (its default behavior is as switch):

frame< n e t name=" Net0 " mode=" uml_swi t ch " hub =" yes " / >

Another interesting attribute is the sock attribute, which contains the file name of a UNIX socket onwhich an uml_switch instance is running. If the file exists and is readable/writable, then instead ofstarting a new uml_switch process on the host, a symbolic link is created, pointing to the existing socket.In this way, we can create uml_switch instances and set their permissions and their configuration aheadof time. In particular, we can attach a tap interface in the host to the the uml_switch (this can be doneusing the -tap option of the uml_switch). This allows the host to monitor the virtual networks or to bepart of these virtual networks using its tap interface. For example, in the VNUML specification we canuse:

frame< n e t name=" Net0 " mode=" uml_swi t ch " hub =" yes " sock = " / v a r / run / vnuml / Net0 . c t l " / >

And then start an uml_switch in the host in the following way:

phyhost# uml_switch -tap tap0 -unix /var/run/vnuml/Net0.ctl

3.4.3 The Section of Virtual Machines

The virtual machines definition completes the simulation scenario. Virtual machines are defined with the<vm> tag. Each <vm> tag describes a virtual UML machine. The tag uses the name attribute to specifythe name for the virtual machine. Version 1.8 has a limit of 7 characters for the length of the name.

The optional order attribute, that uses a positive integer value (for example, order="2"), establishesthe order in which virtual machine will be processed (for example, which virtual machine will be boot/haltfirst). Virtual machines with no order are processed last, in the same order in which they appear in theVNUML file. You can define as many <vm> tags as you need (including zero). The only restriction isobviously that the names of the virtual machines cannot be duplicated (i.e. the value of the name attribute).Example:

frame<vm name=" s e r v e r ">. . . . .</vm>

Within <vm> several tags configures the virtual machine environment (this is a non-intensive list oftags):


• <filesystem >. Optional (default specified with <vm_defaults>) and unique.• <mem >. Optional (default specified with <vm_defaults>) and unique.• <kernel >. Optional (default specified with <vm_defaults>) and unique.• <console >. Optional and multiple.

By default, each virtual machine is booted without any I/O channel so, apart networking (supposingit has been configured properly) there is no way for the user to interact with the virtual machine.This approach is fine for some hosts environments (for example, a server where no X server isavailable) but you may want to have a way of directly accessing to the virtual machine to loginand to introduce commands, as you will do in a conventional machine. The <console> comes tosolve this problem. It allows you to specify that you want to access the virtual machine through axterm, a tty line or a pts line. You can specify several consoles (each one with a different idattribute). Examples:

– If you use the console “xterm”, then a tty in the guest is connected to an xterm applicationin the host. For example:

frame< c o n s o l e i d ="0" > xterm </ c o n s o l e >

If the previous element is present in the definition of a guest virtual machine, when thesimulation is started it will appear an xterm in the host that is connected to the tty0 of theguest.

– If you use the console “pts”, then a tty in the guest is connected to a pseudo-terminal (pts)in the host. For example:

frame< c o n s o l e i d ="1" > p t s < / c o n s o l e >

If the previous element is present in the definition of a guest virtual machine, then, when thesimulation is started, a pts in the host is connected to the tty1 (notice that id="1") on theguest. In particular, the pts device is stored by the vnumlparser.pl in a file. For example,if your simulation name is “simple_example”, and the virtual machine name is “uml1“, thefilename for the pts will be $HOME/.vnuml/simulations/simple_example/vms/uml1/run/pts.If you execute cat over this file while the simulation is running, you will obtain a result likethis:

$ cat $HOME/.vnuml/simulations/simple_example/vms/uml1/run/pts/dev/pts/7

This means that the /dev/tty1 inside the guest is connected to /dev/pts/7 inside thehost. To access to pseudo-terminal devices, we can use the screen command as follows:

$ screen /dev/pts/7

• <if >. Optional and multiple.This tag describes a network interface in the virtual machine. It uses two attributes: id and net.Attribute id identifies the interface. The name of a virtual machine interface with id=n is ethn.Attribute net specifies the virtual network (using name value of the corresponding <net>) to whichthe interface is connected.Example:frame< i f i d ="1" n e t =" Net1 ">

<ipv4 > 1 0 . 0 . 1 . 2 / 2 4 < / ipv4 ></ i f >

As shown in the example, several tags can be used inside <if>:


– <mac >. Optional and unique.Specifies MAC address for the interface inside UML. If not used, one address is assignedautomatically if <automac> is in use or relies in UML.

– <ipv4>. Optional and multiple.Specifies an IPv4 address for the interface.The mask can be specified as part of the tag value. For example:

frame<ipv4 > 1 0 . 1 . 1 . 1 / 2 4 < / ipv4 >

or using the optional mask attribute either in dotted or slashed notation, for example:

frame< ipv4 mask = " / 2 4 " > 1 0 . 1 . 1 . 1 < / ipv4 >

or

frame< ipv4 mask = " 2 5 5 . 2 5 5 . 2 5 5 . 0 " > 1 0 . 1 . 1 . 1 < / ipv4 >

If the mask is not specified (for example, <ipv4>10.1.1.1</ipv4>) 255.255.255.0 (equivalently/24) is used as default.Using mask attribute and the mask prefix in the tag value at the same time is not allowed.

– <ipv6 >. Optional and multiple.Specifies an IPv6 address for the interface. The mask can be specified as part of the tag value.For example:

frame<ipv6 >3 f f e : : 3 / 6 4 < / ipv6 >

You can also use the optional mask attribute in slashed notation. For example:

frame< ipv6 mask ="/64" >3 f f e : : 3 / 6 4 < / ipv4 >

Note that, different from <ipv4> , dotted notation is not allowed in <ipv6>. If the mask isnot specified (for example, <ipv6>3ffe::3/64</ipv6>) /64 is used as default.Using mask attribute and the mask prefix in the tag value at the same time is not allowed.

• <route >. Optional and multiple.Specifies a static route that will be configured in the virtual machine routing table at boot time.The routes added with this tag are gateway type (gw). Two attributes are used: type (allowed values:"ipv4" for IPv4 routes or "ipv6" for IPv6 routes) and gw, that specifies the gateway address. Thevalue of the tag is the destination (including mask, using the ’/’ prefix) of the route.frame< r o u t e t y p e =" ipv4 " gw = " 1 0 . 0 . 0 . 3 " > d e f a u l t < / r o u t e >

• <forwarding >. Optional (default specified with <vm_defaults>) and unique. Activates IPpacket forwarding for the virtual machine (packets arriving at one interface can be forward toanother, using the information in the routing table). This tag uses the optional type attribute (defaultis "ip"): allowed values are: "ipv4", that enables forwarding for IPv4 only; "ipv6", that enablesforwarding for IPv6 only; and "ip" that enables forwarding for both IPv4 and IPv6. The forwardingis enabled setting the appropriate kernel signals under /proc/sys/net.frame< f o r w a r d i n g t y p e =" i p " / >


Remote command execution: <exec> and <filetree>

VNUML framework has two important features which are very useful to control and manage the virtualmachines of a simulation. One of these features is the possibility to perform remote command executionfrom the host OS (Operating System) to the guests OS when the simulation is running. This featureis accomplished with the <exec> tag. Remote command execution offers the possibility to automateprocedures over the virtual machines once they are running up.

The other important feature is the remote file copy procedure, which allows to copy files from the hostOS to the guest OS’s in runtime. This feature is accomplished with the <filetree> tag.

<exec>

This is an optional tag and it can appear multiple times in a VNUML file. Specifies one command tobe executed by the virtual machine during executing commands sequence mode. In this document, weshow the mandatory attributes this tag can use. Optional attributes are described in the VNUML referencemanual.

Mandatory attributes:• seq. It is a string that identifies a command sequence. This string is used to identify the commands

to be executed.• type (allowed values: "verbatim", "file"). Using "verbatim" specifies that the tag value is the

verbatim command to be executed. Using "file", the tag value points (with absolute pathname) to afile (in the host filesystem) with the commands that will be executed (line by line).

In the following example (see Code 3.4), it has been defined two labels as command sequences: ”start”and ”remove“. When the ”start” label is executed, uml1 will execute /usr/bin/streamsenderwhereas uml2 will execute /usr/bin/streamreceiver. If ”remove” label is executed, then uml1will execute rm /etc/motd.


<global>....

</global><net name="Net0" mode="uml_switch" /><vm name="uml1">

...<exec seq="remove" type="verbatim">rm /etc/motd</exec><exec seq="start" type="verbatim">/usr/bin/streamsender</exec>


....<exec seq="start" type="verbatim">/usr/bin/streamreceiver</exec>

</vm></vnuml>

Code 3.4: Example exec labels.

<filetree>

This is an optional tag it can appear multiple times in a VNUML file.


<?xml version="1.0" encoding="UTF−8"?><!DOCTYPE vnuml SYSTEM "/usr/share/xml/vnuml/vnuml.dtd">

<vnuml><global>

....<vm_defaults exec_mode="mconsole">

....<basedir>/home/user/config_files/</basedir>...

</vm_defaults></global><vm name="uml1">

...<filetree seq="stconf" root="/etc/streamer">streamer/</exec>

</vm></vnuml>

Code 3.5: Example filetree labels.

Specifies a filetree (a directory, as well as all its files and subdirectories) in the host filesystem thatwill be copied to the virtual machine filesystem (overwriting existing files) during execution commandsmode. This tag allows easily copying of entire configuration directories (as /etc) that are stored and editedfrom host when preparing simulations.

• If the directory (in the host filesystem) starts with "/", then it is an absolute directory.• If the directory doesn’t start with "/", then it is relative to <basedir> tag.<basedir> is an optional tag (default specified with <vm_defaults>) and unique. It sets the root path

used for <filetree> tags, that is to say, when when the filetree path doesn’t start with "/" it uses the pathspecified in <filetree> as a relative path to the value in <basedir>.Important note: if <basedir> is not specified, the value of basedir is set to the directory in which itis stored the VNUML file.

<filetree> tag uses two mandatory attributes:• root. Specifies where (in the virtual machine filesystem) to copy the filetree.• seq. The name of the commands sequence that triggers the copy operation. Note that filetree copy

is made before processing <exec> commands.Other optional attributes can be viewed in VNUML language reference manual. The code 3.5 shows

how to use the <filetree> tag. In this example, when label "stconf" is executed, a filetree copy betweenhost and uml1 virtual machine is performed. Specifically, the filetree in the host below "/home/user/con-fig_files/streamer" is copied to the uml1 filesystem at "/etc/streamer". Note that it is possible to havethe same sequence label assigned to an <exec> tag and to a <filetree> tag. In this case, the copy using<filetree> is executed first and next the commands within <exec>.

3.5 The VNUML processor command: vnumlparser.pl

Now that we have been introduced to VNUML language and we are able to write an VNUML documentfor describing a simulation scenario, it’s time to present how this VNUML document is executed by theVNUML processor command: “the vnumlparser.pl”. As we previously mentioned, the workingcycle using VNUML has three phases. The last one, the execution phase, is the phase where we build,start and manage the simulation scenario. This phase is accomplished using an application that processesthe VNUML document and executes the appropriate commands: linux, uml_switch, etc. This application


for processing the XML file is called in general a “parser”3. In VNUML, this parser is called thevnumlparser.pl. The vnumlparser.pl performs three steps (actually, one of them is optional):

1. Build scenario. The parser creates the virtual networks that will interconnect the virtual machinesand the host, and then, boots and configures the virtual machines defined, adding IP addresses,static routes or any other network related parameters. The UML boot process makes this step veryprocessor intensive.

2. Execute commands. Once the scenario has been built, you can run command sequences on it.Basically in this step, the parser takes the commands defined in the <exec> and <filetree> tagsin the VNUML definition file and executes them. Several command sequences may be defined(e.g., one to start routing daemons, another to perform a sanity check in the virtual machines, etc.),specifying which to execute in every moment.This step is optional. If you don’t need to execute command sequences (because you prefer interactwith the virtual machines directly), you don’t need it.

3. Release Scenario. In this final step, all the simulation components previously created (UMLs,virtual networks, etc.) are cleanly released. The UML shutdown process makes this step also veryprocessor intensive.

vnumlparser.pl has several operation modes each related to each of the three steps of theexecution phase:

1. Build the scenario: -t mode. The command syntax is:

phyhost$ vnumlparser.pl -t VNUML-file

2. Execute commands: -x mode. In this case once we know the sequence label (labelname) we wantto execute, the command syntax is:

phyhost$ vnumlparser.pl -x labelname@VNUML-file

3. Release Scenario: -d mode. The command syntax is:

$ vnumlparser.pl -d VNUML-file

3In computing, a parser (syntax analyzer) is one of the components in an interpreter or compiler, which checks for correctsyntax and builds data structures related with the input tokens

Part II

Routing

33

Chapter 4

Shortest Path Algorithms

4.1 Bellman-ford

4.1.1 The Algorithm

In this section we explain the Bellman-Ford (BF) algorithm [5]. This algorithm allows us to find theshortest path to go from a source (s) to any destination in a certain topology. Topologies are composed ofvertex (nodes) and edges (links). We will use the following notation:

• V is the set of vertex of the topology.• E is the set of edges of the topology.• (i,j) denotes the edge that directly connects the vertex i with the vertex j.• w(i, j) = c denotes that the weight or cost associated with crossing the edge (i, j) is c.• Psj is a best path from s to j.• d[j] current estimate to go from s to j.• δ[j] best cost to go from s to j.

The algorithm works in a topology that might have edges with negative weights but it cannot containnegative cycles. See Figure 4.1a for an example of negative cycle. The pseudo-code of the Bellman-fordalgorithm is shown in Algorithm 1.

input : s, V , E , w(i, j)∀i, joutput : Psj ,δ[j]∀jd[s]← 01forall i ∈ V − {s} do2

d[i]←∞3end4for n = 1 to |V| − 1 do5

forall edges (i, j) ∈ E do6if d[j] > d[i] + w(i, j) then7

d[j]← d[i] + w(i, j);8Psj = Psi → (i, j);9

end10end11

end12Algorithm 1: Pseudo-code for the Bellman-Ford Algorithm

Lines from 1 to 4 initialize the algorithm. Lines from 7 to 10 apply the relax condition. At eachiteration, the algorithm converges to the shortest path by trying to relax edges.

35

Chapter 4. Shortest Path Algorithms 36

i

j

k5

4

10

(a) A Negative Cycle.

320

15

5

1

5

4

10

510

2

15

(b) Sample Topology to Compute BF.

Figure 4.1: Sample Topologies.

4.1.2 Example

Let’s use BF to calculate the best paths from s = 1 in the topology of the Figure 4.1b. Notice that in BFthe edges can be visited in any order.

Table 4.1: Example of Bellman-Ford

n=0 d[1] = 0 d[2] =∞ d[3] =∞ d[4] =∞ d[5] =∞Edge (1,2) (1,3) (2,5) (2,4) (3,4) (4,5) (5,4)

Weight 15 20 10 15 5 -5 10Order(ran-dom)

4 7 5 2 3 1 6

n=1 d[2] = 15

P12 =(1, 2)

d[3] = 20

P13 =(1, 3)

d[5] = 25P15 =(1, 2, 5)

- - - d[4] = 35P14 =(1, 2, 5, 4)

n=2 - - - d[4] = 30P14 =(1, 2, 4)

d[4] = 25

P14 =(1, 3, 4)

- -

n=3 - - - - - d[5] = 20

P15 =(1, 3, 4, 5)

-

n=4 - - - - - - -

4.1.3 Proof of Correctness

We will prove that BF finds the best paths (shortest paths). Let’s consider any shortest path (s, v1, v2, ..., vk−1, vk).Then,

• We can always be sure that k ≤ |V| − 1. That is to say, a path cannot be longer than the number ofvertex of the topology.

• (s, v1) must be also a shortest path (d[v1] = δ[v1]). In other words, there is not any other paththat provides a shortest path and from s to v1. We know this because by definition a shortest pathis formed of smaller shortest paths. Finally, we can be sure that BF will find this path at n = 1because BF will relax (s, v1) in its first iteration.

• By the same reasons (s, v1, v2) is another shortest path. BF will find this path for sure at n = 2because the second edge (v1, v2) must be relaxed at most at n = 2. Recall that the previous edge ofour shortest path has been relaxed at n = 1.

37 Chapter 4. Shortest Path Algorithms

• After |V| − 1 iterations for sure no edges are relaxable in any shortest path possible and d[vk] =δ[vk] ∀ k ∈ V .

• Finally, if we execute an iteration of BF and there is not any link relaxation, this is the end of thealgorithm. In other words, at least one link must be relaxed for the algorithm to continue becauseif BF does not relax any link in a round, a new round of the algorithm will not produce any newshortest path.

4.1.4 Distributed Version

In the centralized BF, a source node must completely know the topology (links and nodes) to computeBF. Nevertheless, there is a method to build a distributed or decentralized version of BF. In this version,a node that wants to compute the shortest path to other nodes in the topology will not know the exacttopology but only the estimated distances provided by its neighbors. Put in another words, neighborsexchange their vector of distances dx (see Figure 4.2). Since the update messages contain a vector ofdistances, this type of dynamic routing algorithms are called “distance vector”.

k

v

ju

dvd

udk

du

dj

du

Figure 4.2: Decentralized Bellman-Ford.

When a node u receives the vector of distances dk of a neighbor k, it can use the distances on thisvector to relax the edge to this neighbor (u, k):

forall i ∈ dk do1

if d[i] > w(u, k) + dk[i] then2

d[i]← w(u, k) + dk[i];3

next hop for i = k;4

end5

end6Algorithm 2: Decentralized BF

In Figure 4.3 we provide an example of a topology in which we show the distribution of the route tothe node a with the distributed version of the BF. In the figure you can observe the update messages andthe route to the node a learned by each node.

Notice the decentralized computation: d sends only the best path to a, in other words, sends the pathto a with all the relaxations computed. On the other hand, h will learn the shortest path to a after:

• b relaxes (a, b) and sends the update to its neighbors.

• d receives the previous update from b, relaxes (d, b) and sends the update to its neighbors.

• g receives the previous update from d, relaxes (g, d) and sends the update to its neighbors.

• h receives the previous update from g and relaxes (h, g).


a

a,1

b

c

d

e

g

f

ha,1

a,2 a,2 a,3

a,4

a,3

a,2

a,4,g

a,3,da,2,b

a,1,a

a,3,d a,4,e

a,2,b

Figure 4.3: Example of Distributed BF.

Notice that the algorithm converges to the optimal solution (shortest path) because this decentralizedprocess is like running the BF in a end-less loop in which each node relaxes its adjacent edges. After allthe nodes have sent and received update messages is like an iteration of the centralized algorithm.

4.2 Dijkstra

4.2.1 The Algorithm

In this section we explain the Dijkstra algorithm [6]. This algorithm allows us to find the shortest path togo from a source (s) to any destination in a certain topology. We will use the same notation as in Section4.1.1 and the following additional sets:

• S is the set of visited vertices.• R is the set of remaining vertices.• extract_min(R) is a function that extracts the node i fromR with less distance to s.• Adj[i] are the set of adjacent nodes to i.The algorithm works in a topology that must be free of edges with negative weights. The pseudo-

code of the Dijkstra algorithm is shown in Algorithm 3. Lines from 1 to 6 initialize the algorithm. Linesfrom 8 and 9 manage the sets S andR and lines from 10 to 15 apply the relax condition. The Dijkstraalgorithm requires in general less operations than BF (is faster than BF). This is because it is a greedyalgorithm thanks to the assumption of non-negative weights.

d[s]← 0;1S ← φ;2R ← V;3forall i ∈ V − {s} do4

d[i]←∞;5end6whileR 6= φ do7

i← extract_min(R);8S ← S ∪ {i};9Psi = compute_path(i);10foreach j ∈ Adj[i] do11

if d[j] > d[i] + w(i, j) then12d[j]← d[i] + w(i, j);13

end14end15

end16Algorithm 3: Pseudo-code for the Dijkstra Algorithm

39 Chapter 4. Shortest Path Algorithms

4.2.2 Example

Let’s use Dijkstra to calculate the best paths from s = 1 in the topology of the Figure 4.4. The results ofeach round of the algorithm are shown in Table 4.2.

320

15

10

1

5

4

5

155

2

15

Figure 4.4: Example Topology to Compute Dijkstra.

Step Sets Paths DistancesInitialization S = {φ} R =

{1, 2, 3, 4, 5}- d[1]=0 d[2] =∞ d[3] =∞ d[4] =∞ d[5] =

∞Extract 1 S = {1} R = {2, 3, 4, 5} P11 = (1) d[1] = 0 d[2]=15 d[3] = 20 d[4] =∞ d[5] =

∞Extract 2 S = {1, 2} R = {3, 4, 5} P12 = (1, 2) d[1] = 0 d[2] = 15 d[3] = 20 d[4] =

30 d[5]=20

Extract 5 S = {1, 2, 5} R = {3, 4} P15 = (1, 2, 5) d[1] = 0 d[2] = 15 d[3]=20 d[4] = 25 d[5] =20

Extract 3 S = {1, 2, 5, 3} R = {4} P13 = (1, 3) d[1] = 0 d[2] = 15 d[3] = 20 d[4]=25 d[5] =20

Extract 4 S = {1, 2, 5, 3, 4} R ={φ}

P14 =(1, 2, 5, 4)

d[1] = 0 d[2] = 15 d[3] = 20 d[4] = 25 d[5] =20

Table 4.2: Example of Dijkstra

4.2.3 Proof of Correctness

We will prove this algorithm by the inductive hypothesis. This means that we have to prove that the initialstep of the algorithm is optimal and then that each new step is also optimal.

Let’s assume that d[i] = δ[i] for each i ∈ S with k ≥ 1 where k = |S|. In other words, the currentdistance estimation of each element of S to s is set to the shortest possible distance.

Then, we have to prove that:(1) The inductive hypothesis holds for k = 1. However, this is trivial since the first element added to S is

s and d[s] = 0.(2) If j is the next vertex to be added to S and i is the vertex of S that connects j with the minimum

distance, then, the path Psj constructed using the path to i and the edge (i, j) is a shortest path from sto j.


j

...

...

s

x

i

[y]

S

P'sj

≥ Shortest path

Shortest path Psj

Figure 4.5: Proof of Dijkstra.

To prove (2) let’s consider another path from s to j P ′sj (see Figure 4.5). We will prove that it does notexist any path P ′sj shorter than the path Psj constructed with Dijkstra. To do so, we define another genericpath P ′sj that leaves S from a vertex let’s say x. Then, let (x, y) be the edge of P ′sj that is used to leave S .In particular, y can be assumed equal to j. We want to prove that always w(P ′sj) ≥ w(Psj). The proof isthe following:

w(P ′sj) ≥︸︷︷︸Nonegativeweights

w(Psx) + w(x, y) ≥︸︷︷︸By construction :extract_min(R)

w(Psi) + w(i, j)

Chapter 5

RIP

5.1 Introduction

The Routing Information Protocol (RIP) is a dynamic routing protocol that can be used in small/mediumIP networks. RIP is based on a distance-vector exchange and a distributed version of the Bellman-Fordalgorithm. The popularity of RIP is due to its simplicity and to its long history. RIP was first defined inRFC1058 [7] (1988) and extended to Version 2 in RFC2453 [8]. Compared to RIPv1, RIPv2 defines anew message format and it includes a number of new features such as support for classless addressing,authentication, and the use of multicasting instead of broadcasting to improve network performance.Finally, RIP has also been adapted for its use in IPv6 networks in a standard known as RIPng (RIPnext generation) RFC2080 [9]. While RIP is still in use in some networks today, it is consideredtechnically obsolete compared to more advanced protocols such as Open Shortest Path First (OSPF)RFC2328,RFC5340 [10, 11] and the OSI protocol ISIS RFC1142 [12]. RIP, like any routing protocol,defines a routing database, a protocol for exchanging information about routes and an algorithm forupdating routing information. These three parts are described in the following sections.

5.2 Routing Database

Each RIP entity (typically a router) keeps track in its routing database of all networks (and possiblyindividual hosts) in the RIP routing domain. Each entry in the routing database includes the nextintermediary router (called next hop) to which datagrams have to be delivered so that they can reach thefinal destination. In addition, the routing database includes a “metric” for measuring the total distanceto the final destination. More specifically, each entry in the routing database includes the followinginformation1:

• Address. The address of a destination network or a destination host (@IP/MASK).

• Metric. The metric or cost from that node to the destination.

• Router. The IP address of the next intermediary or router to which datagrams must be sent toeventually get to the destination network (or host).

• Interface. The network interface that must be used to reach the next router.

• Timers. Timers are used to manage dynamics of the routing information.

The metric for distances can be any assessment of cost, but in practice, RIP uses the number of hopsto destination as metric. In an IP network, a datagram makes a hop when it passes through a router. More

1In a real implementation, there are various additional flags and other internal information in the routing table.

41

Chapter 5. RIP 42

specifically, if a RIP entity is directly connected to a destination, then the RIP distance between thesetwo entities is 1 hop, if the source and destination are connected through a single intermediary (router),then the distance is 2 hops, and so on. In RIP, the valid metrics range between 1 and 16, inclusive. Themaximum number of hops allowed for any destination is 15 (we will see why later) and the RIP distance16 is reserved for the meaning “infinity”. In other words, if a destination in a routing table has RIP distanceequal to 16, this means that “the destination (network or host) is unreachable”.

5.3 Update Algorithm

Distance vector algorithms get their name from the fact that it is possible to compute optimal routes (thosewith minimal distances) by periodically exchanging the vector of distances to the different destinationsthat each node in the network has. The routing database of each RIP node is initialized with a descriptionof the RIP entities that are directly connected (at one hop or metric=1) and then, it is updated with adistributed version of the Bellman-Ford algorithm, according to information received in RIP messagesfrom neighboring RIP entities. Next, we show the basic version of the algorithm for a static topology andthen, we present the modifications necessary to deal with dynamic topologies.

5.3.1 Static Topology

Distributed Bellman-Ford Algorithm

To describe the algorithm, we define w(i, k) as the weight or cost of the edge that connects the nodes iand k. So, if we use the number of hops as metric, then w(i, k) = 1 if i and k are directly connected andw(i, j) = 16 (infinity) if they are not. Let us define δ(Pij) as the metric of the best path or route for twoentities i and j, which are not directly connected in general. Since costs are additive, it is easy to showthat the best metric must be described by the following equation:

δ(Pij) = min[w(i, k) + δ(Pkj)] (5.1)

The equation (5.1) is the Bellman-ford equation and it says that “the best route is through the neighborthat has the minimum distance to the destination”. Based on this equation we can implement the distributedversion of the Bellman-ford algorithm. The pseudo-code of the algorithm is shown in Algorithm 4.

input : dk: distance vector to N destinations from neighbor k.

for n← 1 to N do1if R(n).metric > w(i, k) + dk(n) then // Consider better metrics to destination2

R(n).metric = w(i, k) + dk(n);3R(n).next_hop = k;4

end5end6

Algorithm 4: Pseudo-code for the Basic RIP Update Algorithm

The explanation of Algorithm 4 is the following. When a RIP entity i receives the distance vectordk with the estimates of neighbor k, it adds w(i, k) to each of the estimations received. Then, for eachdestination n, the node i compares the metric provided by the neighbor with its current routing entrymetric for this destination R(n).metric. The node picks the new route if the metric provided by theneighbor is smaller. With this algorithm, after receiving estimates from all the nodes in the network, i willhave the smallest distance to all the destinations. Note. By default, the metric between neighbors is one(w(i, k) = 1) but many implementations allow you to change this metric for the links that you considernecessary.

43 Chapter 5. RIP

5.3.2 Dynamic Topology

Decrease the metric

The method so far only has a way to lower the metric, as the existing metric is kept until a smaller oneshows up. However, it is possible that the initial estimate might be too low. In this case, we need a methodfor “increasing the metric”. For this purpose, it is enough to always consider the information received bythe next hop of a route. For example, suppose the current route to a destination has metric D and usesrouter R. If a new set of information arrived from some source other than R, only update the route if thenew metric is better than D. But if a new set of information arrives from R itself, always update D to thenew value. The pseudo-code of the modified algorithm is shown in Algorithm 5.

input : dk: distance vector to N destinations from neighbor k.

for n← 1 to N do1if R(n).next_hop == k then // Always consider the information from the2next_hop

R(n).metric = w(i, k) + dk(n);3end4else if R(n).metric > w(i, k) + dk(n) then // Consider better metrics to5destination

R(n).metric = w(i, k) + dk(n);6R(n).next_hop = k;7

end8end9

Algorithm 5: Pseudo-code for the Basic RIP Update Algorithm Always Considering Next Hop Infor-mation.

Updates

The algorithm described works without making assumptions about when updates are sent. It is safe to runthe algorithm asynchronously, that is, each RIP entity can send updates with its distance vector accordingto its own clock. The algorithm will converge to the correct distances in finite time in the absence oftopology changes as long as not all RIP updates get dropped.

Originally each RIP router transmitted full updates every 30 seconds. In the early deployments,routing tables were small enough that the traffic was not significant. As networks grew in size, however,it became evident there could be a massive traffic burst every 30 seconds, even if the routers had beeninitialized at random times. It was thought, as a result of random initialization, the routing updates wouldspread out in time, but this was not true in practice. Sally Floyd and Van Jacobson showed in 1994 [13]that, without slight randomization of the update timer, the timers synchronized over time and sent theirupdates at the same time. Modern RIP implementations introduce deliberate variation into the updatetimer intervals of each router.

Examples

Let us illustrate the route update process with an example (see Figure 5.1). As shown in Figure 5.1, theNetwork 1 (N1) is connected to two routers: RA and RB. We also assume that all the routers in the domain(RA,RB, RC and RD) run RIP. We are going to illustrate how the information about Network 1 (N1)can be distributed by RIP. As the order of the RIP updates is randomly distributed (unpredictable), thedescription that follows is just a possible realization of the RIP update process:

(1) As initial condition we assume that the only routers that have information about N1 are the twodirectly connected routers (RA and RB). Then, RA is the first router to send information about N1 ina RIP message. We will note this information as {N1,1}, which means that the RIP message sent by

Chapter 5. RIP 44

RA includes an entry for N1 showing that this router can reach this network with one hop. N1 sendsthis RIP message to its one-hop neighbors: RB and RC. Upon receiving this information, RC updatesits routing database because the information received by RA informs about a new reachable network.RB does nothing because it already knows N1 with a better (shorter) path.

(2) RC decides that it has to send a RIP message and it includes the new information it knows aboutN1. This information is that it can reach N1 with two hops {N1,2}. RC sends this information toits one-hop neighbors: RA and RD. In the case of RD, this is the first time it hears about N1, so itupdates its routing database with the new information. Obviously, RA does nothing.

(3) This time RB sends its RIP message including the entry {N1,1} to its one-hop neighbors: RA andRD. RA does nothing, but RD updates its routing database because the new information from RBprovides a shortest path to the destination than the previous one that RD possessed.

(4) This time RD sends its RIP message but nobody does nothing because the information provided bythe router is worse than the information present at the routing databases of the rest of the routers (thisis logical because RD is the farthest router).

Figure 5.1: Basic RIP Update Algorithm

Following our example, now, we are going to assume that one of the links is broken. In particular, thelink that connected RB to N1 (as shown in Figure 5.2).

In this case, we are going to observe that some nodes have too low estimations for N1 and we will seehow nodes use the method for increasing the metric. The description that follows is a possible realizationof the RIP update process:

(1) As an initial condition, we assume that all the routers have converged to the topology depicted inFigure 5.1. Thus, they have entries for N1 as shown at the top of the diagram of Figure 5.2. Then,RB detects that its link to N1 is broken. At this moment, RB updates the entry for N1 in its databasesetting the metric for this network to infinity (16).

(2) RA sends a RIP message to its one-hop neighbors: RB and RC. Upon receiving this information, RBupdates its routing database because the information received by RA informs about a new path toreach again the network N1. RC does nothing because it already knows N1 with the same metric.

45 Chapter 5. RIP

{N1,1}{N1,1}(2)

(3){N1,2} {N1,2}

{N1,1,D} {N1,1,D} {N1,2,RA} {N1,2,RB}

{N1,16,}

{N1,2,RA}

{N1,1,D} {N1,2,RA} {N1,2,RA} {N1,3,RB}

(1)

Figure 5.2: Basic RIP Update Algorithm with a Broken Link

(3) At this time RB sends a RIP message to its one-hop neighbors: RA and RD. RA does nothing, butRD updates its routing database because although the information received contains a worse metric, itcomes from the next hop router. Recall that according to the “increasing metric method”, we alwayshave to update our routing entries when the information comes from the next hop. Notice also thatthis can be interpreted as an indication that our estimate for N1 is currently too low. Indeed, with thenew network topology, RD cannot reach N1 with just two hops.

In practice, routers and lines often fail and come back up. To properly handle dynamisms the algorithmpresented so far is not yet suitable and some enhancements are required.

5.3.3 Enhancements

Next, we explain some enhancements to deal with the network dynamics in a RIP domain. Theseenhancements are the use of timers, split horizon rules and triggered updates.

Timers

Notice that if a certain router X is included in the best route to a certain destination of some other routerY , and the router X is no longer available (for example because it crashed or because some networkconnection to it is broken), the algorithm explained so far might never reflect the change to router Y . Thealgorithm as shown so far depends upon routers notifying its neighbors if their metrics change. In order tohandle problems of this kind, distance vector protocols must make some provision for timing out routes.For this purpose, there are two timers associated with each route: a “timeout” and a “garbage-collection”time.

• The timeout is used to limit the amount of time a route can stay in a routing database without beingupdated. Recall that RIP entities have to send update messages approximately every 30 seconds.The timeout is initialized to 180 seconds whenever a new route is established and is reset to theinitial value whenever an update is heard for that route. If an update for a route is not heard withinthat 180 seconds (six update periods), the hop count for the route is changed to 16, marking theroute as unreachable.

• The other timer, the garbage-collection timer, is used to make help neighbors making them knowthat the route is no longer valid. An unreachable route will be advertised with the infinite metric(16) until the garbage-collection timer expires (120 seconds by default). After this, the route isremoved from the route database.

Chapter 5. RIP 46

Note. Some implementors (like CISCO) call the timeout timer the “invalid timer” and the garbage-collection timer the “flush timer”.

Count to Infinity

The algorithm as presented up to this point is still not quite enough to make it useful in practice due to thecount to infinity problem. To illustrate this problem, we continue our example but this time we “break“the link between RA and N1. As a result is broken, N1 is now unreachable for our four RIP routers (seeFigure 5.3). Let us show what happens in this situation.

(1) As shown in Figure 5.3, at a certain moment the last link to N1 is broken. At this moment, RA, whois the only one-hop RIP neighbor of N1 detects this situation and sets the metric for this network toinfinity in its routing database.

(2) At a certain moment, RB sends a RIP message with its current information about N1. That is tosay, RB sends {N1,2} since it has not received any new information about N1. This out-of-dateinformation arrives to RA since it is a one-hop neighbor of RB, and it causes the update of the RA’sentry for N1.

(3) RA sends its RIP update message including {N1,3}. This causes an update in the entries for N1 inRB and RC.

(4) RC sends its RIP update message including {N1,4}, which does not cause any update in the neighbors.

(5) RA sends its RIP update message including {N1,4,}, which causes that RA increases the metric forthis network up to 5. At this moment, RA and RC are in mutual deception because each mutual RIPupdate message causes an increase of the metric for the N1 network in this two neighbors.

(2){N1,2} {N1,2}

{N1,2,RA} {N1,2,RA} {N1,3,RB}

{N1,3,RB}{N1,3}(3)

{N1,3}

{N1,1,D}

{N1,16,}

{N1,4,RA} {N1,4,RA}(4){N1,4} {N1,4}

{N1,5,RB}

(5){N1,4} {N1,4}

routers are in mutual deception counting to infinity

. . .

(1)

Figure 5.3: The RIP Count To Infinity

47 Chapter 5. RIP

Notice that the behavior of algorithm is correct in the sense that the network is now at a distance ofinfinity and in fact, with the successive updates, the metric for the route is slowly increasing to infinity.However, we have a problem because the "counting to infinity" will never end and thus the routingdatabases will never converge. Thus, at this point it might become clear why we have to limit themaximum number of hops of a RIP domain and why "infinity" should be chosen as small as possible.Notice however, that infinity must be large enough so that no real route is that big. Therefore, the choice ofinfinity is a trade-off between network size and speed of convergence in case counting to infinity happens.The designers of RIP believed that the protocol was unlikely to be practical for networks with a diameterlarger than 15, and thus they decided to set infinity to 16.

Split Horizon

The counting to infinity problem that we saw in our previous example is caused by the fact that RA andRC are engaged in a pattern of mutual deception. Each claims to be able to get to N1 via the other. Thiscan be prevented in many cases by being a bit more careful about which information is sent to whichneighbors. In general, the idea of providing a ”split view“ of your available information receives the nameof ”split horizon”. In the context networking, split horizon is used to solve several problems and also toprovide certain functionalities. In the context of RIP, in its simplest version, the split horizon rule is just toomit routes learned from one neighbor in updates sent to that neighbor. The reason for this, is that it isnever useful to send information about a certain route to the neighbor that you are using as next hop forthe route in question. Let illustrate this technique with an example (see Figure 5.4).

(1) As shown in Figure 5.4, at a certain moment the last link to N1 is broken. At this moment, RA, whois the only one-hop RIP neighbor of N1 detects this situation and sets the metric for this network toinfinity in its routing database.

(2) At a certain moment, RC sends a RIP message with its current information about N1 but in thiscase RC applies the split horizon rule and thus, it sends information about N1 only RD and not toRA because RA is the next hop for N1 in the RB’s routing database. This avoids contaminating therouting entry for N1 of RA with out-of-date information coming from RC.

(3) RA sends its update to its neighbors RB and RC. After receiving the update, RB and RC update theirrouting databases because both had RA as next hop router (they apply the “increasing the metricrule”).

(4) Using the simple split horizon rule, RB sends its update to RD. The final result is that RD updates itsrouting entry for N1 and that all the routers in the RIP domain know now that N1 is unreachable.

The simplest version of split horizon prevents the majority of the situations of mutual deception.However, in the eventual situation in which a router, say RC, thinks that it can get a network, say N1, viaanother router, say RD, and RD thinks that it can get N1 via RC, in this case, we have a loop. For solvingthis situation, the standard of RIP proposes a modification in the behavior of split horizon called “splithorizon with poisoned reverse”. With poisoned reverse, there is also an “split view” of routing informationavailable, but the idea is not omitting routes learned from one neighbor in updates sent to that neighborbut include them with their metrics set to infinity.

Figure 5.5 shows a rather particular way in which we might arrive to a situation of mutual deceptionalthough we have activated split horizon. Then, we show how activating split horizon with poison reversemight help. For this example, we are considering that packets that convey the RIP update data may sufferdelays. These delays are typically due to the process of transmitting the data through the network andalso to the time spent by the packets at different queues at the sender and the receiver. These delays alsocause that RIP update messages sent to a neighbor are not immediately received and processed by that

Chapter 5. RIP 48

N1

RArouter

router

router

RB RD(3)

(3)

(4)

RC

router

RA RB RC RD

(4) {N1,16}

{N1,2,RA} {N1,2,RA} {N1,3,RB}{N1,1,D}

{N1,16,}(2) {N1,2}

split horizon

(2)

split horizon

{N1,16}(3){N1,16}

{N1,16,RA} {N1,16,RA}

{N1,16,RA} {N1,16,RA} {N1,16,RB}{N1,16,RA}

(1)

Figure 5.4: The RIP Split Horizon

neighbor. Issues related with delays are illustrated in Figure 5.5 with sloping lines (the different slopesmean different delays between neighbors).

N1

RArouter

router

router

RB RD

RC

router

RA RB RC RD{N1,2,RA} {N1,2,RA}{N1,1,D}

{N1,16,}

{N1,16}

{N1,16}

{N1,3,RB}

{N1,16,RB}

{N1,16}{N1,3}

{N1,3,RC}

{N1,3}

{N1,4,RD}

activatesplit horizon poison reverse

{N1,16}{N1,3}

(1)

(2) (3)(4)

(5)

(6)

activatedsimplesplit horizon

Figure 5.5: The RIP Split Horizon with Poisoned Reverse

Let us describe the situation:

(1) As mentioned, we consider at a first moment that only the simple split horizon is activated. Then, asshown in Figure 5.5, at a certain moment, the last link to N1 is broken. At this moment, RA, whois the only one-hop RIP neighbor of N1 detects this situation and sets the metric for this network toinfinity in its routing database.

(2) RA sends RIP update messages containing the entry {N1,16} to neighbors RB and RC. As it is shownin Figure 5.5, these update messages arrive to the neighbors at different instants of time.

49 Chapter 5. RIP

(3) RB after receiving and updating its entry for N1 to infinity, RB sends a message to its neighbor RD.Notice that RB does not send any update message to RA since slit horizon is activated.

(4) RD sends an update to RC including {N1,3}, just a little bit before receiving the update for N1 fromRB. As split horizon is activated, RD does not send information about N1 to RB. The update fromRD is received by RC after the update from RA, and thus, the entry in the RC’s database for N1 is{N1,4,RD}.

(5) RC sends and update about N1 to RD just before receiving the update from RA telling that N1 isunreachable. As a result, RC will end with an entry for N1 that is {N1,3,RC}. At this point, RCand RD are in mutual deception. Furthermore, they will not send to each other any update messagebecause of the split horizon rule. Therefore, to clean the route N1 from the routing tables, we willhave to wait to the timeout.

(6) In the final step, we see that if we activate split horizon with poison reverse, RD might send an updatemessage to RC and remove the loop with RC before the timeout for the route expires.

Notice that the problems have not ended completely since RD is still claiming to RB that itcan reach N1 with 3 hops. This type of situation might generate a patter of mutual decep-tion between three or more routers. We will discuss this later and show a solution: triggeredupdates.

Certainly it is rather improbable that with simple split horizon activated two routers end with routespointing at each other, but it can happen. This might be due to a combination of circumstances whentransmitting and processing the packets of the protocol as we have explained, because of some errorin the routing software or by any other problem or circumstance. Thus, in general, split horizon withpoisoned reverse is safer than simple split horizon because advertising reverse routes with a metric of 16will break any loop of mutual deception between two routers immediately. If the reverse routes are simplynot advertised, the erroneous routes will have to be eliminated by waiting for a timeout.

However, poisoned reverse does have a disadvantage: it increases the size of the routing messages.Consider the case of a backbone connecting a number of different buildings. In each building, there isa router connecting the backbone to a local network (see Figure 5.6). Consider what routing updatesthose routers should broadcast on the backbone network. All that the rest of the network really needs toknow about each router is what local networks it is connected to. Using simple split horizon, only thoseroutes would appear in update messages sent by the router to the backbone network. If split horizon withpoisoned reverse is used, the router must mention all routes that it learns from the backbone, with metricsof 16. If the system is large, this can result in a large update message, almost all of whose entries indicateunreachable networks.

In a static sense, advertising reverse routes with a metric of 16 provides no additional information. Ifthere are many routers on one broadcast network, these extra entries can use significant bandwidth. Thereason they are there is to improve dynamic behavior. When topology changes, mentioning routes thatshould not go through the router as well as those that should can speed up convergence. However, in somesituations, network managers may prefer to accept somewhat slower convergence in order to minimizerouting overhead.

Triggered updates

Split horizon with poisoned reverse will prevent routing loops that involve two routers. However, it is stillpossible to end up with patterns in which three routers are engaged in mutual deception. An example ofsuch situation is shown in Figure 5.7.

Chapter 5. RIP 50

N1router

N2 N3

router N4

N5

N6N7N8

N9

RA updates:{N1,1}To RB {N2,16}To RC {N3,16}...

RARB

RDRC

RE

RFRGRH

RI

routerrouter

router

routerrouter

router

router

Figure 5.6: Poisoned Reverse Drawbacks

N1

RArouter

router

router

RB RD(2)

(2) (4) RC

router

RA RB RC RD{N1,2,RA} {N1,2,RA} {N1,3,RB}{N1,1,D}

{N1,16,}

(3)

{N1,16}(2){N1,16}

{N1,16,RA} {N1,16,RA}{N1,3} (3)

{N1,4,RD}

(4)

{N1,16} split horizon +poison reverse

(4){N1,4} {N1,16}{N1,4}

{N1,5,RC} {N1,5,RC}

split horizon +poison reverse

{N1,4,RD} {N1,3,RB}

B>C>D loop

(3)

(4)

(1)

Figure 5.7: Three Router Loop

51 Chapter 5. RIP

Let us describe the situation:

(1) As shown in Figure 5.7, the last link to N1 is broken. RA, who is the only one-hop RIP neighbor ofN1 detects this situation and sets the metric for this network to infinity in its routing database.

(2) RA sends an update about N1 to its neighbors RB and RC, and these routers update their routingdatabases.

(3) RD sends an update. As RD has activated split horizon with poison reverse and the routing entry forN1 says that RB is the next hop, RD sends {N1,3} to RC and {N1,16} to RB. The result is that RC iscontaminated with out-to-date information despite using the split horizon rules.

(4) Now, if RC sends its update using split horizon with poison reverse, it will create a loop of mutualdeception between itself, RD and RB. As shown, split horizon cannot stop such a loop. In any case,with the successive the updates, the metric for N1 will count to infinity and finally the algorithm willconverge. However, this example points out the slow convergence problem that arises when three ormore routers are involved in the pattern of mutual deception.

Triggered updates are used to speed up convergence by avoiding that out-of-date updates producepatterns of three or more routers in mutual deception. To implement triggered updates, we simply add arule that whenever a router changes the metric for a route, it is required to send update messages to itsneighbors almost immediately, even if it is not yet time for a regular update message.

For example, suppose a router’s route to destination N1 goes through router RA. If an update arrivesfrom RA itself, the receiving router is required to believe the new information, whether the new metric ishigher or lower than the old one. If the result is a change in metric, then the receiving router will sendtriggered updates to all the hosts and routers directly connected to it. They in turn may each send updatesto their neighbors. The result is a cascade of triggered updates. Suppose a router RA times out a route todestination N1. RA will send triggered updates to all of its neighbors. However, the only neighbors whowill believe the new information are those whose routes for N1 go through RA. The other routers andhosts will see this as information about a new route that is worse than the one they are already using, andignore it. The neighbors whose routes go through RA will update their metrics and send triggered updatesto all of their neighbors. Again, only those neighbors whose routes go through them will pay attention.Thus, the triggered updates will propagate backwards along all paths leading to router RA, updating themetrics to infinity. This propagation will stop as soon as it reaches a portion of the network whose routeto destination N1 takes some other path. Figure 5.8 shows an example of how triggered updates combinewith the rules for computing new metrics and how loops between several routers are avoided. Let usdescribe the situation:

(1) After the link to N1 is broken, RA sets the metric for this network to infinity in its routing database.

(2) RA sends an update about N1 to its neighbors RB and RC. Since this update message causes a changeof metric for the entries of network N1 in RB and RC, these routers must “immediately” send atriggered update for this network.

(3) In particular, we can observe that after RB sends its triggered update and this message is processed,all the routers in the RIP routing domain have already converged to the correct metric (16) for N1.

Notice that if the system could be made to sit still while the cascade of triggered updates happens,it would be possible to prove that counting to infinity will never happen. Bad routes would always be

Chapter 5. RIP 52

N1

RArouter

router

router

RB RD(2)

(2) RC

router

RA RB RC RD{N1,2,RA} {N1,2,RA} {N1,3,RB}{N1,1,D}

{N1,16,}

(3)

{N1,16}(2){N1,16}

{N1,16,RA {N1,16,RA}

(3)

Triggered updatesplit horizon +poison reverse

{N1,16,} {N1,16,RA} {N1,16,RA} {N1,16,RB}

(3){N1,16} {N1,16}{N1,16}

(1)

(3)

Figure 5.8: Triggered Updates

removed immediately, and so no routing loops could form. Unfortunately, things are not so nice. While thetriggered updates are being sent, regular updates may be happening at the same time. Routers that haven’treceived the triggered update yet will still be sending out information based on the route that no longerexists. It is possible that after the triggered update has gone through a router, it might receive a normalupdate from one of these routers that hasn’t yet gotten the word. This could reestablish a out-of-dateversion of the faulty route. If triggered updates happen quickly enough, this is very unlikely. However,counting to infinity is still possible.

The final aspect about triggered updates to be taken into account is related to performance. Triggeredupdates can cause excessive load on networks with limited capacity or networks with many routers onthem. This is due to the fact that triggered updates cause a lot of network traffic in a short period of time.Therefore, the protocol requires that implementors include some mechanisms to avoid these performanceproblems. To this respect, the standard proposes two mechanisms:

• The first mechanism is to limit the frequency of triggered updates. After a triggered update is sent,a timer should be set for a random interval between 1 and 5 seconds. If other changes that wouldtrigger updates occur before the timer expires, a single update is triggered when the timer expires.The timer is then reset to another random value between 1 and 5 seconds. Furthermore, a triggeredupdate should be suppressed if a regular update is due by the time the triggered update would besent.

• The second mechanism says that triggered updates do not need to include the entire routing table.In principle, only those routes which have changed need to be included. Therefore, messagesgenerated as part of a triggered update must include at least those routes that have their route changeflag set. They may include additional routes, at the discretion of the implementor; however, sendingcomplete routing updates is strongly discouraged. Split Horizon processing is done when generatingtriggered updates as well as normal updates. The only difference between a triggered update andother update messages is the possible omission of routes that have not changed. The remainingmechanisms must be applied to all updates.

Hold-Down Timer

The hold-down timer works as follows. Each router starts the hold-down timer when it first receivesinformation about a network that is no longer reachable (RIP distance=16). Until the hold-down timer

53 Chapter 5. RIP

expires, the router will discard any subsequent update messages that indicate the route is again reachable.A typical hold-down timer ranges from 60 to 120 seconds. The main advantage of the hold-down timer isthat a router will not be confused by receiving spurious information about a route being accessible, whenit was just recently told that the route was no longer valid. This provides a period of time for out-of-dateinformation to be flushed from the system. However, this has a disadvantage because the hold-down timerforces a delay in a router responding to a route once it is fixed. For example, let us suppose that a network“hiccup” causes a route to go down for five seconds. After the network is up again, the hold-down timermust expire before the router will try to use that network again. This makes using hold-down relativelyslow to respond and may lead to delays in accessing networks that fail intermittently.

As a final remark, we have to mention that the hold down timer is a not standard mechanismbut it is implemented by some routers (e.g. it is implemented by Cisco but not by the Quagga [14]implementation of Linux).

5.4 RIP Protocol

RIP messages are sent using the User Datagram Protocol (UDP) with UDP port number 520 for RIP-1and RIP-2, and 521 for RIPng. Notice that even though RIP is considered part of layer three, in termsof message exchange, RIP behaves like an application (using UDP/IP). The format of RIP messagesis version-dependent. On the other hand, RIP messages can be either sent to a specific RIP neighbor(unicast), or they can be sent to multiple neighbors (broadcast or multicast). For the three versions of RIP,we have only two basic types of messages:

• RIP Requests. Requests are messages sent by a RIP entity to another RIP entity asking it to sendback all or part of its routing table.

• RIP Responses. Responses are messages sent by a RIP entity containing all or part of its routingtable. Despite the name “response”, as we have already seen, these messages are sent most of thetime without any preceding request.

Next, we describe the message format used by each of the three versions of RIP, as well as certainspecific features not common to all versions. We begin with the description of the original RIP, also nowknown as RIP Version 1. Then, we describe the updated version of RIP called RIP Version 2 or RIP-2 and,finally, we discuss RIPng, the protocol for IP version 6 (IPv6) also called RIPv6. Technically, RIPng isnot a new version of the original RIP protocol but a new protocol closely based on RIP versions 1 and 2.

5.4.1 RIP version 1

RIP-1, the original specification of RIP, is defined in RFC1058 [7] and it uses classful network addressesbecause the message format of RIP-1 does not consider sending masks. As a result, this protocol lackssupport for subnetting or supernetting. Another limitation of RIP-1 is that there is not support for routerauthentication, making RIP vulnerable to various attacks.

Message Format

The message format of RIP-1 is illustrated in Figure 5.9.

As shown in Figure 5.9 a RIP message has three fields in the header: command type, version and“Must Be Zero”. Then, there is a number of RIP entries for each destination. The command type identifiesthe type of message: 1 indicates an request and 2 a response. Version in this case is 1. The “Must Be Zero”field is reserved and it must be set to all zeros. Finally, there are from 20 to 500 RIP Entries. RIP Entries

Chapter 5. RIP 54

Command TypeVersion

Number=1 Zeros

Address Family identifier=2 Zeros

IP Address

Zeros

Zeros

Metric

RIP entry #2

.

.

.

RIP entry #N

RIP entry #1

0 8 24 3216

Figure 5.9: The RIP-1 message format

contain the actual route information that the message is conveying. They consist of 1 to 25 sets of entries(each set has 20 entries). Each entry has the following fields:

• Address Family Identifier: (2 bytes). Identifies the type of address in entry.

• Must Be Zero. (2 bytes).

• IP Address: (4 bytes). The address of the IP destination. This is a network or host address.



• Metric: (4 bytes). The RIP distance to the IP address or network.

The first thing that comes to mind looking at this message format is that it seems that there is a lotof reserved space. This seeming wastefulness is actually an artifact of the generality of the original RIPdesign. The protocol was intended to be able to support a variety of different routign protocols for differentnetwork layers (not just IP). So, the Address Family Identifier was included to specify address type, andRIP entries were made large enough to handle large addresses. As IP only requires 4 bytes per address,some space is not used.

UDP/IP Parameters

RIP messages are sent using the UDP/IP network. Regarding the IP layer, the RIP-1 entity can select aunicast transmission by setting the destination IP of the neighbor or a broadcast transmission by setting theuniversal broadcast IP address 255.255.255.255. Regarding the UDP layer, RIP-1 uses the UDP reservedport number 520. The UDP port numbers in RIP-1 are used as follows:

• RIP Request messages are sent to UDP destination port 520. They may have a source port of 520 ormay use an ephemeral port number.

• RIP Response messages sent in reply to an RIP Request are sent with a source port of 520, and adestination port equal to whatever source port the RIP Request used.

• Unsolicited RIP Response messages (sent on a routine basis and not in response to a request) aresent with both the source and destination ports set to 520.

55 Chapter 5. RIP

5.4.2 RIP version 2

RIP-2 represents a very modest change to the basic Routing Information Protocol. The new featuresintroduced in RIP-2 are described as “extensions” to the basic protocol. The five RIP-2 extensions are:

• Classless Addressing Support and Subnet Mask Specification. RIP-2 adds explicit support forsubnets by allowing a subnet mask within the route entry for each network address. RIP-2 providessupport for fixed-length subnet masking (FLSM), variable-length subnet masking (VLSM) andclassless addressing (CIDR).

• Use of Multicasting. To help reduce network load, RIP-2 allows routers to be configured to usemulticast with the address 224.0.0.9.

• Next Hop Specification. The immediate next hop IP address to which packets to the destinationspecified by this route entry should be forwarded. Specifying a value of 0.0.0.0 in this field indicatesthat routing should be via the originator of the RIP advertisement. An address specified as a nexthop must, per force, be directly reachable on the logical subnet over which the advertisement ismade. The purpose of the Next Hop field is to eliminate packets being routed through extra hops inthe system. It is particularly useful when RIP is not being run on all of the routers on a network.Note that Next Hop is an "advisory" field. That is, if the provided information is ignored, a possiblysub-optimal, but absolutely valid, route may be taken. If the received Next Hop is not directlyreachable, it should be treated as 0.0.0.0.

• Authentication. RIP-2 provides an optional authentication scheme, which allows routers to ascertain the identity of a router before it will accept RIP messages from it.

• Route Tag. Each RIP-2 entry includes a Route Tag field, where additional information about aroute can be stored. This information is propagated along with other data about the route.

Command TypeVersion

Number=2 Must be zero

Address Family identifier=2 Route Tag

IP Address

Subnet Mask

Next Hop

Metric

Route Table entry (RTE) #2

.

.

.

Route Table entry (RTE) #N

RTE #1

0 8 24 3216

Figure 5.10: RIP Version 2 (RIP-2) Message Format

Message Format

As you can observe in Figure 5.10, the basic message format for RIP-2 is also pretty much the same as itwas for RIP-1, with the Version field of course set to 2, to clearly identify the message as being RIP-2. As

Chapter 5. RIP 56

you can observe, for compatibility, RIP-2 uses the same basic message format as RIP-1, putting the extrainformation required for its new features into some of the unused fields of the RIP-1 message format. RIPEntries are now called Route Table Entries (RTEs). There are from 20 to 500 RTE as in RIP-1 but whenauthentication is used, one of the RTEs contains authentication information, limiting the message to 24“real” RTEs (for further information review the RFC 2082 [15]). Each RTE is 20 bytes long and has thefollowing subfields:

• Address Family Identifier: (2 bytes). Same meaning as for RIP-v1.

• Route Tag: (2 bytes). Additional information for the route.

• IP Address: (4 bytes). The IP address of the RIP destination network or host.

• Subnet Mask. (4 bytes). The IP netmask of the RIP destination network or host.

• Next Hop: (4 bytes). The IP address of the next hop router for the RIP destination.

• Metric: (4 bytes). The RIP distance to the destination.

As you could observe, the unused fields allow the new RIP-2 features to be implemented withoutchanging the basic structure of the RIP entry format. This allows RIP-1 and RIP-2 messages and devicesto coexist in the same network. A RIP-2 device can handle both RIP-1 and RIP-2 messages, and will lookat the version number to see which version the message is. A RIP-1 device should handle both RIP-2 andRIP-1 messages the same way, simply ignoring the extra RIP-2 fields it doesn’t understand.

UDP/IP Parameters

RIP-2 messages are exchanged using the same basic mechanism as RIP-1 messages, that is to say, usingthe UDP/IP network. However, to help to reduce the network load, RIP-2 allows routers to be configuredto use multicast instead of broadcast for sending out unsolicited RIP Response messages. In this case,UDP/IP datagrams are sent out using the special reserved multicast address 224.0.0.9. All routers on aRIP-2 domain must use multicast for this feature to work properly.

5.4.3 RIPng

RIPng is the IPv6-compatible version of RIP for IPv6. RIPng, which is also occasionally seen as RIPv6for obvious reasons, was designed to be as similar as possible to the current version of RIP for IPv4,which is RIP Version 2 (RIP-2). In fact, RFC 2080 [9], the standard that describes RIPng, says that RIPngrepresents the minimum change possible to RIP to allow it to work on IPv6. Despite this effort, it wasnot possible to define RIPng as just a new version of the older RIP protocol because of the change in thelength of the addresses: from 32-bit in IPv4 to 128-bit addresses in IPv6. This forced a new messageformat for RIPng. The main differences between RIPv2 and RIPng are:

• Support of IPv6 networking.

• The maximum number of RTEs in RIPng is not restricted to 25 as it is in RIP-2 (and also RIP-1). Itis limited only by the maximum transmission unit (MTU) of the network over which the message isbeing sent.

• While RIPv2 supports authentication, RIPng does not include its own authentication mechanism.It is assumed that if authentication and/or encryption are needed, they will be provided using thestandard IPSec features defined for IPv6 at the IP layer. This is more efficient than implementingauthentication for each individual protocol.

• RIPv2 allows attaching arbitrary tags to routes, RIPng does not.

57 Chapter 5. RIP

• RIPv2 encodes the next-hop into each route entries, RIPng requires specific encoding of the nexthop for a set of route entries. Due to the large size of IPv6 addresses, including a Next Hop fieldin the format of RIPng RTEs would almost double the size of every entry. Since Next Hop is anoptional feature, this would be wasteful. Instead, when a Next Hop is needed, it is specified in aseparate routing entry.

Message Format

The message format for RIPng is similar to that of RIP-1 and RIP-2, except for the format of the RouteTable Entries (see Figure 5.11).

Command TypeVersion

Number=1 Must be zero

IPv6 Prefix(128bits)

Route Tag Prefix Length Metric

Route Table entry (RTE) #2

.

.

.

Route Table entry (RTE) #N

RTE #1

0 8 24 3216

Figure 5.11: RIPng Message Format

The Version Number is set to 1 (not 6, since this is the first version of the new protocol RIPng. Thenumber of Route Table Entries (RTEs) is variable. The limit of 25 entries per message has also beeneliminated. Each RTE is 20 bytes long and has the following subfields:

• IPv6 Prefix: (16 bytes) The 128-bit IPv6 address of the network.

• Route Tag: (2 bytes) Additional information to be carried with this route.

• Prefix Len: (1 byte) The number of bits of the IPv6 address that is the network portion. (is analogousto an IPv4 subnet mask).

• Metric: (1 bytes) The RIP distance for the network indicated by the IP address.

UDP/IP Parameters

RIPng uses multicasts for transmissions, using reserved IPv6 multicast address FF02::9. Since RIPng isa new protocol, it cannot use the same UDP reserved port number 520 used for RIP-1/RIP-2. Instead,RIPng uses well-known port number 521. The semantics for the use of this port is the same as those usedfor port 520 in RIP-1 and RIP-2.

Chapter 5. RIP 58

5.5 Limitations of RIP

The RIP protocol does not solve every possible routing problem. RIP is primary intended for use as anIGP in networks of moderate size. In addition, according to the standard, the following specific limitationsare be mentioned:

• The protocol is limited to networks whose longest path (the network’s diameter) is 15 hops. Thedesigners of RIP believe that the basic protocol design is inappropriate for larger networks. Notethat this statement of the limit assumes that a cost of 1 is used for each network. This is the wayRIP is normally configured. If the system administrator chooses to use larger costs, the upper boundof 15 can easily become a problem.

• RIP depends upon “counting to infinity” to resolve certain unusual situations. If the system ofnetworks has several hundred networks, and a routing loop was formed involving all of them, theresolution of the loop would require either much time (if the frequency of routing updates werelimited) or bandwidth (if updates were sent whenever changes were detected). Such a loop wouldconsume a large amount of network bandwidth before the loop was corrected. However, variousprecautions are taken that should prevent these problems in most cases.

• RIP uses fixed “metrics” to compare alternative routes. It is not appropriate for situations whereroutes need to be chosen based on real-time parameters such a measured delay, reliability, or load.The obvious extensions to allow metrics of this type are likely to introduce instabilities of a sortthat the protocol is not designed to handle.

59 Chapter 5. RIP

h11

eth4

eth3

eth1

eth2

eth1

eth2

eth1

eth2

eth3

eth1eth2

r3

eth4

SW2

SW3

192.168.3.0/24

SW5

SW4

eth1

192.168.4.0/24

192.168.2.0/24

h33eth1

192.168.0.0/24

192.168.5.0/24

SW7

SW1

192.168.1.0/24

r4

eth1SW6

172.16.0.0/16

SW0 r222eth3

h223

192.168.0.128/25

eth1

External Nets(Internet)

r5

RIP10.10.10.10/24 10.20.20.20/2410.20.20.20/24

r1

Figure 5.12: Scenario for testing RIP

5.6 RIP Practices

Note. Read the Chapter 6 before starting with these practices.

Exercise1– The goal of this practice is to study RIP version 1 (RIPv1). Remember that RIPv1 is“classful“. The networking scheme for the exercise is shown in Figure 5.12. Interfaces are initiallyconfigured with an IP address whose IP HostID is set to the number on the name of each machine.For example, the network interface eth1 of r222 is set to 192.168.3.222. Also, tapX interfaces areconfigured in the physical host, where X is the number of the switch. The hosts have a default route totheir corresponding routers. Start the simulation by executing the following command:

host:~# simctl rip start

After the scenario has been started, you have to load the initial configuration. Execute the initial labelwith the following command:

phyhost$ simctl rip exec initial

In this exercise, we will only use the hosts (h11, h223 and h33) and the routers r1, r222 and r3.

1. initial Execute a ping from router r3 to 192.168.2.1, 192.168.3.1 and 192.168.4.11. Discuss theresults.

2. Start a capture on tap2. In r3, open the Quagga command tool and add the networks 192.168.1.0/24and 192.168.2.0/24 to RIP:

root@r3:~# vtyshr3# configure terminalr3(config)# router ripr3(config-router)# version 1r3(config-router)# network 192.168.1.0/24r3(config-router)# network 192.168.2.0/24

Explain the RIP response messages that you observe in tap2. In your explanation include theMAC addresses (L2), IP addresses (L3), ports (L4) and the RIP information.

Chapter 5. RIP 60

3. Capture on tap2 and type the following RIP command in r3:

r3(config-router)# neighbor 192.168.2.1

Describe what the command does and explain why you receive an error message (ICMP) from192.168.2.1. To finish this exercise disable the neighbor with:

r3(config-router)# no neighbor 192.168.2.1

4. Capture in tap2 and then, in r3, set eth3 down:

root@r3:~# ifconfig eth3 down

Describe the RIP response messages captured waiting at least for 2 minutes.

5. In r3, set eth3 up. Describe and explain the RIP response messages that you capture on tap2 atleast during 30 seconds.

6. In r3, remove the network 192.168.1.0/24 from RIP.

Describe the RIP response messages that you observe in tap2. Wait at least for 2 minutes to endthe capture.

Finally, in r3, remove also the network 192.168.2.0/24 from RIP.

7. initial Capture in all the networks to which r1 is connected and, in this router, type the followingRIP command:

r1(config-router)# network 192.168.0.0/16

In which networks do yo see a RIP response packet? Why?

Describe the RIP messages including L2 MAC addresses, L3 IP addresses, L4 ports and the RIPinformation.

8. Do you think that it makes sense to send RIP response messages through the 192.168.4.0/24network? For example, it makes sense to send RIP response messages to h11?

9. Capture on tap2 and type the following command in r1:

r1(config-router)# no network 192.168.0.0/16

Wait for a few seconds, do yo see any RIP message? why?

Then, in r1, in less than 2 minutes, activate RIP for the networks 192.168.2.0/24 and 192.168.3.0/24.

Explain the RIP messages captured on tap2.

10. Capture on tap2 and tap3. In r3, start RIPv1 for the network 192.168.5.0/24.

Do you observe RIP messages on tap2 from r3? why?

Now, in r3, start RIPv1 for the network 192.168.2.0/24.

For the captured traffic, explain the networks that you observe in the RIP response messages. Inaddition, explain if triggered updates, split horizon and poison reverse are activated in r1 and howcan you know this.

11. initial, ripv1-b For your information, currently RIPv1 has been activated as follows:

61 Chapter 5. RIP

r1.v1.eth2.192.168.2.0/24.activer1.v1.eth3.192.168.3.0/24.active------------------------------------r3.v1.eth1.192.168.5.0/24.activer3.v1.eth2.192.168.2.0/24.active

To set this configuration you can use the labels of simctl initial and then ripv1-a .

Try a ping from r3 to 192.168.3.1.

Does it work? why?

Try a ping from r3 to 192.168.4.11.

Does it work? why?

Explain the RIB and FIB of the router r3.

• The RIB entries for RIP can be shown entering in Quagga and typing:

router# show ip rip

• The FIB/RIB entries for all the protocols can be shown in Quagga with the command:

router# show ip route

You can also see the FIB on a linux command-line typing:

root@r3:~# route -n

12. Capture on tap4 and set the interface eth4 in r1 as passive. Then, add the network 192.168.4.0/24to RIP and try again a ping from r3 to 192.168.4.11.

Does it work now? why?

Do you see RIP messages on tap4? Why?

Explain the RIB and FIB of the router r3.

13. initial, ripv1-b Capture on tap3 and tap5.

In r3, add 192.168.5.0/24 to RIP and also add 192.168.0.0/24 to RIP but as passive.

In r222, add 192.168.3.0/24 and 192.168.5.0/24 to RIP.

For your information, currently RIPv1 has been activated as follows (label ripv1-b):

r1.v1.eth2.192.168.2.0/24.activer1.v1.eth3.192.168.3.0/24.activer1.v1.eth4.192.168.4.0/24.pasive------------------------------------r222.v1.eth1.192.168.3.0/24.activer222.v1.eth2.192.168.5.0/24.active------------------------------------r3.v1.eth1.192.168.5.0/24.activer3.v1.eth2.192.168.2.0/24.activer3.v1.eth3.192.168.1.0/24.activer3.v1.eth4.192.168.0.0/24.pasive

Explain the RIP messages sent by r222 and its RIP RIB.

A ping from r222 to 172.16.0.1 should work?

Chapter 5. RIP 62

14. Another way of including routes to RIP is to use “redistribution”. Redistribute the connectednetworks of r1 and r222 using the following Quagga command:

router(config-router)# redistribute connected

Describe the entries (if any) of the networks 172.16.0.0/16 and 192.168.0.128/25 present on the RIBof r1 and r222. Observing the RIB of the different routers, explain if you notice any differencesbetween the networks distributed with redistribution and the networks distributed with the networkcommand.

15. Try a ping from r222 to 172.16.0.1. Does it work this time? why? Why do you think that192.168.0.128/25 has not been redistributed? To give you a clue, recall that 172.16 is a class B.Try changing the IP address of the eth1 of r1 to 172.16.0.1/24. Is this network redistributednow? why? When you finish this exercise, restore the IP address again on the eth1 of r1 to172.16.0.1/16.

16. Start a capture on tap4 and tap7. Try a ping from h223 to h11. Does it work? Discuss theresult explaining the messages that you observe (including ICMP and ARP) on each tap interface.

Describe the final configuration after this exercise as:

router.(v1/v2).interface.network.(active/passive/redistr/failed)

17. (*) Set eth4 down on r1 and r3. After 10 seconds, analyse the FIB of r222. Describe theinformation that you see about these networks and explain the reason.

Set eth4 up on r1 and r3. Then, on r1 type no network 192.168.4.0/24 and on r3 typeno network 192.168.0.0/24. After 10 seconds, observe the FIB of r222. Describe theinformation that you see about these networks and explain the reason. Add again 192.168.4.0/24and 192.168.0.0/24 to RIP in their respective routers.

63 Chapter 5. RIP

Exercise2– The goal of this exercise is to test more features of RIP including its version 2 (RIPv2). Thenetworking scheme for the exercise is the same as in the previous exercise (Figure 5.12).

1. initial,ripv2-a Start a capture on tap2. We start with the configuration of the previous exercisebut this time with RIP version 2 in r1, r222 and r3. You can do this using the configuration of theprevious exercise and setting version 2 with Quagga or executing sequentially the labels initial andripv2-a. After approximately 10 seconds, try a ping from h223 to h11.

Does it work now? Why?

Explain the entries in the RIB of the router r3 and the RIP messages that you observe in tap2.

2. Capture in tap2 and add all the interfaces of r4 to RIP with the Quagga network command.

Then, using the command ip as explained in Chapter 6, set an IP address of the form 192.168.100.X/32in each loopback interface of each RIP router (r1, r222, r3 and r4), where X is the number of router(192.168.100.1 for r1 and so on). Then, add these IP addresses to RIP and check that now you canaccess to any router from h11 using the loopback addresses.

Describe the commands introduced in each router and the responses sent through tap2 by r1 inthe steady state.

Explain also the entries in the RIB of the routers r1 and r4.

3. initial,ripv2-a,ripv2-b For the rest of the exercise, we simulate a link has been broken.

Capture in tap3 and set eth2 down in r3.

After the link is down, describe the responses that you observe in tap3 during at least 5 minutes.In addition, describe the entries in the RIB of the router r1.

4. Try a ping from h11 to the loopback address of r3. Does it work? Which is the path that ICMPmessages follow? Under the current network state (link eth2 of r3 is down), is this path optimal?

Note. You can use the traceroute command.

Now, try a ping from h11 to the loopback address of r4. Does it work? Which is the path that ICMPmessages follow? Under the current network state, is this path optimal? Why? Can you change theRIP configuration of any router to make this path more efficient (i.e. use a shorter path)?

5. In r1, start RIP for the network 172.16.0.0/16, configure a static route to the network 10.10.10.0/24and redistribute static routes.

Try a ping from h33 to 10.10.10.10 and describe the path. It is optimal? why? Discuss how thenetwork 10.10.10.0/24 is announced by r1 through eth1 and eth3.

6. initial,ripv2-a,ripv2-b,ripv2-c In this exercise, we deal with default routes and how they can beincluded in a RIP domain. In first place, explain in which RIP routers of our topology you canyou originate a default route and why? Then, select one of these routers, create a default route andoriginate it for RIP.

Capture in tap6 and describe the traffic captured when you send a ping from h33 and from h11 tothe host in the Internet 10.20.20.20. Use three requests for each ping (option -c3). Discuss the paththat the ICMP messages follow.

Can you improve the configuration with more routers originating the default route in RIP? If so,explain an test your configuration.

Chapter 5. RIP 64

5.7 Answers to practices

Exercise 1

1. The direct routes are configured, so the ping from r1 to 192.168.2.3 works but not the pingsto 192.168.5.3 and 192.168.0.33 because they are not directly connected to r1. We get “network inunreachable”.

2. We type the following:

root@r1:~# vtyshr1# configure terminalr1(config)# router ripr1(config-router)# version 1r1(config-router)# network 192.168.1.0/24r1(config-router)# network 192.168.2.0/24

We observe in tap2:

MAC: src IF address dst ff:ff:ff:ff:ff:ffIP: src 192.168.2.3 dst 192.168.2.255UDP: src and dst 520RIP: 192.168.1.0 Metric 1

3. This command severs to use the unicast address of the neighbor instead of the broadcast address(RIPv1) or multicast address (RIPv2). We send a unicast RIP message but as RIP is not yet activated in r1we obtain an ICMP message Destination unreachable (Port unreachable) from 192.168.2.1 to 192.168.2.3.

4. We type:

root@r3:~# ifconfig eth3 down

We see five or six RIP responses (120 seconds) with:

RIP: 192.168.1.0 Metric 16

The first response is due to the triggered update. The other 4 or 5 resposes are related with the factthat the garbage collection timer is set by default to 120 seconds. Recall that the garbage collector is thetime that the route is at the RIP database with metric=16 (infinity).

5. We type:

root@r3:~# ifconfig eth3 up

We see again RIP response messages including:

RIP: 192.168.1.0 Metric 1

6. We type:

no network 192.168.1.0/24

It happens the same as when we set the interface down:

RIP: 192.168.1.0 Metric 16

Finally, we remove eth2 from RIP:

no network 192.168.2.0/24

65 Chapter 5. RIP

Obviously, we don’t see any message on tap2.7. We type the following:


We can see RIP messages in tap2, tap3 and tap4 but not on tap6. This is because the network172.16.0.0/16 is not included in the network command. r1 announces on:

tap2

MAC: src IF address dst ff:ff:ff:ff:ff:ffIP: src 192.168.2.1 dst 192.168.2.255UDP: src and dst 520RIP:192.168.3.0 Metric 1192.168.4.0 Metric 1

tap3

MAC: src IF address dst ff:ff:ff:ff:ff:ffIP: src 192.168.3.1 dst 192.168.3.255UDP: src and dst 520RIP:192.168.2.0 Metric 1192.168.4.0 Metric 1

8. It makes no sense to send RIP responses through the 192.168.4.0/24 network because there is notany router there understanding RIP.

9. We type:


As we have completely disabled RIP we do not see any message on tap2.Then, we type:

r1(config-router)# network 192.168.2.0/24r1(config-router)# network 192.168.3.0/24

When we activate RIP in eth2 and eth3, we see on tap2 RIP responses announcing:

192.168.3.0 Metric 1192.168.4.0 Metric 16

This is because 192.168.4.0 is now unreachable for RIP but it is still in the RIP routing table.10. We type:

root@r3:~# vtyshr3# configure terminalr3(config)# router ripr3(config-router)# version 1r3(config-router)# network 192.168.5.0/24

We do not observe any message from r3 on tap2 because RIP is not activated on this interface. Then,in r3 we activate RIP in eth2:


Chapter 5. RIP 66

Triggered updates are activated:On tap2 192.168.2.3 (r3) announces 192.168.5.0/24.On tap3 192.168.3.1 (r1) announces 192.168.5.0/24 (metric 2) alone when it hears this network ontap2.Therefore, we can conclude that triggered updates are activated on r1.In the regular updates:On tap2 192.168.2.3 (r3) announces 192.168.5.0/24.On tap2 192.168.2.1 (r1) announces 192.168.3.0/24 (metric 1). Since r1 does not announce the 5.0/24network to r3, we can conclude that split horizon is activated. Notice also that the poisoned reverse is notactivated (metric 16 for the reverse path).On tap3 192.168.3.1 (r1) announces 192.168.2.0/24 (metric 1) and 192.168.5.0/24 (metric 2).

11. First ping now OK:

root@r1:~# ping -c1 192.168.5.3PING 192.168.5.3 (192.168.5.3) 56(84) bytes of data.64 bytes from 192.168.5.3: icmp_req=1 ttl=64 time=0.940 ms

--- 192.168.5.3 ping statistics ---1 packets transmitted, 1 received, 0% packet loss, time 0msrtt min/avg/max/mdev = 0.940/0.940/0.940/0.000 ms

Second ping from r3 KO because 192.168.4.0/24 is not under RIP:

root@r3:~# ping -c1 192.168.4.11connect: Network is unreachable

FIB of r3

r3# show ip routeCodes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF,

I - ISIS, B - BGP, > - selected route, * - FIB routeC>* 127.0.0.0/8 is directly connected, loC>* 192.168.0.0/24 is directly connected, eth4C>* 192.168.1.0/24 is directly connected, eth3C>* 192.168.2.0/24 is directly connected, eth2R>* 192.168.3.0/24 [120/2] via 192.168.2.1, eth2, 00:33:34C>* 192.168.5.0/24 is directly connected, eth1

root@r3:~# route -nKernel IP routing tableDestination Gateway Genmask Flags Metric Ref Use Iface192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth4192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth3192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 eth2192.168.3.0 192.168.2.1 255.255.255.0 UG 2 0 0 eth2192.168.5.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1

RIB of r3

r3# show ip ripCodes: R - RIP, C - connected, S - Static, O - OSPF, B - BGPSub-codes:(n) - normal, (s) - static, (d) - default, (r) - redistribute, (i) - interface

Network Next Hop Metric From Tag TimeC(i) 192.168.2.0/24 0.0.0.0 1 self 0R(n) 192.168.3.0/24 192.168.2.1 2 192.168.2.1 0 02:56C(i) 192.168.5.0/24 0.0.0.0 1 self 0

67 Chapter 5. RIP

The metric of the connected networks shown with route -n is 0 because the Quagga RIP daemonhas not installed these routes in the FIB. However, you see the correct metric (1) in the RIB of RIP. UG=means usable and needs gateway (router).

12. To make the interface passive:

r1(config-router)# passive-interface eth4r1(config-router)# network 192.168.4.0/24

Now the ping works because 192.168.4.0/24 is announced by r1. There aren’t RIP messages on tap4because the interface is passive.

RIB of r3


Network Next Hop Metric From Tag TimeC(i) 192.168.2.0/24 0.0.0.0 1 self 0R(n) 192.168.3.0/24 192.168.2.1 2 192.168.2.1 0 02:55R(n) 192.168.4.0/24 192.168.2.1 2 192.168.2.1 0 02:59C(i) 192.168.5.0/24 0.0.0.0 1 self 0

FIB of r3

r3# show ip routeCodes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF,

I - ISIS, B - BGP, > - selected route, * - FIB routeC>* 127.0.0.0/8 is directly connected, loC>* 192.168.0.0/24 is directly connected, eth4C>* 192.168.1.0/24 is directly connected, eth3C>* 192.168.2.0/24 is directly connected, eth2R>* 192.168.3.0/24 [120/2] via 192.168.2.1, eth2, 00:45:24R>* 192.168.4.0/24 [120/2] via 192.168.2.1, eth2, 00:01:41C>* 192.168.5.0/24 is directly connected, eth1

13. Activate the RIP as mentioned:

r3# configure terminalr3(config)# router ripr3(config-router)# version 1r3(config-router)# passive-interface eth4r3(config-router)# network 192.168.0.0/24r3(config-router)# network 192.168.1.0/24

r222# configure terminalr222(config)# router ripr222(config-router)# version 1r222(config-router)# network 192.168.3.0/24r222(config-router)# network 192.168.5.0/24

RIPv1 has been activated as follows:

r1.eth2.192.168.2.0/24.activer1.eth3.192.168.3.0/24.activer1.eth4.192.168.4.0/24.pasive------------------------------------r222.eth1.192.168.3.0/24.activer222.eth2.192.168.5.0/24.active------------------------------------r3.eth1.192.168.5.0/24.active

Chapter 5. RIP 68

r3.eth2.192.168.2.0/24.activer3.eth3.192.168.1.0/24.activer3.eth4.192.168.0.0/24.pasive

The RIP responses of r222 on tap3:

192.168.0.0 Metric 2192.168.1.0 Metric 2192.168.5.0 Metric 1

The RIP responses of r222 on tap5:

192.168.2.0 Metric 2192.168.3.0 Metric 1192.168.4.0 Metric 2

The routing tables are:


Network Next Hop Metric From Tag TimeR(n) 192.168.0.0/24 192.168.5.3 2 192.168.5.3 0 02:48R(n) 192.168.1.0/24 192.168.5.3 2 192.168.5.3 0 02:48R(n) 192.168.2.0/24 192.168.3.1 2 192.168.3.1 0 02:54C(i) 192.168.3.0/24 0.0.0.0 1 self 0R(n) 192.168.4.0/24 192.168.3.1 2 192.168.3.1 0 02:54C(i) 192.168.5.0/24 0.0.0.0 1 self 0

Notice that as r222 has learned 192.168.2.0/24 from r1 it can send this route to r3 (split horizon).A ping from r222 to 172.16.0.1 should NOT work because 172.16.0.0/16 is not currently under RIP.14. We type redistribute connected on r1 and r222.Now, the RIP routing tables on r1 and r222 are:

r1# show ip ripCodes: R - RIP, C - connected, S - Static, O - OSPF, B - BGP(n) - normal, (s) - static, (d) - default, (r) - redistribute, (i) - interface

Network Next Hop Metric From Tag TimeC(r) 172.16.0.0/16 0.0.0.0 1 self (connected:1) 0R(n) 192.168.0.0/24 192.168.2.3 2 192.168.2.3 0 02:46R(n) 192.168.1.0/24 192.168.2.3 2 192.168.2.3 0 02:46C(i) 192.168.2.0/24 0.0.0.0 1 self 0C(i) 192.168.3.0/24 0.0.0.0 1 self 0C(i) 192.168.4.0/24 0.0.0.0 1 self 0R(n) 192.168.5.0/24 192.168.3.222 2 192.168.3.222 0 02:45

r222# show ip ripNetwork Next Hop Metric From Tag Time

R(n) 172.16.0.0/16 192.168.3.1 2 192.168.3.1 0 02:33R(n) 192.168.0.0/24 192.168.5.3 2 192.168.5.3 0 02:55C(r) 192.168.0.128/25 0.0.0.0 1 self (connected:1) 0R(n) 192.168.1.0/24 192.168.5.3 2 192.168.5.3 0 02:55R(n) 192.168.2.0/24 192.168.3.1 2 192.168.3.1 0 02:33C(i) 192.168.3.0/24 0.0.0.0 1 self 0R(n) 192.168.4.0/24 192.168.3.1 2 192.168.3.1 0 02:33C(i) 192.168.5.0/24 0.0.0.0 1 self 0

We see 172.16.0.0/16 in the routing table of r222 because it has been redistributed.

69 Chapter 5. RIP

The network 192.168.0.128/25 isn’t redistributed because RIPv1 is a classful routing protocol and itdoes not support sub-netted networks.

You can see the difference in the RIP routing table:C(i) 192.168.1.0/24 is a directly connected network announced with “network”.C(r) 172.16.0.0/16 is a directly connected network announced with “redistribute connected”.15. Obviously, now the ping from r222 to 172.16.0.1 works.If we change the IP address of the eth1 of r1 to 172.16.0.1/24, the network is not redistributed.16. The ping does not work and we see ARPs trying to find 192.168.0.223 on the network of SW7.

Since this IP address is not active on the network connected with SW7, the ARPs fail and r3 sends anICMP destination unreachable to 192.168.4.11 (the origin of the echo-replay). The origin of the ping(h223) never knows what happened.

Now, RIPv1 has been activated as follows:

r1.v1.eth1.172.16.0.0/16.redistrr1.v1.eth2.192.168.2.0/24.activer1.v1.eth3.192.168.3.0/24.activer1.v1.eth4.192.168.4.0/24.pasive------------------------------------r222.v1.eth1.192.168.3.0/24.activer222.v1.eth2.192.168.5.0/24.activer222.v1.eth3.192.168.0.128/25.failed------------------------------------r3.v1.eth1.192.168.5.0/24.activer3.v1.eth2.192.168.2.0/24.activer3.v1.eth3.192.168.1.0/24.activer3.v1.eth4.192.168.0.0/24.pasive

17.(*) When we set the interfaces down, both networks disappear from the kernel routing table ofr222.

When we remove the networks from RIP, the network 192.168.4.0/24 is still present in the routingtable of r222. This is because we have redistributed connected networks on r1. On the other hand,the network 192.168.0.0/24 disappears from the routing table of r222 because we do not redistributeconnected on r3.

We add again the networks to RIP as specified:



Exercise 2

1. On each router (r1, r222 and r3), we type:

r1# configure terminalr1(config)# router ripr1(config-router)# version 2

Or we execute the labels initial,ripv2-a .Now we see R(n) 192.168.0.128/25:


R(n) 172.16.0.0/16 192.168.2.1 2 192.168.2.1 0 02:38

Chapter 5. RIP 70

C(i) 192.168.0.0/24 0.0.0.0 1 self 0R(n) 192.168.0.128/25 192.168.5.222 2 192.168.5.222 0 02:50C(i) 192.168.1.0/24 0.0.0.0 1 self 0C(i) 192.168.2.0/24 0.0.0.0 1 self 0R(n) 192.168.3.0/24 192.168.5.222 2 192.168.5.222 0 02:50R(n) 192.168.4.0/24 192.168.2.1 2 192.168.2.1 0 02:38C(i) 192.168.5.0/24 0.0.0.0 1 self 0

The ping works:

root@h223:~# ping -c1 192.168.4.11PING 192.168.4.11 (192.168.4.11) 56(84) bytes of data.64 bytes from 192.168.4.11: icmp_req=1 ttl=62 time=2.27 ms

--- 192.168.4.11 ping statistics ---1 packets transmitted, 1 received, 0% packet loss, time 0msrtt min/avg/max/mdev = 2.275/2.275/2.275/0.000 ms

Internet Protocol Version 4Source: 192.168.2.1 (192.168.2.1)Destination: 224.0.0.9 (224.0.0.9)

User Datagram Protocol, Src Port: router (520), Dst Port: router (520)Routing Information Protocol

Command: Response (2)Version: RIPv2 (2)IP Address: 172.16.0.0, Metric: 1

Address Family: IP (2)Route Tag: 0IP Address: 172.16.0.0 (172.16.0.0)Netmask: 255.255.0.0 (255.255.0.0)Next Hop: 0.0.0.0 (0.0.0.0)Metric: 1

IP Address: 192.168.0.128, Metric: 2IP Address: 192.168.3.0, Metric: 1IP Address: 192.168.4.0, Metric: 1

Internet Protocol Version 4Source: 192.168.2.3 (192.168.2.3)Destination: 224.0.0.9 (224.0.0.9)

User Datagram Protocol, Src Port: router (520), Dst Port: router (520)Routing Information Protocol

IP Address: 192.168.0.0, Metric: 1IP Address: 192.168.0.128, Metric: 2IP Address: 192.168.1.0, Metric: 1IP Address: 192.168.3.0, Metric: 2IP Address: 192.168.5.0, Metric: 1

2. We add the interfaces of r4:

r4(config-router)# network 192.168.1.0/24r4(config-router)# network 172.16.0.0/24

Then, we add the loopback addresses in each router and announce them in RIP:

root@rX:~# ip address add 192.168.100.4/32 dev lorX(config-router)# network 192.168.100.4/32

Or execute labels: initial,ripv2-a,ripv2-bResponses in tap2:

Routing Information ProtocolCommand: Response (2)Version: RIPv2 (2)IP Address: 172.16.0.0, Metric: 1

71 Chapter 5. RIP

IP Address: 192.168.0.128, Metric: 2IP Address: 192.168.3.0, Metric: 1IP Address: 192.168.4.0, Metric: 1IP Address: 192.168.5.0, Metric: 2IP Address: 192.168.100.1, Metric: 1IP Address: 192.168.100.222, Metric: 2


C(r) 172.16.0.0/16 0.0.0.0 1 self (connected:1) 0R(n) 192.168.0.0/24 192.168.2.3 2 192.168.2.3 0 02:54R(n) 192.168.0.128/25 192.168.3.222 2 192.168.3.222 0 02:43R(n) 192.168.1.0/24 192.168.2.3 2 192.168.2.3 0 02:54C(i) 192.168.2.0/24 0.0.0.0 1 self 0C(i) 192.168.3.0/24 0.0.0.0 1 self 0C(i) 192.168.4.0/24 0.0.0.0 1 self 0R(n) 192.168.5.0/24 192.168.3.222 2 192.168.3.222 0 02:43C(i) 192.168.100.1/32 0.0.0.0 1 self 0R(n) 192.168.100.3/32 192.168.2.3 2 192.168.2.3 0 02:54R(n) 192.168.100.4/32 192.168.2.3 3 192.168.2.3 0 02:54R(n) 192.168.100.222/32 192.168.3.222 2 192.168.3.222 0 02:43


C(i) 172.16.0.0/16 0.0.0.0 1 self 0R(n) 192.168.0.0/24 192.168.1.3 2 192.168.1.3 0 02:45R(n) 192.168.0.128/25 192.168.1.3 3 192.168.1.3 0 02:45C(i) 192.168.1.0/24 0.0.0.0 1 self 0R(n) 192.168.2.0/24 192.168.1.3 2 192.168.1.3 0 02:45R(n) 192.168.3.0/24 192.168.1.3 3 192.168.1.3 0 02:45R(n) 192.168.4.0/24 192.168.1.3 3 192.168.1.3 0 02:45R(n) 192.168.5.0/24 192.168.1.3 2 192.168.1.3 0 02:45R(n) 192.168.100.1/32 192.168.1.3 3 192.168.1.3 0 02:45R(n) 192.168.100.3/32 192.168.1.3 2 192.168.1.3 0 02:45C(i) 192.168.100.4/32 0.0.0.0 1 self 0R(n) 192.168.100.222/32 192.168.1.3 3 192.168.1.3 0 02:45

3. Before setting eth2 of r3 down, r1 sends responses through tap3 with the following content:

IP Address: 172.16.0.0, Metric: 1IP Address: 192.168.0.0, Metric: 2IP Address: 192.168.1.0, Metric: 2IP Address: 192.168.2.0, Metric: 1IP Address: 192.168.4.0, Metric: 1IP Address: 192.168.100.1, Metric: 1IP Address: 192.168.100.3, Metric: 2IP Address: 192.168.100.4, Metric: 3

And r222:

IP Address: 192.168.0.0, Metric: 2IP Address: 192.168.0.128, Metric: 1IP Address: 192.168.1.0, Metric: 2IP Address: 192.168.5.0, Metric: 1IP Address: 192.168.100.3, Metric: 2IP Address: 192.168.100.4, Metric: 3IP Address: 192.168.100.222, Metric: 1

Chapter 5. RIP 72

Then, after we set the interface down, we get a response from r1 with:

IP Address: 192.168.0.0, Metric: 16

And another respose from r3 with:

IP Address: 192.168.1.0, Metric: 16IP Address: 192.168.100.3, Metric: 16IP Address: 192.168.100.4, Metric: 16

Then, r3 receives a response from r222:

IP Address: 192.168.0.0, Metric: 2IP Address: 192.168.0.128, Metric: 1IP Address: 192.168.1.0, Metric: 2IP Address: 192.168.5.0, Metric: 1IP Address: 192.168.100.3, Metric: 2IP Address: 192.168.100.4, Metric: 3IP Address: 192.168.100.222, Metric: 1

Then, responses from r1 include the following:

IP Address: 172.16.0.0, Metric: 1IP Address: 192.168.2.0, Metric: 1IP Address: 192.168.4.0, Metric: 1IP Address: 192.168.100.1, Metric: 1

Notice that there are not more responses with metric 16 because r1 learns new routes from r222.The RIP of r1:


C(r) 172.16.0.0/16 0.0.0.0 1 self (connected:1) 0R(n) 192.168.0.0/24 192.168.3.222 3 192.168.3.222 0 02:33R(n) 192.168.0.128/25 192.168.3.222 2 192.168.3.222 0 02:33R(n) 192.168.1.0/24 192.168.3.222 3 192.168.3.222 0 02:33C(i) 192.168.2.0/24 0.0.0.0 1 self 0C(i) 192.168.3.0/24 0.0.0.0 1 self 0C(i) 192.168.4.0/24 0.0.0.0 1 self 0R(n) 192.168.5.0/24 192.168.3.222 2 192.168.3.222 0 02:33C(i) 192.168.100.1/32 0.0.0.0 1 self 0R(n) 192.168.100.3/32 192.168.3.222 3 192.168.3.222 0 02:33R(n) 192.168.100.4/32 192.168.3.222 4 192.168.3.222 0 02:33R(n) 192.168.100.222/32 192.168.3.222 2 192.168.3.222 0 02:33

Notice that now indirect routes go through 192.168.3.222.4. The ping from h11 to the loopback address of r3 works and follows the following paths:echo-request: h11→r1→r222→r3. echo-replay: r3→r222→r1→r11.The path is optimal.The ping from h11 to the loopback address of r4 works and follows the following paths:echo-request: h11→r1→r222→r3→r4. echo-replay: r4→r3→r222→r1→r11.The path is not optimal. The optimal path is h11→r1→r4. But the eth1 of r1 is not under RIP. So,

we activate eth1 of r1:

73 Chapter 5. RIP



C(i) 172.16.0.0/16 0.0.0.0 1 self 0R(n) 192.168.0.0/24 192.168.3.222 3 192.168.3.222 0 02:46R(n) 192.168.0.128/25 192.168.3.222 2 192.168.3.222 0 02:46R(n) 192.168.1.0/24 172.16.0.4 2 172.16.0.4 0 02:46C(i) 192.168.2.0/24 0.0.0.0 1 self 0C(i) 192.168.3.0/24 0.0.0.0 1 self 0C(i) 192.168.4.0/24 0.0.0.0 1 self 0R(n) 192.168.5.0/24 192.168.3.222 2 192.168.3.222 0 02:46C(i) 192.168.100.1/32 0.0.0.0 1 self 0R(n) 192.168.100.3/32 192.168.3.222 3 192.168.3.222 0 02:46R(n) 192.168.100.4/32 172.16.0.4 2 172.16.0.4 0 02:46R(n) 192.168.100.222/32 192.168.3.222 2 192.168.3.222 0 02:46

5. r1 announces 10.10.10.0/24 with next_hop=0.0.0.0 in eth3 and with next_hop=172.16.0.5 ineth1. With this feature, the path is optimal from h33 to 10.10.10.10.

6. We can set a default route in r1 or r4 because these are the routers that connect the RIP domainwith the Internet.

If we generate the default route in just one router, the configuration is inefficient. In particular,next_hop parameter of RIPv2 is not set for default routes.

For example, if we originate the default route in r1 and send a ping from h33 to 10.20.20.20, theecho-request messages go from r4 to r1 before leaving the RIP domain, making an extra hop. With r1and r4 originating the default route this problem disappears.

Chapter 5. RIP 74

Chapter 6

Linux/Quagga

6.1 Architecture

This section provides an overall overview of the routing process. The description tries to be rather generic,but in some aspects it is based on the Linux/Quagga implementation [14].

In modern implementations, a router that runs multiple routing protocols actually instantiates aseparate routing process for each protocol. To store protocol-specific routing information, there is arouting database called Routing Information Base (RIB). Depending on the particular implementationof the router, each routing process can have its own RIB or several routing processes can share a RIB.Routers do not forward packets using directly the RIB, but they forward packets according to routesstored in a protocol-agnostic table called Forwarding Information Base (FIB). In general, while theFIB contains the so-called “active routes” and it is optimized for fast lookup of destination addresses, theRIB is optimized for efficient updating by routing protocols and other control plane methods, and containthe full set of routes learned by the router. Figure 6.1 shows the routing architecture of a Linux routerusing Quagga. The next sections discuss each component in Figure 6.1.

RIB

Routesof a protocol

Dynamic routing daemons (e.g. quagga)

OSPF

RIP

Oth

er ro

ute rs

ICMPRedirects

ifconfigcommand

ip or routecommand

FIB Kernel Forwarding

cache

admindist

redistinput

redistoutput

Other protocols

longestmatch

. . .

BGP

Figure 6.1: Routing architecture with Linux and Quagga.

75

Chapter 6. Linux/Quagga 76

6.2 Routing daemons

Dynamic routing protocols internally operate based on a Routing Information Base (RIB) that hasawareness of preference/administrative distance. The “essence” (a single best prefix entry among severalpossible candidates) of these protocol daemon RIBs is presented to the operating system via a protocolmaster daemon such as Quagga in a consolidated metric-only fashion. This prefix is often referred to asactive route. Therefore, the FIB is a collection of active routes.

Figure 6.1 shows a schematic representation of the a routing engine (Quagga) that includes the routingdaemons for Open Shortest Path First (OSPF), Routing Information Protocol (RIP) and Border GatewayProtocol (BGP). These protocol daemons directly exchange signaling information over IP networks.

6.3 FIB and Host Forwarding Cache

Internet Control Message Protocol (ICMP) redirects, as well as manual route instructions via route, ipor ifconfig commands, can alter the FIB structures.

The FIB is altered by the route or ip command, by the ifconfig command and by the ICMPredirect messages. On the other hand, the Kernel Forwarding Cache contains IP-related information toreach next hop for particular destinations. When a routing lookup occurs, the cache table is consulted firstand as a fallback mechanism, the FIB is then consulted. This triggers the lookup result to be placed in thecache and speed up future lookups. In a Linux router you can show the FIB with either the route or theip command.

As shown, Linux routers provide a FIB and a derived Kernel Forwarding Cache.

# route -FnKernel IP routing tableDestination Gateway Genmask Flags Metric Ref Use Iface192.168.99.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo0.0.0.0 192.168.99.254 0.0.0.0 UG 0 0 0 eth0# ip route show192.168.99.0/24 dev eth0 scope link127.0.0.0/8 dev lo scope linkdefault via 192.168.99.254 dev eth0

The -F option tells the route command to operate on the kernel’s FIB (this is the default, i.e. route-Fn is equivalent to route -n). To show the cache you can use:

# route -nCKernel IP routing cacheSource Destination Gateway Flags Metric Ref Use Iface192.168.1.254 192.168.1.1 192.168.1.1 il 0 0 1 lo192.168.1.1 192.168.1.254 192.168.1.254 0 0 1 eth2192.168.1.1 192.168.1.254 192.168.1.254 0 0 0 eth2# ip route show cache192.168.1.254 from 192.168.1.1 dev eth2

cache mtu 1500 advmss 1460 hoplimit 64192.168.1.254 dev eth2 src 192.168.1.1

cache mtu 1500 advmss 1460 hoplimit 64local 192.168.1.1 from 192.168.1.254 dev lo src 192.168.1.1

cache <local,src-direct> iif eth2

6.4 Longest Match Rule

In the FIB there can be different entries that match a certain route. In this case, routers always follow theLongest Match Rule, which says that “routing to any destination is always done on a longest match

77 Chapter 6. Linux/Quagga

basis, i.e. a router that has to decide between two different length prefixes of the same network willalways follow the longer mask”. Suppose, for example, that a router has the following two entries in theFIB:

frame1 9 2 . 1 6 8 . 0 . 0 / 2 4 v i a p a t h 11 9 2 . 1 6 8 . 0 . 0 / 2 5 v i a p a t h 2

The first thing is that to these two prefixes are different so, both can be present at the same time at the FIB,and the second thing is that when attempting to deliver traffic to host 192.168.0.1, the router will alwaysselect path2 because it is the longest match. An obvious question arises now, what happens if we haveroutes from different protocols that have the same prefix and length? Notice that if the prefix or the lengthare different, there is not any problem in installing both route in the FIB as they are considered differentroutes. To solve this question, we have administrative distances.

6.5 Administrative distances

Different routing processes may offer routes for the same prefix and thus, the router must select one ofthe routes to be included in the FIB. To add flexibility to the selection procedure, router vendors haveintroduced a configurable integer parameter called administrative distance (AD). Table 6.1 gives someof the default administrative distances used by some router vendors.

Table 6.1: Example of administrative distances

Protocol ADConnected 0Static 1eBGP 20OSPF 110IS-IS 115RIP 120iBGP 200DHCP-learned 254Unknown 255

Notice that there is a difference in terms of AD between a static routes pointing to a directly connectedinterface (AD=0) and routes pointing toward the far end of an attached link (AD=1).

So, upon creation, each routing process is assigned a protocol-specific default administrative distance.The administrator may override the default AD of a protocol, according to local policy and on a per-routerper-prefix basis, as part of the protocol configuration. The administrative distance values are local to arouter and are not propagated in any signaling message. Among the routing processes announcing a routeto a destination, the one with the lowest administrative distance will be selected.

6.6 Quagga

6.6.1 vtysh

The vtysh command opens a console for Quagga:

root@r1:~# vtyshr1#

The commands to display the available commands are:

r1# ?r1# list


“exit” closes each session in turn.You can view the current configuration with:

r1# write terminal

Or with:

r1# show running-config

You can write the current configuration with the command:

r1# write

The previous command writes the configuration to the configuration files of Quagga, which are in thedirectory /etc/quagga. On the other hand, to run routing commands, or to change configurations, changeto configure mode:

r1# configure terminalr1(config)#

6.6.2 Static and Kernel Routes

With Quagga running over a Linux box, we have kernel and static routes. Both are routes defined manually,but the subtle difference is that kernel routes are defined with the route or ip command, while staticroutes are defined inside Quagga. Example:

root@r1:~# route add 10.0.1.0/24 gw 192.168.1.11root@r1:~# vtyshr1# configure terminalr1(config)# ip route 10.0.0.0/24 192.168.1.10r1(config)# exitr1# show ip routeCodes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF,

I - ISIS, B - BGP, > - selected route, * - FIB routeS>* 10.0.0.0/24 [1/0] via 192.168.1.10, eth1K>* 10.0.1.0/24 via 192.168.1.11, eth1C>* 127.0.0.0/8 is directly connected, loC>* 192.168.1.0/24 is directly connected, eth1

6.7 RIP Router

6.7.1 Basic RIP

In Quagga, to show the RIP configuration you can type:

root@r1:~# vtyshr1# show ip rip

Type the following commands to configure the RIP router in version 1:

r1# configure terminalr1(config)# router ripr1(config-router)# version 1

Use the network command to add links (networks) to RIP:


79 Chapter 6. Linux/Quagga

The previous command adds to RIP all the interfaces of the router configured with networks in the rangeof 192.168.1.0/16. You can remove a network from being announced by RIP with the command:


You can make an interface passive with the command:

r1(config-router)# passive-interface eth4

A passive interface is one through which the router will not send any RIP message. This is useful to avoidsending RIP messages to segments of the network that do not have any other router understanding RIP.

We can also define all the interfaces as passive by default and then define which interface is in factactive:

r1(config-router)# passive-interface defaultr1(config-router)# no passive-interface eth0

You can use the unicast address of a neighbor for RIP messages using the following command:

r3(config-router)# neighbor 192.168.2.1

Finally, you can activate the “debug mode” to view debug messages:

r1(config)# do debug rip

Debug messages provide us with information about what is happening.

6.7.2 Loopback Interfaces

It is a common practice to use the loopback interface (lo) in dynamic routing. A loopback interface is avirtual interface that resides on the router but it is not a physical NIC. The idea is to assign an IP addressto the loopback interface, and distribute that address with the dynamic routing protocol.

Loopback interfaces are very useful because they will never go down, unless the entire router goesdown. Therefore, we can reach the router with the IP address of the loopback interface if there is at leastone physical NIC available. With this configuration, the IP address of the lo interface can be used asa unique identifier and this helps in managing routers. In practice, we can add an IP address to the lointerface with the ip command:

root@r1:~# ip address add 192.168.0.1/32 dev lo

Then, you can redistribute the address with RIP to make it visible to the rest of the network:


6.7.3 Routes in RIP

The FIB/RIB entries for all the protocols can be shown in Quagga with the command:

router# show ip route

You can see the FIB from a Linux command-line typing:

root@r1:~# route -n

To see only the RIP routes (FIB/RIB) use the Quagga command:

r1# show ip rip

A router can select itself as the default route for the rest of the RIP routers using the command:


r1(config-router)# default-information originate

A default route must exist in the FIB to make the previous command work properly.On the other hand, you can use the redistribute routes from one protocol to another with the redistribute

command. For example, to distribute the static routes of a router into the RIP domain you can use thecommand:

r1(config-router)# redistribute static

The previous command makes the router to announce its static routes in its RIP responses.Finally, you can use a route-map to control the redistribute command. The following commands make

a redistribution setting the RIP next hop field to a desired IP address. However, in general, RIP selects theproper next_hop value to be used through each network interface.

r1(config-router)# route-map NEXT_HOP_FOR_STATIC permit 10r1(config-route-map)# set ip next-hop 192.168.0.1r1(config-route-map)# exitr1(config)# router ripr1(config-router)# redistribute static route-map NEXT_HOP_FOR_STATIC

Chapter 7

OSPF

7.1 Introduction

7.1.1 Link State

The Open Shortest Path First (OSPF) RFC2328 [10] is a link state protocol. In a link state protocoleach router has a map of the network and runs Dijkstra’s algorithm to choose the shortest path to eachdestination, placing itself as the source to find the shortest path to each possible destination. Routers usesignaling packets to exchange information about the network topology. Any change in the topology isannounced, for example, if a router detects a change in any of its directly connected links, it announcesthe change to its neighbors (directly connected routers). Then, each neighbor resends this informationto its neighbors and so on. In this way, a link state change is spread all over the network. In addition,OSPF routers synchronize their Link State Database (LSDB) when they ingress the OSPF domain andperiodically (on long intervals).

An OSPF router maintains three basic tables:

• Neighbor table. This table keeps track of the state of the directly connected routers. Routers sendperiodically packets called “Hello” to show that they are alive.

• Topology table. This table or Link State Database (LSDB) stores a map of the network (networktopology). The network topology is expressed as a set of Link State Advertisements (LSAs). AnLSA can be understood as a piece of topology.

• Routing table. This table stores the shortest paths, in other words, it is the RIB that contains OSPFroutes.

Link state protocols require more CPU than vector distance protocols like RIP, but they are less proneto loops and their convergence time is also shorter. Convergence time is better because a link state changeis immediately spread over the network and then, the Dijkstra algorithm is applied to update the shortestpath.

7.1.2 Areas

On large networks, the signaling traffic required to keep the topology updated by all the OSPF routersmight be high. The design of OSPF allows us to separate the network into smaller networks calledareas. This helps us create OSPF networks that decrease routing overhead and speed up convergence. Inparticular, the solution adopted by OSPF is to use a Hub-And-Spoke topology (see Figure 7.1). This is atypical networking scheme that uses a hub to connect many speakers. This scheme easily avoids routingloops.

In OSPF, all traffic destined between areas, also known as inter-area traffic must traverse area 0, thehub, which is also called the backbone area. All areas must have at least one router with an interface

81

Chapter 7. OSPF 82

Area 0(OSPF Backbone)

A1A2

ExternNetworks

ABR ABR

ASBR

OSPF

ExternNetworks

ASBR

Figure 7.1: OSPF Areas..

attached to the area 0. The design with areas is used to confine the specific link status updates aboutthe network topology of the area inside the area. The idea is that routers inside the area know all thetopology information of the area but the rest of the areas only receive a summary of the relevant routinginformation.

Finally, it is worth to mention that the set of IP networks under the control of one or more networkoperators that presents a common, clearly defined routing policy to the Internet is called AutonomousSystem (AS). In this chapter, we will assume that the AS is completely routed using OSPF and thus, theterms OSPF domain and AS can be used interchangeably.

7.1.3 ABRs and ASBRs

Routers that connect two areas (remember that one must be the area 0) are called ABRs (Area BorderRouters). The ABR summarizes routing information to minimize the traffic exchanged between areas.Finally, it is worth to mention that:

• An hierarchical IP addressing plan is needed to make an effective summarization.• Only ABRs make summarization of IP addresses inside OSPF.• This summaries are implemented using a vector of distances.On the other hand, external routes to OSPF (static, kernel, RIP, etc.) are inserted or redistributed into

OSPF by ASBRs (Autonomous System Border Routers). ASBRs redistribute and summarize externalroutes.

7.1.4 Basic Quagga: Adding Links

In quagga, you can add/remove links (networks) to the OSPF link state database using the commands:

root@r1:~# vtyshHello, this is Quagga (version 0.99.20.1).Copyright 1996-2005 Kunihiro Ishiguro, et al.r1# configure terminalr1(config)# router ospfr1(config-router)# network 10.0.0.0/24 area 0r1(config-router)# no network 10.0.0.0/24 area 0

Note that you have to specify the area to which the network is being added. A network only belongsto an area, while routers can belong to several areas (these are the ABRs).

To see the OSPF routes (RIB/FIB) use the Quagga command:

r1# show ip ospf

83 Chapter 7. OSPF

7.1.5 Router Identifier (router-id)

Each router that runs OSPF is identified by a unique identifier. This identifier is a 32-bit string calledrouter-id. The router-id is included in all the packets that an OSPF router sends. In other words, therouter-id is the “name” of the router in the OSPF domain. In Quagga, you can manually configure therouter-id with the command:

r1(config-router)# router-id 192.168.0.1

Note that the router-id has the format of an IP address but it is not mandatory that the router-idcorresponds to any address assigned to any network interface of the router. Instead of assigning therouter-id manually, we can also let the OSPF routing daemon to select it automatically. In this case, theOSPF daemon uses the following criterion:

• Use as router-id the highest address assigned to the loopback interface that is different from a localone (127.0.0.0/8).

• If there is not any address different from a local one assigned to the loopback interface, use asrouter-id the highest IP address assigned to any physical NIC.

In practice, it is typical to add an address to the loopback interface and use this address as router-id.In Linux, we can add an IP address to the lo interface with the ip command:

root@r1:~# ip address add 192.168.0.1/32 dev lo

Then, it is also typical to include the loopback address as an OSPF network so that we can use this addressto access to the router. This has the advantage that the loopback interface will never go down, unless theentire router goes down. Therefore, we can reach the router with the IP address of the loopback interfaceif there is at least one physical NIC available. To make the previous loopback address visible to the rest ofthe OSPF network you can use the command:

r1(config-router)# network 192.168.0.1/32 area 0

7.2 Broadcast Segments

7.2.1 Flooding

OSPF routers exchange pieces of topology information in Link State Advertisements (LSAs). Each LSAhas a serial number. If a router receives an LSA about an unknown part of the network or a newer LSA (anLSA with a bigger serial number) it updates its Link State DB (LSDB). In addition, if an LSA causes anupdate, the router sends that LSA to its neighbors. This flooding results in creating the same LSDB in allthe routers. However, when several routers are connected to the same broadcast segment (e.g. a switchedEthernet), it is inefficient to exchange updates (LSAs) between each pair of routers of the segment. Anexample of this situation is shown in Figure 7.2.

Note that if routers exchange LSAs in pairs (unicast), then, there will be a flood of LSAs. In Figure7.2, we see a flood of LSAs because R5 sends an LSA about the failed link with R6 to its neighbors,which are the routers in the broadcast segment R1, R2, R3 and R4. In turn, each router that receives theupdate sends an LSA using unicast to its neighbors, generating a peak of traffic.

7.2.2 Designated Routers

To avoid a flood of LSAs, OSPF uses a Designated Router (DR), a Backup Designated Router (BDR)and multicast. The command show ip ospf neighbor shows the roles of neighbors (DR, BDR orDROTHER):

Chapter 7. OSPF 84

R1

R2R2 R3

R4

R5 R6

Figure 7.2: Unicast Flooding.

r1# show ip ospf neighborNeighbor ID Pri State Dead Time Address Interface RXmtL RqstL DBsmL192.168.0.2 1 Full/DR 32.220s 172.16.0.2 eth0:172.16.0.1 0 0 0

As you can observe in Figure 7.3, the DROTHERS routers (all non-DR/BDR routers) send their LSAsto the DR/BDR using the multicast address 224.0.0.6 (AllDRouters). The DR and the BDR process theinformation and if an update is necessary, the DR sends the corresponding LSAs to the rest of neighborsusing the multicast address 224.0.0.5 (AllSPFRouters). In this process, the BDR is in stand-by unless anupdate is necessary and the DR does not send it. In this case, it is assumed after a time out that the DR hasfailed and then, the BDR becomes the DR and a new BDR is elected.

DR

BDR(backup)

@224.0.0.6 (AllDRouters)

R1

R2 R3

R4

R5

@224.0.0.5 (AllSPFRouters)

R6

Figure 7.3: Updates with DR/BDR.

In the previous example, R5 is a DROTHER router and as such, it sends its update to the address224.0.0.6. This update is processed by R1 (DR) and R2 (BDR). Then, the DR router sends the update tothe rest of the neighbors using the address 224.0.0.5.

The pair DR/BDR appear in Hello packets and they are used also by new neighbors to initiallysynchronize their databases. It is remarkable that there is a DR/BDR pair per broadcast segment not perOSPF area and that on point-to-point links there is no need to use DR/BDR.

7.2.3 DR/BDR Election

The election of the DR/BDR is performed as follows:• The BDR and DR takes place by comparing the Priority ID, which is located in the Hello packet.• The router with the highest Priority ID is elected DR and the next router with second highest Priority

ID will become the BDR.• By default all router interfaces have a priority ID of 1.

85 Chapter 7. OSPF

• If on a particular segment, all the Priority ID of all routers match, the router-id will be the next IDto compare in order to elect the DR/BDR.

• The router-id is also located in the Hello packets.• The OSPF router with the highest router-id will be elected the DR and/or BDR.Once the DR/BDR are elected, if a new OSPF router is added with the highest priority of all, the

DR/BDR will not change. To start the election process, you will have to clear up the OSPF daemon. Oncethe DR and BDR are elected, the BDR will only listen to the exchange between the peers. On the otherhand, in case the BR fails, the BDR elects itself as the new DR and a new BDR is selected. Finally, if arouter does not want to participate in the DR/BDR election, it can set its Priority ID to 0, it will then beshown as DROTHER.

7.3 States & Packets

OSPF does not use a transport protocol (UDP/TCP), but is encapsulated directly in IP datagrams withprotocol number 89. OSPF handles its own error detection/correction functions, has its own state machineand its packet formats:

• The OSPF state machine has 6 states: DOWN, INIT, 2-WAY, EXSTART, LOADING and FULL.

• OSPF uses 5 type of packets: Hello, Database Description (DD), Link State Request (LSR), LinkState Update (LSU) and Link State Acknowledgment (LSACK) packets.

7.3.1 DOWN State

When you add a network to OSPF, the corresponding NIC starts sending “Hello” packets and the routergoes to the DOWN state.

• Actions on this state. Send Hello packets.

• Actions to change the state. When the router receives a valid Hello packet from a neighbor, itmoves to the INIT state.

In this state the router sends Hello packets but no information (hellos) has been received from theneighbor. By default, Hello packets are sent each 10 seconds. The information that an Hello packetcontains is:

• The router-id.• The hello and dead timers.• The network mask.• The area-id.• Might contain authentication and password information.• List of active neighbors (router-ids).• Router priority.• DR and BDR (link interface addresses).When a neighbor receives a Hello packet from another router, it checks the hello/dead interval, the

network mask, the area-id and the authentication/passwords (if it is used). These parameters (underlined)must be equal to those configured by the neighbor to establish a neighbor relationship. If some of theseparameters of the neighbor do not match the parameters configured by the router, the relationship stays inthe DOWN state. A relationship might also go to the DOWN state if we do not receive any information abouta neighbor during a dead timer. The dead timer is typically set to four times the hello interval (40s).

Chapter 7. OSPF 86

7.3.2 INIT State

• Actions on this state. Send Hello packets listing the neighbor.• Actions to change the state. When from a neighbor, we receive a valid Hello packet with our

router-id in the list of neighbors, we move to the 2-WAY STATE.This state specifies that the router has received a valid Hello packet from its neighbor with the correct

parameters, but the router-id of the receiving router was not included in the Hello. When a router receivesa Hello packet from a neighbor, it lists the sender’s router-id in its hello packet as an acknowledgment thatit received a valid hello packet.

7.3.3 2-WAY State

• Actions on this state. Elect the DR/BDR pair and send Hello packets listing the neighbors and theDR/BDR.

• Actions to change the state. We move to the next state to exchange the LSDB only with the DRand BDR. The rest of the neighbors remain in this state.

This state designates that bi-directional communication has been established between two routers.Bi-directional means that each router has seen the other’s hello packet. This state is attained when therouter receiving the hello packet sees its own router-id within the received hello packet’s neighbor field.Also the pair DR/BDR is elected.

7.3.4 Hello Packets

Hello packets are used for forming and maintaining neighborships, for DR/BDR selection (where applica-ble) and for exchanging optional capabilities of the neighbors. Hello packets are always sourced from theprimary IP address of the interface and the destination IP is the AllSPFRouters multicast IP address. Anexample Hello packet is the following:

frameI n t e r n e t P r o t o c o l V e r s i o n 4 , Src : 1 7 2 . 1 6 . 0 . 2 ( 1 7 2 . 1 6 . 0 . 2 ) , Dst : 2 2 4 . 0 . 0 . 5 ( 2 2 4 . 0 . 0 . 5 )Open S h o r t e s t Pa th F i r s t

OSPF HeaderOSPF V e r s i o n : 2Message Type : H e l l o P a c k e t ( 1 )P a c k e t Length : 48Source OSPF Ro u t e r : 1 9 2 . 1 6 8 . 0 . 2 ( 1 9 2 . 1 6 8 . 0 . 2 )Area ID : 0 . 0 . 0 . 0 ( Backbone )P a c k e t Checksum : 0 x2322 [ c o r r e c t ]Auth Type : Nu l lAuth Data ( none )

OSPF H e l l o P a c k e tNetwork Mask : 2 5 5 . 2 5 5 . 2 5 5 . 0H e l l o I n t e r v a l : 10 s e c o n d sO p t i o n s : 0x02 ( E )Ro u t e r P r i o r i t y : 1Ro u t e r Dead I n t e r v a l : 40 s e c o n d sD e s i g n a t e d R ou t e r : 1 7 2 . 1 6 . 0 . 2Backup D e s i g n a t e d Ro u te r : 1 7 2 . 1 6 . 0 . 1A c t i v e Neighbor : 1 9 2 . 1 6 8 . 0 . 1

In the previous listing, we can observe a Hello message from a router that is the DR of the broadcastsegment with IP network address 172.16.0.0/24. The router has the IP address 172.16.0.2 in this segmentand it has the address 192.168.0.2 as router-id. Note that neighbors are listed using router-ids.

In Figure 7.4, we show a situation of a router (R2) connected to different broadcast segments and theassociated Hello messages that this router will sent. As you can observe, R2 is the DR of the network172.16.0.0/24 but it is not the DR of the network 10.0.0.0/24.

87 Chapter 7. OSPF

BDR

DR

DR

BDR

10.0.0.4/24

10.0.0.5/24

10.0.0.3/24

10.0.0.2/24

172.16.0.2/24

172.16.0.1/24

R3

loopbackinterfaces

192.168.0.4/32

192.168.0.5/32

192.168.0.1/32

R2

R1

R5

R4

192.168.0.3/32

R3

192.168.0.2/32

Hellosrc IP:172.16.0.2dst IP:224.0.0.5DR:172.16.0.2BDR:172.16.0.2Neig:192.168.0.1

Hellosrc IP:10.0.0.2dst IP:224.0.0.5DR:10.0.0.5BDR:10.0.0.4Neig:192.168.0.3Neig:192.168.0.4Neig:192.168.0.5

Figure 7.4: Hello Messages of R2 in Different Networks.

7.3.5 EXSTART State

• Actions on this state. This state starts the process of exchanging link state information to synchro-nize the LSDBs of two routers. A router will only start the synchronization process with DR andBDR. Between DROTHER routers we will never observe this state. In this state, routers establish amaster-slave relationship and choose an initial sequence number for exchanging information. Themaster-slave relationship is determined by the highest “priority” field in the Hello packet. Thedefault priority is 1. In case of tie, higher router-id breaks the tie. Once the master is decided, it isthe first in sending its database description packets (DD packets). DD packets contain a sequencenumber and a short description of the router’s link state database. More precisely, DD packetscontain the headers of the set of LSAs that each router has. The master is also the only router thatcan increment the sequence number.

• Actions to change the state. After DDs are acknowledged, we move to the next state.

7.3.6 DD Packets

DD packets are exchanged when adjacency between two routers is going to be established. DD packetsdescribe the contents of the topological database of each router and multiple of these packets may be usedto describe this database. In Figure 7.5, you can observe a typical dialog for the EXSTART state usingDD messages.

192.168.0.2

R2DD,I,M,MS,SEQ=970

DD,I,M,MS,SEQ=949

192.168.0.1

R1

DD,SEQ=949, LSA Headers

DD,SEQ=950, LSA Headers

DD,SEQ=950

EX

STA

RT

Master Slave

Figure 7.5: DD Messages during the EXSTART State.

Chapter 7. OSPF 88

A poll-response procedure is used, in which one of the routers acts as master and the other as slave.First both routers declare themselves as master, and then the master will be elected (due to the Router ID),and hence, the slave is then elected as well. Each DD packet has the following flags:

• I-bit. The Init bit. When set to 1, this packet is the first in the sequence of Database Descriptionpackets.

• M-bit. The More bit. When set to 1, it indicates that more Database Description packets are tofollow.

• MS-bit. The Master/Slave bit. When set to 1, it indicates that the router is the master during theDatabase Exchange process. Otherwise, the router is the slave.

During the synchronization, the master sends Database Description packets (polls) which are acknowl-edged by Database Description packets sent by the slave (responses). The responses are linked to the pollsvia the packets’ DD sequence numbers. The DD sequence number is used to sequence the collection ofDD packets. The initial value (indicated by the Init bit being set) should be unique. The DD sequencenumber then increments until the complete database description has been sent.

89 Chapter 7. OSPF

An example DD is the following:frame

I n t e r n e t P r o t o c o l V e r s i o n 4 , Src : 1 7 2 . 1 6 . 0 . 2 ( 1 7 2 . 1 6 . 0 . 2 ) , Dst : 1 7 2 . 1 6 . 0 . 1 ( 1 7 2 . 1 6 . 0 . 1 )Open S h o r t e s t Pa th F i r s t

OSPF HeaderOSPF DB D e s c r i p t i o n

I n t e r f a c e MTU: 1500O p t i o n s : 0x02 ( E )DB D e s c r i p t i o n : 0x07 ( I , M, MS)

. . . . 0 . . . = R : OOBResync b i t i s NOT s e t

. . . . . 1 . . = I : I n i t b i t i s SET

. . . . . . 1 . = M: More b i t i s SET

. . . . . . . 1 = MS: Mas te r / S l a v e b i t i s SETDD Sequence : 1383937455

The previous DD is the first packet that a router is going to send trying to become the master andsetting the initial sequence number. Then, if the router is the master, the slave responds with a DDincluding a summary of its topological database (a list of Headers of its LSAs). Example:

frameI n t e r n e t P r o t o c o l V e r s i o n 4 , Src : 1 7 2 . 1 6 . 0 . 1 ( 1 7 2 . 1 6 . 0 . 1 ) , Dst : 1 7 2 . 1 6 . 0 . 2 ( 1 7 2 . 1 6 . 0 . 2 )Open S h o r t e s t Pa th F i r s t

OSPF HeaderOSPF DB D e s c r i p t i o n

I n t e r f a c e MTU: 1500O p t i o n s : 0x02 ( E )DB D e s c r i p t i o n : 0x00

. . . . 0 . . . = R : OOBResync b i t i s NOT s e t

. . . . . 0 . . = I : I n i t b i t i s NOT s e t

. . . . . . 0 . = M: More b i t i s NOT s e t

. . . . . . . 0 = MS: Mas te r / S l a v e b i t i s NOT s e tDD Sequence : 1383937455

LSA HeaderLS Age : 2 s e c o n d sDo Not Age : F a l s eO p t i o n s : 0x02 ( E )Link−S t a t e A d v e r t i s e m e n t Type : Router−LSA ( 1 )Link S t a t e ID : 1 9 2 . 1 6 8 . 0 . 1A d v e r t i s i n g Ro u t e r : 1 9 2 . 1 6 8 . 0 . 1 ( 1 9 2 . 1 6 8 . 0 . 1 )LS Sequence Number : 0 x80000003LS Checksum : 0 xf040Length : 48

Then, the master should send a DD (or several) with its list of LSA headers, increasing the sequencenumber and the slave must acknowledge this transmission with a final DD with the same sequence number.

7.3.7 LOADING State

• Actions on this state. In this state, the actual exchange of link state information occurs. Basedon the information provided by the DDs, routers send link-state request (LSR) packets. Theneighbor provides the requested link-state information in link-state update (LSU) packets. Duringthe adjacency establishment, if a router receives an outdated or missing LSA, it requests that LSAby sending a link-state request packet. All link-state update packets are acknowledged with LSACKpackets.

• Actions to change the state. When all the LSAs have been received we move to the FULL state.In Figure 7.6, you can observe a typical dialog for the LOADING state using LSR, LSU and LSACK

messages.

7.3.8 LSR Packets

LSR packets are sent after exchanging DD packets with a neighboring router because a router may findthat parts of its topological database are out of date. The LSR packet is used to request the pieces of

Chapter 7. OSPF 90

192.168.0.2

R2LSR

LSU (LSAs inside)

192.168.0.1

R1

LSACK

LSR

LSU (LSAs inside) LO

AD

ING

LSACK

FULL

Figure 7.6: Messages exchanged during the LOADING State.

the neighbor’s database that are more up to date. Multiple LSR packets may need to be used. A routerthat sends a LSR has in mind the precise instance of the database pieces it is requesting, defined by LSsequence number, LS checksum and LS age. Then, the router may receive even more recent instances inresponses from its neighbor router because LSR packets are understood to be requests for the most recentpossible information. An example LSR is the following:

frameI n t e r n e t P r o t o c o l V e r s i o n 4 , Src : 1 7 2 . 1 6 . 0 . 1 ( 1 7 2 . 1 6 . 0 . 1 ) , Dst : 1 7 2 . 1 6 . 0 . 2 ( 1 7 2 . 1 6 . 0 . 2 )Open S h o r t e s t Pa th F i r s t

OSPF HeaderLink S t a t e Reques t

Link−S t a t e A d v e r t i s e m e n t Type : Router−LSA ( 1 )Link S t a t e ID : 1 9 2 . 1 6 8 . 0 . 2A d v e r t i s i n g Ro u t e r : 1 9 2 . 1 6 8 . 0 . 2 ( 1 9 2 . 1 6 8 . 0 . 2 )

7.3.9 LSU Packets

LSU packets are used to implement the flooding of link state advertisements (LSAs). Each LSU packetcarries a collection of LSAs one hop further from its origin. Several LSAs may be included in a singleLSU.

frameI n t e r n e t P r o t o c o l V e r s i o n 4 , Src : 1 7 2 . 1 6 . 0 . 1 ( 1 7 2 . 1 6 . 0 . 1 ) , Dst : 2 2 4 . 0 . 0 . 6 ( 2 2 4 . 0 . 0 . 6 )Open S h o r t e s t Pa th F i r s t

OSPF HeaderLS Update P a c k e t

Number o f LSAs : 1LS Type : Router−LSA

LS Age : 3 s e c o n d sDo Not Age : F a l s eO p t i o n s : 0x02 ( E )Link−S t a t e A d v e r t i s e m e n t Type : Router−LSA ( 1 )Link S t a t e ID : 1 9 2 . 1 6 8 . 0 . 1A d v e r t i s i n g Ro u t e r : 1 9 2 . 1 6 8 . 0 . 1 ( 1 9 2 . 1 6 8 . 0 . 1 )LS Sequence Number : 0 x80000003LS Checksum : 0 xf040Length : 28F l a g s : 0x00Number o f L inks : 1Type : S tub ID : 1 9 2 . 1 6 8 . 0 . 1 Data : 2 5 5 . 2 5 5 . 2 5 5 . 2 5 5 M e t r i c : 10

IP ne twork / s u b n e t number : 1 9 2 . 1 6 8 . 0 . 1Link Data : 2 5 5 . 2 5 5 . 2 5 5 . 2 5 5Link Type : 3 − C o n n e c t i o n t o a s t u b ne twork

91 Chapter 7. OSPF

Number o f TOS m e t r i c s : 0TOS 0 m e t r i c : 10

As shown, the body of the LSU consists of a list of LSAs. There are several types of LSAs (seeSection 7.5) and the format of each LSA is different.

LSUs use multicast on those physical networks that support multicast/broadcast. In order to makethe flooding procedure reliable, flooded advertisements are acknowledged. If retransmission of certainadvertisements is necessary, the retransmitted advertisements are always carried by unicast LSU packets.

7.3.10 LSACK Packets

LSACK packets are used to make the flooding of LSAs reliable. Any flooded advertisements is explicitlyacknowledged with an LSACK packet. Multiple link state advertisements can be acknowledged in asingle LSACK packet. The format of this packet is similar to that of the DD packet. The body of bothpackets is simply a list of LSA headers. Many acknowledgments may be grouped together into a singleLSACK. Such a packet is sent back out the interface that has received the advertisements. Depending onthe state of the sending interface (DR,BDR or DROTHER) and the source of the advertisements beingacknowledged, the LSACK packet is sent either to the multicast address AllSPFRouters (224.0.0.5), tothe multicast address AllDRouters (224.0.0.6), or to unicast. In this context, the LSACK packet can besent in one of two ways: delayed-multicast or direct-unicast. The particular acknowledgment strategyused depends on the circumstances surrounding the receipt of the advertisement:

• Delayed ACKs are sent on an interval timer using multicast. Sending delayed acknowledgmentsaccomplishes several things. On one hand, it facilitates the packaging of multiple acknowledgmentsin a single LSACK packet. On the other hand, it enables a single LSACK to indicate acknowledg-ments to several neighbors at once (through multicasting) and it randomizes the LSACK packetssent by the various routers attached to a multi-access network.

• Direct acknowledgments are sent to a particular neighbor (unicast) in response to the receipt ofduplicate LSAs. These acknowledgments are sent as unicasts, and are sent immediately when theduplicate is received. Duplicate LSAs are retransmitted LSAs.

An example LSACK is the following:frame

I n t e r n e t P r o t o c o l V e r s i o n 4 , Src : 1 7 2 . 1 6 . 0 . 1 ( 1 7 2 . 1 6 . 0 . 1 ) , Dst : 2 2 4 . 0 . 0 . 5 ( 2 2 4 . 0 . 0 . 5 )Open S h o r t e s t Pa th F i r s t

OSPF HeaderOSPF V e r s i o n : 2Message Type : LS Acknowledge ( 5 )P a c k e t Length : 64Source OSPF Ro u t e r : 1 9 2 . 1 6 8 . 0 . 1 ( 1 9 2 . 1 6 8 . 0 . 1 )Area ID : 0 . 0 . 0 . 0 ( Backbone )P a c k e t Checksum : 0 xa fb6 [ c o r r e c t ]Auth Type : Nu l lAuth Data ( none )

LSA HeaderLS Age : 1 s e c o n d sDo Not Age : F a l s eO p t i o n s : 0x02 ( E )Link−S t a t e A d v e r t i s e m e n t Type : Router−LSA ( 1 )Link S t a t e ID : 1 9 2 . 1 6 8 . 0 . 2A d v e r t i s i n g Ro u t e r : 1 9 2 . 1 6 8 . 0 . 2 ( 1 9 2 . 1 6 8 . 0 . 2 )LS Sequence Number : 0 x80000003LS Checksum : 0 xea43Length : 48

LSA HeaderLS Age : 1 s e c o n d sDo Not Age : F a l s eO p t i o n s : 0x02 ( E )Link−S t a t e A d v e r t i s e m e n t Type : Network−LSA ( 2 )Link S t a t e ID : 1 7 2 . 1 6 . 0 . 2A d v e r t i s i n g Ro u t e r : 1 9 2 . 1 6 8 . 0 . 2 ( 1 9 2 . 1 6 8 . 0 . 2 )LS Sequence Number : 0 x80000001

Chapter 7. OSPF 92

LS Checksum : 0 xb0a9Length : 32

Finally, it is worth to mention that LSAs are usually acknowledged by sending LSACK packets. How-ever, acknowledgments can also be accomplished implicitly with an LSU that includes the correspondingLSA. This is termed an “implied acknowledgment“.

7.3.11 FULL State

FULL is the normal state for an OSPF router. If a router is stuck in another state, it’s an indication thatthere are problems in forming adjacencies. The only exception to this is the 2-WAY state, which, asmentioned, it is normal in a broadcast segment between DROTHER neighbors (see Figure 7.7).

2-WAY

2-WAY

FULL

FULL

DR

BDR

R1

R2 R3

R4

R5 R6

Figure 7.7: OSPF Final States in a Broadcast Segment..

In Quagga, you can view the state with your neighbors with the command:

r1# show ip ospf neighborNeighbor ID Pri State Dead Time Address Interface RXmtL RqstL DBsmL192.168.0.2 1 Full/DR 32.220s 172.16.0.2 eth0:172.16.0.1 0 0 0

In the FULL state, routers have their databases fully synchronized. After this, the OSPF routers onlyexchange LSU messages when there is a change in the network state, that is to say, when there is a newLSA or there is a new LSA instance. The ”full synchronization“ takes place only on long periods of time.

7.4 Costs

7.4.1 Set the Cost

Unlike RIP, OSPF can use a cost different from the number of traversed hops. In general, an OSPF routercan announce its links with any cost. To manually configure the cost of an interface type the following inQuagga:

r1# configure terminalr1(config)# interface eth1r1(config-if)# ospf cost 30

However, by default OSPF uses the following expression to compute costs for each interface:

reference_bandwidth/configured_bandwidth

By default, many implementations use 100 Mbps as reference_bandwidth. This means that 10 Mbpshas a cost of 10. But 100 Mbps, 1 Gbps or 10 Gbps have all a cost of 1. To enable a more fine-grained costfor high speed links you have to change the reference_bandwidth. In Quagga:

93 Chapter 7. OSPF

r1(config)# router ospfr1(config-router)# auto-cost reference-bandwidth 1000

The previous command sets the reference_bandwidth to 1 Gbps.Finally, the cost of an OSPF route is the sum of the costs of the links traversed. Note that the costs of

the same path in the reverse direction can be different. This happens when the interfaces in the forwarddirection have different costs than the interfaces used in the reverse direction.

7.4.2 Load Balancing

A routing protocol can install several routes in the FIB if these routes have equal cost. You can see ifseveral routes are installed with the Linux command:

root@r1:~# ip route show10.0.1.0/24 proto zebra metric 20nexthop via 172.16.0.3 dev eth0 weight 1nexthop via 172.16.0.4 dev eth0 weight 110.0.2.0/24 via 172.16.0.4 dev eth0 proto zebra metric 2010.0.3.0/24 dev eth1 proto kernel scope link src 10.0.3.1172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.1

In the previous example, the network 10.0.1.0/24 has two equal cost routes installed in the FIB: onethrough 172.16.0.3 and another through 172.16.0.4. Then, the router has to do some load balancing amongthese routes. In general we have several ways of doing so:

• Per-packet. Each packet can be routed through a different route.• Per-flow (a.k.a per-destination). All the packets of a flow will follow the same path.Load balancing by per-packet is generally deprecated due to the impact of rapidly changing latency,

packet reordering and maximum transmission unit (MTU) differences within a network flow. This candisrupt the operation of many Internet protocols, most notably TCP and path MTU discovery. In addition,there might be a considerable increase in CPU/memory utilization due to intensive processing in the routerin large networks.

On the other hand, there is a standard about how to implement a per-flow load balancing calledEqual-Cost Multi-Path (ECMP). The typical way is by hashing the source and destination IP address,resulting in a unique hash ID that randomizes the assignment across the end-to-end paths. For furtherdetails: RFC 2991 [16], “Multipath Issues in Unicast and Multicast Next-Hop Selection” and RFC 2992[17] “Analysis of an Equal-Cost Multi-Path Algorithm”. Finally, notice that in many situations, ECMPmay not offer any real advantage over best-path routing: for example, if the multiple best next-hop pathsto a destination re-converge downstream into a single low-bandwidth path (a common scenario).

ECMP can be activated in Linux if the kernel is compiled with Equal Cost Multi-Path routing enabled(configuration option CONFIG_IP_ROUTE_MULTIPATH=y). This means that the kernel will permitmultiple routers (gateways) in the routing table, and will do a per-flow load balancing of outgoing trafficacross them.

7.5 Basic LSAs

There are five basic types of LSAs1. These LSAs are: Router, Network, ABR Summary, ASBR Locationand ASBR Summary. In addition, each LSA has a Link State ID that identifies the piece of the routingdomain that is being described by the LSA.

1Actually, there are more types but these are beyond the scope of this document.

Chapter 7. OSPF 94

7.5.1 Router-LSA (type-1)

Each OSPF router originates a Router-LSA (type-1) for each area that it belongs to. The Router-LSAdescribes the links of the router in the area. The LSA has a Link State ID field that is set to the originatingrouter’s router-id. This field is used to rapidly identify the LSA. In addition, each link (or L2 network) isindividually identified inside the LSA by the tuple “Type”, “Link-id” and “Data”. There are 3 types oflinks/networks:

Type Description Link-id Data1 Point-to-point link (two routers) Neighbor Router-id Router IF Address2 Link to transit network (multiple routers) DR IF Address Router IF Address3 Link to Stub network (single router) IP Network Netmask

Point-to-point links are links that connect two routers, a transit network is a broadcast segment thathas multiple OSPF routers attached to it and a stub network is a network with just one OSPF router.

An example Router-LSA is the following:frame

LS Type : Router−LSALS Age : 1 s e c o n d sDo Not Age : F a l s eO p t i o n s : 0x02 ( E )Link−S t a t e A d v e r t i s e m e n t Type : Router−LSA ( 1 )Link S t a t e ID : 1 9 2 . 1 6 8 . 0 . 2A d v e r t i s i n g Ro u t e r : 1 9 2 . 1 6 8 . 0 . 2 ( 1 9 2 . 1 6 8 . 0 . 2 )LS Sequence Number : 0 x80000004LS Checksum : 0 x333aLength : 48F l a g s : 0x00Number o f L inks : 2Type : S tub ID : 1 9 2 . 1 6 8 . 0 . 1 Data : 2 5 5 . 2 5 5 . 2 5 5 . 2 5 5 M e t r i c : 10

IP ne twork / s u b n e t number : 1 9 2 . 1 6 8 . 0 . 1Link Data : 2 5 5 . 2 5 5 . 2 5 5 . 2 5 5Link Type : 3 − C o n n e c t i o n t o a s t u b ne tworkNumber o f TOS m e t r i c s : 0TOS 0 m e t r i c : 10

Type : T r a n s i t ID : 1 7 2 . 1 6 . 0 . 2 Data : 1 7 2 . 1 6 . 0 . 1 M e t r i c : 10IP a d d r e s s o f D e s i g n a t e d Ro u t e r : 1 7 2 . 1 6 . 0 . 2Link Data : 1 7 2 . 1 6 . 0 . 1Link Type : 2 − C o n n e c t i o n t o a t r a n s i t ne tworkNumber o f TOS m e t r i c s : 0TOS 0 m e t r i c : 10

In the previous example LSA, the router is connected to two networks (links): a stub and a transit.

7.5.2 Network-LSA (type-2)

The Network-LSA (type-2) is used by the DR to describe the transit networks. This LSA lists the router-idsof all the routers that are attached to the transit network, including the DR itself. The Link State ID is theIF IP address of the DR. Thanks to this LSA, only the DR has to care about which neighbors becomeactive or inactive in the transit network to update and synchronize the LSDBs. If a new neighbor becomesactive or a neighbor goes down, the DR informs its neighbors in the transit network using this LSA. Notethat an OSPF router can also know which neighbors are active with Hello packets, however, these packetsdo directly serve to install any data in the LSDB. For this purpose you need to use an LSA.

An example Network-LSA is the following:frame

LS Type : Network−LSALS Age : 1 s e c o n d sDo Not Age : F a l s eO p t i o n s : 0x02 ( E )Link−S t a t e A d v e r t i s e m e n t Type : Network−LSA ( 2 )Link S t a t e ID : 1 7 2 . 1 6 . 0 . 2A d v e r t i s i n g Ro u t e r : 1 9 2 . 1 6 8 . 0 . 2 ( 1 9 2 . 1 6 8 . 0 . 2 )

95 Chapter 7. OSPF

LS Sequence Number : 0 x80000001LS Checksum : 0 xb0a9Length : 32Netmask : 2 5 5 . 2 5 5 . 2 5 5 . 0A t t a c h e d R ou te r : 1 9 2 . 1 6 8 . 0 . 1A t t a c h e d R ou te r : 1 9 2 . 1 6 8 . 0 . 2

7.5.3 ABR Summary LSA (type-3)

Summary-LSAs (type-3) are generated by Area Border Routers (ABRs) to advertise networks from an areato the rest of the areas in the OSPF domain. There is one LSA per network but advertised networks can besummarized. If a complete summarization is possible, the ABR will generate just one summary-LSA foran area. However, if summarization is not used or if we cannot properly aggregate the area’s addresses byonly one prefix, then, the ABR will generate more than one summary-LSA (one for each network). TheLink State ID used by each LSA is the network number advertised. An example ABR Summary-LSA isthe following:

frameLS Type : Summary−LSA ( IP ne twork )

LS Age : 1 s e c o n d sDo Not Age : F a l s eO p t i o n s : 0x02 ( E )Link−S t a t e A d v e r t i s e m e n t Type : Summary−LSA ( IP ne twork ) ( 3 )Link S t a t e ID : 1 0 . 0 . 5 . 0A d v e r t i s i n g Ro u t e r : 1 9 2 . 1 6 8 . 0 . 3 ( 1 9 2 . 1 6 8 . 0 . 3 )LS Sequence Number : 0 x80000001LS Checksum : 0 x5c71Length : 28Netmask : 2 5 5 . 2 5 5 . 2 5 5 . 0M e t r i c : 20

In Quagga, we can summarize addresses of an area with the command:

r1(config-router)# area 2 range 10.0.0.0/16

With the previous command we summarize intra area paths to any network of the range 10.0.0.0/16.That means that this network will be announced to other areas by the ABR with a summary-LSA. But thiswill happen only if area 2 contains at least one intra-area network from this range. By default, the cost ofthe summarized routes will be the highest cost of the routes being summarized.

On the other hand, remember that routers always use longest-match (match with the longer mask) andthat different routes can be installed in the FIB if they have different prefixes. Therefore, if we summarizethe networks of an area, the route installed in the FIB for the summarized network will be less specificthan the routes that would be installed announcing each individual network.

Finally, it is worth to mention that summary-LSAs help in reducing the size of the LSDB and constrainsflooding to an area. In addition, summary-LSAs make areas somewhat insensitive to link or router failuresin another area. Since OSPF prefers intra-area paths, duplicate routes in another area will not affect.

7.5.4 AS External LSA (type-5)

An ASBR is an OSPF router that redistributes external routes into OSPF. External routes are NOTconsidered part of the OSPF domain. External routes can be direct networks to which the router isconnected, indirect routes statically defined or routes dynamically learned using another routing protocolsuch as RIP or BGP. The AS-external-LSA or ASBR-summary-LSA (type-5) contains informationimported into OSPF from other routing processes.

The ASBR has to set a cost for the external routes that it redistributes. External routes fall under twocategories: external type 1 and external type 2. The difference between these two is in the way the cost(metric) of the route is calculated in the OSPF domain:

Chapter 7. OSPF 96

• The cost of a type 1 extern route is the addition of the external cost plus the internal cost used toreach the advertising ASBR.

• The cost of a type 2 extern route is the external cost, irrespective of the interior cost to reach theadvertising ASBR.

The extern route type 2 is designed for saving computation when in the OSPF domain there is onlyone exit (ASBR) towards the external network. As shown in Figure 7.8a, when there is only one ASBR, itis irrelevant to compute the interior costs for the extern route. We can just store the ASBR-summary-LSAand use the route to the corresponding ASBR. In this case, defining the extern route as type 2 savescomputation.

area N

Extern network X

r2 r1

r3 r4

10 10

10

10 10

20

ASBR1

Type 1 and type 2are OK!

r3: X through ASBR1 → type 1 cost 40 ; type 2 cost 20

(a) Single ASBR.

area N

Extern Network X

ASBR1

r2 r1

r3 r4

10 10

10

10 10

20

ASBR2

20 Type2 is not optimal

r3: X through ASBR1 → type 1 cost 40 ; type 2 cost 20r3: X through ASBR2 → type 1 cost 50 ; type 2 cost 20

(b) Multiple ASBRs.

Figure 7.8: External Routes with One or More ASBRs.

On the other hand, when there are several ASBRs, there are several routes for reaching the externalnetwork and these routes might have different costs. For example, if in the configuration of Figure 7.8bwe define the extern route as type 2, the routers in the domain are going to have two routes of equal cost(20) to the extern network and if ECMP is activated, the traffic will be shared between these two routes.However, note that in fact, for r3, the route through ASBR1 has less cost than the route through ASBR2.In this case, it is more efficient to use extern routes type 1.

Finally, it is worth to mention several more issues about extern routes. In first place, ASBR-summary-LSAs are flooded to all areas unchanged. On the other hand, if a router receives different ASBR-summary-LSAs containing the same network prefix, and in an LSA the prefix is defined as type 1 and in the anotherLSA is defined as type 2, the router will select always type 1 over type 2. In this context, it is a commonpractice to use always type 1 even though there is only one path/ABR in the network since type 1 alwaysyields the most optimal routes. Finally, when redistributing, by default the cost is 1 for routes learnedfrom the BGP protocol and 20 for routes from other routing protocols but we can also do redistributionswith other costs:

r1(config-router)# redistribute static metric-type 1 metric 40

Or change the default cost:

r1(config-router)# default-metric 40

An example AS-External-LSA is the following:frame

LS Type : AS−E x t e r n a l−LSA (ASBR)LS Age : 2 s e c o n d sDo Not Age : F a l s eO p t i o n s : 0x02 ( E )

97 Chapter 7. OSPF

Link−S t a t e A d v e r t i s e m e n t Type : AS−E x t e r n a l−LSA (ASBR) ( 5 )Link S t a t e ID : 1 0 . 0 . 5 . 0A d v e r t i s i n g Ro u t e r : 1 9 2 . 1 6 8 . 0 . 5 ( 1 9 2 . 1 6 8 . 0 . 5 )LS Sequence Number : 0 x80000001LS Checksum : 0 xc779Length : 36Netmask : 2 5 5 . 2 5 5 . 2 5 5 . 0E x t e r n a l Type : Type 2 ( m e t r i c i s l a r g e r t h a n any o t h e r l i n k s t a t e p a t h )M e t r i c : 20Forward ing Address : 0 . 0 . 0 . 0E x t e r n a l Route Tag : 0

Note that the Link State ID of the type 5 LSA is the external network number. The ForwardingAddress2 contains the address of the router to which the prefix has to be sent. 0.0.0.0 means ”follow-path-to-router-ID“.

7.5.5 ASBR Location LSA (type-4)

The Network-Summary-LSA and ASBR-Summary-LSA have the same format. The difference is that theLink State ID in an ASBR-Summary-LSA is always the router-id of the ASBR that is being advertised.An example ASBR-Summary-LSA is the following:

frameLS Type : Summary−LSA (ASBR)

LS Age : 1 s e c o n d sDo Not Age : F a l s eO p t i o n s : 0x02 ( E )Link−S t a t e A d v e r t i s e m e n t Type : Summary−LSA (ASBR) ( 4 )Link S t a t e ID : 1 9 2 . 1 6 8 . 0 . 5A d v e r t i s i n g Ro u t e r : 1 9 2 . 1 6 8 . 0 . 3 ( 1 9 2 . 1 6 8 . 0 . 3 )LS Sequence Number : 0 x80000001LS Checksum : 0 xbfb7Length : 28Netmask : 2 5 5 . 2 5 5 . 2 5 5 . 2 5 5M e t r i c : 10

Now the question is why is this LSA needed? Next we provide the answer.

A0

A1A2

ExternNetworks

ABR3

OSPF

ASBR1

ABR2

LSA T5

Router-id of ASBR1 = 192.168.0.5Router-id of ABR3 = 192.168.0.3

LSA T4

LSA T5

LSA T4 LSA T5

Figure 7.9: Motivation for the Type 4 LSA.

In a regular area, the type-1 and type-2 LSAs are used to build a full shared view of the topology,i.e., an interconnected topology graph of router nodes and interconnecting links. The nodes and linksare identified by router-ids and link-ids. On the other hand, a type-5 LSA contains information about anextern route and the router-id is the one of the ASBR. However, remember that for scaling purposes ABRsdeliberately hide all of topology of one area to the other areas they connect. As a result, the router-id of

2This is equivalent to the next hop field of RIPv2.

Chapter 7. OSPF 98

an ASBR only makes sense to routers in its native area (area 0 in our example of Figure 7.9), and is nothelpful to non-native routers (areas 1 and 2 in Figure 7.9). If the router-id advertised by the type-5 LSAis not meaningful, then the path to the advertised prefix cannot be resolved and finally the prefix of theexternal route is unreachable. Using the type 4 LSA fixes the situation because with this LSA, an ABRcan announce itself as the way of reaching a certain ASBR. To do so, the ABR includes its router-id as the”Advertising Router“ in the type 4 LSA. As shown in Figure 7.9, the ABR that announces a certain type 5LSA has to also generate the corresponding type 4 LSA.

As a final remark, it is worth to mention that we could announce all the router-ids of our OSPF domainas prefixes to other areas (for example 192.168.0.5/32, the router-id of our ASBR). In this case, the type 4LSA is not necessary. However, the router-id is functionally equivalent to a name, we can set the ASBRsrouter-id to 192.168.0.5/32 but never advertise 192.168.0.5/32 into OSPF. In this case, OSPF will stillwork, thanks to the type-4 LSA which acts like a glue record.

7.5.6 Link State IDs

Actually, for type 3 and type 5 LSAs the Link State ID may additionally have one or more of thedestination network’s ”host“ bits set. For example, when originating an AS-external-LSA for the network10.0.0.0 with mask of 255.0.0.0, the Link State ID can be set to anything in the range 10.0.0.0 through10.255.255.255 inclusive (although 10.0.0.0 should be used whenever possible).

The freedom to set certain host bits allows a router to originate separate LSAs for two networks havingthe same address but different masks.

7.5.7 Advertising Router

This field specifies the OSPF Router ID of the LSA’s originator. For router-LSAs, this field is identical tothe Link State ID field. Network-LSAs are originated by the network’s Designated Router. Summary-LSAs originated by area border routers and AS-external-LSAs are originated by AS boundary routers.

7.5.8 LS sequence numbers

The sequence number field is a signed 32-bit integer. It is used to detect old and duplicate LSAs. TheLSA’s sequence number is incremented each time the router originates a new instance of the LSA.

7.5.9 LS age

The field is the age of the LSA in seconds. Whenever a new instance of an LSA is originated, its LSsequence number is incremented, its LS age is set to 0. It is incremented by on every hop of the floodingprocedure. LSAs are also aged as they are held in each router’s database. The LS age field is examinedwhen a router receives two instances of an LSA, both having identical LS sequence numbers and LSchecksums. The instance with the bigger age is always used.

7.6 Types of Areas

Area types help to control the advertisement of routes into an area. In other words, area types allowfiltering and managing the LSAs that the ABRs announce. There are several types, in this document wedescribe the two most widely used: stub and totally stub areas.

7.6.1 Stub Area

By designating an area border router (ABR) interface to the area as a stub interface, you suppress externalroute advertisements (LSAs type-5 and type-4) through the ABR. OSPF inter-area routes (type-3 LSAs)

99 Chapter 7. OSPF

are still advertised into the stub area. To make external routes reachable the ABR injects a default route(through itself) into a stub area. Then, packets destined for external routes are automatically sent to theABR, which acts as a gateway for outbound traffic and routes the traffic appropriately. In a stub area,the ABR generates a summary LSA (type-3 LSA) with the link-state ID 0.0.0.0 (default route). This istrue even if the ABR does not have a default route of its own. In this case, you do not need to use thedefault-information originate command.

All OSPF routers inside a stub area have to be configured as stub routers. This is because wheneveran area is configured as stub, all interfaces that belong to that area will start exchanging Hello packetswith a flag that indicates that the interface is stub. Actually this is just a bit in the Hello packet (E bit) thatgets set to 0. All routers that have a common segment have to agree on that flag. If they do not, then theywill not become neighbors and routing will not take effect. Note also that an ASBR cannot be internal to astub area. The backbone, of course, cannot be configured as stub. These restrictions are made because astub area is mainly configured not to carry external routes and any of the above situations cause externallinks to be injected in that area.

The Quagga command to define, for example, area 2 as stub is the following:

r1(config-router)# area 2 stub

7.6.2 Totally Stub Areas

A totally stubby area takes it a step further: OSPF external routes (type-5 and type-4 LSAs) and inter-arearoutes (type-3 LSAs) are not advertised into a totally stubby area. Instead, the ABR injects a default routeinto a totally stubby area.

The Quagga command to define, for example, area 2 as totally stub is the following:

r1(config-router)# area 2 stub no-summary

7.6.3 Area Design

Now the question is for what are these types of areas useful? The answer is that configuring stub ortotally stub areas reduces the topological database size (and the memory/processing requirements) of therouters inside these areas. Hiding or limiting topology information makes the network more stable, helpsprovide faster convergence, and allows for scalable OSPF routing design. Stub and totally stub areasoptimize this concept of reducing flooding and they effectively limit the size of the flooding domain. It isconvenient to qualify an area as stub or as totally stub when (see Figure 7.10):

• There is a single exit point from that area.• If we can tolerate suboptimal routes.Let’s illustrate this with an example. In Figure 7.10a, we can observe that area N has only one ABR

that connects this area with routes X and Y, which, generically, can be extern or inter-area routes. In thiscase, it is not worth to inject LSAs T3,T4 or T5 into the area N. A default route from the ABR would beenough to optimally route the traffic to X or Y.

On the other hand, in Figure 7.10b, if we filter the LSAs for X and Y, the two ABRs will inject adefault into area N. Then, the route from r4 to network X takes a suboptimal path. This is because thecost of the default route injected by ABR2 has a cost of 20, while the cost of the default route injected byABR1 has a cost of 30. Thus, r4 selects as exit point ABR2. However, the route to network X throughABR2 has a total cost of 220, while this cost would be 130 through ABR1.

As a final remark, notice that we could get a similar approach to stub and totally stub areas by defininga static default route and then redistributing it. Nevertheless, in general, area types are more optimal sinceeverything is managed by OSPF and the number of routes is minimal due to LSA filtering.

Chapter 7. OSPF 100

area N

Network X

r2 r1

r3 r4

10 10

10

10 10

100

Network Y

200

ABR

(a) Single Exit Point.

area N

Network X Network Y

ABR1

r2 r1

r3 r4

10 10

10

10 10

100100

ABR2

200

Suboptimal path

(b) Multiple Exit Points.

Figure 7.10: Stub and Totally Stub Areas Design.

7.7 Practices

eth0.3r1

SW0

eth0.1

eth0.2

eth0.4

eth1.1

172.16.0.0/2410.0.3.0/24eth1.3

10.0.1.0/24

eth1.2

r210.0.4.0/24

eth0.5

SW1r3 r5

eth1.4

10.0.2.0/24

eth0.6

SW2 r6r4

eth1.5

10.0.5.0/24

Figure 7.11: Basic network for configuring OSPF.

Exercise3– In this exercise, we are going to study the basic OSPF messages and how neighbors establishrelationships. To do so, you have to start the scenario ospf-basic on your host platform. You can obtainthe initial configuration by executing the label “initial”. The scenario is depicted in Figure 7.11.

1. Capture tap0 on phyhost, open vtysh in r1 and activate eth0 for OSPF area 0.

Show the OSPF database and annotate which is the current router-id.

Describe the IGMP messages and the OSPF Hello messages that you observe.

In particular, describe the 5th Hello message and its previous IGMP message.

In r1, deactivate OSPF on eth0 and describe the packets that you observe.

2. With the ip command, add the IP addresses 192.168.0.1/32 and 192.168.0.2/32 to the loopbackinterfaces (lo) of r1 and r2 respectively.

Set 192.168.0.1/32 and 192.168.0.2/32 as router-id respectively on r1 and r2 and add these addressesto OSPF area 0.

Finally, activate OSPF area 0 on the eth0 interfaces of r1 and r2 almost at the same time in bothrouters.

101 Chapter 7. OSPF

Describe the traffic captured between these routers.

Which is finally the BDR router and which is the DR?

3. After r1 and r2 are in the FULL state, add the network 10.0.3.0/24 to OSPF area 0.

Describe the command used and traffic captured in tap0.

4. With the ip command, add the IP address 192.168.0.3/32 to the loopback interface (lo) of r3.

Set the router-id in OSPF to 192.168.0.3/32 and activate OSPF area 0 in eth0 and in the loopbackinterface.

Explain each command that you use and describe the traffic captured in tap0.

5. After r3 is in the FULL state with r1 and r2, add the network 10.0.4.0/24 to OSPF area 0.


6. With the ip command, add the IP address 192.168.0.4/32 to the loopback interface (lo) of r4.

Set the router-id in OSPF to 192.168.0.4/32 and activate OSPF area 0 in eth0 and in the loopbackinterface.

Explain each command that you use and describe the traffic captured in tap0.

7. After r4 is in the FULL state with r1 and r2, add the network 10.0.2.0/24 to OSPF area 0 on thisrouter.


8. Stop OSPF in r2.

Describe the command used and the traffic captured in tap0.

Which is now the DR? and the BDR?

9. In r5, set the router-id to 192.168.0.5/32 but do not assign this address to the loopback interface.Activate the network 10.0.1.0/24 to OSPF area 1 in routers r3 and r5.

Describe the commands used and the traffic captured in tap0.

10. In r6, set the router-id to 192.168.0.6/32 but do not assign this address to the loopback interface.Active the network 10.0.2.0/24 in routers r4 and r6 in area 2, defined as stubby.

Describe the commands used and the traffic captured in tap0, tap1 and tap2.

Describe the OSPF routes and OSPF database of r6.

11. In r5, use the command redistribute connected to redistribute the network 10.0.5.0/24.

Describe the commands used and the traffic captured (LSAs) in tap0, tap1 and tap2.

How it is possible that r6 reaches 10.0.5.5?

12. In r5, configure a default route pointing to 10.0.5.111.

Then, originate in r5 a default route in OSPF.

Describe the commands used and the traffic captured in tap0 and tap1.

13. Describe the list of border-routers that r1 has. Then, set the eth0 of r5 down.

Describe the commands used and the traffic captured in tap0. In particular, take a look at the ageof the LSAs.

Describe the list of border-routers that r1 has now.

Set the eth0 of r5 up again.

Describe the commands used and the traffic captured in tap0.

Chapter 7. OSPF 102

7.8 Answers to practices

Exercise 3

Labels:exec two-neighborsexec r1-add-net-10exec r3-addexec r2-add-net-10exec r4-addexec r4-add-net-10-a0exec quagga_stop r2exec r3-r5-a1exec r4-r6-a2sexec r5-redisexec r5-defaultexec r5-eth0-downexec r5-eth0-up1. We activate OSPF on eth0 of r1 and we can see that Router id is 10.0.1.1:

root@r1:~# vtyshHello, this is Quagga (version 0.99.20.1).Copyright 1996-2005 Kunihiro Ishiguro, et al.r1# configure terminalr1(config)# router ospfr1(config-router)# network 172.16.0.0/24 area 0r1(config-router)# exitr1(config)# exitr1# show ip ospf route============ OSPF network routing table ============N 172.16.0.0/24 [10] area: 0.0.0.0

directly attached to eth0============ OSPF router routing table ========================= OSPF external routing table ===========r1# show ip ospf database

OSPF Router with ID (10.0.1.1)Router Link States (Area 0.0.0.0)

IGMP message of group V3: You can see that a group of multicast is created: 224.0.0.5 and acceptany network: 225.0.0.5 Change To Exclude mode -> Num Src:0 (That means no Src is excluded or anyaccepted)

--------------------------------------------------------------------------Internet Group Management Protocol

[IGMP Version: 3]Type: Membership Report (0x22)Header checksum: 0xf9f8 [correct]Num Group Records: 1Group Record : 224.0.0.5 Change To Exclude Mode

Record Type: Change To Exclude Mode (4)Aux Data Len: 0Num Src: 0Multicast Address: 224.0.0.5 (224.0.0.5)

--------------------------------------------------------------------------

OSPF Hello Messages: The packet is from the OSPF Router 10.0.1.1. It’s in the area id 0.0.0.0 Andthere is not Designated Router yet.

103 Chapter 7. OSPF

--------------------------------------------------------------------------Open Shortest Path First

OSPF HeaderOSPF Version: 2Message Type: Hello Packet (1)Packet Length: 44Source OSPF Router: 10.0.1.1 (10.0.1.1)Area ID: 0.0.0.0 (Backbone)Packet Checksum: 0xf19d [correct]Auth Type: NullAuth Data (none)

OSPF Hello PacketNetwork Mask: 255.255.255.0Hello Interval: 10 secondsOptions: 0x02 (E)

0... .... = DN: DN-bit is NOT set.0.. .... = O: O-bit is NOT set..0. .... = DC: Demand Circuits are NOT supported...0 .... = L: The packet does NOT contain LLS data block.... 0... = NP: NSSA is NOT supported.... .0.. = MC: NOT Multicast Capable.... ..1. = E: External Routing Capability.... ...0 = MT: NO Multi-Topology Routing

Router Priority: 1Router Dead Interval: 40 secondsDesignated Router: 0.0.0.0Backup Designated Router: 0.0.0.0

--------------------------------------------------------------------------

5th hello packet and the previous IGMP message: You can see that a new multicast group is created:224.0.0.6. This address is listened by the Designated Router. And in the 5th Hello message, the network172.16.0.1 is self-defined as the DR.


[IGMP Version: 3]Type: Membership Report (0x22)Header checksum: 0xf9f7 [correct]Num Group Records: 1Group Record : 224.0.0.6 Change To Exclude Mode

Record Type: Change To Exclude Mode (4)Aux Data Len: 0Num Src: 0Multicast Address: 224.0.0.6 (224.0.0.6)

--------------------------------------------------------------------------Source: 172.16.0.1 (172.16.0.1)Destination: 224.0.0.5 (224.0.0.5)

Open Shortest Path FirstOSPF Header

OSPF Version: 2Message Type: Hello Packet (1)Packet Length: 44Source OSPF Router: 10.0.1.1 (10.0.1.1)Area ID: 0.0.0.0 (Backbone)Packet Checksum: 0x458c [correct]Auth Type: NullAuth Data (none)

OSPF Hello PacketNetwork Mask: 255.255.255.0Hello Interval: 10 secondsOptions: 0x02 (E)


Router Priority: 1Router Dead Interval: 40 seconds

Chapter 7. OSPF 104

Designated Router: 172.16.0.1Backup Designated Router: 0.0.0.0

--------------------------------------------------------------------------

When we deactivate OSPF on eth0:

root@r1:~# vtysh

Hello, this is Quagga (version 0.99.20.1).Copyright 1996-2005 Kunihiro Ishiguro, et al.

r1# configure terminalr1(config)# router ospfr1(config-router)# no network 172.16.0.0/24 area 0

An IGMP message of group version 3. Change to Include Mode, Num Src:0 is the way that describesno src are included in those groups of multicast. So the multicast groups are removed.


[IGMP Version: 3]Type: Membership Report (0x22)Header checksum: 0x17f1 [correct]Num Group Records: 2Group Record : 224.0.0.6 Change To Include Mode

Record Type: Change To Include Mode (3)Aux Data Len: 0Num Src: 0Multicast Address: 224.0.0.6 (224.0.0.6)

Group Record : 224.0.0.5 Change To Include ModeRecord Type: Change To Include Mode (3)Aux Data Len: 0Num Src: 0Multicast Address: 224.0.0.5 (224.0.0.5)

--------------------------------------------------------------------------

——————————————————————————————————————————————

2. Configuring loopback interfaces:

r1:~# ip address add 192.168.0.1/32 dev lor2:~# ip address add 192.168.0.2/32 dev lo

Configuring route-id and add loopback to OSPF:

r1(config-router)# router-id 192.168.0.1r1(config-router)# network 192.168.0.1/32 area 0r2(config-router)# router-id 192.168.0.2r1(config-router)# network 192.168.0.2/32 area 0

Activate OSPF on eth0:

r1(config-router)# network 172.16.0.0/24 area 0r2(config-router)# network 172.16.0.0/24 area 0

This is the dialog between routers.No. Time Source Destination Protocol Info

1 0.000000 172.16.0.1 224.0.0.22 IGMP Join group 224.0.0.5 any sources2 0.000357 172.16.0.1 224.0.0.5 OSPF Hello Packet3 2.295988 172.16.0.2 224.0.0.22 IGMP Join group 224.0.0.5 any sources4 2.296421 172.16.0.2 224.0.0.5 OSPF Hello Packet5 6.116575 172.16.0.2 224.0.0.22 IGMP Join group 224.0.0.5 any sources6 6.217459 172.16.0.1 224.0.0.22 IGMP Join group 224.0.0.5 any sources7 10.003493 172.16.0.1 224.0.0.5 OSPF Hello Packet8 12.302013 172.16.0.2 224.0.0.5 OSPF Hello Packet9 20.012674 172.16.0.1 224.0.0.5 OSPF Hello Packet

10 22.310419 172.16.0.2 224.0.0.5 OSPF Hello Packet11 30.022904 172.16.0.1 224.0.0.5 OSPF Hello Packet

105 Chapter 7. OSPF

12 32.310923 172.16.0.2 224.0.0.5 OSPF Hello Packet13 39.970169 fe:fd:00:00:01:00 ff:ff:ff:ff:ff:ff ARP Who has 172.16.0.2? Tell 172.16.0.114 39.970292 fe:fd:00:00:02:00 fe:fd:00:00:01:00 ARP 172.16.0.2 is at fe:fd:00:00:02:0015 39.970379 172.16.0.1 172.16.0.2 OSPF DB Description16 40.031263 172.16.0.1 224.0.0.5 OSPF Hello Packet17 42.259545 172.16.0.2 172.16.0.1 OSPF DB Description18 42.260308 172.16.0.1 172.16.0.2 OSPF DB Description19 42.260945 172.16.0.2 172.16.0.1 OSPF DB Description20 42.261430 172.16.0.1 172.16.0.2 OSPF DB Description21 42.261791 172.16.0.1 172.16.0.2 OSPF LS Request22 42.262416 172.16.0.2 172.16.0.1 OSPF LS Request23 42.262783 172.16.0.2 224.0.0.5 OSPF LS Update24 42.263457 172.16.0.1 224.0.0.6 OSPF LS Update25 42.264027 172.16.0.1 224.0.0.6 OSPF LS Update26 42.264607 172.16.0.2 224.0.0.5 OSPF LS Update27 42.284870 172.16.0.2 224.0.0.22 IGMP Join group 224.0.0.6 any sources28 42.315440 172.16.0.2 224.0.0.5 OSPF Hello Packet29 42.316713 172.16.0.1 224.0.0.5 OSPF LS Update30 42.336952 172.16.0.1 224.0.0.22 IGMP Join group 224.0.0.6 for any sources31 42.992393 172.16.0.1 224.0.0.5 OSPF LS Acknowledge32 43.144049 172.16.0.2 224.0.0.22 IGMP Join group 224.0.0.6 for any sources33 43.265322 172.16.0.2 224.0.0.5 OSPF LS Acknowledge34 44.263642 172.16.0.1 224.0.0.22 IGMP Join group 224.0.0.6 for any sources35 47.266813 fe:fd:00:00:02:00 fe:fd:00:00:01:00 ARP Who has 172.16.0.1? Tell 172.16.0.236 47.266923 fe:fd:00:00:01:00 fe:fd:00:00:02:00 ARP 172.16.0.1 is at fe:fd:00:00:01:0037 47.267541 172.16.0.2 172.16.0.1 OSPF LS Update38 48.025313 172.16.0.1 224.0.0.5 OSPF LS Acknowledge39 50.041379 172.16.0.1 224.0.0.5 OSPF Hello Packet40 52.269193 172.16.0.1 172.16.0.2 OSPF LS Update41 52.320645 172.16.0.2 224.0.0.5 OSPF Hello Packet42 52.341199 172.16.0.2 224.0.0.5 OSPF LS Acknowledge43 60.050822 172.16.0.1 224.0.0.5 OSPF Hello Packet44 62.330172 172.16.0.2 224.0.0.5 OSPF Hello Packet45 70.059256 172.16.0.1 224.0.0.5 OSPF Hello Packet

Database exchange is started with DD packets. A DB Description packet (packet 15):

Source: 172.16.0.1 (172.16.0.1)Destination: 172.16.0.2 (172.16.0.2)

Open Shortest Path FirstOSPF Header

OSPF Version: 2Message Type: DB Description (2)Packet Length: 32Source OSPF Router: 192.168.0.1 (192.168.0.1)Area ID: 0.0.0.0 (Backbone)Packet Checksum: 0xe0a6 [correct]Auth Type: NullAuth Data (none)

OSPF DB DescriptionInterface MTU: 1500Options: 0x02 (E)


DB Description: 0x07 (I, M, MS).... 0... = R: OOBResync bit is NOT set.... .1.. = I: Init bit is SET.... ..1. = M: More bit is SET.... ...1 = MS: Master/Slave bit is SET

DD Sequence: 1351681049

In the previous packet the bits I,M,MS are set to 1:

I-bit: The Init bit. When set to 1, this packet is the first in the sequenceof Database Description Packets.

M-bit: The More bit. When set to 1, it indicates that more Database DescriptionPackets are to follow.

MS-bit: The Master/Slave bit. When set to 1, it indicates that the router isthe master during the Database Exchange process.Otherwise, the router is the slave.

.............

Chapter 7. OSPF 106

Chapter 8

Conclusions

The main purpose of this thesis has been to build a Virtual Network Laboratory Environment. This concepthas been performed with a virtual environment generator called VNUML developed by “UniversidadPolitecnica de Madrid”. All the component of this labs like VNUML, Quagga and the tools to test it likeWireshark has been studied before. In fact, we’ve learnt how to install Ubuntu in an USB pen-drive and allthe study environment has been developed using this USB. During the thesis, we’ve realised that a virtualmachine can act as router whether there is installed Quagga, and more Quagga routers can communicateamong them using static routing or dynaminc routing protocol like RIP, OSPF. Moreover, we know that aQuagga router can communicate with a physical network with commercial routers exchanging information.All these results allow to wonder the following consideration:

• Virtual Network Laboratory Environment can be used from any student to excercise their networkingskills.

• Virtual Network Laboratory Environment can be used in a business system.

These two point will be discussed in the next sections.

8.1 Virtual Network Laboratory Environment as Learning Environment

Nowadays, a student that wants to exercise and learn how to configure routers has to use the laboratorywith commercial routers installed in a room of his/her University. Sometimes happen that the labs canbe busy because other students are using it, and the free routers available are not enough to performexercises. So the student must wait. The reasons can be different, one because a lab can host a limitof router for space reasons and then because a router has a cost for a University. A Virtual NetworkLaboratory Environment allow students to create and test like it was real commercial router labs in asingle physical machine. This allows student to save time and University to save money. Using a VirtualNetwork Laboratory Environment a student can learn how to configure commercial routers, he/she canalso learn how to set static routing and dynamic routing protocol like RIP, OSPF and so on. A student canalso learn how perform troubleshooting for networks. For example, a teacher can use wireshark, whichallows to see in detail the packets, on the implemented lab to study TCP/IP protocols. But it is necessaryto remark that a virtual labs cannot replace a laboratory and this is not its purpose. The goal is to improvethe ways a student can learn.

Once a student has performed a lab scenario, he has available a network. This means that he couldenhance his knowledge not only in networking but also in other matter. A network consists of routers,workstation but also server. So a student can for example perform a server farm or starting building aDMZ (demilitarized zone). This allows student for example to start configuring a web server or a proxyserver.

107

Chapter 8. Conclusions 108

Actually, a Virtual Network Laboratory Environment has some limits. Its performance depends on itshardware capacity. For example a virtual machine with only 250 MB of RAM cannot execute quaggarouter and web server and proxy server and so on. So the first limit of virtual machine is the hardwarecapacity of the physical machine.

8.2 Virtual Network Laboratory Environment as Working Environment

In this section, the thesis tries to wonder if a Virtual Network Laboratory Environment can help or givesome advantages whether used in working environment. This term is used to identify all the possible wayto implement the lab in an enterprise or a company. Honestly, it is difficult to think that a enterprise mayneed to swap its physical network of commercial routers with Quagga routers even whether it might savemoney. The reason is the heavy amount of traffic used in a big company. It is not possible to say the samewith a virtual machines yet. The main limit will be how fast a machine can forward packets. But it couldbe used to help managing traffic in a virtual network in VMware Infrastructure or Xen.

Morover, processor speed can influence routing updates, routing convergence, route lookups. This isone reason why commercial routers like Cisco, Juniper, Foundry cost because route lookups are cachedin hardware and take nanoseconds to find versus a PC which might take quite a few milliseconds. In asmall business that has a normal ADSL modem used to connect to Internet and some workstation together,a Virtual Network Laboratory Environment could be useful. If this company wants to manage its smalltraffic, enhance its security can implement a physical machine running for example an Ubuntu Server. Onthis machine can run Quagga router to manage the traffic between the workstation, and can run a LinuxFirewall to protect the network and an ISP/IDS to verify what kind of packets are entering in the network.

8.3 RIP versus OSPF Conclusions

The differences that we can see in both protocols are mainly in their development, because both usealgorithms determining distances and lower cost. OSPF will always find the shortest path, however, theRIP protocol seeks the fewest number of hops between routers, for example using the RIP protocol ashorter or better way in terms of lower cost from Router origin to destination may exist, however whetherthis path have a greater number of routers or hops chosen it will be discarded.

RIP and OSPF have many features and mechanisms that can interact. In this thesis, we’ve taken intoaccount some of them, such as Split-horizon, poison reverse and trigger updates for RIP and the Router-id,LSAs (Link State Database) and Areas for OSPF.

RIP is primary intended for use as an IGP in networks of moderate size. In addition, RIP has somelimitations: the protocol is limited to networks whose longest path is 15 hops, depends upon “counting toinfinity” to resolve certain unusual situations and it uses fixed “metrics” to compare alternative routes. Soit is not appropriate for situations where routes need to be chosen based on real-time parameters.

We’ve also learnt Link state protocols as OSPF require more CPU than vector distance protocolslike RIP, but they are less prone to loops and their convergence time is also shorter. Convergence timeis better because a link state change is immediately spread over the network and then, the Dijkstraalgorithm is applied to update the shortest path. Now we know that configuring stub and totally stub areasreduces the topological database size (and the memory/processing requirements) of the routers insidethese areas. Hiding or limiting topology information makes the network more stable, helps to providefaster convergence, and allows for scalable OSPF routing design.

8.4 Future work

Actually Quagga still does not support MPLS, but MPLS support is going on. In the future, could beimplemented TCP/IP filtering control, QoS control, diffserv configuration to Quagga with the purpose

109 Chapter 8. Conclusions

to make a productive, quality, free TCP/IP routing software. A MPLS support could be very useful forstudent, which could learn how to build a MPLS network. At the moment there is no VOIP support forQuagga router. It could be interesting for a student to configure Quagga router to support VOIP traffic,using for example same softphone for testing calls. Recently it has been implemented the Multi-RouterLooking Glass. It is a Web-based utility that can be used to display the interfaces and routes recognized byzebra. MRLG is really nothing more then a Web Interface to the zebra shell with a limited set of command.In the features it could be improved with more commands to execute by web-interface. Today, there are noreal performance comparison between Quagga router and other physical router, so it could be interestingbuild two different networks, ones with Quagga router and the other with for example Cisco router. Thentested the performance of the two networks to calculate the throughput, packet loss and latency.

Additionally, it would be interesting to test the interworking between OSPF and RIP potocols inthe same simulation environment. And also to test different scenarios with more type of Areas in thecase of OSPF protocol. Another possible future work would be to test BGP and IS-IS Protocol usingthis environment with VNUML, making possible to test systems with tens of nodes and more complexenvironments.

Chapter 8. Conclusions 110

Part III

Appendices

111

Appendix A

Simulation Tools

A.1 A Wrapper for VNUML: simctl

With the aim of simplifying and extending the management capabilities of VNUML, the Depart-ment of Telematics Engineering (ENTEL) of the UPC has developed several modifications over thevnumlparser.pl of VNUML 1.8, a wrapper written for Bash called simctl and some other scripts.The modifications over the vnumlparser.pl are essentially for (i) allowing a virtual guest machine tohave several consoles connected to several pts in the phyhost (mpts functionality) and for (ii) allowingthe implementation of virtual networks with other virtual switches like VDE (Virtual Distributed Ethernet).

On the other hand, the script simctl allows you to: search for the different scenarios that youcan run, start a simulation, stop a simulation, list the virtual machines that are part of a simulation, listthe “labels” (seq attributes of <exec> tags) defined on each machine of a simulation, run defined labels,manage the consoles to access the virtual machines, view the network structure, and some more things.

Note. The simctl wrapper and other scripts and utilities are distributed as a debian packagecalled simtools. This document describes the version 2.5.22 of simtools.

A.2 Installation

The following instructions allow you to install the tools to build VNUML Virtual Networks and use oursimctl wrapper. This installation has been tested using the 32-bit version of Ubuntu 12.04. It is knownthat the installation does not work for the 64-bit version. You can check your architecture typing:

$ archi686

To install the simulation tools on a fresh Linux box you have to type the following command to addour repository to your list of APT (software) repositories:

$ echo deb http://sertel.upc.es/~vanetes/debs i386/|sudo tee /etc/apt/sources.list.d/simtools.list

Then, type the following commands to update the software repository list and to install all the packagesrelated to simtools.

$ sudo apt-get update$ sudo apt-get install metasimtools -y --force-yes

Note: you can repeat these steps if the software is not installed correctly at the first time.On the other hand, in case you have a 64-bit OS, we can provide to you a Virtual Machine with

everything already installed on it that can run with hardware virtualization (with VirtualBox or VMWare) .In this case, it is very important that you check that your processor supports hardware virtualization andthat you activate this feature in your BIOS. You can check this with the following command:

113

Appendix A. Simulation Tools 114

$ grep -E "(vmx|svm)" --color=always /proc/cpuinfo

If nothing is displayed after running that command, then your processor does not support hardwarevirtualization, and you will not be able to use our virtual machine fluently.

Finally, another possibility, if your physical machine is able to boot from USB (most modern computerssupport this), is to make a raw copy with dd of our ISO image on a USB pendrive and use this device ashard disk.

A.3 Profile for simctl

The simctl wrapper is compatible with the version 1.8 of VNUML but to be able to fully exploit thefunctionalities of this script you should consider the issues that are listed below:

• We don’t use the management network. Thus, we always use:frame<vm_mgmt t y p e =" none " / >

• The filesystem is always of type COW:frame< f i l e s y s t e m t y p e =" cow "> / u s r / s h a r e / vnuml / f i l e s y s t e m s / r o o t _ f s _ t u t o r i a l < / f i l e s y s t e m >

• We always use the "mconsole" execution mode:frame< v m _ d e f a u l t s exec_mode=" mconsole ">

• We use consoles of type “pts” and we can use multiple consoles of this type (mpts functionality) asfollows:frame< c o n s o l e i d =" 0 "> p t s < / c o n s o l e >< c o n s o l e i d =" 1 "> p t s < / c o n s o l e >

If there are <console> tags in both <vm_defauls> and <vm>, they are merged. Our wrapper,simctl, internally uses the screen application to automatically manage the connection to thesepseudo-terminal devices. Our wrapper is able to list the pseudo-terminals available and to allowyou to always connect to the virtual machines. With simctl, you will never loose the possibilityto have a console with a virtual machine while the simulation is running. You can even close aconsole and later reopen it without loosing any data. On the other hand, the mpts functionality isimplemented by our modified version of the vnumlparser.pl, which stores the names of themultiple pts devices at the same directory as the original vnumlparser.pl, but we use thefilenames pts.0, pts.1, etc.

• simctl always executes the label “start”, i.e. the <exec> tags with attribute seq="start", when itinitiates a simulation.

• simctl automatically creates a tap interface in the phyhost for each virtual network definition inwhich it founds a sock attribute. For example, if you define a virtual network like this:frame< n e t name=" Net0 " mode=" uml_swi t ch " hub=" yes " sock =" / v a r / run / vnuml / Net0 . c t l " / >

Then, simctl creates a tap interface in the phyhost called tap0. In more detail, simctl callsanother bash script called simtun. The script simtun is executed with root permissions, whichallows us to create the tap interfaces and execute the uml_switch instances in the phyhost. Inthis creation, the tap is connected with the uml_switch, which in turn, is connected with thevirtual machines creating the virtual network.

115 Appendix A. Simulation Tools

• simctl uses a configuration file for defining some basic parameters using the syntax of bash.To locate this file, the script first checks the file .simrc in the home directory of the user that isrunning simctl, if this file is not found, then simctl accesses the system-wide configurationfile located at /usr/local/etc/simrc. An example configuration file is the following:


1 # s i m r c : t u n n i n g o f e n v i r o n m e n t v a r i a b l e s2 # f o r s i m c t l3 # D e f i n i t i o n o f s c e n a r i o f i l e s d i r e c t o r y4 DIRPRACT=/ u s r / s h a r e / vnuml / s c e n a r i o s5 # Change t h e d e f a u l t t e r m i n a l t y p e ( x te rm )6 # v a l u e s : ( gnome | kde | l o c a l )7 # TERM_TYPE=gnome8 # KDE Konsole t u n n i n g9 # For Konsole v e r s i o n >= 2 . 3 . 2 (KDE 4 . 3 . 2 ) use t h e s e o p t i o n s :

10 # KONSOLE_OPTS="−p ColorScheme=GreenOnBlack −− t i t l e "11 # For Konsole v e r s i o n <= 1 . 6 . 6 (KDE 3 . 5 . 1 0 ) use t h e s e o p t i o n s12 # KONSOLE_OPTS="−−schema GreenOnBlack −T "

The configuration file can be customized. For example, the DIRPRACT environment variablecontains the “path” where VNUML simulation files can be found. On the other hand, if youwant to use a GNOME terminal instead of an xterm terminal, you can assign the variableTERM_TYPE=gnome.Note. When simctl runs console terminals, it tries to use a classic color settings with greenforeground on black background. This feature is modifiable for GNOME and KDE terminals. KDEKonsole terminal can be configured with the variable KONSOLE_OPTS, keeping in mind that thelast parameter must be the handle of the window title of the terminal. For gnome-terminal, youcan define and save a profile with name vnuml with custom features. Editing the gnome-terminalprofiles can be made in the edit menu.

• With simctl, you can always use the “TAB” key to auto-complete the commands and optionsavailable at each moment.

A.4 Simple Example Continued

Now, following the example of Section 3.3.3, we are going to show how to manage the scenario withsimctl instead of with the vnumlparser.pl and we are going to complete the scenario with morefunctionality by including the definition of IP network addresses, routes, the execution of commands,multiple consoles etc. Figure A.1 shows the design of the topology including IP addresses and networks.

uml3uml110.0.0.2/24eth1

uml2

eth110.0.1.5/24

eth210.0.1.1/24

eth110.0.0.1/24

Net0Net1

hostOSSimTNet0

Figure A.1: Simple Network Topology (Continued).

The Code A.1 shows a VNUML file that meets the topology and network configuration above exposed.The XML file contains also the configuration of additional functionalities. Next, we discuss the morerelevant aspects about this VNUML specification file. The first relevant aspect to mention is that theprevious definition uses multiple pseudo-terminals. In particular, notice that there is a tag <consoleid="0"> in the global element of the specification. This means that all the virtual machines will haveone console of type "pts". In addition, the definition of virtual machine uml1 includes another consoletag: <console id="1">. This means that this virtual machine is going to have two consoles of type "pts".Regarding the definition of virtual networks, notice that Net0 has been defined with the sock attribute,


which means that this network is going to be connected to the phyhost with a tap interface called tap0.The other virutal network defined, Net1, has not the sock attribute, and thus, it will not be connected toany tap interface of the phyhost.

Then, we have the definition of the three virtual machines uml1, uml2 and uml3. Regarding theIP configuration, as you can observe, we have configured the IP addresses as specified in Figure A.1and uml1 and uml3 have uml2 as their default router. The forwarding in uml2 has been activated too.Finally, we have several <exec> tags in the definition of each virtual machine. Notice that all the virtualmachines have the label “start“ which in this example enables the ”source routing“ functionality of thevirtual machines. Again, all the virtual machines have the label ”reset_ips“, which simply removes theIP addresses of the Ethernet network interfaces of the virtual machines (and as a result this action alsoremoves all the routes from the routing tables). Finally, the virtual machine uml2 has two labels called”enable_forwarding“ and ”disable_forwarding“ that allow us to enable and disable IP forwarding (whichby default is enabled when uml2 is booted).



<!−− Global definitions −−><global>

<version>1.8</version><simulation_name>simple_example</simulation_name><automac/><vm_mgmt type="none" /><vm_defaults exec_mode="mconsole">

<filesystem type="cow">/usr/share/vnuml/filesystems/root_fs_tutorial</filesystem><kernel>/usr/share/vnuml/kernels/linux</kernel><console id="0">pts</console>

</vm_defaults></global><!−−Network definitions −−>

<net name="Net0" mode="uml_switch" hub="yes" sock="/var/run/vnuml/Net0.ctl" /><net name="Net1" mode="uml_switch" /><!−− Virtual machines definition −−><vm name="uml1">

<console id="1">pts</console><if id="1" net="Net0"> <ipv4>10.0.0.2/24</ipv4> </if><route type="ipv4" gw="10.0.0.1">default</route><exec seq="start" type="verbatim">echo "1" >/proc/sys/net/ipv4/conf/all/accept_source_route</exec>

<exec seq="reset_ips" type="verbatim">ifconfig eth1 0.0.0.0</exec></vm><vm name="uml2">

<if id="1" net="Net0"> <ipv4>10.0.0.1/24</ipv4> </if><if id="2" net="Net1"> <ipv4>10.0.1.1/24</ipv4> </if><forwarding type="ip" /><exec seq="start" type="verbatim">echo "1" >/proc/sys/net/ipv4/conf/all/accept_source_route</exec>

<exec seq="reset_ips" type="verbatim">ifconfig eth1 0.0.0.0</exec><exec seq="reset_ips" type="verbatim">ifconfig eth2 0.0.0.0</exec><exec seq="enable_forwarding" type="verbatim"> echo "1" >/proc/sys/net/ipv4/ip_forward </exec><exec seq="disable_forwarding" type="verbatim"> echo "0" >/proc/sys/net/ipv4/ip_forward </exec>


<if id="1" net="Net1"> <ipv4 mask="255.255.255.0">10.0.1.5</ipv4> </if><route type="ipv4" gw="10.0.1.1">default</route><exec seq="start" type="verbatim">echo "1" >/proc/sys/net/ipv4/conf/all/accept_source_route</exec>

<exec seq="reset_ips" type="verbatim">ifconfig eth1 0.0.0.0</exec></vm>

</vnuml>

Code A.1: VNUML File for the Simple Example (Continued)

A.5 Getting Started with simctl

Now, if you store the previous file with, for example, the name ”simple_example.vnuml“ in a place inwhich simctl can locate it (by default, you can use the directory /usr/share/vnuml/scenarios or properlyset the variable DIRPRACT in the .simrc file), then you can execute simctl without parameters, andyou should obtain the list of possible scenarios that you can run. For example:

phyhost$ simctl

simctl ( icmp | routing | subnetting | simple_example ) (OPTIONS)

OPTIONSstart Starts scenario


stop Stops scenariostatus State of the selected simulation

(running | stopped)vms Shows vm's from simulationlabels [vm] Shows sequence labels for ALL vm's or for vmexec label [vm] Exec label in the vms where label is defined

or exec the label only in the vmnetinfo Provides info about network connectionsget [-t term_type] vm [pts] Gets a terminal for vm

term_type selects the terminal(xterm | gnome | kde | local)pts is an integer to select the console

The output of simctl in this example tells us that it has located four scenarios with names: icmp,routing, subnetting and simple_example. This output also shows us all the possibilities that simctlprovides us to manage the scenario. These possibilities are explored in the following sections.

A.6 Start and Stop Scenarios

To start a particular scenario, you must execute simctl in the phyhost with the name of the selectedscenario and use the start option. For example:

phyhost$ simctl simple_example start..............Total time elapsed: 77 secondsphyhost$

Please, be patient because it might take some time to complete the starting process (this might take upto several minutes). Finally, the command ends indicating the time taken to start the scenario and we getthe console prompt again. At this moment, all the virtual machines and their respective interconnectionswith virtual switches have been created. After the scenario is started, you can check in the phyhost that thecorresponding tap interfaces have been created. In particular, after you run the simple_example scenario,if you type ifconfig -a in the phyhost two view all the network interfaces, you should obtain anoutput as follows:

phyhost$ ifconfig -aeth1 Link encap:Ethernet HWaddr 00:23:ae:1c:51:29

inet addr:192.168.234.252 Bcast:192.168.234.255 Mask:255.255.255.0inet6 addr: fe80::223:aeff:fe1c:5129/64 Scope:Link.....

tap0 Link encap:Ethernet HWaddr be:50:6a:c1:55:49BROADCAST MULTICAST MTU:1500 Metric:1.....

lo Link encap:Local Loopbackinet addr:127.0.0.1 Mask:255.0.0.0inet6 addr: ::1/128 Scope:HostUP LOOPBACK RUNNING MTU:16436 Metric:1.....

On the other hand, when you wish to stop the simulation, you can type the following:

phyhost$ simctl simple_example stop.......Total time elapsed: 17 secondsphyhost$

Usually stopping a simulation is faster than starting it.


A.7 Troubleshooting

A machine might spent some time while booting. Then, it might appear a message telling us to retry,continue or abort. Type always retry (r).

However, if a simulation never starts or stops, to clear the system, type CRL+c and then:

phyhost$ simctl simulation_name stopphyhost$ simctl forcestop

Finally, reboot the physical host. The forcestop option kills all the ”linux“ processes (UMLkernels) and removes the directory .vnuml at the user’s home. After rebooting the system you should beable to start the simulation again.

On the other hand, you should never run two different simulations at the same time. If by mistake youstart two simulations do:

phyhost$ simctl simulation_name1 stopphyhost$ simctl simulation_name2 stopphyhost$ simctl forcestop

Finally, you should never use the superuser ”root“ to execute simctl. If by mistake you start asimulation with the root user, you must clear the system and start it again using your user:

phyhost$ sudo -sphyhost# simctl simulation_name stopphyhost# simctl forcestopphyhost# exit

A.8 Access to Virtual Machines

One we have a simulation running, we can list the available virtual machines and connect to them usingsimctl. Following our example, let us assume that we have the simple_example scenario alreadyrunning. At this moment, we can use the command simctl simple_example vms to list thevirtual machines that are part of the simulation (vms stands for "virtual machines"):

phyhost$ simctl simple_example vmsVirtual machines from simple_example:1 uml12 uml23 uml3

The “get” option of the simctl command is used to access to the virtual machines consoles. If yourun the “ get” option without parameters, you will obtain information about the state of virtual machines(“Running” or “Not Running”) and the possibility to access their command consoles. In the exampleshown, the dashed lines (- - - - - - - - ) indicate that all virtual machines have enabled consoles but that wehave not yet accessed to any of them.

phyhost$ simctl simple_example getuml1 Running --------uml2 Running --------uml3 Running --------

To access to the console of a virtual machine you have to execute “simctl simname get virtual_machine”.For example, you can get a command console of the virtual machine uml1 using the following command:

phyhost$ simctl simple_example get uml1


The “get” option can have an argument (-t) to indicate what type of terminal you want to use (itrequires that selected terminal emulator is already installed on the system). The argument values (-t) andthe terminals can be any of: xterm (classic in X11), gnome (terminal from GNOME) or kde (Konsolefrom KDE). For example, to get a gnome-terminal for the uml2 virtual machine, you can type (you canalso define this terminal as default in your preferences file “simrc”):

phyhost$ simctl simple_example get -t gnome uml2

Once you have accessed to the console of the virtual machines uml1 and uml2, you can type:

phyhost$ simctl simple_example getuml1 10121.0 (Attached)uml2 10169.0 (Attached)uml3 Running --------

“Attached” indicates that there is already a terminal associated with the command console of uml1virtual machine. If you close the uml1 terminal then the terminal state is “Detached”.

Finally, you can also access to the other console of uml1 using the following command:

phyhost$ simctl simple_example get uml1 1

Now, if you try the following command:

phyhost$ simctl simple_example getuml1 14272.1 (Attached)uml1 10121.0 (Attached)uml2 10169.0 (Attached)uml3 Running --------

As you observe, uml1 has two consoles attached. Notice that if you try to get a second console onuml2 you will obtain an error:

phyhost$ simctl simple_example get uml2 1Error: This virtual machine has not the console 1 enabled

A.9 Network Topology Information

Using the syntax “simctl simname netinfo” you can access to the connection topology of virtualmachines. For example:

phyhost$ simctl simple_example netinfoUML IFACE NETuml1 eth1 Net0uml2 eth1 Net0uml2 eth2 Net1uml3 eth1 Net1

As you observe, the output is the topology defined in the XML file. This command can be useful todetect mistakes in the topology configuration. It is also worth to mention, that the “netinfo” option usesinformation directly obtained from the virtual machines (not from the VNUML file) when the simulationis running.

A.10 Managing and Executing Labels

VNUML allows you to define a set of actions to be executed on virtual machines while running asimulation scenario. VNUML uses labels specified in the “seq” attributes of the <exec> and <filetree>tags to associate a label name to a set of actions. Labels allow you to easily distinguish a certain set of


actions from another set of actions. The assignment of label names to the associated actions takes place inthe VNUML specification file, in particular, in the XML element that defines each virtual machine. Thesimctl wrapper has two options, “labels” and “exec ”, to facilitate the management and execution oflabels. Let us show how the first option (“labels“) works with our simple_example scenario.

phyhost$ simctl simple_example labelsuml1 : reset_ipsuml2 : reset_ips enable_forwarding disable_forwardinguml3 : reset_ips

As shown in the output of the previous command, with the ”labels“ option we obtain the list of definedlabels per virtual machine. You can also view the labels of a specific virtual machine using the name ofthe virtual machine:

phyhost$ simctl simple_example labels uml2uml2 : reset_ips enable_forwarding disable_forwarding

The other option of simctl (”exec“) allows you to manage the execution of the actions (commands)of a label. The command syntax “simctl simname exec labelname” executes the commandsassociated with the label “labelname” on all the virtual machines where that label is defined. For example:

phyhost$ simctl simple_example exec reset_ipsVirtual machines group: uml3 uml2 uml1OK The command has been started successfully.OK The command has been started successfully.OK The command has been started successfully.Total time elapsed: 0 seconds

Recall that in our example, the “reset_ips” label removes the IP addresses from all the interfaces ofthe virtual machines. Finally, you can also run a label on a single machine with the following syntax:

phyhost$ simctl simple_example exec disable_forwarding uml2OK The command has been started successfully.Total time elapsed: 0 seconds

Notice that in this particular example, the same result can be obtained executing the previous com-mand without specifying the virtual machine (uml2). This is true because in this case, the label “dis-able_forwarding” is only defined for uml2. In general, if a label is multiply defined on several virtualmachines, with the previous command, the actions of a label are only executed on the specified virtualmachine but not on the rest of virtual machines that have defined that label.

A.11 Install Software

To install a package XXX in the master filesystem, you have to use the following commands:

phyhost$ sudo -sphyhost# cd /usr/share/vnuml/filesystemsphyhost# mkdir imgphyhost# mount -o loop filesystem.fs img/phyhost# cp /etc/resolv.conf img/etc/resolv.confphyhost# mount -t proc none img/procphyhost# chroot img

Where “filesystem.fs” must be replaced by the name of the file under the /usr/share/vnuml/filesystemsdirectory that contains the filesystem in which you want to install software. Then, to install software type:

phyhost# apt-get updatephyhost# apt-get install XXX #install package


To finish:

phyhost# exitphyhost# umount img/procphyhost# fuser -k imgphyhost# umount imgphyhost# exitphyhost$ simctl forcestop

A.12 Drawbacks of Working with Screen

The screen application is a screen manager with a VT100/ANSI terminal emulation. screen is usedby simctl to manage the virtual machine terminals through the get function (simctl simname getvm) when pts option is used in the vnuml file in the <console> tag. However, two problems arise whenusing pseudo-tty’s (pts) with screen. While these problems are not critical, they can be annoying. Next,we present the problems and they “workaround” solutions.

• Problem 1. The UML Kernel is not aware of terminal size. The consequence of this is that whenyou resize the terminal in which you are executing screen, the size of the terminal is not refreshed.

Workaround to problem 1. We can use stty command to indicate terminal size to the UML kernel.For example "stty cols 80 rows 24". Elaborating a little bit more this solution, we can use the keybinding facility of screen to do so.

– Copy the file /usr/local/share/doc/simtools/screenrc.user into $HOME/.screenrc

– Put the /usr/local/share/doc/simtools/setgeometry script in a directory included in your PATHenvironment variable and enable the execution permissions for this script.

The file .screenrc contains a binding for the key combination Ctrl+a f to the setgeometrycommand. Once in a virtual machine terminal and when the terminal size is modified, keypressCtrl+a f and then the script setgeometry will be executed, finding the new terminal geometry andbuilding, flushing and executing the appropriate stty command (stty cols NUMCOLS rowsNUMROWS) in the virtual machine shell.

• Problem 2. The other problem is the lack of screen of terminal scrolling.

Workaround to problem 2. screen has a scrollback history buffer for each virtual terminal of100 lines high. To enter screen into scrolback mode press Ctrl+a ESC. In this mode cursor keys canbe used to scroll across terminal screen. To exit scrollback mode press ESC.

Appendix B

Ubuntu in a Pen-drive

B.1 Install

In this appendix we show how to install Ubuntu in an USB pen-drive. Once the system has been installed,we configure properly the system in order to extend as much as possible the life of the USB device.

Important note. This installation procedure is only valid for x86 architectures, which meansthat it is not valid for MAC computers.

Let’ start with the process. First, we must insert the Unbutu CD and configure the BIOS of ourcomputer to boot from the CD unit. Next, we have to choose the language for installation, then select“Install Ubuntu” and choose the language for the OS (Figure B.1)

Then, we select the time zone and the keyboard layout (Figure B.2).

Now, we must select the disk and the partition in which we want to install the OS.Now, we must select to specify disk partitions manually (advanced).

Figure B.3: Advanced specification of disk partitions

We must properly identify the disk device

Figure B.4: Advanced specification of disk partitions 2

125

Appendix B. Ubuntu in a Pen-drive 126

Figure B.1: First selection windows

In this sample installation /sda is our SATA hard disk so the disk partition selected is /sdb, which isthe device of our USB pen-drive. Note. The size of disks can help us to select correctly the device ofthe pen-drive.

The next step requires filling in the fields for our account (see Figure B.5). Note. This account willbe by default in the “sudoers” file with administration capabilities.

Figure B.5: Account Information

127 Appendix B. Ubuntu in a Pen-drive

Figure B.2: Time zone and Keyboard layout

This step is tricky, because if we make a mistake when selecting the disk/partitionwe can cause data loss in the system.

The next step allows us to import documents and settings for programs (like Firefox) or other OS(like Windows) already installed in the disk units. In this case, we will press "Next" as we do not want toimport anything.

Figure B.6: Import documents and settings

We are at the last step before starting the installation. This window reports all the parameters we haveset. However, it also has an option called “Advanced”. We must type this button.


Figure B.7: Final installation step

In this window we have the possibility of selecting where we want to install the boot loader (Grub2in the current Ubuntu version). In this case, we simply need to specify /dev/sdb, which is the device inwhich we are doing the installation.

Figure B.8: Selecting the partition of the Boot loader

GRUB (GRand Unified Bootloader) is a multi boot manager used in many “Linux distros” to startseveral operating systems within the same computer. It is very important to correctly make this finalstep because with this window we modify the boot sector of the corresponding hard disk (USBpen-drive) called MBR (Master Boot Record) in order to properly point to the boot loader code.If we miss this step we will install the boot loader over the defalt disk, probably /dev/sda and thecomputer will not be able to boot neither from its main disk (/dev/sda) nor from the USB pen-drive(/dev/sdb).

Once the installation is complete, we might have to modify our BIOS settings to select the USB deviceas primary boot device.

129 Appendix B. Ubuntu in a Pen-drive

B.2 Tunning the system

Flash drives and solid state drives are shock resistant, consume less power than traditional disks, produceless heat, and have very fast seek times. However, these type of disks have a more limited total numberof write operations than traditional disks. Fortunately, there are some tweaks you can make to increaseperformance and extend the life of these type of disks.

• The simplest tweak is to mount volumes using the noatime option. By default Linux will write thelast accessed time attribute to files. This can reduce the life of your disk by causing a lot of writes.The noatime mount option turns this off. Ubuntu uses the relatime option by default. For your diskpartitions, replace relatime with noatime and nodiratime in /etc/fstab. You can also add the optioncommit=120 to set to 120 seconds the pedding data writes:

/dev/sda1 / ext4 noatime,nodiratime,commit=120,errors=remount-ro 0 0

• Another tweak is using a ramdisk instead of a physical disk to store temporary files. This will speedthings up and will protect your disk at the cost of a few megabytes of RAM. We will make thismodification in our system so, edit your /etc/fstab file and add the following lines:

tmpfs /tmp tmpfs defaults 0 0tmpfs /var/tmp tmpfs defaults 0 0

This lines tell the Kernel to mount these temporary directories as tmpfs, a temporary file system. Wewill avoid unnecessary disk activity and in addition when we halt the system we will automaticallyget rid of all temporal files.

• Reduce “swap” operations that write to disk. Change the value to a lower number

sudo sysctl -w vm.swappiness=0

Paste vm.swappiness=5 in /etc/sysctl.conf. Reboot and check the setting again to confirm thechanges were made in /proc/sys/vm/swappiness.

• The final tweak is related with our web browser. If you use Firefox, this browser puts its cachein your home partition. By moving this cache in RAM you can speed up Firefox and reduce diskwrites. Complete the previous tweak to mount /tmp in RAM, and you can put the cache there aswell. Open about:config in Firefox. Right click in an open area and create a new string valuecalled browser.cache.disk.parent_directory and set the value to /tmp (see FigureB.9). When we reboot the system all the previous changes must be active.


Figure B.9: Firefox cache in /tmp

Appendix C

Introduction to Unix/Linux

C.1 Introduction to OS

This chapter provides some basic background about the Linux Operating System. In short, an OperatingSystem (OS) is a set of software whose purpose is to (i) manage the resources of a computer system while(ii) providing an interface for the interaction with human beings.

• Resources management. The computer resources that can be managed by an OS include CPU,RAM, and input/output (I/O) devices. For example, in all modern computer systems, it is possibleto run more than one process simultaneously. So, the OS is responsible for allocating the CPUexecution cycles and RAM memory for each process. Regarding I/O devices, these include storagedevices like a Hard Disk (HDD), a Compact Disk (CD), a Digital Versatile Disc (DVD), a UniversalSerial Bus (USB) device, etc. but also communication devices like wired networking (Ethernetcards) or wireless networking (WIFI cards). An introduction to the organization of Linux thatallows this OS to achieve a proper way of managing and accessing system resources is provided inSection C.2.

• User Interaction. Another essential issue is how the OS allows interaction between the user(human being) and the computer. This interaction includes operations over the file system (copy,move, delete files), execution of programs, network configuration, and so on. The two main systeminterfaces for interaction between the user and the computer - CLI and GUI - are further discussedin Section C.3.

C.2 Resources Management

C.2.1 History

Nowadays, most deployed OS come either from Microsoft WINDOWS/DOS or from UNIX. We mustremark that UNIX and DOS were initially designed for different purposes. While DOS was developedfor rather simple machines called Personal Computers (PC), UNIX was designed for more powerfulmachines called Mainframes. DOS was originally designed to run only one user process1 or task at a time(mono-process or mono-task) and to manage only one user in the system (mono-user).

On the other hand, UNIX was directly designed as a multi-user system. This has numerous advantages,security for example, which is necessary for protection of sensitive information, was designed at the verybeginning in UNIX. UNIX was designed also as multi-task being able of managing several users runningseveral programs at the same time. As shown in Figure C.1, today both types of OS (WINDOWS/UNIX)are capable of managing multiple users and multiple processes and also both can run over the same type

1A process is basically a running program.

131

Appendix C. Introduction to Unix/Linux 132

of hardware. However, we would like to point out that many of these capabilities were present in the firstdesign of UNIX, while they have been added in DOS-like systems.

Mainframe

Personal Computer

Terminal

UNIXDOS

TerminalTerminal

Multi-processMulti-user

Mono-processMono-user

Common Hardware

80-90s

TodayLinux (Ubuntu, redhat...)AndroidFreeBSDMAC OS (iOS)

Microsoft Windows

Figure C.1: Origins of Linux.

C.2.2 OS Rings and the Kernel

Modern OS can be divided in at least three parts: hardware, kernel and user space (user applications orprocesses). See Figure C.2.

Hardware

Kernel (ring0)

User Space (ring1)

Figure C.2: OS Rings.

These parts can be seen as “rings“. The kernel or ”ring 0” is an intermediary between user applicationsand the actual data processing done at the hardware level. The kernel’s primary function is to manage thecomputer’s resources and allow other programs to run and use these resources. These resources are atleast:

• Central Processing Unit (CPU). The CPU is responsible executing programs. The kernel takesresponsibility for deciding which of the running programs should be placed in the processor orprocessors.

133 Appendix C. Introduction to Unix/Linux

• Memory. Memory is used to store both program instructions and data. For a program in executionboth program instructions and data need to be present at the memory. Often, multiple programs maywant to access to memory and the kernel is responsible for deciding which memory each processcan use, and determining what to do if not enough memory is available.

• Input/Output (I/O) Devices. These devices add some functionality to the system. Examplesare keyboard, mouse, disk drives, printers, displays, etc. The kernel manages requests from userapplications to perform input and output operations and provides convenient methods for using eachdevice (typically abstracted to the point where the application does not need to know implementationdetails of each device).

C.2.3 System Calls

The kernel manages all the low level operations with the hardware (CPU, memory and other devices). Todo so, the kernel runs in what is called “supervisor mode” which provides unrestricted access to hardware.On the other hand, the kernel provides an interface for user processes so that they can use the operatingsystem. This interface is implemented with system calls (see Figure C.3). A system call defines how auser program or user application has to request a service from the kernel. For example, a system call mayallow a user program to save a file in a Hard Disk but not to write bytes at certain locations of the HDD(this operation might require supervisor mode).

User's Processes

Kernel Modules

System Calls

Other DevicesCPU Memory Hardware

Figure C.3: System Calls & Modular Design.

Generally, systems provide a library or API that sits between user programs and the Kernel. OnUnix-like systems, that API is usually part of an implementation of the C library (libc), such as glibc, thatprovides wrapper functions for the system calls, often named the same as the system calls that they call.For example:

framei n t k i l l ( p i d _ t pid , i n t s i g ) ;

The previous function is a wrapper for a system call called kill, which is used to send a signal to aprocess.

C.2.4 Modules

The Linux Kernel is a monolithic hybrid kernel. Monolithic means that the entire operating system isworking in kernel space and is alone in supervisor mode. Unlike traditional monolithic kernels, the LinuxKernel is hybrid. This means that kernel extensions, called modules, can be loaded and unloaded into thekernel upon demand while the kernel is running. In other words, modules are pieces of code that extendthe functionality of the kernel without the need to reboot the system.

One type of module is the device driver, which allows the kernel to access a hardware device connectedto the system. Device drivers interact with devices like hard disks, printers, network cards etc. and providesystem calls.


Notice that a traditional monolithic kernel (without the possibility of having modules) has to includein its code all the possible device drivers. This has two main drawbacks: we will have larger kernels andwe will need to rebuild and reboot the kernel each time we want a new functionality.

Finally, when you build a Linux kernel you can decide if you include a certain module inside thekernel (statically compiled module) or if you allow this module to be loaded at run time by your kernel(dynamic module).

The commands to list , insert and remove modules are: lsmod, insmod, modprobe andrmmod.

C.3 User Interaction

In the past, mainframe systems had several physical terminals connected via a serial port (often using theRS-232 serial interface). Terminals were simple monochrome displays with the minimal hardware andlogic to send the text typed by the user in a keyboard and display the text received from the mainframe inthe display (see Figure C.4). On the mainframe-side, there was a command-line interpreter (CLI) or shell.A shell is a process that interprets and executes commands.

Terminal (Keyboard, Display,RS232)

CLI

Command-line interpreter (CLI)also called a command line shell or simply shell.

textKeyboardcommand

Displayresults

text

Shell

RS-232 LinePhysical Terminals (or consoles)

...

CLI...

Figure C.4: Mainframe with Old Physical Terminals.

In current systems, we have also Graphical User Interfaces or GUIs. GUI requires a graphical server(often Unix systems use the X server). Processes launched from the GUI are typically applications thathave graphical input/output. This graphical I/O is implemented with devices such as a mouse, a keyboard,a screen, a touch screen, etc. GUIs are easier to use for novel users but in general, CLI provides you withmore control and flexibility than GUI for performing system administration & configuration.

Physical terminals are not very common today, instead, virtual consoles are used. If you use virtualconsoles to interact with our Linux system, you will not require a graphic server running on the system.Virtual consoles just manage text and they can emulate the behavior of several physical terminals.The different virtual consoles are accessed using different key combinations. By default, when Linuxboots it starts six virtual consoles which can be accessed with CRL+ALT+F1 ... CRL+ALT+F6. Thecommunication between the virtual console and the shell is performed using a special device file in thesystem of the form /dev/ttyX (where X is the number of virtual console). This device file is called TTYand it emulates the “old physical communication channel”. To interact with the terminal you can write toand read from the TTY. If you want to see the TTY of a terminal just type the tty command:

$ tty/dev/tty1

Commands executed from a terminal are connected with a shell using the TTY. In the Unix jargon, itis said that the command is “attached” to a TTY.

On the other hand, we can also use a GUI if our Linux system has an graphical server running. In fact,the GUI is the default interface with the system for most desktop Linux distributions. To go from a virtual


Virtual Consoles TTY<Ctrl><Alt><F1> /dev/tty1<Ctrl><Alt><F2> /dev/tty2…<Ctrl><Alt><F6> /dev/tty6No need for a graphic server

Terminal Emulator or pseudo-terminal

xterm (classical from X)gnome-terminal (GNOME)konsole (KDE),etc.

<Ctrl><Alt><F7> /dev/pts/X Graphic server running

Figure C.5: Linux Terminals.

console to the graphic server, you can type CRL+ALT+F7 in the majority of the Linux systems. Once youlog into the GUI, you can also start a “terminal“. In this case, the terminal is called terminal emulatoror pseudo-terminal. For example, to start a pseudo-terminal you can use ALT + F2 and then typegnome-terminal or xterm. You can also use the main menu: MENU-> Accessories-> Terminal.This will open a gnome-terminal, which is the default terminal emulator in our Linux distribution(Ubuntu). In Figure C.5 you can observe the main features of virtual consoles and pseudo-terminalsincluding their TTY device files. Notice that in the case of pseudo-terminals, the TTY is of the form/dev/pts/X (where X is the number of pseudo-terminal).

Regarding the shell, this documentation refers only to Bash (Bourne Again Shell). This is becausethis shell is the most widely used one in Linux and includes a complete structured programming languageand a variety of internal functions.

Note. When you open a terminal (either a virtual console or a pseudo-terminal) you will see a line oftext that ends with the dollar sign “$“ and a blinking cursor. When using “$“ throughout this document,we will mean that we have opened a terminal with our user that is ready to receive commands.

C.4 Implementations and Distros

UNIX is now more a philosophy than a particular OS. UNIX has led to many implementations, that is tosay, different UNIX-style operating systems. Some of these implementations are supported/developedby private companies like Solaris of Sun/Oracle, AIX of IBM, SCO of Unisys, IRIX of SGI or Mac OSof Apple, Android of Google, etc. Other implementations of Unix as “ Linux” or “FreeBSD” are notcommercial and they are supported/developed by the open source community.

In the context of Linux, we also have several distributions, which are specific forms of packaging anddistributing Linux (Kernel) and its applications. These distributions include Debian, Red Hat and somederivate of these like Fedora, Ubuntu, Linux Mint, SuSE, etc.

C.5 Switching Users

We need some method to interact with the system as superuser (or as another user). Obviously, onepossibility is to log into the system using the proper user account but it would be desirable to havecommands that enable this without having to “relog”. To this respect, we can use the commands su andsudo. The su command stands for “switch user”, and allows you to become another user or executecommands as another user. For example:

$ su telematic

The previous command prompts you for the password of the user “telematic“. If you don’t provide auser, the su command defaults to the root account, which in Unix is the system administrator account. Ineither case, with su you will be prompted to enter the password associated with the account to whichyou are switching. After you execute the su command you will be logged as the new user until you exit.You can exit typing Ctrl-d or typing exit.

To use the su command on a per-command basis, you can type:


$ su user -c command

However, using su creates security hazards. It is potentially dangerous since it is not a good practice,for example, to have numerous people knowing and using the password of the root. Notice that whenlogged in as root, you can do anything in the system.

For this reason, Linux people came up with another command: sudo. Using the sudoers file(/etc/sudoers), system administrators can define which users or groups will be able to execute certaincommands (or even any command) as root but these users will not have to know the password of theroot. The sudo command prompts to introduce the password of the user that is executing sudo(see Figure C.6).

user$ sudo cmd(1) Prompt to introduce the password of user

sudoers(2) Check /etc/sudoers about the user'spermissionsabout cmd

Figure C.6: How sudo works.

In this way, the command sudo makes it easier to implement the principle of ”least privilege“. Italso logs all commands and arguments so there is a record of who used it for what, and when. To use thesudo command, at the command prompt, enter:

$ sudo command

Replace command with the command for which you want to use sudo. If your user is configured assystem administrator in the sudoers file you can get a shell as root typing:

user$ sudo -sroot#

Note. We will use $ to mean that the command is being executed as a regular user and # tomean that the command is being executed as root.

Finally, You can use the command whoami to know which user you are at this moment.

C.6 Installing Software

C.6.1 Static and Dynamic Libraries

We have to understand the differences between static and dynamic libraries to fully understand the processof installing software in our Linux box (see Figure C.7).

Process A

Lib KStaticLibsProcess

B

Lib K

Process C

Lib K

Process A

Lib K

DynamicLibs

Process B

Process C

Figure C.7: Static and dynamic libraries.


• Static libraries2 or statically-linked libraries are a set of routines, external functions and variableswhich are resolved at compile-time and copied into the final object file by the compiler. This iscalled ”static compiling“ and program produced by this process is called a ”static executable”.

• With dynamic libraries, the kernel provides facilities for the creation and use of dynamicallybound shared libraries. Dynamic binding allows external functions and variables to be referenced inuser code and defined in a shared library to be resolved at run time, that is to say, when the programis loaded to become a process in the system. Therefore, the shared library code is not present in theexecutable image on disk. Shared code is loaded into memory once in the shared library segmentand shared by all processes that reference it.

The main advantage of static executables is that they avoid dependency problems. Since librariesare included at compiling time, we can be sure that anything used from external libraries is available (withthe correct version) before the program is executed. The main drawback of static linking is that thesize of the executable becomes greater than in dynamic linking, as the library code is stored within theexecutable rather than in separate files and, which is worse, static processes in execution consume morememory than dynamically linked processes (see Figure C.7).

On the other hand, the main advantages of shared libraries are that they use less disk space becausethe shared library code is not included in the executable programs. They also use less memory becausethe shared library code is only loaded once. The load time may be also reduced because the shared librarycode migth be already in memory. Dynamic libraries also allow the library to be updated to fix bugs andsecurity flaws without updating the applications that use the library. The main drawback of dynamiclibraries is that they usually establish complex relationships between the different packages of softwareinstalled in a system. For example, a configuration might enforce having several versions of the samelibrary simultaneously installed in the system to satisfy the dependencies (required dynamic libraries) ofdifferent software packages.

In Linux, you can use the ldd command to see the dependencies of a program. For example:

user@host:~$ ldd /bin/echolinux-gate.so.1 => (0xb77cd000)libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xb7609000)/lib/ld-linux.so.2 (0xb77ce000)

The management of the different libraries on the system results in a challenge colloquially knownas "dependency hell". On Windows systems, this is called "DLL hell" (DLL comes from DynamicallyLinked Library). On windows systems, it is common to distribute and install the library files that anapplication needs with the application itself.

On Unix-like systems, this is less common as Package Management Systems can be used to ensurethat the correct library files are available in the system. This allows the library files to be shared betweenmany applications leading to disk space and memory savings.

C.6.2 Sofware Packages

In Linux systems, a package of software tracks where all its files are, allowing the user to easily managethe installed software: view dependencies, uninstall, etc. Generally, in Linux, software packages copytheir executables in /usr/bin, their libraries in /usr/lib and their documentation in /usr/share/doc/package/.It’s worth noting here that to take full advantage of packages, one should not go installing or deleting filesbehind the package system’s back.

There are multiple different package systems in the Linux world, the two main ones being:• Red Hat Packages (.rpm files).• Debian Packages (.deb files).

2In the past, libraries could only be static.


Packages can be managed with the commands rpm and dpkg on Red Hat and Debian respectively.With these commands, we can install and remove packages, view the files installed by a package and soon. Table C.1 shows the main parameters of the commands rpm and dpkg.

Table C.1: dpkg and rpm.

Debian Red Hat Descriptiondpkg -Gi package(s).deb rpm -Uvh packages(s).rpm install/upgrade package file(s)dpkg -r package rpm -e package remove packagedpkg -l ’*spell*’ rpm -qa ’*spell*’ show all packages whose names

contain the word spelldpkg -l package rpm -q package show version of package in-

stalleddpkg -s package rpm -q -i package show all package metadatadpkg -I package.deb rpm -q -i -p package.rpm show all package file’s metadatadpkg -S /path/file rpm -q -f /path/file what package does file belongdpkg -L package rpm -q -l package list where files were installeddpkg -c package.deb rpm -q -l -p package.rpm list where files would be in-

stalleddpkg -x package.deb rpm2cpio package.rpm | cpio -id extract package files to current

directorydpkg -s package | grep ^Depends: rpm -q --requires package list files/packages that package

needsdpkg --purge --dry-run package rpm -q --whatrequires package list packages that need package

(see also what requires)

C.6.3 Advanced Package Management Systems

The package systems previously described do not manage dependencies. So, if you need to install apackage that has dependencies, you have to manually install the proper versions of the packages containingthe dynamic libraries on which your package depends on (at least all the libraries not currently installedon your system).

In this context, a new generation of package management systems was developed to make this tediousprocess easier for the user. These advanced package management systems automatically manage packagedependencies. For rpm files we have yum and for .deb files we have apt. With these tools one canessentially say “install this package” and all dependent packages will be installed/upgraded as appropriate.

Of course, one has to configure where these tools must go to find out the software packages. Thesepackages are online in package repositories. Thus, you have to configure the addresses of the repositoriesyou are interested in. In the case of Debian, APT uses files that lists the ’sources’ from which packagescan be obtained. These files are in the directory /etc/apt.

Table C.2 shows the main parameters of the commands apt and yum.

Table C.2: apt and yum.

Debian Red Hat Descriptionapt-get update update the list available pack-

ages from online repositoriesapt-get dist-upgrade yum update [package list] upgrade specified packages (or

all installed packages if nonespecified)

apt-get install <package list> yum install <package list> install latest version of pack-age(s)

apt-get remove <package list> yum remove <package list> remove specified packages fromsystem

apt-cache list [package list] yum list [package list] list available packages fromrepositories


Note. APT requires to execute apt-get update to update the available list of packages availablein online repositories. If this command is not executed, apt works with the local cache, which might beoutdated. Yum is the opposite, you have to add the -C option to tell it to operate with the local cache.

On the other hand, in an Ubuntu system, we can type the name of an application in the console and, ifit is not installed, the system will tell us how to install it:

$ ipcalcThe program 'ipcalc' is currently not installed. You can install it by typing:sudo apt-get install ipcalc

Finally, for apt, there is a useful tool called apt-file that allows us to search to which package afile belongs to. For example:

$ apt-file search /etc/init/ssh.confopenssh-server: /etc/init/ssh.conf

C.6.4 Installing from the Source

When you need to install software that is neither in a repository or has an individual package created, youcan install from the source code. This is compatible with all Linux distributions. The source code typicallycontains a bunch of files of the application, packed in a .tar archive and compressed using GNU Zip (.gz)or BZip2 (.bz2). Format: <filename>.tar.gz or <filename>.tar.bz2 These types of filescan be unzipped and unpacked on a directory using the tar command:

$ tar xvzf <filename>.tar.gz$ tar xvjf <filename>.tar.bz2

By convention, there are files called “INSTALL” or “README” giving application-specific usageinformation. The typical compilation/installation steps are:

1. Unpack the tar archive (tarball):

$ tar xzvf <package_name>.tar.gz$ tar xvjf <package_name>.tar.bz2

2. Change to the extracted directory

$ cd <extracted_dir_name>

3. Run source configuration script as follows:

$ ./configure

4. Build the source code using the GNU Make utility. In the directory of the sources there will be aMakefile file describing how to compile. You can call make (compile) as follows:

$ make

5. Install the package as follows:

# make install

As a final remark, we would like to mention that in general, you should avoid installing software fromthe source if there is a package available3.

3Many times it is not too hard to make the package yourself.

Bibliography

[1] RJ. Creasy. he origin of the vm/370 time-sharing system, IBM Journal of Research and Development.May 1981.

[2] Fernando Rodríguez-Haro, Felix Freitag, Leandro Navarro, Efraín Hernánchez-sánchez, NicandroFarías-Mendoza, Juan Antonio Guerrero-Ibáñez, and Apolinar González-Potes. A summary ofvirtualization techniques. Procedia Technology, 3:267–272, January 2012.

[3] Unified modeling language (UML). http://www.uml.org/.

[4] Reference 1.8 - VNUML-WIKI. http://neweb.dit.upm.es/vnumlwiki/index.php/Reference.

[5] Jørgen Bang-Jensen. Digraphs: theory, algorithms, and applications. Springer monographs inmathematics. Springer, London, 2nd ed edition, 2009.

[6] Thomas H. Cormen and Thomas H. Cormen, editors. Introduction to algorithms. MIT Press,Cambridge, Mass, 2nd ed edition, 2001.

[7] C.L. Hedrick. Routing Information Protocol. RFC 1058, Internet Engineering Task Force, June1988.

[8] G. Malkin. RIP Version 2. RFC 2453, Internet Engineering Task Force, November 1998.

[9] G. Malkin and R. Minnear. RIPng for IPv6. RFC 2080, Internet Engineering Task Force, January1997.

[10] J. Moy. OSPF Version 2. RFC 2328, Internet Engineering Task Force, April 1998.

[11] R. Coltun, D. Ferguson, J. Moy, and A. Lindem. OSPF for IPv6. RFC 5340, Internet EngineeringTask Force, July 2008.

[12] D. Oran. OSI IS-IS Intra-domain Routing Protocol. RFC 1142, Internet Engineering Task Force,February 1990.

[13] Sally Floyd and Van Jacobson. The synchronization of periodic routing messages. IEEE/ACM Trans.Netw., 2(2):122–136, 1994.

[14] Quagga software routing suite. http://www.nongnu.org/quagga/.

[15] F. Baker and R. Atkinson. RIP-2 MD5 Authentication. RFC 2082, Internet Engineering Task Force,January 1997.

[16] D. Thaler and C. Hopps. Multipath Issues in Unicast and Multicast Next-Hop Selection. RFC 2991,Internet Engineering Task Force, November 2000.

[17] C. Hopps. Analysis of an Equal-Cost Multi-Path Algorithm. RFC 2992, Internet Engineering TaskForce, November 2000.

141

Virtualization of network environment and study of dynamic ...

Documents

Transcript of Virtualization of network environment and study of dynamic ...