A STUDY OF APPLICATIONS

7/29/2019 A STUDY OF APPLICATIONS

1/91

A STUDY OF APPLICATIONS

FOR

OPTICAL CIRCUIT-SWITCHED NETWORKS

A Thesis

Presented to

the faculty of the School of Engineering and Applied Science

University of Virginia

In Partial Fulfillment

of the requirements for the Degree

Master of Science

Computer Science

by

Xiuduan Fang

May 2006


2/91

APPROVAL SHEET

This thesis is submitted in partial fulfillment of the requirements for the degree of

Master of Science

Computer Science

Xiuduan Fang

This thesis has been read and approved by the examining committee:

Malathi Veeraraghavan (Advisor)

Marty Humphrey (Chair)

Alfred Weaver

Accepted for the School of Engineering and Applied Science:

Dean, School of Engineering and Applied Science

May 2006


3/91

Abstract

The networking community has made a significant investment in GMPLS networks, which are

connection-oriented networks that support dynamic call-by-call bandwidth sharing. Currently,

GMPLS switches are call blocking and GMPLS control-plane protocols only support immediate

requests for bandwidth. This thesis first addresses the question of suitability for different types

of applications for GMPLS networks. Using the Erlang-B formula, we reason that GMPLS net-

works are well suited for applications in which the required per-circuit bandwidth is on the order of

one-hundredth the shared link capacity.

Then, we propose two applications for the GMPLS network, CHEETAH, which we have de-

ployed as part of an NSF-sponsored project. The first is a web transfer application, for which we

design and implement a software package called WebFT. We integrate the CHEETAH end-host

software modules into WebFT to provide deterministic data-transfer services transparently to users.

The CHEETAH network provides connection-oriented services in addition to the connectionless

service offered by the Internet. This add-on design allows the WebFT package to provide normal

web access to nonCHEETAH clients through the Internet while simultaneously serving CHEE-

TAH clients on dedicated circuits. The experiments conducted on the CHEETAH testbed show

that WebFT can achieve low-variance, end-to-end transfer delays at different circuit rates and low

transfer delays when high-speed circuits are possible.

The second application is parallel file transfers on CHEETAH. We identify that two factors

limit file-transfer throughput on networks with a high bandwidth-delay product: TCPs congestion-

control algorithm and end-host limitations. We propose a general cluster solution to overcome these

two factors. The solution uses GridFTP striped transfer and Parallel Virtual File System, version

iii


4/91

iv

2 (PVFS2) to transfer data amongst multiple hosts in parallel over dedicated circuits. To minimize

end-host networkanddisk contention, we modify GridFTP and PVFS2 code such that all pairs

of sending and receiving hosts are only responsible for blocks located in their local disks, which

results in improved throughput.


5/91

Acknowledgments

I am indebted to my advisor, Professor Malathi Veeraraghavan, for her consistent guidance and

support. Professor Veeraraghavan has tirelessly guided me, teaching me how to do research in a

systematic way. She has spent significant time on improving my writing skills. She has been and

will always be an excellent role model for me.

I am also grateful to all the other members in our research group, Dr. Xuan Zheng, Xiangfei

Zhu, Zhanxiang Huang, Tao Li, and Anant P. Mudambi, for all their help.

I am especially grateful to my grandmother, my parents, my brother Kevin, and my husband

Lin for their continuous love and support. Without them, I could not have achieved what I have

achieved today.

Finally, this work was carried out under the sponsorship of NSF ITR-0312376, NSF EIN-

0335190, and DOE DE-FG02-04ER25640 grants.

v


6/91

Contents

Acknowledgments v

1 INTRODUCTION 1

2 BACKGROUND 3

2.1 CO Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 CO Networks and GMPLS Control-Plane Protocols . . . . . . . . . . . . . 3

2.1.2 Existing Switches, Gateways, and Networks . . . . . . . . . . . . . . . . . 8

2.2 CHEETAH Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.1 CHEETAH Concept and Network . . . . . . . . . . . . . . . . . . . . . . 11

2.2.2 CHEETAH End-Host Software . . . . . . . . . . . . . . . . . . . . . . . 13

3 ANALYTICAL MODELS OF GMPLS NETWORKS 15

3.1 Bandwidth Sharing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1.1 Model for Applications in which Call-Holding Time is Independent of Per-

Circuit Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1.2 Model for Applications in which Call-Holding Time is Dependent on Per-

Circuit Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2.1 Applications in which Call-Holding Time is Independent of Per-Circuit

Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

vi


7/91

Contents vii

3.2.2 Applications in which Call-Holding Time is Dependent on Per-Circuit

Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 WEB TRANSFER APPLICATION ON CHEETAH 29

4.1 WebFT Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1.1 WebFT Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1.2 CGI Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.3 The WebFT Sender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.1.4 The WebFT Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2 Experimental Testbed and Results . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5 PARALLEL FILE TRANSFERS ON CHEETAH 38

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.2.1 FTP and GridFTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.2.2 PVFS2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.3 The Single-Host Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.4 The General-Case Cluster Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.4.1 The Splitting Degree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.4.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.4.3 ImplementationModifications to PVFS2 . . . . . . . . . . . . . . . . . 53

5.4.4 ImplementationModifications to GridFTP . . . . . . . . . . . . . . . . . 61

5.4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.5 The Specific Cluster Solution for TSI . . . . . . . . . . . . . . . . . . . . . . . . 68

5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69


8/91

Contents viii

6 CONCLUSIONS AND FUTURE WORK 70

6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Bibliography 73


9/91

List of Figures

2.1 Distributed call-setup process progressing hop-by-hop . . . . . . . . . . . . . . . 6

2.2 CHEETAH concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 CHEETAH experimental testbed . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 CHEETAH end-host software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.1 Call-based sharing model for any single link of a switch . . . . . . . . . . . . . . 15

3.2 A bandwidth sharing model for file transfers . . . . . . . . . . . . . . . . . . . . 17

3.3 Plots ofPb vs. m for U = 40%,60%,80%, and 90% . . . . . . . . . . . . . . . . . 20

3.4 Plots of vs. m and /m vs. m . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.5 Plots ofPb vs. and U vs. for m = 10, 100, and 1000, N 0 = 50 and 100,

= 1.1, and k= 1.25 MB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.6 Plot ofN0 vs. for m = 10, 100, and 1000, U = 60% and 80%, = 1.1, and

k= 1.25 MB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.7 Plots ofN vs. m for U = 40%, 60%, 80%, and 90% . . . . . . . . . . . . . . . . . 25

4.1 WebFT architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 The flow of events from running CGI scripts . . . . . . . . . . . . . . . . . . . . 32

4.3 The flow chart for the WebFT sender . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.4 CHEETAH testbed for WebFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.5 The web page to test WebFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.1 The single-host solution vs. the general-case cluster solution . . . . . . . . . . . . 40

5.2 The model and flow chart of third-party control . . . . . . . . . . . . . . . . . . . 42

ix


10/91

List of Figures x

5.3 The model and flow chart of GridFTP striped transfer . . . . . . . . . . . . . . . . 43

5.4 PVFS system architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.5 A model of using GridFTP partial file transfer to implement the transferring step . 52

5.6 A model of using GridFTP striped transfer to implement the transferring step . . . 53

5.7 A snippet ofpvfs2-fs2.conf, the PVFS2 configuration file on sunfire6 . . . . . . . . 55

5.8 A part of the output for pvfs2-fs-dump . . . . . . . . . . . . . . . . . . . . . . . . 55

5.9 The content of an s KB file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.10 A part of the output for the command more testfile/pvfs2cp2 | grep connect . . . . . 57

5.11 A part of the output of the command more testfile/pvfs2cp2 | grep writev | more . . 58

5.12 The pvfs2-fs-dump output for the test 1000M file . . . . . . . . . . . . . . . . . . 59

5.13 A snippet from the file pvfs2cp . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.14 A part of the output for the strace command . . . . . . . . . . . . . . . . . . . . . 60

5.15 A snippet of the source code for PINT cached config get next io() . . . . . . . . . 61

5.16 The commands to start GridFTP servers on sunfire . . . . . . . . . . . . . . . . . 62

5.17 A part of the debug output for the GridFTP striped transfer . . . . . . . . . . . . . 63

5.18 The tcptrace outputs for GridFTP striped transfer before we modified GridFTP code 645.19 The tcptrace outputs for GridFTP striped transfer after we modified GridFTP code . 67

5.20 The specific cluster solution for TSI . . . . . . . . . . . . . . . . . . . . . . . . . 68


11/91

List of Tables

2.1 A classification of networks that reflects sharing modes . . . . . . . . . . . . . . . 4

4.1 Average throughputs and delays at a variety of circuit rates . . . . . . . . . . . . . 37

5.1 A summary of possible approaches to implement the general-case cluster solution . 54

5.2 The logical server numbers for the physical I/O servers . . . . . . . . . . . . . . . 56

5.3 The file descriptors and IP addresses for sunfire6 through sunfire10 . . . . . . . . . 57

5.4 The data-distribution pattern for /pvfs2/test 1000M . . . . . . . . . . . . . . . . . 58

xi


12/91

List of Abbreviations

API application programming interface

AS autonomous system

CHEETAH Circuit-switched High-speed End-to-End Transport ArcHitecture

CGI Common Gateway Interface

CL connectionless

CN compute node

CO connection-oriented

C-TCP Circuit-TCP

DNS Domain Name Server

DRAGON Dynamic Resource Allocation via GMPLS Optical Networks

FTP File Transfer Protocol

GbE Gigabit Ethernet

Gb/s gigabit per second

GB gigabyte

GFP Generic Framing Procedure

GMPLS Generalized Multiprotocol Label Switching

GPFS General Parallel File System

GSR Gigabit Switch Router

GT Globus Toolkit

I/O Input/Output

ION I/O node

xii


13/91

List of Abbreviations xiii

IP Internet Protocol

KB kilobyte

LAN Local Area Network

LMP Link Management Protocol

MAN Metropolitan Area Network

Mb/s megabit per second

MB megabyte

MPLS Multiprotocol Label Switching

MSPP Multi-Service Provisioning Platform

MTU Maximum Transmission Unit

NCSU North Carolina State University

NFS Network File System

NIC network interface card

OC Optical Carrier

OCS Optical Connectivity Service

ORNL Oak Ridge National Laboratory (ORNL)

PCIX Peripheral Component Interconnect Extended

PVFS2 Parallel Virtual File System, version 2

QoS Quality of Service

RAID redundant array of inexpensive disks

RD routing decision

RSVPTE Resource ReSerVation ProtocolTraffic Engineering

RTP Research Triangle Park

RTT round-trip delay time

SDM Space Division Multiplexing

SLR Southern Light Rail

SNMP Simple Network Management Protocol

SONET Synchronous Optical Network


14/91

List of Abbreviations xiv

SOX Southern Crossroads

TB terabyte

TCP Transmission Control Protocol

TDM Time Division Multiplexing

TE traffic engineering

TSI Terascale Supernova Initiative

VC virtual circuit

VLSR Virtual Label Switch Router

WAN Wide Area Network

WDM Wavelength Division Multiplexing


15/91

Chapter 1

INTRODUCTION

The networking community has made a significant investment in connection-oriented (CO) net-

working. Allowing the reservation of bandwidth in the form of a dedicated circuit, or virtual circuit

(VC), through a CO network prior to data transfers, this networking mode is recognized for its

ability to offer service guarantees at some cost of utilization and fairness.

A number of optical CO testbeds, some of which use Generalized Multiprotocol Label Switch-

ing (GMPLS), have been deployed for research and educational purposes. These include CA-

NARIEs CA*net 4 [11], OMNInet [34], SURFnet [49], UKLight [55], DOEs UltraScience net

[41], Dynamic Resource Allocation via GMPLS Optical Networks (DRAGON) [46], and Circuit-

switched High-speed End-to-End Transport ArcHitecture (CHEETAH) [13]. Further software

projects to enable the use of MPLS tunnels across Internet2 [26] and across the Department of

Energys ESnet [15] are also underway.

Most of these networks are primarily designed for large-scale scientific applications. Some of

these applications require high-bandwidth circuits and long call-holding times. To create large-

scale circuit or VC networks, we need to extend the usage of these networks beyond scientific

applications to millions of users. Thus, we need to identify and design more applications to use

these networks efficiently.

The first goal of this thesis is to determine what applications are well served by GMPLS net-

works, which currently only support immediate-request calls. We use the Erlang-B formula to

analyze the suitability of different types of applications. The study of application suitability for

1


16/91

Chapter 1. INTRODUCTION 2

GMPLS networks identifies applications suited to these networks in general, and specifically the

CHEETAH testbed.

Then, we study two applications for CHEETAH. The first is a web transfer application, where

we present a solution to improve web performance by leveraging CHEETAH without requiring

modifications to existing web server and client software. We implement a CGI-based software pack-

age called WebFT. WebFT is integrated with the CHEETAH end-host software modules to provide

deterministic data-transfer services transparently to users. With dedicated circuits on CHEETAH,

WebFT can achieve low-variance, end-to-end transfer delays at different circuit rates and low trans-

fer delays when high-speed circuits are possible.

The second application is parallel file transfers on CHEETAH, where we study how to achieve

multi-Gb/s throughput for bulk data transfers over WANs. We identify two factors that limit

throughput to hundreds of Mb/s: TCPs congestion-control algorithm and end-host limitations.

Then, we present a cluster solution over dedicated circuits, using GridFTP striped transfer and Par-

allel Virtual File System, version 2 (PVFS2) to achieve multiple-host parallelism, and thus, improve

overall throughput.

The rest of this thesis is organized as follows. In Chapter 2, we provide background informationon a class of call-blocking CO networks and the CHEETAH experimental testbed. In Chapter 3, we

explore the suitability of different types of applications for call-blocking CO networks. In Chap-

ter 4, we design and implement a software package, called WebFT, to improve web performance

through CHEETAH. In Chapter 5, we propose a cluster solution using GridFTP striped transfer and

PVFS2 for parallel file transfers. Finally, we present our conclusions and list future-work items in

Chapter 6.


17/91

Chapter 2

BACKGROUND

In this chapter, we first review different types of GMPLS networks and control-plane protocols. We

point out that current GMPLS implementations use a call-blocking approach. Then, we briefly de-

scribe existing equipment and networks in which CO services can be enabled. Finally, we overview

the CHEETAH network and CHEETAH end-host software because all the work in this thesis has

been conducted as a part of the CHEETAH project.

2.1 CO Networking

Networks are commonly classified by scale into Local Area Networks (LANs), Metropolitan Area

Networks (MANs), Wide Area Networks (WANs), wireless networks, home networks, and inter-

networks [50]. This classification, however, misses the critical aspect of networkingresource

sharing. To reflect how resources are shared in networks , Veeraraghavan and Karol gave a classifi-

cation of networks based on both switching type and networking type, as shown in Table 2.1 [56]. In

this section, we focus on the CO networking mode and, more specifically, on a class of call-blocking

GMPLS networks.

2.1.1 CO Networks and GMPLS Control-Plane Protocols

There are two types of CO networks: packet-switched and circuit-switched (see Table 2.1). Packet-

switched CO networks include

3


18/91

Chapter 2. BACKGROUND 4

Table 2.1: A classification of networks that reflects sharing modes

PPPPPPPPPPPPPPP

Networkingtype

Multiplexing/Switching type

Circuit-switched Packet-switched

Connectionless Not an option e.g., IP networks; Ethernet

networks

Connection-oriented e.g., Telephone network,

SONET/SDH, WDM

e.g., X.25, ATM, MPLS

Intserv IP networks [8]

Multiprotocol Label Switched (MPLS) [42] and Asynchronous Transfer Mode (ATM) net-

works

IEEE 802.1p and 802.1q Virtual LAN (VLAN) Ethernet switch based networks [25]

Circuit-switched networks include

Time-Division Multiplexed (TDM) SONET/SDH networks

All-optical Wavelength Division Multiplexed (WDM) networks

Space-Division Multiplexed (SDM) Ethernet switch based networks (an SDM connection is

created by mapping two ports into an untagged VLAN)

The GMPLS control-plane protocols are defined as a common control plane for these differ-

ent types of CO networks even though their data-plane protocols differ significantly. This common

control plane consists of:

1. Link Management Protocol (LMP) [29]

2. Open Shortest Path FirstTraffic Engineering (OSPFTE) routing protocol [27]

3. Resource Reservation ProtocolTraffic Engineering (RSVPTE) signaling protocol [3]


19/91


These three protocols are designed to be implemented in a control processor at each network

switch. Each of these protocols provides an increasing degree of automation, and a corresponding

decreasing dependence upon manual network administration. This triple combination serves as an

excellent basis on which to create large-scale CO networks, in which switches can cooperate in a

completely automated fashion to respond to requests for end-to-end bandwidth. We consider each

protocol in a little more detail below, starting with LMP.

Primarily, the LMP module automatically establishes and manages the control channels be-

tween adjacent nodes, to discover and verify data-plane connectivity, and to correlate data-plane

link properties. In GMPLS networks, there could be multiple data-plane links between two adja-

cent nodes and the control channel could be established on a separate physical link from any of the

data-plane links. A mechanism is required to automatically discover these data-plane links, verify

their properties, combine them into a single traffic-engineering (TE) link, and correlate data-plane

links to the control channel. Thus, LMP contributes to our plug-and-play goal for CO networks by

minimizing manual administration.

The OSPFTE routing protocol software module, located at a switch, enables the switch to

send topology, reachability, and the loading conditions of its interfaces to other switches, and re-ceive corresponding information from them. This data-dissemination process allows the route com-

putation module at the switch to determine the next-hop switch toward which to direct a connection

setup (this module could be part of the signaling-protocol module or could be used to pre-compute

routing data ahead of when call-setup requests arrive). As a routing protocol, its value in creating

large-scale connectionless networks has already been observed with the success of the Internet. Ad-

mittedly, being a link-state protocol, it is only used intra-domainthat is, within the network of an

organization, referred to as an autonomous system (AS). Even within this intra-domain context, it

organizes the AS as a two-layer hierarchy, meaning that the AS is partitioned into self-contained ar-

eas interconnected by a backbone area. In conjunction with the distance-vector based inter-domain

routing protocol, Border Gateway Protocol (BGP), we have a highly decentralized automated mech-

anism to spread routing information, which was critical to the scaling of the Internet.


20/91


Finally, an RSVPTE signaling engine at a switch manages the bandwidth of all the interfaces

on the switch, and programs the data-plane switch hardware to enable it to forward demultiplexed

incoming user bits or packets as and when they arrive. Given that dynamic bandwidth sharing in

CO networks is controlled by the signaling engine, the call-handling performance of this engine is

critical to the scaling of CO networks. The faster the response times of signaling engines, the lower

the cost to an application to release and reacquire bandwidth as and when needed. This allows

applications to hold circuits only for the duration of their communication bursts, which, in turn,

improves link utilization. The need for high call-handling performance from signaling engines can

be met with a completely automated and distributed bandwidth-management implementation. This

will allow for both temporal and spatial scalability (i.e., shorter call-holding times and networks

with large numbers of switches and hosts).

An RSVPTE engine implemented in a control card at a switch executes three steps when it

receives a connection setup Path message (i.e., a request for bandwidth), as show in Fig. 2.1.

BW: Bandwidth;

D: Destination address

Route lookup Bandwidth andlabel management

Switch fabricconfiguration



GMPLS switch GMPLS switch

Path message (BW, D)

(from previous switch on path)

Path message (BW, D)Path message (BW, D)

(to next switch on path)

Control plane

Data plane





Figure 2.1: Distributed call-setup process progressing hop-by-hop

1. Route computation: Based on the destination address to which the connection is requested

(D, in the example shown in Fig. 2.1), the RSVPTE engine determines the next-hop switch


21/91


toward which to route the connection or a subset of switches on the end-to-end path within

its area of its domain. Constrained Shortest Path First (CSPF) algorithms can only be exe-

cuted intra-area because of the intra-area scope of bandwidth related parameters in OSPFTE

messages.

2. Bandwidth and label management: If the switch is in a position to only compute the next-hop

switch in the route computation phase, then it needs to check if there is sufficient bandwidth

on a link connected to the next-hop switch. If it performs CSPF to determine a part of the

end-to-end route (i.e., the subset of switches on the path within its area of its domain), then

this step of bandwidth management is integrated with the partial route computation. But at

subsequent switches within the area, this step is required to check if there is sufficient band-

width available on the link to the next-hop indicated in the partial source route passed within

the Path signaling message (see Fig. 2.1 for how Path messages travel hop-by-hop). This

is because local conditions can change between the last routing protocol update, which pro-

vided the data used in the CSPF computation, and the arrival of the call being set up. Typical

implementations use a call-blocking approach where calls are simply rejected if sufficient

bandwidth is not available. Label management is the selection of labels to be used on in-

coming and outgoing switch interfaces. In the data plane, labels can be either explicit in the

data plane (e.g., labels used within packet headers in VC networks), or implicit (e.g., time

slots, wavelengths or interface identifiers in TDM, WDM, and SDM networks). In the con-

trol plane, labels are explicit in both types of switches, with the labels identifying time slots,

wavelengths and interface identifiers to be used for the connection across a circuit switch.

These labels are used in the next step.

3. Switch fabric configuration: This step is needed to configure the switch fabric to forward

user data as and when they arrive. This function maps incoming labels associated with input

interfaces to outgoing labels on appropriate outgoing interfaces. In packet switches, there is

an additional step to program the scheduler to enable it to serve packets arriving on the VC

being set up at the requested bandwidth level.


22/91


We do not show the rest of the call-setup procedure in Fig. 2.1, the continuation of the Path

message propagation hop-by-hop, or the Resv message returning in the opposite direction, which

implicitly confirms successful connection setup. Detailed procedures are also defined in RSVPTE

for call-setup failure.

As mentioned in step 2, the bandwidth-management procedure implemented in most GMPLS

switches is based on call blocking. In other words, if the requested bandwidth is not available when

a call arrives, the call request is rejected. There is support for preemption, but if no existing call is

preemptable (because of priority levels), then the call is blocked.

The counterpart call-queuing model, though analyzed in textbooks [44], is seldom imple-

mented. This is because a call traversing multiple links requires a simultaneous allocation of

bandwidth on all these links. A distributed call-queuing model requires a call (an RSVPTE Path

message) to wait in a queue until resources become available at the first switch, and then to join a

queue at the next switch in a hop-by-hop manner as shown in Fig. 2.1. Resources allocated to a call

at upstream switches will lie unused while the Path messages are queued at downstream switches.

Parallelizing this wait time by simultaneously queuing the call at multiple switches will decrease

wasted bandwidth, but not eliminate it. Therefore, call queuing is seldom implemented.The RSVPTE and OSPFTE control-plane protocols do not support advance reservations of

bandwidth. For example, there are no objects defined in RSVPTE to specify a future start time in

a Path message. Nor are there parameters defined in OSPFTE to report future loading conditions

in the TE link state advertisements. Hence, these GMPLS control-plane protocols only support

immediate-request or on-demand calls.

2.1.2 Existing Switches, Gateways, and Networks

The most common network switches today are Ethernet switches, IP routers and SONET/SDH

switches. The first two are primarily connectionless packet switches; however, Ethernet switches

have VLAN capabilities with limited Quality of Service (QoS) support. A VLAN is constructed

by programming the switch to include two or more ports. It can be tagged or untagged. In tagged

mode, all Ethernet frames are tagged with a VLAN header that includes a VLAN ID. Frames


23/91


tagged with the same VLAN ID are treated in the same manner; that is, they are forwarded to all

the ports belonging to that VLAN. An untagged VLAN with two ports is essentially a SDM circuit

because all Ethernet frames arriving on either port are sent exclusively to the other port. No frames

arriving on other ports are forwarded to ports in an untagged VLAN. Ethernet switches available

from Extreme Networks, Dell, Cisco, Intel, Foundry, and Force 10, just to name a few vendors,

have these capabilities. Thus, the data-plane capabilities required to create circuits or VCs through

Ethernet switches are now available. However, control-plane software used to set up and release

circuits dynamically is not implemented within these switches. The Dragon project has developed a

software module called the Virtual Label Switch Router (VLSR), which implements the RSVPTE

and OSPFTE protocols. It runs on an external Linux host connected to the Ethernet switch [46] and

manages the bandwidth of the switch. It issues Simple Network Management Protocol (SNMP) [7]

commands to create the VLANs for admitted connections. With this external software, the Ethernet

switches become fully equipped CO switches.

IP routers are equipped with MPLS engines and RSVPTE signaling software for dynamic

control of MPLS VCs. Both Cisco and Juniper routers support MPLS.

SONET/SDH and WDM switches are circuit switches in which time slots and wavelengthsare respectively mapped from incoming to outgoing interfaces. Some of these switches now sup-

port RSVPTE and OSPFTE control-plane implementations. For example, Sycamore SONET

switches implement these protocols. Examples of WDM switches that implement GMPLS control-

plane protocols include Movaz and Calient WDM equipment.

In addition to supporting pure CO-switching functionality, some of this equipment can be used

as gateways to interconnect different types of networks. Before describing the gateway functional-

ity of these pieces of equipment, we establish some terminology.

We define the term network to consist of switches and endpoints (data-sourcing and sink-

ing entities) interconnected by shared communication links, on which the sharing (multiplexing)

mechanism is the same on all links. Further, we define the term switch as an entity in which all

links (interfaces) support the same (single) form of multiplexing (referred to as switching capabil-

ity [45]). For example, a SONET switch is one in which all interfaces carry TDM signals formatted


24/91


according to the SONET multiplexing standards, and a SONET network is one in which all the

switches are SONET switches. Typical endpoints in a SONET network are IP routers with SONET

line cards; these nodes are endpoints in the SONET network as they source and sink data carried on

to the SONET network.

We use the term internetwork to denote an interconnection of networks (referred to as multi-

region networks) [45]. Entities (nodes) that interconnect networks necessarily need the ability to

support interfaces with different types of multiplexing capabilities, minimally two. We use the term

gateways to refer to such nodes. An IP router is a gateway in the connectionless Internet with

different line cards implementing the protocols of the networks to which they are connected. The

gateway functionality is achieved by the IP implementation within the router examining IP datagram

headers to determine how to route a packet from an incoming network to an appropriate outgoing

network. In contrast, gateways in a CO internetwork move data from one network to another using

circuit or VC techniques. For example, Ethernet cards in a Sycamore SN16000 implement the

Generic Framing Procedure (GFP) Ethernet-to-SONET encapsulation to map all frames received

on any of its Ethernet ports into a port on a SONET line card, which connects this gateway node

to a SONET network. In this scenario, the circuit is a simple SDM circuit. We thus refer to thesegateways as circuit or VC gateways to contrast them with packet-based IP routers. An example of

a VC gateway is a Cisco GSR 12008, which supports line cards that can be programmed to map all

frames arriving on a specific VLAN into an MPLS tunnel set up on one of its other ports. It thus

interconnects a VLAN based CO network to an MPLS based CO network.

While the data-plane capabilities for extracting data from one type of multiplexed connection

and sending it on to a different type of multiplexed connection are available, the control-plane capa-

bilities for controlling such circuits or VCs are not yet standardized, and hence, not implemented.

Finally, as for current CO network deployments, SONET/SDH and WDM networks are al-

ready in widespread deployment. However, the dynamic bandwidth provisioning capability sup-

ported by the GMPLS control-plane protocols, while available on some switches in deployment, is

not yet made available to users. Similarly, the Abilene backbone of Internet2 and DOEs ESnet has

routers with built-in MPLS and RSVPTE capabilities. There are ongoing research projects [22,24]


25/91


to enable the use of dynamically requested VCs through these networks, including CHEETAH [13],

a SONET based network, and DRAGON [46], a WDM based network. Both CHEETAH and

DRAGON are call-blocking and immediate-request GMPLS networks.

2.2 CHEETAH Network

Our research group has deployed the CHEETAH network as part of an NSF-sponsored project

proposed to provide high-speed, end-to-end connectivity on a call-by-call basis. In this section, we

review the CHEETAH concept and the current experimental testbed. We also describe the end-host

software needed in CHEETAH-connected computers.

2.2.1 CHEETAH Concept and Network

CHEETAH is a networking solution to provide end-host applications access to end-to-end CO ser-

vices, while preserving the connectionless services already available to them via the Internet. In

other words, CHEETAH is designed as an add-on service to existing Internet connectivity, and

further, it leverages the services of the latter.

As shown in Fig. 2.2, end hosts are equipped with two Ethernet Network Interface Cards (NICs).

The primary NICs (NIC I) in the end hosts are connected to the public Internet through the usual

Packet-switched

Internet

Packet-switched

Internet

End

host

Optical Circuit-switched

CHEETAH Network

Optical Circuit-switched

CHEETAH Network

NIC I

NIC II

End

host

NIC I

NIC II

IP routers IP routers

Ethernet-SONET

gateway

Ethernet-SONET

gateway

Figure 2.2: CHEETAH concept


26/91


LAN Ethernet switches or IP routers, while the secondary NICs (NIC II) are connected to Ethernet

ports on Ethernet-to-SONET circuit gateways.

Ethernet-to-SONET circuit gateways, in turn, are connected to wide-area SONET circuit-

switched networks, in which both circuit gateways and pure SONET switches are equipped with

GMPLS protocols to support call-by-call dynamic bandwidth sharing. End-to-end CHEETAH cir-

cuits (as shown in the dashed line in Fig. 2.2) are set up dynamically between end hosts with

RSVPTE signaling messages being processed at each intermediate gateway or switch in a hop-by-

hop manner.

The add-on design of CHEETAH network brings two benefits:

1. Connectivity to the Internet allows a CHEETAH end host to communicate with other non

CHEETAH hosts on the Internet while it communicates with another CHEETAH end host

through a dedicated CHEETAH circuit.

2. Applications can selectively choose to request CHEETAH circuits only when the Internet

path is estimated to provide a lower service quality than the CHEETAH circuit, and further

fall back to the Internet path if the CHEETAH circuit-setup attempt fails due to an unavail-

ability of circuit resources on the CHEETAH network.

Currently, the CHEETAH network consists of three Ethernet-to-SONET circuit gateways,

which are Sycamore SN16000 switches, deployed at MCNC in Research Triangle Park (RTP),

NC, Southern Crossroads (SOX) and Southern Light Rail (SLR) in Atlanta, GA, and Oak Ridge

National Laboratory (ORNL) in Oak Ridge, TN. The testbed layout is shown in Fig. 2.3. Hosts,

running Linux, are connected via Gigabit Ethernet (GbE) NICs to the SN16000 switches. The cir-

cuits, set up and released dynamically, consist of Ethernet segments from the hosts to the switches

mapped to Ethernet-over-SONET segments between the switches. The GbE signal is mapped to a

21-OC1 virtually concatenated SONET signal to create an end-to-end 1 Gb/s dedicated circuit.


27/91


zelda4

zelda5

Juniperrouter

Controlcard

OC192card

Crossconnectcard

zelda1

zelda2

zelda3

Sycamore SN16000

Juniper

router

InternetInternet

ORNL, TN

SOX/SLR, GA

C

ontrolcard

OC192card

Cros

sconnectcard

Sycamore SN16000

wukong

MCNC/NCSU, NC

Figure 2.3: CHEETAH experimental testbed

2.2.2 CHEETAH End-Host Software

We have developed a software package for Linux hosts, called CHEETAH end-host software,

to enable the automatic use of CHEETAH circuits. Wherever possible, our goal is to integrate li-

braries of this CHEETAH end-host software into application software modules to make CHEETAH

services transparent to human users.

The CHEETAH end-host software architecture is shown in Fig. 2.4. The Optical Connectivity

Service (OCS) client module is used to determine whether the correspondent end host (called

party) is on the CHEETAH network. It does this by sending a TXT query to a Domain Name

Server (DNS). The TXT resource record is a generic type supported by DNS to allow users to store

any data about hosts. The TXT data we store for a CHEETAH end host consist of an indication that

it is a CHEETAH end host, along with the IP and MAC addresses of the hosts secondary NIC.

The routing decision (RD) module answers queries from applications as to whether to attempt

a circuit setup. It makes these decisions by using collected measurements about the two paths, the


28/91


Application

RSVP-TE client

TCP/IPNIC 1

NIC 2

End hostCHEETAH software

Routing decision

C-TCP

OCS clientInternet

CHEETAHnetwork

Application

RSVP-TE client

TCP/IP NIC 1

NIC 2

End hostCHEETAH software

Routing decision

C-TCP

OCS client

Figure 2.4: CHEETAH end-host software

Internet path and the CHEETAH path, along with the size of the file to be transferred.

The RSVPTE client module is used to initiate the setup and release of CHEETAH circuits

[59]. Parameters provided to this module include the secondary NIC IP address of the destination

to which a circuit is being requested and the desired bandwidth. The Sycamore switches in the

CHEETAH network receive these RSVPTE messages, process them and set up circuits if the

requested bandwidth is available to the specified destination. It is a distributed switch-by-switch

signaling procedure.

The Circuit-TCP (C-TCP) module is the transport protocol that we have developed for CHEE-

TAH circuits [33]. Given that the bandwidth of a dedicated circuit is known before a file transfer

starts, any changes in the sending rate will either cause the circuit to remain idle or cause the receiver

buffer to fill up. Since neither option is desirable, we essentially removed the congestion-control

algorithms of TCP that were designed to keep adjusting the sending rate based on IP network con-

ditions in order to create our C-TCP module. This disabling of the congestion control is selectively

done only by TCP connections traversing the secondary NIC, which is used for CHEETAH circuits.

TCP connections traversing the primary NIC connected to the Internet continue using the standard

TCP code.

Corresponding to each CHEETAH software module is a library providing application program-

ming interfaces (APIs) to invoke the services of each module. These libraries are expected to be

linked into applications using the CHEETAH software and network.


29/91

Chapter 3

ANALYTICAL MODELS OF GMPLS NETWORKS

In Chapter 2, we reasoned that GMPLS networks are call-blocking networks that only support

immediate-request calls. One important question is, what applications, if any, are suitable for GM-

PLS networks. This chapter addresses this problem. First, we present bandwidth sharing models for

two types of applications, ones in which the per-circuit bandwidth and mean call-holding time are

independent and ones in which they are dependent (file transfers). Then, we provide numerical re-

sults for both models. Finally, we conclude that, GMPLS networks are well suited for applications

in which the required per-circuit bandwidth on the order of one-hundredth the shared link capacity

for both types of applications.

3.1 Bandwidth Sharing Model

The switch model used in our analysis is illustrated in Fig. 3.1, in which calls originating from hosts

on the N links (e.g., the N Ethernet links connecting hosts to Ethernet interfaces on a gateway)

share the link capacity C on link L (e.g., the SONET/SDH/WDM/MPLS link out of a gateway).

We assume that call-setup requests arrive according to a Poisson process with rate , since many

12

N-1N

Link L,

capacity C

Figure 3.1: Call-based sharing model for any single link of a switch

15


30/91

Chapter 3. ANALYTICAL MODELS OF GMPLS NETWORKS 16

call-arrival processes observable in practice can be modeled as Poisson processes [44]. Further, we

assume that call-holding times follow arbitrary distributions with a mean call-holding time denoted

as 1/. To understand the types of applications that can be supported on GMPLS circuit-switched

networks, we make a simplifying assumption that all calls are of the same typethat is, they need

the same amount of bandwidth. This allows us to treat linkL as a link of m circuits, where each

circuit is of capacity C/m.

We ask two questions about the suitability of applications for GMPLS networks:

1. Are applications that require high-bandwidth circuits more or less desirable than applications

that require low-bandwidth circuits?1

2. Are applications that generate calls with long mean holding times more or less desirable than

calls with short mean holding times?

The first question is related to m, the number of circuits. The larger the per-circuit bandwidth, the

smaller the m for a given link capacity C. The second question is related to the mean call-holding

time, 1/.

For applications such as remote visualization and video conferencing, the mean holding time is

independent of the per-circuit bandwidth. On the other hand, for file transfers, commonly identified

as an application suitable for high-speed circuits [57], m and 1/ are related. The larger the per-

circuit bandwidth (the smaller the m), the lower the mean call-holding time, 1/. We describe

models for these two cases in the following subsections, respectively.

3.1.1 Model for Applications in which Call-Holding Time is Independent of Per-

Circuit Bandwidth

Given our assumptions, we can model link L as an M/G/m/m system [44]. The call-blocking

probability in this model is given by the well-known Erlang-B formula:

Pb =m/m!

m

i=0

(i/i!)(3.1)

1In this chapter, we only use the word circuits, but the same model and analysis hold for virtual circuits as well.
http://-/?-


31/91


where , the offered traffic load, is given by = /. Although this is a time-tested model for

telephony traffic, we found it useful to our current problem of identifying applications suited to

GMPLS networks.

Assume that the number of calls per second arriving on each of the N ports that are destined for

linkL is . Thus, from Fig. 3.1, the aggregate , call-arrival rate for linkL, is given by:

= N (3.2)

The utilization of linkL, U, is given by:

U =

m(1Pb) (3.3)

3.1.2 Model for Applications in which Call-Holding Time is Dependent on Per-

Circuit Bandwidth

File-transfer applications belong in this category. Given that the GMPLS switch operates in a call-

blocking mode even when used for this category of applications, equations (3.1)(3.3) apply here

as well. If file sizes are too small, the overhead incurred in call-setup delay will significantly reduce

link utilization (since call-setup delays could exceed file-transfer delays). Therefore, Veeraragha-

vans team [57] proposed using an RD module at end hosts to decide, based on the file size and

other metrics, whether to request a circuit for a particular file transfer, or whether to simply use the

Internet connectivity.

Fig. 3.2 illustrates a model for the file transfer application. We use a settable parameter

crossover file size, , to model the behavior of the RD module, wherein files larger than are

Link L,capacity C

.

.

.

12

N-1N

routing

decision (RD)module

end host

0

Figure 3.2: A bandwidth sharing model for file transfers


32/91


routed to the CO network.

We assume that file sizes are distributed according to the Pareto distribution with the probability

density function:

f(x) =k

x+1, xk (3.4)

where is the shape parameter (the larger the , the higher the probability of small file sizes),

and k is the scale parameter, denoting the minimum file size. Crovella [14] characterized web file

sizes as following this distribution and suggested in the range from 1.0 to 1.3 and a value for kof

1000 bytes.

Given that only files larger than are routed to the CO network, using (3.4), we derive the mean

file size, E[X|(X )], as

E[X|(X )] =

1(3.5)

We then estimate the mean call-holding time, 1/, as

1

= Tprop +E[Temission] (3.6)

where Tprop is the one-way propagation delay, and

E[Temission] =E[X|(X )]

C/m=

1

m

C(3.7)

By neglecting Tprop, we can approximate:

1

=

1

m

C(3.8)

capturing the inter-dependence of m and 1/. We justify neglecting Tprop as follows. E[Temission]

should be larger than Tprop because the latter is incurred as part of call-setup delay, and to maintain

a high link utilization, mean call-setup delay should be much smaller than E[Temission], which means

that Tprop is much smaller than E[Temission].


33/91


From Fig. 3.2, we can derive the call-arrival rate at link L as:

= N = N0 P(X ) = N0 k

(3.9)

Combining (3.9) with the mean holding time from (3.8), we get

=

= N0

1

k

1

m

C(3.10)

3.2 Numerical Results

3.2.1 Applications in which Call-Holding Time is Independent of Per-Circuit Band-

width

Assume that the link capacity C= 10 Gb/s. This is a reasonable value if the switch is a SONET

or MPLS switch. For WDM switches, if the number of wavelengths on link L is 100, then a more

reasonable value for Cwould be 1 Tb/s because each wavelength is typically engineered to support

10 Gb/s. We will consider this number later in this chapter. For now, we consider C= 10 Gb/s.

We study the effect of changing m from 1 to 1000; in other words, the per-circuit bandwidth

varies inversely from 10 Mb/s to 10 Gb/s. We obtain numerical results corresponding to four differ-

ent fixed values ofU, 40%, 60%, 80%, and 90%. Since we have two equations (3.1) and (3.3), if

we fix two parameters, U and m, then the other two variables, and Pb, become fixed as well. We

use an iterative algorithm as follows to obtain these values. First, we observe that for a given m, U

increases as increases. We also conduct experiments to confirm the observation. Then, we start

to assign = m temporarily, and compute the corresponding Pb and U. If the current U is larger

than the given U, meaning that is too large, we decrease by = 0.001 until the corresponding

U in the current iteration is smaller than the given U; otherwise, we increase by until the

corresponding U in the current iteration is larger than the given U. Next, we compare the current U

and its neighbor in the previous iteration to get the closest one to meet the given U and m. Finally,

we compute the corresponding Pb. Fig. 3.3 plots Pb vs. m.


34/91


0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

U=80%

U=90%

m

Pb

U=60%

U=40%

(a) m [1,100]

101 400 700 10000

0.01

0.02

0.03

0.04

0.05

U=80%

U=90%

m

Pb

(b) m [101,1000]

Figure 3.3: Plots ofPb vs. m for U = 40%,60%,80%, and 90%

From Fig. 3.3a, we see that at small values ofm, it is hard to achieve high utilization combined

with low call-blocking probability. Consider m = 10, which corresponds to a per-circuit allocation

of 1 Gb/s per call (e.g., for HDTV applications). To run the link at an 80% utilization level, the

corresponding call-blocking probability will be a high 23.62%. In Fig.3.3b, we show the effect of

large m at which values both high utilization and low call-blocking probability are achievable.

The effect of traffic load is not obvious from Fig. 3.3. Therefore, we plot the traffic load

vs. m and /m vs. m in Fig. 3.4. From Fig. 3.4a, we see that should be engineered to be high

0 20 40 60 80 1000

20

40

60

80

100

U=40%

U=60%

U=80%

U=90%

m

(a) vs. m

0 20 40 60 80 1000

2

4

6

8

10

U=40%

U=60%

U=80%

U=90%

m

/m

(b) /m vs. m

Figure 3.4: Plots of vs. m and /m vs. m


35/91


when m is high. We also see that, as m increases, Pb decreases and /m approaches U according to

(3.3). For example, when U = 60%, /m approaches 0.6, reaching this value when m = 80. Thus,

is typically close to and less than m when Pb is low (close to 0) and U is high (close to 1). For

example, at a fixed value ofU = 80%, when m = 100, = 80.35, Pb = 0.4%, and when m = 1000,

= 800, Pb 0. Thus, is close to m when Pb is low (close to 0) and U is high (close to 1).

From the two graphs (Figs. 3.3 and 3.4) we see that if we want to operate the link at a given

value of call-blocking probability, and a given value of utilization, the number of circuits, m, and

traffic load, , become fixed. An alternative starting point is that a given application has a fixed

capacity requirement, which means that m is fixed. If we further assume that , the call-arrival

rate per port, and mean call-holding time, 1/, are intrinsic to the application, then we can only

adjust the aggregate traffic load by engineering N to achieve a given call-blocking probability or

utilization. But these graphs show us that once m is set, ifm is small, we are highly limited in our

ability to achieve both high utilization and low call-blocking probability.

Having understood the influences of all the important variables in this model, , m, Pb and U, let

us now consider three applications. The first application is a high-bandwidth application (m = 10),

the second, a low-bandwidth application (m = 1000) and finally, an intermediate-level bandwidthapplication (m = 100).

High-bandwidth applications: When m = 10that is, when the application requires a per-

circuit bandwidth of 1 Gb/swe can achieve a target 80% utilization, only by operating the link at

a high call-blocking probability of 23.62%. Such a high call-blocking probability could be unac-

ceptable to users. We conclude that applications requiring a high per-circuit capacity relative to

the shared link capacity are unsuitable for the immediate-request call-blocking mode of bandwidth

sharing offered by GMPLS networks in situations where high utilization and low call-blocking prob-

ability are important. Since, as discussed in Chapter 2.1.1, call queuing is not an option, it appears

that we need a book-ahead mechanism for such applications.

We then ask whether the above answer is dependent on the mean call-holding time. In other

words, when m is small, do we require a book-ahead mechanism only if the mean call-holding time

is large or do we need such a mechanism even if the mean call-holding time is small? For example,


36/91


in a doctors office, where there are three to four doctors per office (m is 3 or 4), since our mean

holding times (appointment lengths) are fairly high, on the order of 20-30 minutes, we use a book-

ahead mechanism. If the mean holding time is on the order of 1-2 minutes (e.g., at a bank teller),

could an immediate-request approach work? The answer is that it would if there was space to wait.

In other words, if the queuing system has a buffer to wait, high-bandwidth calls that have short

mean holding times could be handled without a reservation system. Unfortunately, as explained in

Chapter 2.1.1, queuing models are not suitable for calls. Therefore, for applications that require

high bandwidth (i.e., m is small, irrespective of the mean call-holding time), our conclusion of

needing a book-ahead mechanism holds.

Low-bandwidth applications: At the other extreme, consider large values ofm, say m = 500

to m = 1000. For example, in a video-telephony application with motion JPEG cameras operating

at 25 frames/sec (motion-JPEG used instead of MPEG to meet the stringent delay requirements of

telephony), we could allocate 10 Mb/s on an MPLS-shared 10 Gb/s link, in which case m = 1000.

At these high values ofm, call-blocking probability of almost 0 and utilization levels close to 1 are

achievable as seen in Fig. 3.3b; however, the required traffic load is high (close to m) as noted in

our analysis of Fig. 3.4.Whether and how such traffic loads can be engineered depends upon the second important

factor, mean call-holding time. At a traffic load = 500, if the mean call-holding time is small (say

3 minutes for a video-telephony call, which is the number typically quoted as the mean duration of

telephony calls), the aggregate call-arrival rate, , needs to be about 2.8 calls/sec. Say on average

each end host makes 1 call every two hours, which means in (3.2) is about 0.5 calls/hour. This

means that we need N to be 20160 to obtain an aggregate of 500 Erlangs. In other words, we

need calls from 20106 end hosts to be multiplexed (perhaps through a multi-level hierarchy of

switches) into the switch shown in Fig. 3.1, destined to share link Ls capacity. This is a high level

of aggregation requiring switches with large numbers of ports. Since line cards (the more the ports,

the more the line cards) drive up the cost of switches, our conclusion is that to achieve a high

utilization with low-bandwidth applications that have short durations and low call-arrival rates,

we need to equip the switch with a large number of line cards to generate sufficient traffic, which


37/91


could be expensive.

Consider what happens if the mean call-holding time, 1/, is larger, say 2 hours, and mean

call-arrival rate is still low at 1 per 2 hours. This means the number of ports, N feeding traffic into

the shared link can be 540. Building switches with this order of line cards is more feasible. We thus

conclude that the immediate-request, call-blocking mode of bandwidth sharing in GMPLS networks

can be used for low-bandwidth applications that have relatively long durations and low call-arrival

rates. There is an upper limit on mean call-holding time, because if it is very large, unless the call-

arrival rate is very low, , will become very large causing a high call-blocking probability.

Intermediate-bandwidth applications: Finally, consider an intermediate level, where m is in

the range of 100. As seen from Fig. 3.3, call-blocking probabilities are very small when m = 100

even at utilizations of 90%. Now consider the question of mean call-holding times. If we again use

the video-conferencing application or eScience remote-visualization applications where the per-

circuit bandwidth is 100 Mb/s on a 10 Gb/s link (which means m = 100), and mean call-holding

times are in the 2-hour range, the required aggregate call-arrival rate is 40 per hour. If each port of

the switch offers a load of 1 call per 5 hours, we need N to be 200, which is an acceptable number

from a switch-cost perspective. Clearly, the higher the mean holding time, the smaller the N, andhence, the more preferable the application. This result again is surprising: calls with long holding

times are preferable to calls with short holding times in a call-blocking mode of operation.

In summary, applications suitable for present-day GMPLS networks are those in which the

per-circuit capacity is 1/100th shared link capacity and have holding times on the order of tens of

minutes or higher.

3.2.2 Applications in which Call-Holding Time is Dependent on Per-Circuit Band-width

As described in the model in Section 3.1.2, 1/(m) is constant if we neglect Tprop, and hence the

two questions raised at the start of Section 3.1 seem to reduce to one question. But if we study

the system at certain fixed values of m, say m = 10,100,1000 (as in Section 3.2.1), we have a

new parameter , the crossover file size, with which to manipulate the mean call-holding time 1/.


38/91


Therefore, in this section, we study the effect of on various metrics, such as , Pb, U, and N0,

which represents the total call-arrival rate for all files whose sizes are greater than k.

Fig. 3.5 plots the two metrics, Pb, and U, against for fixed values ofm andN0. The influence

of on is interesting because two factors operate in opposing directions. As increases, at a given

m, the mean call-holding time, 1/, increases. But from (3.9), we see that is proportional to

and hence decreases as increases. Since is larger than 1, decreases at a rate faster than 1/

increases. As a result, decreases with increasing . Decreasing is the reason why Pb and U drop

with increasing .

0 5 10 15

x 107

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

m=100, N0=100

m=10, N0=100

m=1000, N0=100

(bytes)

Pb

(a) Pb vs.

0 5 10 15

x 107

0.4

0.5

0.6

0.7

0.8

0.9

1

m=100, N0=50

m=100, N0=100

m=10, N0=100

m=1000, N0=100

(bytes)

U

(b) U vs.

Figure 3.5: Plots ofPb vs. and U vs. for m = 10, 100, and 1000, N0 = 50 and 100, = 1.1,and k= 1.25MB

In Fig. 3.5, we hold N0 constant. But to see the effect of on the required call-arrival rate, we

plot N0 against for a set of given U in Fig. 3.6. From (3.10), we see that N0 is proportional

to 1

. Therefore, N0 increases as increases. From this set of graphs, we see that we shouldselect a smaller so that the required N0 is not too large. IfN0 is large, and the per-host call-

arrival rate, 0, is low, it means that we need to engineer our switches with a large number of ports.

Another interesting result seen in this set of plots is that, unlike the results in Section 3.2.1, where

as m is increased, the required traffic load increases, here we see in Fig. 3.6 that, as m increases, the

required load N0 decreases.


39/91


0 5 10 15

x 10

7

40

60

80

100

120

140

160

U=60%, m=100

U=80%, m=100

U=80%, m=10

U=80%, m=1000

(bytes)

N0

Figure 3.6: Plot of N 0 vs. for m = 10, 100, and 1000, U = 60% and 80%, = 1.1, andk= 1.25MB

We further plot Fig. 3.7 to contrast the effects of m on N for non-file-transfer applications and

file-transfer applications by fixing U and . As shown in Fig. 3.3, increases as m increases.

For non-file-transfer applications, since m and 1/ are independent and 1/ is constant, and N

increase with increasing . We can also derive that the trend ofN vs. m is the same as that of vs.

m (see Fig. 3.4a and Fig. 3.7a). In other words, for m at a small value, the curve has a higher slope

0 20 40 60 80 1000

50

100

150

200

250

U=40%

U=60%

U=80%

U=90%

m

N

(a) N vs. m for non-file-transfer applications with =0.5 call/s and 1/ = 0.8 s

0 20 40 60 80 1000

20

40

60

80

100

120

140

160

180

200

U=40%

U=60%

U=80%

U=90%

m

N

(b) N vs. m for file-transfer applications with 0 =0.5 call/s, = 1.1, k= 1.25 MB, and = 8 MB

Figure 3.7: Plots ofN vs. m for U = 40%, 60%, 80%, and 90%


40/91


than that for m at a large value. In particular, for m at a high value, the curve has an approximately

constant slope of(U)/0 (see Fig. 3.7a). But for file-transfer applications, 1/(m) is a constant

for a fixed , C, and . From (3.10), we can see that the trend of N vs. m is the same as that of

/m vs. m as shown in Fig. 3.4b. In particular, for large m, the curve for N vs. m is flat for a given

U (see Fig. 3.7b). Thus, for file transfers, we can allocate smaller amounts of bandwidth per call,

which means that m can be larger to achieve lower Pb and higher U without increasing N if the user

can tolerate the longer holding time.

Repeating the questions asked in Section 3.2.1, we consider whether high-bandwidth circuits

can be used for file transfers. We reach the same answer as in Section 3.2.1 ifm = 10. Fig. 3.5 shows

that the call-blocking probability is quite high (at 10% even at large ) when m = 10. Furthermore,

Fig. 3.6 shows that a higher N0 load is required to achieve a certain U when m = 10 than when

m is larger. Therefore, we conclude that high-bandwidth circuits, such as m = 10, are not suitable

even for the file-transfer application, unless latency requirements dictate its use.

We see from Fig. 3.5 that using low-bandwidth circuits (m = 1000) does not reduce Pb or

increase U significantly if appropriate values of are selected, although it does not increase N

either (see Fig. 3.7b). Given the natural advantage of lower delay to using lower m for file transfers,we focus the rest of our analysis on the intermediate-bandwidth m = 100 case.

Now we consider the question of what crossover file size, , to select when m = 100. From

Fig. 3.5, we see that should be in the range from 6 MB to 29 MB to meet a utilization higher than

80% and a call-blocking probability lower than 5%. We observe that cannot be too large, because

if it is, then U decreases and the required call-arrival rate, N0, becomes large as seen in Fig. 3.6.

On the other hand, if it is too small, then Pb becomes too high.

To achieve a low call-blocking probability and high utilization, just as we need to choose a

fairly large m (e.g., m = 100) in Section 3.2.1, here we see the need for a fairly high call-arrival

rate, N0 (e.g., N 0 = 100). At an aggregate value N 0 of 100 calls/sec, we also see that

should be in the range from 6 MB to 29 MB. This means that the mean holding time is in the range

of 0.5 s to 2.3 s since the per-circuit rate is 100 Mb/s when m = 100. These mean call-holding times

are significantly smaller than the numbers we consider in Section 3.2.1, where even a mean call-


41/91


holding time of 3 minutes, results in a need for a large number of ports. We see from Fig. 3.5 that

lowering N0 can lower utilization significantly. To engineer an N0 rate of 100 calls/sec, if0

is 1 call every 10 s, it means that we require N to be 1000. This is not a small number and requires a

cascade of switches to build up this load. For example, if the bottleneck link is an enterprise access

link, it requires multiple aggregations from switches internal to the enterprise, whose links can be

run at lower utilization levels, so that the aggregate traffic load for the enterprise access link is high

enough to achieve a high utilization at an acceptable Pb.

Next, we note that the very low mean call-holding times require high-speed signaling engines

to reduce call-setup delays so that they approach round-trip propagation delays, and thus, the circuit

utilization is high. Our work on hardware-accelerated signaling [58] shows the feasibility of im-

plementing an RSVP-TE subset in hardware, which reduces per-switch call processing delays from

the 100 ms range we measured on Sycamore switches to the order of microseconds.

Finally, we note that, although a link capacity of 10 Gb/s is appropriate for SONET/SDH and

MPLS shared links, it is low for a WDM link. If we assume that the shared link supports 100 wave-

lengths, using a typical data rate of 10 Gb/s, link capacity is 1 Tb/s and the per-circuit bandwidth

is 10 Gb/s. Media-immersive applications could consume such high-levels of end-to-end capacity(category of applications where the mean call-holding time is independent of m), but for the file-

transfer application, file sizes should increase significantly to make the use of WDM networks with

GMPLS control-plane protocols usable for file transfers.

3.3 Conclusions

In this chapter, we analyzed the call-blocking mode of operation to determine the types of appli-

cations suitable for GMPLS networks by dividing them into two categories: those for which the

per-circuit capacity is independent of the holding time, and those for which these two variables

are directly related, such as file transfers. We concluded the following for the first category. First,

applications that require high-bandwidth circuits relative to the link capacity (e.g., where the ratio

is one-tenth, say 1 Gb/s circuits on a 10 Gb/s link) are not suitable. Second, applications that re-


42/91


quire low-bandwidth circuits but have short holding times (on the order of a few minutes) require a

high degree of aggregation leading to expenses from large numbers of line cards. Ideal applications

require on the order of one-hundredth the link capacity as per-circuit rates, and have long holding

times. In the second category of applications, we found that the first conclusion to the first category

still holds; however, the second does not because the number of line cards keeps almost constant

for m at a high value. In this category of applications, we also found that calls need to have very

short call-holding times (on the order of seconds).


43/91

Chapter 4

WEB TRANSFER APPLICATION ON CHEETAH

In this chapter, we describe our implementation of a software package, called WebFT, as an applica-

tion for CHEETAH [16]. WebFT accomplishes web transfers across CHEETAH without changing

existing web client and web server software by integrating the CHEETAH end-host software mod-

ules into Common Gateway Interface (CGI) and other external modules.

The main reasons why we chose web transfers as a showcase for CHEETAH are three-fold.

First, web-based applications have become ubiquitous [19] and there is significant interest in im-

proving web performance. Although solutions such as web caching focus on the problems of over-

loaded web servers [9, 17], we focus on improving network performance. Second, according to

the analysis of Chapter 3, CHEETAH network can be operated at a low call-blocking probability

and a high utilization if circuits are on the order of one-hundredth the shared link capacity, for

example, 100 Mb/s on a 10 Gb/s link, and a circuit of 100 Mb/s is suitable for either many small

web file transfers or a single bulk web transfer. Third, many new types of web-based applications,

such as large-file downloads, high-quality video streaming, and remote visualization, require high-

throughput, low-jitter, and deterministic data transfers. These applications need QoS guaranteed

network connectivity. The connectionless sharing mode of the current Internet is inadequate to

provide such connectivity. We contend that the lack of rate-guaranteed network connectivity is hin-

dering these web-based applications from being developed and deployed. An answer to this need

lies in some of the newer networking technologiesfor example, CO networking technologies,

currently under development and deployment. CO networks, such as CHEETAH and DRAGON,

29


44/91

Chapter 4. WEB TRANSFER APPLICATION ON CHEETAH 30

allow for the reservation of bandwidth in the form of a dedicated circuit or VC through the networks

prior to data transfer.

This chapter determines how we can leverage these new CO technologies to improve the per-

formance of web applications. We first describe the WebFT software design and implementation.

Then, we show our experimental results and reason that WebFT can achieve low-variance, end-to-

end transfer delays at different circuit rates and low transfer delays when high-speed circuits are

possible.

4.1 WebFT Design

A primary goal of the WebFT software design is to provide deterministic data-transfer services to

clients connected to a web server via the CHEETAH network. WebFT leverages the coexistence

of two paths between a web client and a web serverthat is, through the Internet and through

the CHEETAH network. It allows clients that have network connectivity to the circuit-switched

CHEETAH network to connect the WebFT server and download web content (e.g., large files or

streamed video) through dedicated end-to-end circuits, while simultaneously providing normal web

access to other nonCHEETAH clients through the Internet. The dedicated nature of the circuits

allows for user data to be streamed unhindered from a web server to a web client via the CHEETAH

network. This results in low-variance transfer delays.

Another goal of the WebFT software design is not to impose any special requirements with

regards to the operating system or the web server or client software packages executed on the client

and server hosts. We leverage the CGI technology to achieve this goal [32].

4.1.1 WebFT Architecture

The WebFT architecture is shown in Fig. 4.1. On the web server side, WebFT includes two CGI

scripts, download.cgi and redirection.cgi, and a process called WebFT sender. Download.cgi is em-

bedded into web pages as a hyperlink, with the name of the file to be served as a parameter. When

the user clicks the download.cgi hyperlink on the web page through any typical web client, the web


45/91


Web serverWeb client

Web Server

(e.g. Apache)

CGI scripts

(download.cgi &

redirection.cgi

URL

Response

WebFT sender

OCS API RD API

RSVP-TE API

C-TCP API

Web Browser

(e.g. Mozilla)

WebFT receiver

RSVP-TE API

C-TCP API

Control messages

via InternetData transfers

via a circuit

OCS daemon

RD daemon

RSVP-TE daemon

RSVP-TE

daemon

Figure 4.1: WebFT architecture

server receives an HTTP message causing download.cgi to be initiated. Download.cgi, in turn, initi-ates the WebFT sender process, which communicates with the WebFT receiver process on the client

host to transfer the data from the server side to the client side. By leveraging the CGI technology,

we avoid requiring any software upgrades to both web servers and web browsers.

Integrated into the WebFT sender and receiver are libraries provided with the CHEETAH end-

host software module described in Section 2.2. Through interaction with the CHEETAH end-host

software modules, the WebFT sender determines whether to use the Internet path or attempt to set

up a CHEETAH circuit, and if deemed appropriate, initiates the setup of a circuit. It then transfers

the user data, and initiates the release of the circuit. If, for some reason, the user data cannot be

transferred via the CHEETAH network (e.g., the client host is not connected to CHEETAH, the file

size is too small, which makes it inefficient to use a circuit, or bandwidth is not available on the

CHEETAH network), the WebFT sender process exits and redirection.cgi is invoked to transfer the

file via the Internet.

4.1.2 CGI Scripts

CGI defines an approach for a web server to interact with external programs, which are often re-

ferred to as CGI programs or CGI scripts. Fig. 4.2 shows the flow of events while running CGI

scripts.1

1This figure is adapted from Writing CGI Applications with Perl by Meltzer and Michalski [32].
http://-/?-


46/91


`

WWW Client HTTP Web Server

HTTP request

HTTP response

Gateway programs

CGI Run CGI

Scripts

Figure 4.2: The flow of events from running CGI scripts

The WebFT package contains two CGI scripts developed in Perl5 on the server side: down-

load.cgi and redirection.cgi. On receiving a request from a client, the web server invokes the

download.cgi script with one input parameter, the requested file name. Download.cgi obtains the

clients primary IP address by querying the environment variable of REMOTE ADDR. It then calls

the WebFT sender process and passes the clients primary IP address and the requested file name to

the WebFT sender process. If the WebFT sender returns indicating a failure to transfer the file over

the CHEETAH network, download.cgi calls redirection.cgi to initiate a normal download of the file

via the Internet.

4.1.3 The WebFT Sender

The WebFT sender is integrated with APIs for the four basic CHEETAH end-host software mod-

ules. Thus, it interacts with the CHEETAH software daemons, including the OCS daemon, the RD

daemon, and the RSVPTE daemon, as shown in Fig. 4.1. The flowchart for the WebFT sender is

shown in Fig. 4.3. Once the sender is initiated by the download.cgi script, it calls the OCS client

module to determine whether the client host is reachable via the CHEETAH network. If the answer

is yes, the OCS client module returns with the IP address and the MAC address of clients secondary

NIC (the one connected to the CHEETAH network).

The WebFT sender then establishes a TCP connection through the host primary NIC via the

Internet to the WebFT receiver, which is running as a daemon on a well-known port in the client

host. Once the TCP connection is successfully established, the receiver sends back a desired CHEE-

TAH circuit rate (based on its receiving capability) and a C-TCP listening port number for the data


47/91


The client can be reached via theCHEETAH network (OCS)

Request a CHEETAH circuit (RD)

Set up a circuit (RSVP_TE client)

Send the file via C-TCP

Release the circuit (RSVP_TE client)

Yes

Yes

Succeed

No

No

Fail

Return Success Return Failure

Figure 4.3: The flow chart for the WebFT sender

transfer on the CHEETAH circuit.

Then, the WebFT sender process calls the RD module (passing the client hosts primary IP

address, secondary IP address, clients desired circuit rate, and file size as arguments) to deter-

mine whether to attempt a CHEETAH circuit setup. The RD module chooses between the two

options based on the loading conditions of the two networks (the Internet and the CHEETAH

circuit-switched network), the round-trip delay time (RTT), and the file size. If it returns a de-

cision to attempt a CHEETAH circuit setup, the WebFT sender process calls the RSVPTE client

module (passing the clients primary and secondary IP addresses and the circuit rate), asking it to

initiate circuit setup.


48/91


If the circuit setup is successful, the WebFT sender process calls the C-TCP send() subroutine,

passing the following arguments: the circuit rate, the clients secondary IP address, the C-TCP

port number on which the client is ready to accept an incoming C-TCP connection on the circuit,

and the file name. The C-TCP send() subroutine opens a socket and connects the client through

the secondary NIC and the CHEETAH circuit. The file is transferred on the dedicated CHEETAH

circuit at a rate equal to the circuit rate.

Once the data transfer is completed, the WebFT sender process invokes the RSVPTE client

APIs to initiate release of the CHEETAH circuit. Finally, it returns a Success indication to the

download.cgi script.

If, during the above-mentioned procedure, the OCS client module determines that the client host

does not have CHEETAH connectivity, or the RD module decides that it is better to use the Internet

path, or the circuit setup initiated by the RSVPTE client module fails, the WebFT sender process

immediately returns a Failure indication to the download.cgi script. The download.cgi process then

calls redirection.cgi to download the file via the Internet as mentioned in Section 4.1.2.

4.1.4 The WebFT Receiver

To avoid manual intervention, the WebFT receiver is designed to run as a daemon on a well-known

port in the background on the client host and to process incoming connection requests from the

WebFT sender automatically. The WebFT receiver is completely independent of web browser soft-

ware, and therefore does not require any modification to the latter. All clients connected to the

CHEETAH network are configured to run this daemon.

The WebFT receiver forks a child process to handle each request for a TCP connection from the

WebFT sender through the primary NIC. The forked WebFT receiver process then creates a TCP

connection with the WebFT sender to accept the request and sends to the latter the information of

a pre-computed desired circuit rate. The circuit rate is typically computed based on the disk access

rate of the client host because with todays technology, disk access rate is usually the bottleneck for

file transfers. The forked WebFT receiver process also sends the listening C-TCP port number for

the data transfer through the secondary NIC on the CHEETAH circuit.


49/91


The WebFT receiver includes the API libraries associated with the RSVPTE client and C-TCP

modules of the CHEETAH end-host software. The RSVPTE client module API library accepts

circuit setup requests from the CHEETAH network and the C-TCP module API library accepts

incoming C-TCP connection requests from the WebFT sender to transfer user data. After a data

transfer is completed, the forked child process terminates and returns to the parent WebFT receiver

process.

4.2 Experimental Testbed and Results

The Linux implementation of WebFT described in the previous section has been tested on the

CHEETAH experimental testbed. This section presents and discusses these

A STUDY OF APPLICATIONS

Documents

Transcript of A STUDY OF APPLICATIONS