Author's personal copy140.136.149.173/jwl/TeachingActivities/Computer Fault Tolerant/100... ·...

14
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright

Transcript of Author's personal copy140.136.149.173/jwl/TeachingActivities/Computer Fault Tolerant/100... ·...

Page 1: Author's personal copy140.136.149.173/jwl/TeachingActivities/Computer Fault Tolerant/100... · 2001; CiscoCo. Ltd., 2002; Faizan et al.,2005, 2006) is based on the hardware redundancy

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/copyright

Page 2: Author's personal copy140.136.149.173/jwl/TeachingActivities/Computer Fault Tolerant/100... · 2001; CiscoCo. Ltd., 2002; Faizan et al.,2005, 2006) is based on the hardware redundancy

Author's personal copy

Fault-tolerant design for wide-area Mobile IPv6 networks

Jenn-Wei Lin *, Ming-Feng YangDepartment of Computer Science and Information Engineering, Fu Jen Catholic University, 510 Chung-Cheng Road, Hsinchuang 242, Taiwan

a r t i c l e i n f o

Article history:Received 31 October 2007Received in revised form 24 July 2008Accepted 25 July 2008Available online 6 August 2008

Keywords:Mobile IPv6Fault toleranceHome agent

a b s t r a c t

Mobile IPv6 provides the mobility management for IPv6 protocol. To establish a reliable Mobile IPv6 net-work, fault tolerance should be also considered in the network design. This paper presents an efficientfault-tolerant approach for Mobile IPv6 networks. In the proposed approach, if a failure is detected inthe home agent (HA) of a mobile node, a preferable survival HA is selected to continuously serve themobile node. The preferable survival HA is the HA that does not incur failure and is neighboring the cur-rent location of the mobile node. The proposed approach is based on the preference of each mobile nodeto achieve the fault tolerance of the HA. Finally, we perform simulations to evaluate the performance ofthe proposed approach.

� 2008 Elsevier Inc. All rights reserved.

1. Introduction

With the rapid development of Internet, IPv6 has been designedto overcome some problems within IPv4 (Deering and Hinden,1998) (e.g. shortage of IPv4 address space, the explosion of routingtables, and security problem, etc.). In the other aspect, there is alsoa growing demand for accessing data using the wireless technol-ogy. To achieve the wireless Internet, mobility support should betaken into Internet. Mobility support in IPv6 (Mobile IPv6) hasbeen standardized by IETF RFC 3775 (Johnson and Perkins, 2004).

Mobile IPv6 is designed to allow a mobile node (MN) to contin-uously sustain its ongoing sessions while changing its location in aMobile IPv6 network. To meet this requirement, each MN is iden-tified by its home address, regardless of its current point of attach-ment. While an MN at its home network, it operates like a fixednode by using the normal routing to send or receive packetsthrough the access router (AR) of its home network. When theMN moves to a foreign network, it will inform its home agent(HA) of such movement by sending a binding update message. Insuch case, if a correspondent node (CN) sends a packet to theMN, the packet will be intercepted by the MN’s HA and then tun-neled to the MN. Here, the packet routing through the HA mayintroduce the triangle routing problem. To avoid the triangle rout-ing problem, Mobile IPv6 also proposes a route optimization mech-anism that allows a CN to directly send packets to the MN bycaching the location information of the MN in the CN.

From the above execution scenario of Mobile IPv6, we can knowthat if a failure occurs in an HA, its responsible MNs cannot per-form binding updates again. In such case, if some packets are re-

quired to go through the faulty HA, such packets will be lost. Thefault tolerance of Mobile IP has been discussed in the literature(Ghosh and Varghese, 1998; Ahn and Hwang, 2001; Cisco Co.Ltd., 2002; Faizan et al., 2005, 2006; Pack and Choi, 2004; Linand Arul, 2003). The proposed approaches in Ghosh and Varghese(1998), Ahn and Hwang (2001), Cisco Co. Ltd. (2002), Faizan et al.(2005, 2006), Pack and Choi (2004) and Lin and Arul (2003) can bedivided into the following two main categories: the redundancy-based scheme and the redirection-based scheme. The redun-dancy-based scheme (Ghosh and Varghese, 1998; Ahn and Hwang,2001; Cisco Co. Ltd., 2002; Faizan et al.,2005, 2006) is based on thehardware redundancy to achieve fault tolerance. The redirection-based scheme (Pack and Choi, 2004; Lin and Arul, 2003) is basedon the workload redirection. In this scheme, the workload of thefaulty mobility agent is taken by the survival mobility agent.Although the approaches of Pack and Choi (2004) and Lin and Arul(2003) do not incur the significant hardware overhead, the work-load redirection is required to be assisted by the network architec-ture (e.g. the hierarchical Mobile IP architecture and a centralizedOAM center in the network).

In this paper, we propose a new fault-tolerant approach for Mo-bile IPv6. Compared to the previous approaches (Ghosh and Vargh-ese, 1998; Ahn and Hwang, 2001; Cisco Co. Ltd., 2002; Faizan et al.,

2005, 2006; Pack and Choi, 2004; Lin and Arul, 2003), there are twofeatures in the proposed approach. The first feature is that the pro-posed approach does not adopt the hardware redundancy and notneed the assistance of the network architecture (e.g. the hierarchi-cal Mobile IP architecture and the centralized OAM center Pack andChoi, 2004; Lin and Arul, 2003). The proposed approach is basedthe home address regeneration to tolerate the HA failure. When anHA fails, each of its responsible MNs (failure-affected MNs) gener-ates a new home address to associate with its preferable survival

0164-1212/$ - see front matter � 2008 Elsevier Inc. All rights reserved.doi:10.1016/j.jss.2008.07.021

* Corresponding author.E-mail address: [email protected] (J.-W. Lin).

The Journal of Systems and Software 82 (2009) 1434–1446

Contents lists available at ScienceDirect

The Journal of Systems and Software

journal homepage: www.elsevier .com/ locate/ jss

Page 3: Author's personal copy140.136.149.173/jwl/TeachingActivities/Computer Fault Tolerant/100... · 2001; CiscoCo. Ltd., 2002; Faizan et al.,2005, 2006) is based on the hardware redundancy

Author's personal copy

HA. Due to using the new home address, the failure-affected MNcan continue to perform the binding update. In addition, the fail-ure-affected MN can also continuously send packets to a CN. Theother feature of the proposed approach is that it particularly con-siders how to recover the lost packets due to the HA failure. Thisalso solves the problem how to make a CN send packets to a fail-ure-affected MN when an HA fails. Unlike the previous approaches(Ghosh and Varghese, 1998; Ahn and Hwang, 2001; Cisco Co. Ltd.,2002; Faizan et al., 2005, 2006; Pack and Choi, 2004; Lin and Arul,2003), the recovery of the lost packets is not dependent on the TCPlayer, which can avoid incurring the long recovery latency. In theproposed approach, the packet recovery is integrated into the bind-ing update of an MN. To examine the effectiveness of the proposedapproach, we perform extensive simulation experiments to evalu-ate the performance of the proposed approach.

The remainder of this paper is organized as follows: Section 2introduces the background of this paper. Section 3 proposes ourapproach. Section 4 evaluates the overhead of the proposed ap-proach. Section 5 makes the comparison between the proposed ap-proach and previous approaches. Section 6 performs simulations toquantify the performance of the proposed approach. Finally, wemake conclusions and discuss future work in Section 7.

2. Background

This section describes the background knowledge of this paper.First, a wide-area Mobile IPv6 network model is given. Then, webriefly introduce the Mobile IPv6 protocol. Next, we describe themethods used for detecting the HA failure. Last, related work isreviewed.

2.1. Network model

The network model referred in this paper is a wide-area mobilenetwork (not Mobile Internet), as shown in Fig. 1. The Telecom car-rier usually establishes a wide-area mobile network for its mobileusers. In the wide-area mobile network, it has a number of networkdomains which are divided based on the geographic locations. Ini-tially, each MN is located in a network domain as its home net-work. In a network domain, there are several access routers

(ARs), a home agent (HA), and an interconnection router. The ARassists the MNs to forward packets within its located network do-main. The HA provides the mobility support for its responsibleMNs.

The details of the mobility support will be given in the next sec-tion. As for the interconnection router, it is used to transmit pack-ets from a network domain to another one. Among the networkdomains, there is a high-speed fixed network as the backbone forconnecting all the network domains.

2.2. Mobile IPv6

Initially, each mobile node (MN) is assigned a permanent ad-dress as its home address. The mobility of an MN is fixedly servedby the HA in its home network. When the MN moves from its homenetwork domain to another network domain, it will obtain a ‘‘care-of-address” to reflect its current point of attachment. Next, the MNsends a binding update message to its serving HA to store the ad-dress mapping between the home address and care-of-address inthe binding cache of the HA. Then, the HA will respond to theMN with a binding acknowledgment message. Thereafter, the HAcan intercept any packets destined to the home address of theMN and tunnel such packets to the care-of-address of the MN.However, the packet transmission through the HA may cause along transmission delay, especially when the MN and CN are lo-cated in the same network domain. This is called the triangle rout-ing problem. To solve the possible triangle routing problem, MobileIPv6 supports the route optimization to forward packets by usingthe shortest path between the CN and MN. The route optimizationis achieved by maintaining a binding cache in the CN to store theaddresses of the communicating MNs (Johnson and Perkins, 2004).

For the security of binding update, Mobile IPv6 uses IPsec (Kentand Seo, 2005) and Return Routability (Johnson and Perkins, 2004)to ensure the secure binding update execution between the MNand HA and between the MN and CN. In this paper, the proposedapproach is also based on the original Mobile IPb6 security mech-anisms to achieve fault tolerance. As for how to provide more se-cure mechanisms for the Mobile IPv6 network, it is anotherresearch topic, which is now discussed in Fathi et al. (2005) andRen et al. (2006).

CN1

Backbone Network

MN

MN: Mobile Node AR: Access Router HA: Home Agent IR: Interconnection Router

CN2

HA1 HA3

AR AR

IR1

AR

AR

HA2 HA4

MN

IR3

IR5

IR2 IR4

Network Domain 1 Network Domain 3

Network Domain 2 Network Domain 4

Internet

CN: Correspondent Node

A wide-area Mobile IPv6 Network

AR

AR

ARARAR

AR

Fig. 1. Network model.

J.-W. Lin, M.-F. Yang / The Journal of Systems and Software 82 (2009) 1434–1446 1435

Page 4: Author's personal copy140.136.149.173/jwl/TeachingActivities/Computer Fault Tolerant/100... · 2001; CiscoCo. Ltd., 2002; Faizan et al.,2005, 2006) is based on the hardware redundancy

Author's personal copy

2.3. Failure assumption and detection

This paper only concerns the fault tolerance of the HA entity.For other main entities in the Mobile IPv6 network system (seeFig. 1), they are all the router equipment. The hardware redun-dancy technique has been extensively adopted to handle the faulttolerance of the router (Srivastava, 2003).

The MN and CN can detect the HA failure with the help of theinterconnection router. As shown in the network model (Fig. 1),the interconnection router acts as the packet forwarder of an HA.Before a packet to an HA, the packet is first received by a corre-sponding interconnection router. In Lin and Arul (2003), it also sta-ted that a router is usually collocated with an HA on a LAN of thesame network domain to be the packet forwarder of the HA. If afailure occurs in an HA, the corresponding interconnection routercan detect the failure by introducing the concept of ‘‘Heartbeat”(Khalil, 2002). Each HA will periodically send a heartbeat messageto its corresponding interconnection router. Based on Hinden(2004), the default heartbeat interval (the default time betweentwo sending heartbeat messages) is set to 1 s. In Kuo et al.(2005), it indicated that the heartbeat messages with once per sec-ond introduce a little bandwidth overhead. If the interconnectionrouter does not receive the heartbeat message for a certain periodof time, it can assume that there is a failure in its correspondingHA. Therefore, while an MN performs a binding update to a faultyHA, the corresponding interconnection router will respond to theMN with an ICMP error message (Conta et al., 2006). Then, theMN is also aware of the HA failure. Similarly, the correspondingnode (CN) can also know the HA failure by the interconnection rou-ter. If a CN would like to send data packets to an MN through afaulty HA, the CN will also receive the ICMP error message fromthe interconnection router.

2.4. Related work

A lot of fault-tolerant approaches have been proposed for Mo-bile IP (Ghosh and Varghese, 1998; Ahn and Hwang, 2001; CiscoCo. Ltd., 2002; Faizan et al., 2005, 2006; Pack and Choi, 2004; Linand Arul, 2003). The approaches (Ghosh and Varghese, 1998; Ahnand Hwang, 2001; Cisco Co. Ltd., 2002; Faizan et al., 2005, 2006)belong to the redundancy-based scheme. The basic idea of theredundancy-based scheme is to equip with one or more redundantHAs for each working HA as its backup set. In the approach ofGhosh and Varghese (1998), the redundant HAs are configured inthe standby or load-sharing mode. If a working HA fails, one mem-ber of its redundant HAs will be selected to be the new workingHA. This is done by using the Address Resolution Protocol (ARP)(Plummer, 1982) to map the IP address of the faulty HA onto thenetwork link-layer address of the selected backup member. Otherbackup members can continuously co-work with the new workingHA in the standby or load-sharing mode. However, in the approachof Ghosh and Varghese (1998), it has the binding synchronizationproblem. Whenever an MN performs a binding update to its pri-mary HA, the binding update message is also sent to the corre-sponding redundant HAs. It may introduce a long registrationdelay.

To avoid the long registration delay, the approach of Ahn andHwang (2001) uses the checkpoint and log techniques to storethe binding update information in stable storage. Basically, the ap-proach of Ahn and Hwang (2001) is similar to the approach ofGhosh and Varghese (1998). In Ahn and Hwang (2001), after anHA fails, one of its backup members can acquire the binding updateinformation from stable storage. However, the access to the stablestorage is a time-consuming operation.

In the approach of Cisco Co. Ltd. (2002), it uses the Hot StandbyRouter Protocol (HSRP) to ensure that the redundant HA can imme-

diately take over the faulty HA. The HSRP is developed by Cisco (Liet al., 1998), which enables two or more HAs on a LAN as a singleHA group by sharing the same IP address and MAC (Layer 2) ad-dress. Similar to the approach of Ghosh and Varghese (1998), thisapproach also has the binding synchronization problem.

Unlike the above redundancy-based approaches, the approachof Faizan et al. (2005) aims at Mobile IPv6, not Mobile IPv4. In Fai-zan et al. (2005), it proposes a Virtual Home Agent Reliability Pro-tocol (VHARP) to allow that multiple HAs coexist on the samenetwork domain (home link) and have the same Global IP address.Based on VHARP, all the communication between the MN (CN) andthe HAs is based on the Global HA address. For each MN, its bindingupdate information is stored on at least two HAs. One is the activeHA, and others are the backup HAs. Each HA periodically sends aheartbeat message over the home link. If a failure occurs in the ac-tive HA, other HAs can detect this event since they do not receivethe heartbeat message for a period of time. In such case, all otherHAs simultaneously execute a procedure to select the HA withthe lowest load as the new active HA. The approach of Faizanet al. (2006) enhances the approach of Faizan et al. (2005) to fur-ther consider the home link failure, not only the HA failure. Inthe approach of Faizan et al. (2006), multiple HAs on the same net-work domain are equipped with several home links. Similarly, if anactive home link fails, the new active home link will be selectedfrom other home links.

For the approaches of Pack and Choi (2004) and Lin and Arul(2003), they belong to the redirection-based scheme. The mainidea of this scheme is to perform the workload redirection. In theapproach of Pack and Choi (2004), its proposed fault-tolerantmethod is only suitable for the Hierarchical Mobile IPv6 (HMIPv6)architecture, not available for the general Mobile IPv6 architecture.The HMIPv6 is designed to reduce the registration overhead andhandoff latency by introducing a local home agent entity calledthe mobility anchor point (MAP). The approach of Pack and Choi(2004) mainly copes with the MAP failure, not the HA failure. Inthe proposed fault-tolerant method, it assumed that each MNcan receive two or more router advisement messages from differ-ent MAPs whenever it moves to a new network domain. Then,the MN selects a primary MAP (P-MAP) and a secondary MAP(S-MAP). Next, the MN performs the primary and secondary bind-ing updates. The primary binding update notifies the P-MAP, HA,and communicating CNs of the following information: the MN cur-rent location and the MN’s located MAP. The secondary binding up-date is executed after completing the primary binding update,which main purpose is to make the communicating CNs knowthe alternate reachable MAP of the MN (the MN’s S-MAP). Whena CN (MN) detects the failure of the P-MAP, the S-MAP will be in-stead of P-MAP to forward (send) packets to the MN (CN). From theabove description, the fault-tolerant capability of Pack and Choi(2004) is dependent on the two-MAPs selection algorithm, butthe algorithm is not clearly described in Pack and Choi (2004). Ifthere is only one MAP in a network domain or the P-MAP andS-MAP fail simultaneously, the approach of Pack and Choi (2004)cannot work successfully. For the fault tolerance of the HA, theapproach of Pack and Choi (2004) described that the redun-dancy-based scheme should be used since the HA contains veryimportant binding update information.

The approach of Lin and Arul (2003) is our previous work on thefault tolerance of Mobile IPv4. It redirects the workload of thefaulty HA to survival HAs without the hardware support. Basedon the approach, the twice-tunneling transmission technique isused to assist the packet transmission after the HA failure. Com-pared to the pre-failure, the packet destined to an MN may incura potentially long delay. In the execution of the workload redirec-tion, the centralized OAM center is involved to collect the loadingstatus of all failure-free HAs for evenly assigning the workload of

1436 J.-W. Lin, M.-F. Yang / The Journal of Systems and Software 82 (2009) 1434–1446

Page 5: Author's personal copy140.136.149.173/jwl/TeachingActivities/Computer Fault Tolerant/100... · 2001; CiscoCo. Ltd., 2002; Faizan et al.,2005, 2006) is based on the hardware redundancy

Author's personal copy

the faulty HA to the failure-free HAs. The centralized OAM centeralso needs to commend all the foreign agents (FAs) in the MobileIPv4 network to collect the binding information of the faulty HA.However, unlike the Mobile IPv4, the Mobile IPv6 has no FAs. Insuch case, the binding information restoration must be done byall the failure-affected MNs to actively perform the binding up-dates with their respective new serving HAs. However, the ap-proach of Lin and Arul (2003) does not discuss how to make eachfailure-affected MN know its new serving HA. Therefore, the ap-proach of Lin and Arul (2003) is not suitable to be used in the Mo-bile IPv6 network architecture for tolerating the HA failure sincethe binding information cannot be restored.

3. The proposed approach

This section presents a new fault-tolerant approach for MobileIPv6. The proposed approach is also based on the workload redirec-tion by using survival (failure-free) HAs to serve the failure-af-fected MNs. The failure-affected MNs indicate the responsibleMNs of the faulty HA. Unlike the previous redirection-based ap-proach Lin and Arul (2003), the proposed approach can be appliedin the Mobile IPv4 and IPv6 networks, and it is not required thesupport of the centralized OAM center. In addition, the selectionof failure-free HAs is from the viewpoint of the failure-affectedMN. It makes each failure-affected MN select its preferable fail-ure-free HA as its new serving HA to reduce the fault-tolerant over-head. Therefore, when an HA fails, a failure-affected MN cancontinuously send packets to a CN. The details are described in Sec-tions 3.1 and 3.2. The other feature of the proposed approach isthat it does not rely on the end-to-end TCP layer to perform the lostpacket recovery. When an HA fails, the packets sent from a CN canbe delivered to a failure-affected MN. The details are described inSection 3.3. In addition, when a faulty HA is recovered from failure,the failure-affected MN can be served back by its original HA. Thisrecovery procedure is also described in Section 3.4.

3.1. Fault tolerance of HA

If an HA fails, its responsible MNs (the failure-affected MNs)cannot continuously perform binding updates to it. Furthermore,the faulty HA is also unable to assist CNs to send packet to itsresponsible MNs. To tolerate the HA failure, the proposed approachassigns more than one HA for each MN. Unlike the redundancybased approaches (Ghosh and Varghese, 1998; Ahn and Hwang,2001; Cisco Co. Ltd., 2002; Faizan et al., 2005, 2006), the proposedapproach does not configure one or more redundant HAs for eachworking HA. As shown in Fig. 1, there is only one HA in each net-work domain. In the proposed approach, the multiple HA assign-ment is from the logical viewpoint to make that each HA in theMobile IPv6 network system can serve any MNs. Fig. 2 illustratesthe basic idea of the proposed approach.

Initially, each MN in its home network domain can know theidentity of its default HA. The default HA is the HA in the MN’shome network domain, which is fixed without changing as theMN moves. In addition, the MN can also know its preferable exter-nal HAs by actively sending a solicited binding update message tothe default HA. The external HAs are the HAs having 1-hop net-work domain distance, 2-hop network domain distance, and soon with the home network domain, as shown in Fig. 3a. The reasonwhy the external HAs can be known from a default HA is explainedas follows. In this paper, the network model is under a wide-areamobile network, not Mobile Internet. The following informationcan be known while establishing the Mobile IPv6 network: thenumber of HAs, the distance relationship between an HA and an-other one, and the addresses of all the HAs in the system. (Notethat the addresses of all the HAs can follow an addressing rule toassign them.) Therefore, it is reasonable for an HA to store the ad-dresses of all other HAs in its external HA list based on the order oftheir location distances with the HA. In practice, the number of theHA addresses stored in the external HA list is dependent on the gi-ven fault-tolerant capability, which will be further discussed inSection 6. In this section, we are based on the highest fault-tolerantcapability to elaborate the proposed approach. Therefore, if thereare n HAs in the network system, the external HA list of an HA isrequired to store the addresses of n � 1 external HAs.

If the large number of the external HAs is concerned, the ad-dresses of the external HAs can be abstracted as a meta-addressform since the addresses of the HAs can be pre-specified by follow-ing an addressing rule. By using the meta-address form and itselfaddress of a default HA, the default HA can infer the addresses ofits corresponding external HAs. For example in Fig. 3a, the interfaceidentifiers (host-portion addresses) of all the HAs are fixedly spec-ified as ‘‘0:0:0:1”.

Additionally, the network domains in a Mobile IPv6 networksystem usually have the contiguous network-prefix addresses.With these two characteristics, in Fig. 3a, the addresses of theexternal HAs for HA4 are simplified as the meta-address form (3,2, and 4). In the meta-address form, the value of the pre-domainsfield is 3. It represents there are three network domains (1, 2,and 3) which network prefix addresses are smaller than the locatednetwork domain (4) of the MN. The value of the post-domains fieldis two which represents there are two network domains (5 and 6)which network-prefix addresses are larger than the located net-work domain (4) of the MN. As for the prefix-piece field, it is setto four which represents the prefix address of the network domainoccupies four pieces. In the IPv6 protocol, the address is 128bits,which is further divided into eight pieces and each piece has 16bits. For the Mobile IPv6, the network-prefix portion of the addressis usually 64 bits and the interface identifier (the host portion ofthe address) is also 64 bits (Jelger and Noel, 2005).

As mentioned above, the default HA will receive a solicitedbinding update message from each of its responsible MNs. Then,the default HA will respond to the MN with a binding acknowledg-ment message. Before sending this message, the default HA at-taches the addresses of its known external HAs on the mobilityoptions field of the binding acknowledgment message. Note thatthe mobility options field already exists in the binding acknowl-edgment message (Johnson and Perkins, 2004), which is used forextension. Several Mobile IPv6 research work (Takahashi et al.,2003; Lu et al., 2005) has also utilized the mobility options fieldto carry necessary information. However, if the size of the attachedaddresses is concerned, the meta-address form of the external HAsis attached on the binding acknowledgment message. Upon receiv-ing the binding acknowledgment message, the MN can know theinformation about its preferable external HAs from the message.Next, the preferable external HA information is changed wheneverthe MN moves to a foreign network domain.

Handoff

Assigning a newserving HA

andretrieving lost packets

Failure detection

Failure detection

Handoff

Updating the preferable

external HAs

Getting the preferable

external HAsInitial

Fig. 2. The basic idea of the proposed approach.

J.-W. Lin, M.-F. Yang / The Journal of Systems and Software 82 (2009) 1434–1446 1437

Page 6: Author's personal copy140.136.149.173/jwl/TeachingActivities/Computer Fault Tolerant/100... · 2001; CiscoCo. Ltd., 2002; Faizan et al.,2005, 2006) is based on the hardware redundancy

Author's personal copy

When an MN moves to a foreign network domain, the MN alsoneeds to send a binding update message to its default HA. Afterreceiving the binding update message, the default HA will attachthe new preferable external HA information on the binding

acknowledgment message since the MN has changed its location.The new preferable external HA information is found as follows.First, the new located network domain of the MN is inferred fromthe MN’s care-of-address. Then, the new preferable external HAs

MN

HA4

(2-hop distance)

HA2

HA1

HA5

HA6

(1-hop distance)

(2-hop distance)(1-hop dista

nce)

MN: Mobile Node HA: Home Agent

HA3(1-hop

distance)

Bin

ding

ack

now

ledg

men

tw

ith

exte

rnal

HA

s

Homenetwork domain

The initial preferable external HAs

3ffe:200:8:3::1 (HA3)

3ffe:200:8:5::1 (HA5)

3ffe:200:8:2::1 (HA2)

3ffe:200:8:6::1 (HA6)

3ffe:200:8:1::1 (HA1)

3 2 4

Pre-domains Post-domains prefix-piece field

8 bits 8 bits 2 bits

Meta-address form

IP addresses

3ffe:200:8:3::1(HA3)

3ffe:200:8:5::1(HA5)

3ffe:200:8:2::1(HA2)

3ffe:200:8:6::1(HA6)

IPv6 address IPv6 address IPv6 address IPv6 address

128 bits 128 bits 128 bits 128 bits

3ffe:200:8:1::1(HA1)

IPv6 address

128 bitsOR

The preferable external HAs after handoff

3ffe:200:8:5::1 (HA5)

3ffe:200:8:6::1 (HA6)

3ffe:200:8:3::1 (HA3)

3ffe:200:8:2::1 (HA2)

3ffe:200:8:1::1 (HA1)

3 1 4

Pre-domains Post-domains prefix-piece field

8 bits 8 bits 2 bits

Meta-address form

OR

IP addresses

MN

HA4

HA2

HA1

HA5

HA6

HA3

(3-hop distance)

(1-hop distance)

(2-hop distance )

(1-hop distance)

Bindi

ng a

ckno

wle

dgm

ent

with

exte

rnal

HAs

Homenetwork domain

3ffe:200:8:6::1(HA6)

3ffe:200:8:3::1(HA3)

3ffe:200:8:2::1(HA2)

IPv6 address IPv6 address IPv6 address

128 bits 128 bits 128 bits

3ffe:200:8:1::1(HA1)

IPv6 address

128 bits

3ffe:200:8:5::1(HA5)

IPv6 address

128 bits

a

b

Fig. 3. The variation of the preferable external HAs. (a) MN in the home network domain. (b) MN in the foreign network domain.

1438 J.-W. Lin, M.-F. Yang / The Journal of Systems and Software 82 (2009) 1434–1446

Page 7: Author's personal copy140.136.149.173/jwl/TeachingActivities/Computer Fault Tolerant/100... · 2001; CiscoCo. Ltd., 2002; Faizan et al.,2005, 2006) is based on the hardware redundancy

Author's personal copy

are also found from the external HA list of the default HA. The newpreferable external HAs have 0-hop network domain distance, 1-hop network domain distance, 2-hop network domain distance,and so on with the new located network domain of the MN. Next,the default HA attaches the addresses of the new preferable exter-nal HAs or their meta-address form on the binding acknowledg-ment message and then sends the message to the MN. Uponreceiving the binding acknowledgment message, the MN updatesthe preferable external HA information, as shown in Fig. 3b. InFig. 3b, if the default HA fails, the first preferable external HA ofthe MN is HA5, not HA3 since HA5 is nearest to the new located net-work domain of the MN.

Thereafter, if a default HA fails, this failure event can be de-tected by its corresponding MNs (see Section 2.3). Each of thesefailure-affected MN will utilize its pre-obtained preferable externalHA information to perform the fault tolerance of its default HA, asshown in Fig. 4. First, the failure-affected MN selects the first pref-

erable external HA as the new serving HA since it is nearest to theMN’s current location in comparison with other external HAs (see(2) of Fig. 4). Then, the failure-affected MN generates a uniqueexternal home address by using the address of the first preferableexternal HA and itself address. Note that the Mobile IPv6 protocolallows an MN to have more than one home addresses. The detailedaddress generation process will be given in next subsection. Next,the failure-affected MN performs a new binding update to the firstpreferable external HA by using the address pair (the generatedexternal home address, the current care-of-address). If the firstpreferable external HA is in the faulty or overloading status (see(6) of Fig. 4), the binding update will be unsuccessful. In such case,the failure-affected MN will select next preferable external HA torepeat the above same operations until a suitable preferable exter-nal HA is met. Fig. 5 illustrates the detailed steps for toleratingmultiple HA failures. After meeting a suitable preferable externalHA, the failure-affected MN also needs to perform a binding updatewith each of its communicating CNs (see (7) of Fig. 4) by using thenew address pair (the generated external home address, the cur-rent care-of-address). The above binding update is considered forthe route optimization between the MN and the CN (see Section2.2).

3.2. New home address generation

In the proposed approach, each failure-affected MN is requiredto generate a new home address for being served by its selectedpreferable external HA. The new home address is generated as fol-lows: first, the prefix of the preferable external HA is used as theprefix of the new home address. Then, the suffix of the new homeaddress is based on RFC 4291 (Hinden and Deering, 2006) and RFC3041 (Narten and Draves, 2001) to generate it. If the failure-af-fected MN has a MAC address, the MAC address is transformedas the suffix of its new home address (Hinden and Deering,2006). If the failure-affected MN has not the MAC address, it usesthe MD5 hash function to generate the suffix of the new home ad-dress (Narten and Draves, 2001). Note that MD5 is usually imple-mented in an IPv6 node (Touch, 1995), which is used in theauthentication for computing the digest of a message. By combin-ing the generated prefix and suffix, the new home address of theMN is obtained. Then, the failure-affected MN will perform dupli-

1. When an MN detects a failure in

its default HA

2. Selecting apreferable external HA

3. Generating a uniqueexternal home address

(see Fig. 5 steps 3 and 6)

4. Performing a bindingupdate to the selected

preferable external HA

5. Success

7. Performing a bindingupdate with each

communicating CN

No

6. In the faulty oroverloading status

Yes

Fig. 4. The fault-tolerant process of the HA.

CN1

CN2

CN3

MN: Mobile Node CN: Correspondent Node HA: Home Agent

Default HA 1

3ffe:200:8:1::1HA2

3ffe:200:8:2::1HA3

3ffe:200:8:3::1

Failure Failure

MN

HA2

HA3

::

The preferable exteranl HAs

1. Binding update 7. Binding update

4. Binding update 5. Binding update failure

2. Binding updatefai

lure

3. Select ing preferable external HA2 andgenerat ing external home address

3ffe:200:8:2:258:a6ff:fed3:742 (based on RFC 4291)

6. Select ing preferable external HA3 andgenerat ing external home address

3ffe:200:8:3:258:a6ff:fed3:742 (based on RFC 4291)

8. Binding acknowledgement

9. Binding update witheach communicating CN

MAC address0058:a6d3:0742

MAC address of MNprefix of HA2

MAC address of MNprefix of HA3

Fig. 5. Example of two HA failures in the system.

J.-W. Lin, M.-F. Yang / The Journal of Systems and Software 82 (2009) 1434–1446 1439

Page 8: Author's personal copy140.136.149.173/jwl/TeachingActivities/Computer Fault Tolerant/100... · 2001; CiscoCo. Ltd., 2002; Faizan et al.,2005, 2006) is based on the hardware redundancy

Author's personal copy

cate address detection (DAD) (Narten and Draves, 2001) for thegenerated new home address. In Soliman (2004), it claimed thatthe generated new home address has a very small duplicate prob-ability with other home addresses. If the generated new home ad-dress is already in use, its suffix-portion address is re-generated byusing a random number.

3.3. Lost packet recovery

The packet delivery from a CN is also affected by the HA failure.This subsection mainly solves the problem that a CN can continu-ously send packets to a failure-affected MN when an HA fails. Theroute optimization is supported in the Mobile IPv6. Therefore,unlike the Mobile IPv4, if an HA fails, the packets destined to a fail-ure-affected MN are affected only when a CN would like to send apacket to the MN, but it has not the MN’s binding information in itsbinding cache. In this situation, the packet will be forwardedthrough the faulty HA, and it will be lost. Furthermore, if the ser-vice session between the MN and the CN is TCP, the packet cannotbe retransmitted successfully unless the faulty HA is recovered. Torecover lost packets and avoid long recovery latency, the intercon-nection router that is located within the network domain of thefaulty HA will be used to assist the packet delivery recovery. Asmentioned in Section 2.3, when an HA fails, the CN will receivean ICMP error message from the interconnection router. To makethe undeliverable packet be sent to the desired failure-affectedMN, the proposed approach first asks the interconnection routerto keep track the sent ICMP error messages for the faulty HA. Thesetracking messages will form an undeliverable packet list (see (2) ofFig. 6).

The proposed approach also asks the CN to store the undeliver-able packets in its buffer (see (4) of Fig. 6). The buffer mechanism isalso used to perform the packet recovery. To clearly understand thelost packet recovery of the proposed approach, Fig. 6 illustratesthe assistance of the undeliverable packet list and buffer as well asthe formats of these two data structures. With the support of thetwo data structures, the recovery process of the undeliverable pack-ets can be integrated into the binding update, as shown in Fig. 7.

Based on the above description, when a default HA fails, eachfailure-affected MN will select its preferable failure-free HA as

the new serving HA to perform its binding update. For recoveringthe lost packets at the faulty HA, the proposed approach asks thefailure-affected MN to additionally send a binding update messageto the faulty default HA in addition to the new serving HA (see (1.a)and (1.b) of Fig. 7). The binding update message to the faulty HA isalso first received by the corresponding interconnection router ofthe faulty HA. In such case, the interconnection router checkswhether there are undeliverable packets addressed to the failure-affected MN by using its undeliverable packet list (see (2.b) ofFig. 7). If so, the interconnection router sends an ICMP error mes-sage containing the relative CN list (see (3) of Fig. 7). Note thatthe ICMP message has a general message body field to includethe required information with a variable length (Conta et al.,2006). The relative CN list is a list of addresses of relative CNs.

After receiving the ICMP error message containing the relative CNlist, the failure-affected MN can know the relative CNs which previ-ously sent undeliverable packets to it. Next, the failure-affected MNsolicits the relative CNs to re-send the undeliverable packets. Here,the retransmission of the undeliverable packets can be integratedinto the binding updates between the failure-affected MN and rela-tive CNs. This is done by the failure-affected MN to send solicitedbinding update messages with its previous home address to the rel-ative CNs (see (4) of Fig. 7). In addition to performing the binding up-date, the relative CNs also find the undeliverable packets destined tothe failure-affected MN from their respective undeliverable packetbuffers based on the previous home address of the failure-affectedMN (see (5) of Fig. 7). Then, the found undeliverable packets are di-rectly sent to the failure-affected MN. Since the failure-affected MNhas registered its new binding information (the new external homeaddress, the care-of-address) on the relative CNs, if such CNs wouldlike to send packets to the failure-affected MN again, the packets canbe directly sent to the failure-affected MN without passing throughthe faulty default HA.

3.4. Failure recovery

For recovering the lost packet at the faulty HA, the failure-af-fected MN additionally sends the binding update message to itsfaulty default HA when it performs a binding update. Therefore,when a faulty HA is recovered from failure, each failure-affected

CN

Network Domain1

IR1

Network Domain3

IR3

MN

AR

2. Tracking the ICMP error message for the faulty HA1

Network Domain2

Faultydefault HA1

Failure

IR2

4. Storing the undeliverable packets

The undeliverable packet list

Destinationaddress

Home addressof MN

::

Sourceaddress

CN

::

The undeliverable packet buffer

Undeliverable packet

Packet

::

Destinationaddress

Home addressof MN

::

MN: Mobile Node AR: Access Router HA: Home Agent IR: Interconnection Router CN: Correspondent Node

1. Packet

3. ICMP error message

Fig. 6. The assistance of the undeliverable packet list and buffer.

1440 J.-W. Lin, M.-F. Yang / The Journal of Systems and Software 82 (2009) 1434–1446

Page 9: Author's personal copy140.136.149.173/jwl/TeachingActivities/Computer Fault Tolerant/100... · 2001; CiscoCo. Ltd., 2002; Faizan et al.,2005, 2006) is based on the hardware redundancy

Author's personal copy

MN can detect this recovery event since its binding update to thedefault HA is successful, not failure. In such case, the failure-af-fected MN will receive a binding acknowledgment message fromthe recovered default HA. Next, the failure-affected MN uses its ori-ginal home address to be served by the default HA, not the prefer-able external HA.

4. Evaluation

This section evaluates the failure-free and fault-tolerant over-heads of the proposed approach. The recovery overhead of the lostpackets is also evaluated in this section.

4.1. Failure-free overhead

In the proposed approach, each MN needs to maintain the pref-erable external HA information during the failure-free period fortolerating the HA failure. The cost for maintaining the preferableexternal HA information is counted into the failure-free overhead.In the proposed approach, the information maintenance is inte-grated into the binding update. When an MN moves to a new for-eign network domain, the information about the new preferableexternal HAs is attached on the binding acknowledgment message.In reality, the additional size of this message is dependent on thenumber of the attached external HAs. However, if the preferableexternal HA information is converted into a meta-address forminformation (see Section 3.1), the size of the attached informationis only 18 bits. As for the increasing transmission delay of the ex-tended binding acknowledgment message (the binding acknowl-edgment message with the external HA information), it isdependent on the transmission delay of the attached external HAinformation. By simulation experiments, the transmission delayof an extended binding acknowledgment is almost same as thatof the normal binding acknowledgment message (see Section 6).

From the above description, we can know that the failure-freeoverhead of the proposed approach is very small.

4.2. Fault-tolerant overhead

The proposed approach is based on the workload redirection totolerate the HA failure. In Lin and Arul (2003), we have mentionedthat the fault-tolerant overhead of the redirection-based scheme ismainly determined on the performance degradation of a failure-free HA and the transmission latency of the fault-tolerant controlmessages (the transmission latency of the control messages issuedfor fault tolerance). Similar to Lin and Arul (2003), the performancedegradation of a failure-free HA is also represented as the increas-ing blocking probability of a failure-free HA. Unlike the approach ofLin and Arul (2003), the proposed approach is from the MN view-point to tolerate the faulty HA. The increasing blocking probabilityformula in Lin and Arul (2003) can be directly applied in the pro-posed approach, which is derived as follows. Before an HA failure,the blocking probability that an HA has no sufficient binding en-tries for a working responsible MN can be derived by using theM/G/c/c queuing model (Gross and Harris, 1985):

PBefore Blocking ¼

kHA

lHA

� �cHA

cHA!

XcHA

i¼0

kHA

lHA

� �i

i!

ð1Þ

where kHA is the arrival rate of working MNs, 1lHA

is the mean servicetime of a working MN, and cHA is the number of binding entries inan HA.

After an HA fails, the average number of the failure-affectedMNs in the faulty HA can be also obtained by using the M/G/c/cqueuing model (Gross and Harris, 1985), as follows:

Newserving HA 2

1.a. Binding update

2.a. Binding acknowledgement

2.b. Finding the undeliverable packet destined to the MN

5. Updating the binding information of the MN and finding the undeliverable

packets destined to the MN

Network Domain2

Faultydefault HA1

Failure

IR2

Network Domain4

IR4

Network Domain3

IR3

Network Domain1

CN

IR1

MN: Mobile Node AR: Access Router HA: Home Agent IR: Interconnection Router CN: Correspondent Node

MN

AR

3. Sending an IC

MP erro

r messa

ge

with th

e relat

ive CN list

1.b. Binding update

4. Solic ited binding update

with previous hom

e address

6. Re-sending the previous

undeliverable packet destined to the MN

Fig. 7. Recovery of the lost packets due to the HA failure.

J.-W. Lin, M.-F. Yang / The Journal of Systems and Software 82 (2009) 1434–1446 1441

Page 10: Author's personal copy140.136.149.173/jwl/TeachingActivities/Computer Fault Tolerant/100... · 2001; CiscoCo. Ltd., 2002; Faizan et al.,2005, 2006) is based on the hardware redundancy

Author's personal copy

NMNs ¼XcHA

n¼0

n�

kHA

lHA

� �n

n!

XcHA

i¼0

kHA

lHA

� �i

i!

ð2Þ

These failure-affected MNs will respectively select their preferablefailure-free HAs to serve them. For a failure-free HA, it may servea portion of failure-affected MNs f � NMNs, where f is the fractionof the failure-affected MNs to be served by the failure-free HA.However, the proposed approach also considers the overloading sit-uation. The number of the failure-affected MNs to be served by afailure-free HA is

NServing Affected-MNs ¼Minimumðf � NMNs;MaximumðNThreshold MNs

� NMNs;0ÞÞ ð3Þ

where NThreshold_MNs is the overloading threshold of an HA. If thenumber of MN served by the HA is greater than or equal to thisthreshold, the HA is regarded to be in the overloading status. In suchcase, no failure-affected MNs can be served by the overloading HA.

During the HA failure period, the number of failure-affectedMNs in Eq. (3) will additionally be served by a failure-free HA.For a failure-free HA, the number of the binding entries used forserving its responsible MNs becomes fewer. Therefore, the newblocking probability of a failure-free HA becomes:

PAfter Blocking ¼

kHA

lHA

� �ðcHA�NServing Affected-MNs Þ

ðcHA � NServing Affected-MNsÞ!

XðcHA�NServing Affected-MNsÞ

i¼0

kHA

lHA

� �i

i!

ð4Þ

From (1) and (4), the increasing blocking probability can be evalu-ated as

PHA Blocking ¼

kHA

lHA

� �ðcHA�NServing Affected-MNs Þ

ðcHA � NServing Affected-MNsÞ!

XðcHA�NServing Affected-MNsÞ

i¼0

kHA

lHA

� �i

i!

kHA

lHA

� �cHA

cHA!

XcHA

i¼0

kHA

lHA

� �i

i!

ð5Þ

For the aspect of the fault-tolerant control messages, The existingcontrol messages of the Mobile IPv6 are used to be the fault-toler-ant control messages, as follows:

� Binding update to an external HA.� Binding acknowledgment from an external HA.

The first fault-tolerant control message is for the failure-affected MN to register its binding information with its preferableexternal HA. The second fault-tolerant control message is thecorresponding binding acknowledgment message. As described inSection 3, the failure-affected MN may select the preferable exter-nal HA several times until a suitable external HA is met. Therefore,the above two fault-tolerant messages may be issued severaltimes. The probability that a failure-affected MN will send i timesthe first (second) fault-tolerant control message is

ðPHA Failure þ PHA OverloadÞi�1ð1� ðPHA Failure þ PHA OverloadÞÞ ð6Þ

where PHA_Failure is the probability that an HA is in the faulty statusat an instant of time. PHA_Overload is the overloading probability.

From (6), we know that the transmission times of the first (sec-ond) fault-tolerant control message follow a geometric distribu-

tion. Therefore, the average number of sending the first (second)fault-tolerant control messages is:

11� ðPHA Failure þ PHA OverloadÞ

ð7Þ

From (7), the transmission latency of the issued first (second) con-trol messages can be further inferred as

11� ðPHA Failure þ PHA OverloadÞ

� d ð8Þ

where d is the average transmission delay for sending a binding up-date message (binding acknowledgment message) from an MN (HA)to an HA (MN).

Note that the above fault-tolerant control messages are sentfrom the failure-affected MN or the HA, not from the OAM center.Unlike the approach of Lin and Arul (2003), the transmission la-tency of the fault-tolerant control messages is not dependent onthe bandwidth of the OAM network. In Lin and Arul (2003), it as-sumed that the OAM network can provide a high bandwidth forsending the fault-tolerant control messages, and it did not pre-cisely evaluate the transmission latency of the fault-tolerant con-trol messages. In addition, Lin and Arul (2003) also mentionedthat the transmission latency of the issued fault-tolerant controlmessages is mainly dependent on the number of failure-affectedMNs due to collecting their binding information. In contrast, theproposed approach can make each failure-affected MN simulta-neously send and receive the first and second fault-tolerant controlmessages to select its preferable external HA as the fault-tolerantHA. The transmission latency of the issued fault-tolerant controlmessages is independent of the number of the failure-affectedMNs.

4.3. Packet recovery overhead

To enable a failure-affected MN to retrieve the undeliverablepackets destined to it, the interconnection router and the CN needto track and store the undeliverable packets by using the undeliv-erable packet list and buffer, respectively (see Section 3.3). Therecovery overhead of the undeliverable packets can be measuredas the memory spaces used to track and store the undeliverablepackets. In Section 3.3, it also mentioned that the undeliverablepackets will be retrieved by the corresponding failure-affectedMNs whenever they periodically perform a binding update to thefaulty HA (see Fig. 7). For an undeliverable packet, its stay timein an undeliverable packet list and that in the undeliverable packetbuffer do not excess the interval of the corresponding MN’s twobinding updates. After the time elapsed, the occupied spaces inthe list and buffer can be used to track and store another undeliv-erable packet. Therefore, the memory spaces (MemUP_List) requiredfor an undeliverable packet list can be derived from the number ofpackets sent to the faulty HA during the binding update interval, asfollows:

MemUP List ¼ IntBA � kPacket HA � EUP List ð9Þ

where IntBA is the time interval between two binding updates to thefaulty HA, kPacket_HA is the arrival rate of packets to an HA, andEUP_List is the size (the number of bytes) of an undeliverable packetlist entry.

Similarly, the memory spaces (MemUP_Buffer) required for anundeliverable packet buffer can be also derived from the numberof packets sent from a CN to failure-affected MNs during the bind-ing update interval, as follows:

1442 J.-W. Lin, M.-F. Yang / The Journal of Systems and Software 82 (2009) 1434–1446

Page 11: Author's personal copy140.136.149.173/jwl/TeachingActivities/Computer Fault Tolerant/100... · 2001; CiscoCo. Ltd., 2002; Faizan et al.,2005, 2006) is based on the hardware redundancy

Author's personal copy

MemUP Buffer ¼ IntBA � kPacket CN � PUP Buffer ð10Þ

where kPacket_CN is the sending rate of packets from a CN, and PUP_-

Buffer is the average size (the average number of bytes) of an unde-liverable packet sent from a CN.

5. Comparison

As described in Section 1, the traditional fault-tolerant ap-proaches for Mobile IP are classified into the redundancy-basedand redirection-based schemes. The comparison between the twoschemes have been made in Lin and Arul (2003). The proposed ap-proach and the approaches of Pack and Choi (2004) and Lin andArul (2003) belong to the redirection-based scheme. Note thatthe approach of Pack and Choi (2004) mainly handles the fault tol-erance of the MAP. In this section, we make detailed comparisonsamong the above three approaches, as shown in Table 1. Thecomparisons are in terms of fault-tolerant support, failure-freeoverhead, fault-tolerant overhead, fault-tolerant capability, andpacket recovery.

� Fault-tolerant support: For the approach of Pack and Choi(2004), the MAP failure is tolerated by assuming that there aretwo or more MAPs in a network domain. With the aspect ofthe HA, it clearly indicated that HA contains very importantbinding information and its fault-tolerant approach shouldadopt the hardware redundancy. In the approach of Lin and Arul(2003), the OAM center is used to assist the workload redirec-tion of a faulty HA. With the OAM center support, the workloadof the faulty HA is distributed to multiple approximate failure-free HAs. As for the proposed approach, the fault tolerance isfrom the MN’s viewpoint. If an HA fails, the failure-affectedMNs individually select their preferable failure-free HAs to servethem. The OAM center is not involved to collect the workload ofall the HAs for supporting the fault tolerance. In addition, theproposed approach is neither dependent on the hardwareredundancy support.

� Failure-free overhead: For the approach of Pack and Choi (2004),an MN chooses two serving MAP while it moves to a foreign net-work domain. Then, two binding updates are performed to thetwo MAPs: the primary and secondary MAPs. The secondaryMAP is only active when the primary MAP fails. With the HAaspect, its failure-free overhead is the binding synchronizationbetween the primary HA and its redundant HA since the hard-ware redundancy is used. For the approach of Lin and Arul(2003), it does not take any actions against the HA failure duringthe failure-free period. For the proposed approach during thefailure-free period, it requires to generate an up-to-date externalHA information for each MN. The generation of the up-to-dateexternal HA information can be integrated into the bindingupdate. The increasing transmission delay on the binding updateis trivial, which will be further validated in Section 6. As for thedata structure used for the external HA information, it only takes18 bits (see Section 3.1: meta-address form).

� Failure-tolerant overhead: For the approach of Pack and Choi(2004), it focuses on the fault tolerance of the MAP, not theHA. For making the secondary MAP take over the faulty primaryMAP, each failure-affected MN needs to perform the bindingupdates to its HA and in communicating CNs for changing itsdefault MAP as the secondary MAP. With the fault-tolerant over-head of the HA, it is determined by the switchover time since thehardware redundancy is adopted to tolerate the HA failure. Forthe approach of Lin and Arul (2003), the performance of a fail-ure-free HA incurs certain degree degradation since some work-load of the faulty HA is redirected to it. However, the workloadredirection in the approach of Lin and Arul (2003) is based onthe OAM center to issue several OAM control messages (see Sec-tion 5.2 of Lin and Arul (2003)). The cost of the issued OAM con-trol messages is not trivial. Especially, the restoration of thefaulty HA’s binding information involves a lot of time-consum-ing operations (see Section 5.2 of Lin and Arul (2003)). Anothermain fault-tolerant overhead for the approach of Lin and Arul(2003) is the twice-tunneling transmission. The twice-tunnelingtransmission is used to redirect the packet transmission of thefaulty HA. It may cause the triangle routing problem to be moreserious. In contrast to the above two approaches, the proposedapproach neither depends on the OAM center nor the hardwareredundancy support. For the fault-tolerant overhead of the HA,its cost includes the performance degradation and the time forfinding a suitable external HA, which have been evaluated inSection 4.2 (see Eqs. (5) and (8)).

� Fault-tolerant capability: For the approach of Pack and Choi(2004), its fault-tolerant range is restricted in a network domain.The faulty MAP (HA) can be tolerated by the secondary MAP (theredundant HA) in the same network domain. If no failure-freeMAPs (redundant HAs) in the same network domain, the fail-ure-free MAPs (redundant HAs) in other network domains can-not be used to achieve fault tolerance. The approach of Lin andArul (2003) and the proposed approach have the same fault-tol-erant capability. The workload redirection range is extended tothe whole network system. If an HA fails, the failure-affectedMNs can be served by anyone failure-free HA in the networksystem. However, the approach of Lin and Arul (2003) cannotbe applied to Mobile IPv6 network system since it uses the for-eign agent (FA) to assist the binding information restoration, butthe FA does not exist in Mobile IPv6.

� Packet recovery: If an HA fails, the packets through it will be lost.The approach of Pack and Choi (2004) does not mention how toretransmit the lost packets. In the approach of Lin and Arul(2003), it clearly indicated that the lost packets due to the HAfailure can be recovered only if the end-to-end TCP transmissionmode is supported in the MNs and CNs (see Section 2.3 of Linand Arul (2003)). As for the proposed approach, the recovery

Table 1Comparison

ComparisonMetrics

Approach of [8] Approach of [9] Proposedapproach

Fault-tolerantsupport

MAP Multiple MAPsin a domain

HA OAM center HA No

HA Hardwareredundancy

Failure-freeoverhead

MAP One additionalbindingupdate

HA No HA Generatingthe externalHAinformationHA Binding

synchronizationFailure-

tolerantoverhead

MAP Binding updateexecution

HA OAMassistancePerformancedegradation,Bindingrestoration,and Twicetunneling

HA Performancedegradationand Findingthe suitableexternal HA

HA Switchoverexecution

Fault-tolerantcapability

MAP Number ofavailable MAPsin a networkdomain

HA Number ofavailable HAsin thenetworksystem

HA Number ofavailableHAs in thenetworksystemHA Number of the

redundant HAsin a networkdomain

Packetrecovery

No End-to-end TCP Binding updateassistance

J.-W. Lin, M.-F. Yang / The Journal of Systems and Software 82 (2009) 1434–1446 1443

Page 12: Author's personal copy140.136.149.173/jwl/TeachingActivities/Computer Fault Tolerant/100... · 2001; CiscoCo. Ltd., 2002; Faizan et al.,2005, 2006) is based on the hardware redundancy

Author's personal copy

of the lost packets is done by the binding update assistance (seeSection 3.3) without relying on the TCP support, which avoidsincurring the long recovery latency.

6. Experimental evaluation

To quantify the overhead comparisons among the proposed ap-proach and the approaches of Pack and Choi (2004) and Lin andArul (2003), we performed simulation experiments which arebased on the given Mobile IPv6 module of the Network Simulatorversion 2 (NS-2) (Mobiwan, 2002). In the simulation environment,a Mobile IPv6 network system with 10 network domains is first de-ployed. Then, each network domain is equipped with a dedicatedHA and three ARs. Among the network domains, the bandwidthand link delay of the backbone network are set to 100 Mbps and10 ms, respectively. Then, 100 MNs are randomly distributed with-in the 10 network domains. For the mobility aspect of each MN, itis simulated based on the random waypoint model (Broch et al.,1998). In addition, 10 CNs are also set in the Mobile IPv6 networksystem to communicate with MNs by sending the consistent bitrate (CBR) traffic. Each CN periodically generates a 512-bytes pack-et by randomly choosing one of the following three intervals: 0.1 s,0.05 s, and 0.001 s, and then sends the generated packet to one cor-responding MN. Each packet of the CBR traffic adopts the UDPtransport protocol. The time of the simulation run is 5000 s. Duringthe simulation time, the HA failure is randomly occurred. In addi-tion, we also perform another simulation run with 5000 s by usingTCP traffic. For generating the TCP traffic, each CN performs the FTPapplication to transmit a data file with a random size to an MN.

In the above two simulation runs, we concern the failure-freeoverhead, fault-tolerant overhead, and packet recovery overhead.Based on Section 5, we can know that only the packet recoveryoverhead is dependent on the data traffic type and rate used insimulation experiments. For the failure-free overhead and fault-tolerant overhead, they are independent of the data traffic typeand rate used since the two overheads are introduced due to issu-ing some control messages.

6.1. Overhead for tolerating faulty HA

Fig. 8a shows the comparison of the failure-free overhead. Asmentioned above, the fault-tolerant approach of Pack and Choi(2004) for the HA adopts the redundancy-based approach. It needsto perform the binding synchronization during the failure-free per-iod. The binding synchronization will take more time while per-forming the binding update. Correspondingly, the transmissiondelay of the binding acknowledgment message also increases. Forthe approach of Lin and Arul (2003), it does not take any actionsagainst the HA failure during the failure-free period. However,the approach of Lin and Arul (2003) incurs a non-trivial fault-toler-ant overhead. In addition, this approach is not really suitable to beused in the Mobile IPv6 network architecture. As for the proposedapproach, the failure-free overhead is mainly determined by thenumber of the external HAs (backup HAs) attached on the bindingacknowledgment message. Although the large number of externalHAs can enhance the fault-tolerant capability, it also correspond-ingly increases the transmission delay of the binding acknowledg-ment message. However, if the addresses of the attached externalHAs are represented as the meta-address form, the increasingtransmission delay of the binding acknowledgment message isindependent of the fault-tolerant capability. In addition, the aver-age value is also very small as 7.7 ls. In Fig. 8a, the three ap-proaches are all based on the heartbeat messages mechanism todetect the HA failure. In Fig. 8b, we also show the introduced fail-

ure-free overhead of the heartbeat messages by setting the heart-beat interval from 0.1 to 1 s with a step of 0.1 s. In Hinden (2004),the default heartbeat interval is 1 s. The size of the heartbeat mes-sage is set to 64 bytes. As shown in Fig. 8b, even if the heartbeatinterval is very small as 0.l s, the bandwidth consumed by theheartbeat messages is 5 Kbps, which occupies 0.005% (5 Kbps/100 Mbps) of total bandwidth. The heartbeat messages introducea little failure-free overhead.

The comparison of the fault-tolerant overhead is shown inFig. 9. As mentioned in Section 4.2, the fault-tolerant overheadconcerns the following two sub-metrics: the increasing blockingprobability and the transmission latency of the fault-tolerant con-trol messages. In the aspect of the increasing blocking probability(see Fig. 9a), the approach of Pack and Choi (2004) has a very smallvalue since it adopts the redundancy-based scheme to equip aredundant HA for each HA. However, the approach of Pack andChoi (2004) has the expensive hardware cost. For the approachof Lin and Arul (2003) and the proposed approach, they are basedon the redirection-based scheme to redirect the workload of thefailure-affected MNs to failure-free HAs. However, the approachof Lin and Arul (2003) does not consider the overloading situation.Therefore, the increasing blocking probability in the approach ofLin and Arul (2003) increases as the number of failure-affectedMNs increases. Unlike the approach of Lin and Arul (2003), the pro-posed approach considers the overloading situation. For a failure-free HA, if it has been selected to be the preferable fault-tolerantHA for many failure-affected MNs, it will not act as the fault-toler-ant HA for other failure-affected MNs again. Therefore, the increas-

0.000

0.002

0.004

0.006

0.008

0.010

0.012

1 2 3 4 5 6 7 8 9The fault-tolerant capability (Number of backup HAs)

Proposed approach

IPv6 addresses

Meta-address form

Approach of [8]

Approach of [9]

0.0

1.0

2.0

3.0

4.0

5.0

6.0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Heartbeat interval (second)

Incr

easi

ng tr

ansm

issi

on d

elay

(sec

onds

)Ba

ndw

idth

con

sum

ptio

n (K

bps)

a

b

Fig. 8. The failure-free overhead. (a) The comparison. (b) The heartbeat overhead.

1444 J.-W. Lin, M.-F. Yang / The Journal of Systems and Software 82 (2009) 1434–1446

Page 13: Author's personal copy140.136.149.173/jwl/TeachingActivities/Computer Fault Tolerant/100... · 2001; CiscoCo. Ltd., 2002; Faizan et al.,2005, 2006) is based on the hardware redundancy

Author's personal copy

ing blocking probabilities are less than a threshold value (e.g. 0.022in Fig. 9a).

For the other concerned metric of the fault-tolerant overhead,its comparison is shown in Fig. 9b. In the approach of Pack and Choi(2004), the faulty HA is switched over to the corresponding redun-dant HA by performing the ARP (Address Resolution Protocol).Although the cost of the ARP execution is trivial, the redundancy-based scheme has a limited fault-tolerant capability and anexpensive hardware cost. For the approach of Lin and Arul(2003), it performs a complicated fault-tolerant procedure whichneeds to take more control messages. As for the proposed ap-proach, it can make each failure-affected MN simultaneouslyselect its preferable failure-free HA as its new serving HA. Thetransmission latency of the fault-tolerant message is independentof the number of the failure-affected MNs, and its average valueis 0.02 s.

In addition, the proposed approach can make each failure-af-fected MN select its neighboring HA as its preferable fault-tolerantHA. To enhance this advantage, we further perform the comparisonof the packet transmission latency through the fault-tolerant HA,as shown in Fig. 10. The proposed approach can provide the small-est packet transmission latency due to selecting the preferablefault-tolerant HA. For the approach of Pack and Choi (2004), thefault-tolerant HA is the equipped redundant HA of the faulty HA.The packet transmission latency through the fault-tolerant HA isnearly same as that before the HA failure. As for the approach ofLin and Arul (2003), the fault-tolerant HA is randomly selectedfrom the failure-free HAs, the packet transmission latency hasthe largest value.

6.2. Overhead for recovering lost packets

As for the packet recovery, Pack and Choi (2004) and Lin andArul (2003) do not consider this issue in their fault-tolerant ap-proaches. Fig. 11 only shows the packet recovery overhead of theproposed approach. As mentioned in Section 4.3, the packet recov-ery overhead is in terms of the sizes of the undeliverable packet listand buffer. As shown in Fig. 11, when adopting the CBR traffic, theboth sizes increase as the binding update interval increases. Theundeliverable packet buffer is larger than the undeliverable packetlist since it is required to store the whole contents of the undeliv-erable packets. As mentioned in Section 3.3, the undeliverablepackets due to the HA failure can be retrieved later from relativeCNs when performing a binding update. Therefore, if there is alarge interval between two binding updates, more undeliverable

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10

10 20 30 40 50 60 70 80 90 100Number of failure-affected MNs

Proposed approachApproach of [8]Approach of [9]

Pack

et tr

ansm

issi

on la

tenc

y (s

econ

ds)

Fig. 10. The average packet transmission latency through the fault-tolerant HA.

0

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9 10Binding update interval (seconds)

Size

of t

he u

ndel

iver

able

pack

et li

st (K

byte

s)

CBR with the interval (0.1, 0.05, or 0.01)

TCP traffic

0102030405060708090

100110

1 2 3 4 5 6 7 8 9 10Binding update interval (seconds)

Size

of t

he u

ndel

iver

able

pack

et b

uffe

r (Kb

ytes

)

CBR with the interval (0.1, 0.05, or 0.01)

TCP traffic

a

b

Fig. 11. The packet recovery overhead. (a) The size of an undeliverable packet list.(b) The size of an undeliverable packet buffer.

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

10 20 30 40 50 60 70 80 90 100 Number of failure-affected MNs

Proposed approachApproach of [8]Approach of [9]

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

10 20 30 40 50 60 70 80 90 100 Number of failure-affected MNs

Proposed approachApproach of [8]Approach of [9]

Incr

easi

ng b

lock

ing

prob

abilit

yTr

ansm

issi

on la

tenc

y (s

econ

ds)

a

b

Fig. 9. The comparison of the fault-tolerant overhead. (a) The increasing blockingprobability. (b) The latency of the fault-tolerant control message.

J.-W. Lin, M.-F. Yang / The Journal of Systems and Software 82 (2009) 1434–1446 1445

Page 14: Author's personal copy140.136.149.173/jwl/TeachingActivities/Computer Fault Tolerant/100... · 2001; CiscoCo. Ltd., 2002; Faizan et al.,2005, 2006) is based on the hardware redundancy

Author's personal copy

packets are generated during the binding update interval. Corre-spondingly, the undeliverable packet list and buffer need morememory spaces to track and store the undeliverable packets,respectively. In Mobile IPv6, the binding update interval is set be-tween 1 and 10 s (Paik et al., 2004). From Fig. 11, we can see thatthe sizes of the undeliverable packet list and buffer (6.1 Kbytesand 100.0 Kbytes) in the CBR traffic are not large even if the bind-ing update interval is set to 10 s.

When the TCP traffic is adopted in simulation experiment, thesizes of the undeliverable packet list and buffer are independentof the binding update interval. The reason is explained as follows:In the TCP protocol, it has the flow control and congestion controlmechanisms which set a window to limit the number of sendingpackets. In the simulation experiments, the window size is set to20 packets, which refers to the TCP module of Network Simulatorversion 2 (NS-2) (Mobiwan (2002)). Therefore, the maximum num-ber of undeliverable packets is 20. Correspondingly, the maximumsizes of the undeliverable packet list and buffer are 0.6 Kbytes (20packets � 32 bytes/per packet list entry) and 20 Kbytes (20 pack-ets � 1 Kbytes/per packet), respectively.

7. Conclusion

This paper has presented an efficient approach to tolerating theHA failure in a Mobile IPv6 network system. The proposed ap-proach is based on the home address regeneration to make eachfailure-affected MN be served by its preferable failure-free HA. Un-like the redundancy-based approaches (Ghosh and Varghese, 1998;Ahn and Hwang, 2001; Cisco Co. Ltd., 2002; Faizan et al., 2005,2006), redundant HAs are not required to be equipped. Comparedto the previous workload-redirection-based approaches (Pack andChoi, 2004; Lin and Arul, 2003), the proposed approach is moresuitable to be used in the Mobile IPv6 network system since it isneither dependent on the hierarchical network architecture nordependent on a centralized information provider (the centralizedOAM center).

Furthermore, the proposed approach also considers how to re-cover the lost packets due to the HA failure. The packet recoveryis not burdened on the TCP layer to avoid incurring long recoverylatency. Finally, we performed simulations to evaluate the over-heads for tolerating the faulty HA and recovering the lost packets.The simulation results show that the two overheads are small.

In addition to the fault tolerance, the security issue is alsoimportant for the Mobile IPv6 network. The proposed fault-toler-ant approach is based on the Mobile IPv6 security mechanism. Itis our future work for providing a more secure environment forthe Mobile IPv6 network.

References

Ahn, J.H, Hwang, C.S., 2001. Efficient fault-tolerant protocol for mobility agents inMobile IP. In: Proc. 15th Int’l Parallel and Distributed Processing Symp., 2001,pp. 1273–1280.

Broch, J., Maltz, D.A., Johnson, D.B., Hu, Y.-C., Jetcheva, J.G., 1998. A performancecomparison of Mmulti-hop wireless ad hoc network routing protocols. In: Proc.ACM Int. Conf. Mobile Computing Networking (MOBICOM), pp. 85–97.

Cisco Co. Ltd., 2002. Mobile IP home agent redundancy. Available from <http://www.cisco.com/en/US/products/sw/iosswrel/ps1830/products_feature_guide09186a0080 087846.html>.

Conta, A., Deering, S., Gupta, M. (Eds.), 2006. Internet Control Message Protocol(ICMPv6) for the Internet Protocol Version 6 (IPv6). RFC 4443.

Deering, S., Hinden, R., 1998. Internet protocol, Version 6 (IPv6) specification. RFC2460.

Faizan, J., El-Rewini, H., Khalil, M., 2005. VHARP: Virtual home agent reliabilityprotocol for Mobile IPv6 based networks. In: Proc. International Conf. onWireless Networks, Communications, and Mobile Computing, Wireless Com2005, Hawaii, USA.

Faizan, J., El-Rewini, H., Khalil, M., 2006. Introducing reliability and load balancingin home link of Mobile IPv6 based networks. In: Proc. of IEEE InternationalConference on Pervasive Services – ICPS 2006, Lyon, France.

Fathi, Hanane, Shin, SeongHan, Kobara, Kazukuni, Chakraborty, Shyam, Imai, Hideki,Prasad, Ramjee, 2005. Leakage-resilent security architecture for Mobile IPv6 inwireless overlay networks. IEEE Journal on Selected Areas in Communications(J-SAC) 23 (11), 2182–2193.

Ghosh, Rajib, Varghese, George, 1998. Fault-Tolerant Mobile IP. WashingtonUniversity Technical Report WUCS-98-11, 1998.

Gross, D., Harris, C.M., 1985. Fundamentals of Queueing Theory. John Wiley & SonsInc..

Hinden, R. (Ed.), 2004. Virtual router redundancy protocol (VRRP). RFC 3768.Hinden, R., Deering, S., 2006. IP Version 6 addressing architecture. RFC 4291.Jelger, C., Noel, T., 2005. Proactive address autoconfiguration and prefix continuity

in IPv6 hybrid ad hoc networks. IEEE SECON 2005, 107–1175.Johnson, D., Perkins, C., 2004. Mobility support in IPv6. RFC 3775.Kent, S., Seo, K., 2005. Security architecture for the internet protocol. RFC

4301.Khalil, M., 2002. Virtual distributed home agent protocol (VDHAP). US Patent 6 430

698.Kuo, Jen-Hao, Te, Siong-Ui, Liao, Pang-Ting, Huang, Chun-Ying, Tsai, Pan-Lung, Lei,

Chin-Laung, Kuo, Sy-Yen, Huang, Yennun, Tsai, Zsehong, 2005. An evaluation ofthe virtual router redundancy protocol extension with load balancing. In: IEEE11th Pacific Rim International Symposium on Dependable Computing,Changsha, Hunan, China.

Li, T., Cole, B., Morton, P., Li, D., 1998. Cisco Hot Standby Router Protocol(HSRP).

Lin, Jenn-Wei, Arul, Joseph, 2003. An efficient fault-tolerant approach for Mobile IPin wireless systems. IEEE Transactions on Mobile Computing.

Lu, Weidong, Lo, Anthony, Niemegeers, Ignas, 2005. Session mobility support forpersonal networks using Mobile IPv6 and VNAT. In: Fifth Workshop onApplications and Services in Wireless Networks (ASWN’05), Paris, France.

Mobiwan, 2002. ns-2 extensions to study mobility in Wide-Area IPv6 Networks.Available from <http://www.inrialpes.fr/planete/pub/mobiwan>.

Narten, T., Draves, R., 2001. Privacy extensions for stateless address autocon-figuration in IPv6. RFC 3041.

Pack, Sangheon, Choi, Yanghee, 2004. Performance analysis of robust hierarchicalMobile IPv6 for fault-tolerant mobile services. IEICE Transactions onCommunications E87-B (5), 7.

Paik, Eun Kyoung, Cho, Hosik, Ernst, Thierry, Choi, Yanghee, 2004. Design andanalysis of resource management software for in-vehicle IPv6 networks. IEICETransactions on Communications E87-B (7).

Plummer, D.C., 1982. Ethernet address resolution protocol: or converting networkprotocol addresses to 48.bit ethernet address for transmission on ethernethardware. RFC 826.

Ren, Kui, Lou, Wenjing, Zeng, Kai, Bao, Feng, Zhou, Jianying, Deng, Robert H., 2006.Routing optimization security in mobile IPv6. Computer Networks 50 (13),2401–2419.

Soliman, Hesham, 2004. Mobile IPv6 Mobility in a Wireless Internet, Addison-Wesley.

Srivastava, S., 2003. Redundancy management for network devices. In: The 9thAsia-Pacific Conference On, pp. 1157–1162.

Takahashi, Takeshi, Harju, Jarmo, Tominaga, Hideyoshi, 2003. Handovermanagement in wireless networks based on buffering and signaling. In:Nineth EUNICE Open European Summer School and IFIP Workshop on NextGeneration Networks, Budapest, Hungary.

Touch, J., 1995. Report on MD5 performance. RFC 1810.

Jenn-Wei Lin received the M.S. degree in computer and information science fromNational Chiao Tung University, Hsinchu, Taiwan, in 1993, and the Ph.D. degree inelectrical engineering from National Taiwan University, Taipei, Taiwan, in 1999. Heis currently an Associate Professor in the Department of Computer Science andInformation Engineering, Fu Jen Catholic University, Taiwan. He was a researcher atChunghwa Telecom Co., Ltd., Taoyuan, Taiwan from 1993 to 2001. His currentresearch interests are fault-tolerant computing, mobile computing and networks,and distributed systems.

Ming-Feng Yang received the M.S. degree in the Department of Computer Scienceand Information Engineering from Fu Jen Catholic University in 2005, and is cur-rently working towards Ph.D. degree in the Graduate Institute of Applied Scienceand Engineering at Fu Jen Catholic University. His current research interests arefault-tolerant computing, mobile computing and networks.

1446 J.-W. Lin, M.-F. Yang / The Journal of Systems and Software 82 (2009) 1434–1446