c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of...

16
c 2008 IEEE. Personal use of this material is permitted. How- ever, permission to reprint / republish this material for advertis- ing or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copy- righted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dis- semination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright hold- ers. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these works may not be reposted without the ex- plicit permission of the copyright holder.

Transcript of c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of...

Page 1: c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of semantic similarity, stable connectivity, and global semantic knowledge. INGA [14] exploits

c©2008 IEEE. Personal use of this material is permitted. How-

ever, permission to reprint / republish this material for advertis-

ing or promotional purposes or for creating new collective works

for resale or redistribution to servers or lists, or to reuse any copy-

righted component of this work in other works must be obtained

from the IEEE. This material is presented to ensure timely dis-

semination of scholarly and technical work. Copyright and all

rights therein are retained by authors or by other copyright hold-

ers. All persons copying this information are expected to adhere

to the terms and constraints invoked by each author’s copyright.

In most cases, these works may not be reposted without the ex-

plicit permission of the copyright holder.

Page 2: c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of semantic similarity, stable connectivity, and global semantic knowledge. INGA [14] exploits

1

Probabilistic Discovery of Semantically DiverseContent in MANETs

Andronikos Nedos, Kulpreet Singh, Raymond Cunningham, and Siobhan Clarke

Abstract —Mobile ad hoc networks rely on the opportunistic interaction of autonomous nodes to form networks without the use ofinfrastructure. Given the radically decentralised nature of such networks, their potential for autonomous communication is significantlyimproved when the need for a priori consensus amongst the nodes is kept to a minimum. This paper addresses an issue withinthe domain of semantic content discovery, namely, its current reliance on the preexisting agreement between the schema of contentproviders and consumers. We present OntoMobil, a semantic discovery model for ad hoc networks that removes the assumption ofa globally known schema and allows nodes to publish information autonomously. The model relies on the randomised disseminationand replication of metadata through a gossip protocol. Given schemas with partial similarities, the randomised metadata disseminationmechanism facilitates eventual semantic agreement and provides a substrate for the scalable discovery of content. A discovery protocolcan then utilise the replicated metadata to identify content within a predictable number of hops using semantic queries. A stochasticanalysis of the gossip protocol presents the different trade-offs between discoverability and replication. We evaluate the proposedmodel by comparing OntoMobil against a broadcast-based protocol and demonstrate that semantic discovery with proactive replicationprovides good scalability properties, resulting in a high discovery ratio with less overhead than a reactive non-replicated discoveryapproach.

Index Terms —Mobile ad hoc networks, distributed discovery, probabilistic algorithms, gossip protocols, semantic services.

1 INTRODUCTION

MOBILE Ad Hoc Networks (MANETs) are formedwhen autonomous mobile devices with short-

range wireless communication capabilities cooperate toprovide spontaneous connectivity. Although a challeng-ing environment, MANETs find wide applicability in sce-narios where opportunistic networking and collaborativeactivity is required, e.g., pervasive computing, mobilegames. In such a dynamic environment, the location ofrequired content (i.e., services or data) cannot be hard-wired because nodes can acquire transient addressesand networks may partition at any time. The result isthat discovery becomes an important process precedingany collaborative effort. In general, discovery attemptsto match required and available functionality by usingan appropriate representation language and suitable net-work support.

In MANETs, representation and discovery of contenthas several unique requirements: the opportunistic na-ture of the network necessitates reduced human inter-vention requiring automated discovery; node autonomyimplies that content representation is difficult to stan-dardise or to guarantee its agreement between providersand consumers; the open and dynamic nature of thenetwork requires discovery mechanisms that scale; andresource constrained devices dictate efficient distributionof discovery load. We elaborate on each of these require-ments below.

When minimum user intervention is required in mo-bile networks with no stable and permanent set of nodes,

The authors are with the Distributed Systems Group, Trinity College Dublin,Ireland, e-mail: {nedosa,singhk,racunnin,sclarke}@cs.tcd.ie.

manually identifying relevant functionality requires userinvolvement and constitutes additional discovery la-tency. Representing services or data with capabilitiesrather than names offers greater flexibility in discoveryqueries. A more expressive content description can en-courage automated discovery based on queries aboutexplicit functionality [1], rather than discovery basedon the implicit association between syntactic interfacesand content. Semantic services are a good example ofhow ontologies can facilitate a semantic interface fordiscovery and also represent the capabilities of services.

A second requirement for content discovery is thenecessary agreement on the information representationbetween providers and clients. Standardisation of in-formation schemas is one way to address this issue,though the administrative overhead is not negligibleand requires centralised infrastructure and planning. Theproblem of having a standardised representation is exac-erbated in mobile ad hoc networks when one considersthe self-contained and unpredictable environment. AsMANETs can form in places where no Internet connec-tivity is guaranteed, it follows that nodes cannot availof common representations on the Internet, and can onlyuse the knowledge of connected nodes. When ontologiesprovide the representation language, it is not reasonableto assume that information in mobile nodes will be de-scribed using commonly agreed ontologies. Rather, theuse of multiple and independently developed domainontologies is more likely. A comprehensive specificationof this problem is given in [2], where the authors use theterm emergent semantics to describe semantic consensusbased not on standardisation but on emergent behaviour.

In the distributed context envisioned by the emer-

Page 3: c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of semantic similarity, stable connectivity, and global semantic knowledge. INGA [14] exploits

2

gent semantics model, maintaining semantic interoper-ability remains a strong requirement. Even when dataand services are described by multiple and indepen-dently developed ontologies, service discovery, contentretrieval, and semantic inference should be guaranteedas if operating within a single ontology. To this end,appropriate mechanisms are required to map or translatemetadata [3], [4].

Projects like H-MATCH [5], LARKS [6], and GLUE [4]accept semantic heterogeneity and have devised tech-niques and algorithms for ontology matching, mapping,and evolution.

The additional challenges that MANETs pose to thediscovery of semantically diverse content relate to scal-ability and efficiency. Open networks with nodes thatcan be both content providers and consumers requirediscovery protocols that scale. Furthermore, mobilityand limited device resources dictate that selected tech-niques must be adaptable, configurable, and spreadload efficiently. In general, application interoperability inMANETs faces issues that require careful considerationof established assumptions such as reliance on stan-dardised syntactic interfaces or availability of commonontologies.

Current discovery architectures for mobile ad hocnetworks address the above requirements to varyingdegrees. Traditional service discovery architectures likeSun’s Jini [7] and IETF’s Service Location Protocol(SLP) [8] were not designed for MANET environments,so they scale poorly and require global knowledge ofservice templates. Other approaches have concentratedon distributed discovery protocols ( [9], [10], [11]) butlimit node autonomy by assuming a simple representa-tion language globally known by all mobile peers.

The contribution of this paper lies in the design,analysis and evaluation of a distributed discovery modelcalled OntoMobil. OntoMobil caters for mobile ad hocnetworks and semantic decentralisation. Semantic decen-tralisation is the idea that autonomous nodes can expressdata or services using different ontologies that are notagreed a priori. The model relies on the decomposition ofontologies into concepts and the replication of these con-cepts through a gossip protocol. This randomised con-cept dissemination works in tandem with a lightweightsemantic matching mechanism that is executed in eachnode to facilitate eventual semantic agreement betweendiverse ontologies and provide a substrate for the scal-able discovery of content. The actual discovery employsa random walk protocol whereby semantic queries arefirst routed to a number of random nodes and aresubsequently evaluated by a semantic reasoner at anyprovider node with a compatible ontology. A stochasticanalysis is used to predict performance and illustrate thetrade-offs between high discoverability and the overheadincurred by replication. A comparison between the an-alytical model and an implementation of OntoMobil inns2 confirms the derived analytical bounds.

The paper has the following layout. Section 2 contains

the state of the art in decentralised and mobile discoverysystems. Section 3 describes the model and protocols thatcomprise OntoMobil. Section 4 provides the stochasticanalysis of the gossip protocol and probabilistic boundsof the random walk discovery protocol, while the evalua-tion is presented in Section 5 and conclusion in Section 6.

2 RELATED WORK

The importance of content discovery in the design offlexible and adaptive applications can be seen in theproliferation of discovery architectures. This section re-views existing work with similar goals and features toOntoMobil in decentralised semantic topologies and adhoc discovery protocols.

2.1 Semantic P2P Topologies

The assumption of a centralised architecture and an in-tuitive scheme to name available resources can simplifythe discovery process. The activity of looking throughyellow pages or initiating a google query are typicalexamples of this. When centralised architectures cannotbe supported, specialised architectures and protocolsare required. A characteristic example is P2P networks.Initially, discovery in these systems was confined tonaming conventions with limited expressiveness, puttingthe emphasis on topology properties such as scalabilityand bounded routing (e.g., Chord, Pastry). Currently, anew generation of P2P networks uses semantic represen-tations coupled with decentralised topologies.

One of the first complete systems to explore the idea ofsemantic or schema-based topologies is the EDUTELLAproject [12]. The goal of EDUTELLA is the distributeddiscovery of semantic information without the use ofcommon ontologies. In terms of topology, EDUTELLA isbased on super-peers, though the clustering algorithm ituses has several flavours.

Haase et al. [13] describe another semantic P2P topol-ogy focused on efficiency. The authors start from a com-pletely random network topology and eventually derivea topology that closely matches the similarity in thesemantic knowledge of peers. The authors evaluate thehypothesis that certain semantic topologies will performmore efficiently than random topologies. A number ofassumptions underlie this hypothesis. Chief amongstthem are the assumptions of semantic similarity, stableconnectivity, and global semantic knowledge.

INGA [14] exploits query history in order to routesemantic queries in P2P networks. It accepts that acommon schema between peers is a simplistic assump-tion and assumes that peers describe content with het-erogeneous metadata. Since INGA does not define anexact topology, it selects a number of peers that arelikely to contain relevant results through observationand recording of meta-information from user queries.This progressive acquisition of knowledge is a commoncharacteristic between INGA and OntoMobil, though thetwo systems use very different methods.

Page 4: c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of semantic similarity, stable connectivity, and global semantic knowledge. INGA [14] exploits

3

OntoMobil differs from existing semantic topologies inits assumptions, its topology and the guarantees it pro-vides. Firstly, OntoMobil is targeted towards mobile adhoc networks, which are inherently dynamic, composedof autonomous nodes and characterised by transient con-nectivity. These conditions prevent solutions that rely onstable connections and global semantic knowledge. Sec-ondly, OntoMobil builds an unstructured semantic over-lay by maintaining a replicated set of concepts. Althoughthe OntoMobil topology is not intended to be a general-purpose P2P topology, it can still be classified accordingto the SIL model [15] as a graph with forwarding searchand forwarding index links. What further differentiates theOntoMobil topology in relation to other general-purposecontent addressable topologies, e.g., PlanetP [16], is theuse of multiple weakly consistent replicated indexes inthe form of concepts instead of a single index. Finally,the use of epidemic updates to maintain concepts weaklyconsistent has the main benefits of performance, simplic-ity and an algorithm amenable to an analytical modeland predictable guarantees.

2.2 Service Discovery in MANETs

Despite similarities between peer-to-peer and mobile adhoc networks, there are a number of important differ-ences. In an environment where mobile nodes developcontent without a common schema, ontology matchingand discovery must be robust to node mobility, considerthe processing capacity of nodes, and adapt to variationin the size of the network. We illustrate these designtrade-offs by comparing OntoMobil to the followingdiscovery systems in mobile ad hoc networks.

Kozat et al. [9] present a distributed discovery mech-anism for mobile ad hoc networks. The mechanism isbroker-based and operates very close to the routing layer.There is no suggestion as to what type of services canbe supported in such a system as the focus is on thespecification of the discovery protocol and its perfor-mance evaluation. The broker-based topology designedby Kozat et al. and the semantic overlay created byOntoMobil satisfy different requirements making a directcomparison difficult. If the broker-based topology wereto be used to fulfil the requirements of OntoMobil, thebroker nodes would have to bear the sole responsibilityfor concept matching. This would increase their process-ing overhead and would also require a specialised hand-off procedure to transfer the matching relations betweendeparting and new broker nodes. Because of mobility,the hand-off procedure and the constant swapping ofnodes between brokers and non-brokers would increasethe traffic overhead.

Sailhan et al. [10] describe a scalable service archi-tecture for mobile ad hoc and hybrid networks. Thearchitecture is composed of a service representation anda suite of protocols for advertisement and service dis-covery. The requirement of reduced energy consumptionhave led to a design that incorporates a set of connected

broker nodes forming an overlay network. The coredifferences between the OntoMobil overlay and the dis-covery architecture in [10], reside in the representationof content, the fabric of the overlay network, and theuse of broker nodes. Because OntoMobil uses ontologiesto describe content, the overlay is constructed frommetadata, rather than data or services. Furthermore, inOntoMobil every node is considered to be part of theoverlay, which simplifies the design by not requiring aspecial protocol for node to broker communication.

The Group-based Service Discovery (GSD) proto-col [17] uses a caching mechanism to improve accesstimes and reduce the overhead associated with semanticservice discovery in ad hoc networks. The caching mech-anism works by employing an ontology-based classifica-tion for all services. The use of cached service conceptsto aid discovery in GSD resembles that of OntoMobil.Amongst the differences are the common ontology as-sumed by GSD and the fact that OntoMobil does notuse flooding but relies on a unicast gossip protocol forconcept dissemination. This has the potential to improveOntoMobil’s scalability properties and to reduce over-head for nodes that do not participate in the exchange ofservices. GSD also sketches a service matching approach,which functions by identifying service groups, input–output parameters, and service capabilities as poten-tial matchmaking properties. This can prove suboptimalwhen the selective forwarding strategy is different tothe matchmaking one. For example, as outlined in thedesign of the forwarding strategy, service groups aresufficient to route a service to a provider. This doesnot guarantee a matched service however, as the querysignature might still be different to the provided one.OntoMobil addresses this concern by routing requestsaccording to their full semantic signature.

3 ONTOMOBIL : MODEL AND PROTOCOLS

This section presents the OntoMobil model and the de-tailed specification of the gossip and discovery protocols.

3.1 Model Description

The model specification uses concepts as the core abstrac-tion and randomisation as the main process. The specifi-cation begins by considering a finite set of concepts thatare split uniformly across nodes. The aim is to compareall concepts in a pair-wise fashion by using all availablenodes, while at the same time providing a replicationpattern to facilitate rapid discovery. The model uses theintuition that both ontologies and semantic queries areeasily decomposed into their constituent concepts. Tosimplify this abstract model description, each node isassumed to know every other node and protocol eventsin all nodes are assumed to be synchronised. In reality,the model only requires each node to know a subset ofthe participating nodes, while the synchrony assumptionis relaxed.

Page 5: c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of semantic similarity, stable connectivity, and global semantic knowledge. INGA [14] exploits

4

Initially, each node randomly selects a fixed subsetof its concepts and transmits these concepts to a set ofrandom nodes. In subsequent steps, each node mixes anyreceived concepts with those from its own ontology andperforms the same random selection and transmission.To prevent the eventual replication of all concepts intoall nodes, the model restricts both the retransmission ofreceived concepts and their propagation. This restrictionis necessary to allow the model to scale as the number ofconcepts increase. The restriction is facilitated by the useof two constraint parameters, a transmission threshold(age) and a propagation threshold (time to live).

The model requires each participating node to con-tribute a fraction of its resources for ontology matchingand replication. Ontology matching combines the pair-wise concept comparison with the randomised conceptdissemination to obtain network-wide semantic agree-ment in a progressive and distributed fashion. By utilis-ing the infrastructure that emerges from replication andsemantic matching, a discovery query can now followa random path through the one-hop neighbourhood ofeach node until it identifies concepts that are similar tothe query’s constituent concepts. Replication guaranteesthat similar concepts will be found in a number of hopsthat is probabilistically bound.

Such a model offers a scalable mechanism for the dis-covery of diverse semantic content and an efficient wayto match heterogeneous ontologies. It is scalable becauseeach node maintains only partial metadata and efficientbecause matching load is progressive and distributedacross all nodes.

3.2 System Assumptions

We consider an ad hoc network composed of N ={n1, n2, . . . , nN} mobile nodes. All nodes are consideredto be active participants that maintain ontologies andshare content. The model can be extended to includenodes that are not participants, though they are requiredto have the basic capability to forward packets. It isassumed that all nodes communicate using a fixed-range, wireless medium (e.g., IEEE 802.11) and that aunicast routing protocol (e.g., AODV, OLSR) is available.

Each node ni ∈ N maintains three different views. Theontology view, the concept view and the node view. Theontology view represents a node’s ontology as a fixedset of concepts, VOi = {ci1, . . . , ciG}, where cil ∈ V

Oi , 1 ≤

l ≤ G is a concept in the ontology of node ni and G

represents the maximum number of concepts per node.The model assumes that these ontologies are static so thisview allows no additions or deletions of concepts for theduration of the protocol execution. We believe that thisis a reasonable assumption as ontologies are structuredmetadata, meaning they are specified during applicationdesign and do not change often.

The concept view includes the set of concepts thatare received from other nodes. It does not allow du-plicate concepts or concepts that exist already in the

node’s ontology view. This view is represented as VCi ⊆{ckl | ckl ∈ V

Ok, k ∈ N − {ni}, 1 ≤ l ≤ G}, where ckl

represents a concept from any ontology view except theontology view of node ni. The gossip protocol providesthe concept view with the following properties. It is:

• probabilistically bound – the concept view is not con-strained by a fixed size, rather the protocol guaran-tees a bound on its size with a certain probability.The intent of the different gossip parameters is tokeep the concept view partial, i.e., with a certainprobability it should maintain only a subset of thetotal number of concepts,

• evolving – the gossip protocol constantly inserts andremoves concepts,

• a simple random sample – since each view does notcontain a set of concepts that concretely describe aknowledge domain, rather it contains randomisedconcepts that can belong to any of the availableontologies.

The node view is a set of node identifiers, VNi ⊆{nk | k ∈ N − {ni}}. Like the concept view it doesnot allow duplicate node identifiers and cannot containthe node’s own identifier. It is a membership view thathas similar properties to those found in recent gossipprotocols [18], [19]. The node view is:

• of a fixed-size – contrary to the concept view, thenode view does not automatically adapt to an ever-increasing group size. More sophisticated protocolshave been proposed that adapt the size of the mem-bership view as the group size increases [20]. Thenode view is also intended to be partial withoutcontaining the complete list of all participants,

• uniform – the probability that a node identifier existsin a specific node view is the same for all nodeidentifiers.

• randomised – the distribution of identifiers in thenode views is that of a random distribution.

The node view is populated during a bootstrap phaseand is subsequently maintained by the gossip protocol.

3.2.1 Gossip-based ParametersThe gossip protocol is completely characterised by thefollowing parameters:

• Fc – the concept fanout specifies the number ofconcepts a sender includes in a gossip message,

• Fn – the node fanout specifies the number of desti-nation nodes a gossip message is sent to,

• Ta – age specifies the number of times a nodetransmits a received concept before the concept isremoved from the concept view. For convenience,we define function age(c), which takes concept c asinput and returns its age,

• Tt – the time to live (ttl) value is assigned by eachsender to any concept that is selected for transmis-sion from its ontology view. It specifies the numberof hops in the semantic overlay that a concept willtraverse before being discarded, i.e., when the ttlvalue reaches zero.

Page 6: c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of semantic similarity, stable connectivity, and global semantic knowledge. INGA [14] exploits

5

3.3 Constructing the Semantic Overlay

3.3.1 Join

Before a node begins execution of the gossip protocol, itmust first populate its node view with a partial list ofother participants. A simple bootstrap protocol is usedin order to reach a partial and uniform membershipview. The OntoMobil join protocol is based on a simpleexpanding ring search and is used either when a nodeenters an ad hoc network or after a node failure.

3.3.2 Leave

The current gossip specification requires an explicit dis-connection procedure and does not admit unexpectedfailures. Before a node leaves the network, it declares itsintent as part of the periodic gossip transmission. Nodesthat receive such a gossip transmission 1) remove thenode identifier if it exists in their node view, 2) removeany concepts from their concept view that originate fromthe ontology view of the departing node and 3) removeall references from concepts maintained in their ontologyor concept views that point to concepts in the ontologyview of the departing node.

In the next gossip round, the nodes that initiallyreceived the leave notification will further propagate it,resulting in more nodes executing the same removal se-quence as above. Eventually, by gossipping the intentionto disconnect, all participants will be reached and allnetwork-wide references to the departing node will beremoved.

3.3.3 Gossip Protocol

The OntoMobil gossip protocol is influenced by [19]retaining similar semantics for the ttl and age param-eters and incorporating the use of a partial membershipview mechanism found in similar protocols. It has beenmodified to a distinct protocol, described in more detailin [21], to fit the discovery model of OntoMobil byaltering the transmit and receive operations, adding theconcept fanout parameter and optimising the OntoMobilgossip for a proactive routing protocol. In particular,what distinguishes the two main gossip operations oftransmission and reception from [19] is their operationon a finite set of concepts rather than on applicationgenerated messages; the transmission of a randomisedselection from the union of the concept and ontologyviews rather than the transmission of buffered messagesuntil such buffer is empty; and ultimately the provi-sion of randomised replication rather than randomisedtransmission. Listing 1 describes the transmission of agossip message. Periodically, a node selects Fc conceptsuniformly at random from the union of the ontology andconcept views (line 3). A concept that is selected from theontology view is augmented with the ttl property (line6), signifying the number of hops the concept will bepropagated before being dropped by the receiving node.If a concept is selected from the concept view and hasalready been transmitted Ta times, it is removed from

1: //GossipMessage represents the network packet2: Every t time units at node j3: Choose Fc random concepts {ck1, . . . , ckFc

} from VOj⋃VCj

where k ∈ N4: for all c ∈ {ck1, . . . , ckFc

} do5: if c ∈ VOj then6: c.ttl← Tt

7: end if8: if c ∈ VCj ∧ age(c) = Ta then9: VCj ← V

Cj − {c}

10: else if c ∈ VCj then11: age(c)← age(c) + 112: end if13: end for14: GossipMessage.concepts← {ck1, . . . , ckFc

}15: //Piggyback removal notifications16: GossipMessage.removals← BufferRemovals

17: //Select destination nodes18: Choose Fn random nodes, {n1, . . . , nFn

} from VNj19: GossipMessage.src← {j}20: for all nid ∈ {n1, . . . , nFn

} do21: send(nid,GossipMessage)22: end for

Listing 1. Gossip Transmission

the concept view (line 9). Selecting a fixed number ofrandom concepts from the union of the two views is asimple algorithm that exhibits a desirable adaptive be-haviour. During the initial stages of gossip transmission,a node’s priority is to disseminate its own ontology sothat its semantic information is diffused throughout thenetwork. Since the concept view contains few elementsduring the initial rounds, concepts from the ontologyview have a higher probability of being selected fortransmission.

If the sender has received any leave notifications, theyare also appended to the gossip message (line 16). Thelast action for the sender is to select Fn distinct nodesuniformly at random from its node view (line 18) and tounicast the gossip message to the target nodes (line 21).

On reception of a gossip message, the receivingnode executes the algorithm presented in Listing 2.If the optional field GossipMessage.removals existsthen the actions outlined in Section 3.3.2 are executed.The identifiers are then stored in the auxiliary bufferBufferRemovals so they can be further propagated in thenext gossip transmission (line 6).

Next, the receiver matches the set of concepts includedin the gossip message against its own stored concepts(line 10). We use the H-MATCH algorithm [5] to iden-tify concept equivalence relationships. When two con-cepts are found equivalent, a reference to the matchingconcept and its corresponding source node address isappended in both concepts. The current implementationuses RDFS to model ontologies, so two RDFS predicatesare appended in both concepts, om:matchesConcept

Page 7: c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of semantic similarity, stable connectivity, and global semantic knowledge. INGA [14] exploits

6

holds a reference to the URI of the matching conceptand om:matchesConceptInNode references the corre-sponding node address.

Subsequent to matching, lines 11 – 14 show that if aconcept is not found in either the concept or the ontologyviews and the concept’s ttl is greater than one, it is storedin the receiver’s concept view and the concept’s ttl valueis decremented by one.

1: //GossipMessage represents the gossip packet2: On reception of a GossipMessage at node j3: //Prune entries from removed nodes4: for all nid ∈ GossipMessage.removals do5: Execute algorithm in Section 3.3.26: BufferRemovals ← BufferRemovals

⋃{nid}

7: end for8: //Process received concepts9: for all c ∈ GossipMessage.concepts do10: Execute ontology matching algorithm between c and

VOj⋃VCj

11: if c /∈ VOj ∧ c /∈ VCj ∧ c.ttl > 1 then12: c.ttl← c.ttl − 113: VCj ← V

Cj

⋃{c}

14: end if15: end for16: //Node view maintenance17: if GossipMessage.src /∈ VNj then18: if |VNj | = ParameterNodeView then19: Prune VNj by removing a random node id20: end if21: VNj ← V

Nj

⋃{GossipMessage.src}

22: end if

Listing 2. Gossip Reception

Through the ontology matching algorithm, each mes-sage has the potential to create new associations be-tween concepts from different ontologies. This progres-sive aspect of ontology matching results in the followingnetwork behaviour: the longer a node is connected toan ad hoc network, the more likely it is that potentialassociations between its own and other ontologies willbe identified. This prevents nodes that are short-lived ortransiently connected from immediately overloading thenetwork with the task of complete ontology matching. Aconcept view size that is probabilistically bound and afixed concept fanout also ensure that nodes are not over-whelmed with matching large ontologies. Although theredundancy that is inherent in gossip protocols can seemexcessive, it is this very feature that allows progressivematching through the exchange of concepts and scalablediscovery through concept replication.

The last action a receiver undertakes is node viewrelated. A simple algorithm derived from [18] is used.The algorithm maintains the fixed size of the node view,denoted by ParameterNodeView, while constantly updatingit with a small number of new identifiers (lines 17 – 22).

The benefits of the gossip execution are twofold: first,the ontology matching component in each node can

derive semantic similarity relations between its storedconcepts and those received; second, replicated conceptscan be used to bound the number of hops requiredfor discovery. The gossip protocol is the basis of themodel and provides the foundation on top of whichdistributed ontology matching and semantic discoverycan take place.

3.3.4 Optimised Gossip

An important decision in the design of OntoMobil wasthe separation between the gossip and the routing pro-tocol. While this separation places mobility adaptationto the routing layer and semantic discovery in theapplication layer, the transmission of concepts manyhops away can increase routing overhead and failedtransmissions. We have devised and implemented anoptimisation for the gossip protocol that works with aproactive routing protocol, specifically OLSR, and com-bines the use of topology-aware membership views witha weighted destination selection mechanism. The idea isto have the node view maintain, in addition to the setof node identifiers, an estimation of the number of hopsrequired to reach each of the corresponding nodes. Thisinformation can be recorded each time a gossip messageis received and maintained together with an associatedtimer to track the accuracy of the hop distance. Havinga node view that is populated with a distance metric canfacilitate an algorithm that does not select identifiers uni-formly at random but is biased to those identifiers witha short hop count and timely information. Specifically,in the optimised gossip version the target of a gossiptransmission is not selected based on uniform randomsampling, but using a weighted random sampling. Thenode view for node ni now contains entries of the formVNi = {(nk, wk), . . . }, where nk a node identifier andwk = (h×∆t)

−1 the weight for node k that was h hopsaway from node i when it was inserted in the node viewof node i some ∆t time units ago.

3.4 Discovery

Discovery queries in OntoMobil [22] accept a set ofconcepts describing the capabilities of services or gener-ally the semantics of required content. The mechanismthat disseminates these queries exploits the randomisedoverlay to locate nodes having ontologies with conceptsequivalent to the concepts that compose the discoveryqueries. The specification in this section covers thismechanism and assumes that a matchmaking process ateach destination node will identify services or data thatare semantically similar to those requested and transmitthe reply to the source node.

Some common definitions are provided below fol-lowed by the protocol description:

• A node initiating a discovery request becomes thesource node of the request.

Page 8: c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of semantic similarity, stable connectivity, and global semantic knowledge. INGA [14] exploits

7

• A request is composed from a set of concepts. Thefollowing notation is used:

Q = {c, . . . | c ∈ VOi , i ∈ N}

• Matches are stored in R = {(c, {ni, . . . }), . . . | c ∈Q}, where ni represents a node that has an ontologycontaining a concept that matches c.

• To simplify the description of the random walk, wedefine the function fS(c), where c is a concept withc ∈ S and S can be either of the two views, i.e., VO,VC, orR. This function returns the set of node identi-fiers of all matched concepts currently embedded inc. In other words, it returns the values obtained fromthe set of om:matchesConceptInNode predicates.

1: //RequestMessage represents the semantic query2: At source node j3: for all c ∈ Q do4: if c ∈ VOj then5: R ← fVO

j(c)

6: end if7: end for8: RequestMessage.ttl← ParameterRequestTTL

9: Choose a random nid from the 1-hop neighbours of j10: send(nid,RequestMessage)11: At each node receiving RequestMessage:12: for all c ∈ Q do13: if c ∈ VC ∪ VO then14: R ← R∪ fVO(c) or R ← R∪ fVC(c)15: end if16: end for17: if RequestMessage.ttl = 0 or all concepts found then18: D =

c∈Q(fR(c))19: for all nid ∈ D do20: //redirect query to node nid21: forward(nid,RequestMessage)22: end for23: else24: RequestMessage.ttl← RequestMessage.ttl − 125: Choose a random nid from 1-hop neighbours26: send(nid,RequestMessage)27: end if

Listing 3. Semantic Discovery

The basic discovery mechanism in OntoMobil is flex-ible and can support many possible variations. Thesevariations are conditioned on several choices, e.g.,whether the concepts composing the query exist in thesource node’s ontology or the nature of the operatorsthat form the query condition. The current variation,manifested in the definition of Q, is selected becauseof its simplicity and because it utilises both the mech-anisms of concept replication and matching. It does soby allowing the composition of queries from non-localconcepts, i.e., concepts that do not exist in the ontologyview of the source node. The difference being that whena query is solely composed of concepts that exist in

the source node’s ontology, extracting matching conceptscan be accomplished either by only polling periodicallythe local concepts for new matching associations orby using polling followed by a discovery request. Thedrawback with a polling-only mechanism is that it be-comes overly dependent on the progress of the semanticmatching. Combing polling with a discovery request canincrease the chances of identifying new associations inother nodes. When a query is composed of non-localconcepts, a discovery query is mandatory since the queryconcepts must first be discovered in the network for anysubsequently matching associations to be extracted. Themotivation to compose queries with non-local conceptsstems from the ability to create more complex queries bycombining concepts in the node’s ontology with conceptsin the network. Furthermore, it allows the discovery ofcontent using predefined queries by nodes that can onlyhold a minimal ontology but want to avail of all networkknowledge.

We briefly describe the discovery protocol by observ-ing the random walk as traversing a graph built byconsidering each node as a vertex with outgoing edgesto the nodes found in its one-hop neighbourhood (lines1 – 10). The random walk mechanism is used to collectthe identifiers of nodes that maintain ontologies withconcepts that match the concepts in the discovery query.The node identifiers constitute addresses of potentialcontent providers since a matching relation by definitionindicates partially compatible ontologies. When a querycondition is satisfied, i.e., all query concepts are locatedor when the query’s ttl reaches zero (line 17), the secondphase is initiated in which the query is forwarded to thenodes discovered during the first phase (line 21).

4 STOCHASTIC ANALYSIS OF ONTOMOBIL

When considering the applicability of OntoMobil, thesize and variability of the concept view become impor-tant factors. They influence the scalability of the gossipprotocol as memory consumption, processing overhead,and the probabilistic guarantees for discovery dependon the distribution of replicated concepts across nodes.

This section uses a stochastic analysis to derive apredictive model of the proposed gossip protocol, giventhe total number of concepts, the number of nodes, andthe characteristic parameters of Fn, Fc, Tt, and Ta. Themain aim of the analysis is to formulate a probabilitymeasure of the concept view size, which will be usedas the foundation for the probabilistic guarantees of therandom walk discovery protocol.

The variability of the concept view size is modelledusing the random variable V . Based on V , the probabilitymass function (pmf) fV (v) is derived, which containscomplete information about the distribution of V tosatisfy the main goal of the analysis.

4.1 Analytical AssumptionsTo make analytical modelling tractable a number ofassumptions and simplifications are made. Similar to the

Page 9: c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of semantic similarity, stable connectivity, and global semantic knowledge. INGA [14] exploits

8

system assumptions in Section 3.2, the analysis considersa fixed set of N participants with cardinality N = |N |,that maintain the same number of concepts G = |VO| intheir ontology views. This brings the total number ofconcepts to Gtotal = N · G.1

Recall that concepts are inserted in the concept viewif they do not already exist in the two views VC andVO. Concepts are also removed when they are selectedfor transmission from the concept view and their ageparameter reaches Ta. The analysis presented here fo-cuses on modelling the variability of the concept viewsize, based on the probability that a certain number ofconcepts are inserted and removed in each round.

Fig. 2. The variability in the size of the concept viewcan be measured in space (node slice) or in time (roundslice). Data in this figure are taken from simulations with300 nodes for 300 rounds. Despite the dispersion, itcan be seen that both distributions are convergent andapproximate the Gaussian. Experiment invariants: Fc = 4,Fn = 2, Tt = 2, Ta = 2.

As is evident from the protocol specification, the sizeof the concept view varies both across nodes and acrossrounds. Specifically, a node’s concept view varies acrossrounds, while in a certain round there is variability in theview sizes of the different nodes. We assume that the sizedistribution across both of these dimensions is Gaussianand as the number of rounds and the number of nodesincrease the mean and variance of these respective distri-butions converge. Figure 2 illustrates the distribution ofthe concept view size through a normal probability plottaken from a simulation of the gossip protocol with 300nodes running for 300 rounds. Despite the dispersionbetween the two data slices and the short tailed lines,the two slices do approximate the normal distribution,which is sufficient for the purposes of this assumption.

The stochastic model provides an analytical distribu-tion that approximates both experimental distributionsas derived from the per node and the per round samples.This implies that the distribution of the concept view

1. The appendix provides a concise table of the symbols used duringthe analysis.

size for a single node across time can also be used as anindicator for the distribution of the concept view sizesacross all nodes during a single round. Therefore, thebehaviour of the gossip protocol is observed from a nodethat is selected uniformly at random from the nodes inN .

The four main assumptions that will be used duringthe analysis are presented below.A1)Synchronous gossip transmission. The stochastic

analysis assumes that all nodes transmit in synchronousintervals. At every round, each node in N transmitsa gossip message to Fn other nodes. Node failures ormessage omissions are not taken into account and itis assumed that message latency between all nodes isnegligible. It can be inferred that messages sent in around will be received within the same round. Notethat this assumption only requires the upper bound onnode to node communication to be within the range ofthe gossip timeout value. This is feasible since there areno real time requirements and the timeout value can beeasily adjusted without influencing protocol correctness.A2)Simple random sample. All concepts in both the

concept and ontology views represent a simple ran-dom sample selected from the set of all concepts. Thisassumption is verified by the agreement between thediscovery and analytical results that are presented inSection 4.3. Specifically, this assumption implies that theset of concepts in each node’s VC∪VO represent a simplerandom sample from

⋃VOi ,∀i ∈ N .

A3)Independence of gossip reception and transmission. Thetransmission and reception of gossip messages are inde-pendent events that happen simultaneously. In practice,transmission and reception of gossip messages have anarbitrary order within a round and occur sequentially.However, treating them as simultaneous events allowsthe composition of reception and transmission into asingle well-ordered gossip action.A4)Sequential insertion of concepts when Fn > 1. The

last assumption concerns the reception of multiple gos-sip messages from different senders, i.e., Fn > 1. Sec-tion 4.2.2 shows that on average Fc · Fn concepts arereceived by each node in a single round. The assumptionof sequential insertion excludes the case where similarconcepts may exist in the different Fn transmissions,essentially treating this case similar to the reception of asingle gossip message containing Fc ·Fn distinct concepts.

4.2 Stochastic Model Specification

The stochastic model considers a network of nodes withempty concept views at the initial round r = 0. LetVr be a random variable representing the concept viewsize of a randomly chosen node, so that Vr models |VC|when r ≥ 0. From the gossip specification, Vr has rangeEV = {0, 1, . . . , Gc}, where Gc = Gtotal − G. The sequenceof random variables {Vr}r≥0 can now be considered asa discrete-time and finite-space Markov chain with EVas its state space. We now proceed to calculate the statevector and transition matrix of Vr.

Page 10: c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of semantic similarity, stable connectivity, and global semantic knowledge. INGA [14] exploits

9

4.2.1 State probability vectorThe state space represents all applicable concept viewsizes in unit increments. To represent the state probabil-ities at round r we use the vector

p(r) =[p(r)0 . . . p

(r)Gc

]′(1)

where p(r)i = P [Vr = i], i ∈ EV is the probability that

the concept view has size i after round r with an initial

condition p(0)0 = P [V0 = 0] = 1.

If the stationary vector for p(r) exists, the state proba-bilities reach a steady-state and the stationary vector rep-resents the required probability measure of the conceptview size. Since the stationary vector can be mapped tothe probability mass function fV (v), then if it shown thatp(r) has the stationary property, fV (v) will have beenderived.

4.2.2 Transition probability matrixGiven that the concept view of a random node has sizeVr = i after round r, the transition to size Vr+1 = j afterround r + 1 is subject to the following conditions:

• C1. the number of concepts received by the node,• C2. the number of concepts inserted in the concept

view,• C3. the number of concepts removed from the con-

cept view because of the node’s transmission andthe age parameter threshold.

Condition C1 is approximated by computing the meannumber of received concepts in each round. Under theassumptions of uniform node views and synchronousgossip transmission (A1), each node will receive on av-erage µc = Fc ·Fn concepts from Fn gossip transmissions.

Knowing the mean number of concepts that a nodereceives in a round, it can be inferred that, between tworounds, the size of a concept view ranges from −Fc toµc. This variability lies in the interval of the boundarycases where no concepts are inserted and Fc concepts areremoved, and all µc received concepts are inserted butno concepts are removed.

........ Vr=j j+1 j+2 j+3 j+µcj−1j−Fc

Fig. 3. States and corresponding transitions for a Markovchain that represents the variability in concept view sizes.

Figure 3 illustrates the possible transitions of the con-cept view size after a round. Each of these transitionscan be represented by an event that has zero or moreoutcomes. Two numbers compose each outcome, thenumber of concepts inserted in the view and the numberof concepts removed from the view. These two numberscorrespond to conditions C2 and C3.For example, the transition to the state of maximum

view size reduction occurs when there are zero insertions

and Fc concept removals, while moving to the staterepresenting the maximum increase happens when thereare zero removals and µc concept insertions.The transition to other states can be computed in a

similar way, remembering that most transitions are rep-resented by multiple outcomes. An increase of one in theview size can happen because two concepts are insertedand one is removed or three concepts are inserted andtwo are removed, etc.

This relationship is revealed if the difference in viewsizes between consecutive rounds, i.e., Vr+1 − Vr, isfurther decomposed into two random variables. Eachrandom variable represents an outcome, with X denot-ing the number of inserted concepts during a round (C2)and Y denoting the number of concepts removed duringthe same round (C3). The two random variables haveranges EX = {0, 1, . . . , µc} and EY = {0, 1, . . . , Fc}. Itfollows that X − Y = Vr+1 − Vr.As stated in assumption A3, insertion and removal are

considered as simultaneous and independent events. LetPX(x, i) = P [X = x|Vr = i] and PY (y, i) = P [Y = y|Vr =i] be used to express the probabilities that x concepts areinserted and y concepts are removed when the conceptview size is i. The probability that a view change isVr+1 − Vr can then be computed by using the sum ofproducts between PX(x, i) and PY (y, i) for all x ∈ EXand y ∈ EY where x− y = Vr+1 − Vr.

The view change probability can be generalised toexpress the transition probability as:

Pij = P [Vr+1 = j |Vr = i]

=

∀x∈EX ,∀y∈EY :

x−y=j−i

PX(x, i)PY (y, i) i−Fc≤j≤i+µc

0 i+µc<j<i−Fc

(2)We can now proceed to compute the values Pij of the

probability matrix by deriving PX(x, i) and PY (y, i).

4.2.3 Concept removal probability

Let ζ(i) = G+i be a convenience function that returns thenumber of concepts in a node’s ontology and conceptviews when the concept view has size i. The conceptremoval probability is calculated assuming that conceptswith different age values are uniformly distributed in theconcept view. The probability of removing y concepts,when the concept fanout is Fc can then be calculated as:

PY (y, i) =

(⌈i/Ta⌉

y

) (⌊ζ(i)−(i/Ta)⌋

Fc−y

)

(ζ(i)Fc

) y ∈ EY (3)

Equation (3) expresses the probability of selecting yconcepts from the subset of concepts in the concept viewthat have an age value of Ta − 1, while the remainingFc−y are selected either from the ontology view or fromthe subset of concepts having an age value different toTa − 1.

Page 11: c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of semantic similarity, stable connectivity, and global semantic knowledge. INGA [14] exploits

10

4.2.4 Concept insertion probability

To calculate the concept insertion probability, it is neces-sary to identify the number of received concepts that (a)have a ttl value that is greater than one and (b) do notexist in the concept and ontology views of the receiver.Let η(i) be a function that returns the number of conceptsin the sender that have a ttl value greater than one anddo not exist in the receiver’s views.

The probability of inserting x concepts having receivedµc can be calculated as:

PX(x, i) =

(⌈η(i)⌉

x

)(⌊ζ(i)−η(i)⌋

µc−x

)

(ζ(i)µc

) x ∈ EX (4)

Equation (4) expresses the probability that x conceptswill be selected from η(i) concepts and therefore in-serted, while the remaining µc − x concepts will eitherhave a ttl value of one or exist in the receiver’s viewsand will accordingly not be inserted.

4.2.5 Computation of η(i)

As stated in Section 4.2.4, the two factors of the gossipprotocol that influence the insertion of received conceptsinto the concept view are: (F1) the ttl values of thereceived concepts and (F2) the existence of the receivedconcepts in the concept or ontology views of the receiv-ing node.

In order to compute the probability that the numberof concepts selected by the sender will be inserted inthe receiver’s view, it suffices to relate both F1 andF2 to the view size of the gossip sender. Each of theselected concepts must have a ttl value different to oneand cannot exist in the receiver’s views. To formulatethe concept insertion probability, we require expressions,predicated on view size, from which to derive the numberof concepts having a certain ttl value and the number ofreplicated concepts between the sender and the receiver.The following paragraphs outline an analytical deriva-tion of the two expressions.

To compute η(i) we must find the distribution of ttlvalues across the ontology and concept views of a singlenode for F1 and the distribution of replicated conceptsbetween a sender and a receiver for F2. Since bothfactors depend on the size of the concept view, theircalculation must take place for all concept view sizes.

Knowing that concepts in the ontology view have attl value of Tt, the concept view will contain conceptswith ttl values in the range of [1, . . . , Tt − 1]. Figure 4illustrates the intuition that in each round, the numberof concepts with a ttl value of τ is proportional to thenumber of concepts that had a ttl value of τ + 1 in theprevious round.

Knowing that the ttl distribution, when the conceptview has size i, derives from the ttl distribution of theimmediately preceding size i−1, we can use a recurrentfunction, g(τ, i), to compute the number of conceptshaving ttl value τ when the size of the concept view

τ=3

τ=3

τ=3

τ=2

τ=2

τ=2

τ=2

τ=1

τ=1

τ=1

τ=0

VCVC

VOVO

Fig. 4. The distribution of ttl values across concepts in aconcept view is proportional to the ttl distribution acrossconcepts in both views of the previous round.

is i. Function g(τ, i) has domain and range:

g(τ, i) : {1, . . . , Tt} × {0, . . . , Gc} → {0, . . . ,max{G, i}}

and is specified as:

g(τ, i) =

0 if {τ}Tt−11 , i = 0

G if τ = Tt, {i}Gc

0

i ·g(τ + 1, i− 1)∑Tt

κ=2 g(κ, i− 1)if {τ}Tt−1

1 , {i}Gc

1

(5)

To compute the probability that a transmitted conceptexists in the receiver’s views (i.e., either the concept orthe ontology view), we first need to calculate the numberof identical (replicated) concepts in both views betweenthe sender and the receiver. Using an overall set torepresent all concepts, under the simple random sampleassumption (A2) we can use two distinct subsets ofequal size selected randomly and with replacement fromthe overall set to represent the sender’s and receiver’sconcepts.

Let the random variable Z represent the number ofreplicated concepts between two nodes. The range of Zis EZ = {0, . . . , ζ(i)} and its probability distribution canbe expressed as:

PZ(z, i) =

(ζ(i)z

) (Gtotal−ζ(i)

ζ(i)−z

)

(Gtotal

ζ(i)

) for z ∈ EZ (6)

Equation (6) expresses the probability that z conceptsare identical when both the sender’s and the receiver’sviews have size ζ(i). We derive the distribution know-ing that ζ(i) concepts have already been selected fromGtotal for the first subset and formulating the probabilitydistribution of identical concepts based on the secondsubset. Since the size of the second subset is also ζ(i),the probability distribution is calculated by having thesecond subset be composed of z items chosen from thefirst subset, therefore considered identical, while the restζ(i)−z are selected from the remaining set of all availableconcepts. Simulation results indicate that the expectedvalue E[Z] is a good approximation to the mean numberof identical concepts in both views between the senderand the receiver.

Assuming that the expected number of identical con-cepts are spread uniformly across the two views in

Page 12: c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of semantic similarity, stable connectivity, and global semantic knowledge. INGA [14] exploits

11

each node, the probability that a transmission containsa certain number of identical concepts can be easilydetermined.

Factors F1 and F2 can now be expressed with equa-tions (5) and (6) resulting in the following equation forη(i):

η(i) =

Tt∑

τ=2

g(τ, i)−

(∑Tt

τ=2 g(τ, i)

ζ(i)· E[Z]

)

(7)

4.2.6 Stationary distribution

By construction, the probability matrix derived fromequation (2) is finite. It can be shown [23] that theMarkov chain is irreducible and aperiodic, if for somepower of the matrix, all its elements are positive.

Each transition matrix constructed under the currentmodel has elements that have either positive or zerovalues. Each matrix is formulated with a set of welldefined rules, with the following property:

P[ij] =

{

> 0 if i− β < j ≤ i + α

0 otherwise(8)

where P[ij] represents the matrix element at position ij,which has a value calculated from Pij . Specifically, αcorresponds to µc and β to Fc. When α > 1 and β > 1,it is easy to show that for some r > 1, the transitionprobability matrix Pr will have all its elements positive.

Knowing that the state probability vector p(r) resultsin a stationary vector and having formulated the tran-sition matrix, the probability distribution of V is thestationary vector limr→∞p(r), which can be computedfrom:

p(r) = p(r−1) P (9)

4.2.7 Discovery Distribution

We can now derive the probabilistic bound on conceptdiscovery for a query composed of non-local conceptsand specified according to the protocol in Section 3.4:

PDisc = 1− ((1−

i︷ ︸︸ ︷

|VC|

Gtotal) · (1−

ii︷ ︸︸ ︷

|VC|+ |VO|

Gtotal)n−1)|Q| (10)

where n ≥ 1 represents the number of nodes in therandom walk path, including the source node, (i) theprobability that a concept exists in the concept view ofthe source node, (ii) the probability that a concept isfound in either of the two views and |Q| represents thenumber of concepts in the query.

4.3 Evaluation of Stochastic Analysis

This section evaluates the stochastic properties of thegossip protocol by studying the accuracy of the analyti-cal model against simulation results. In all experimentsthe same parameter values were used for both the sim-ulation and the analysis. We chose two network sizesof 30 and 60 nodes. The varied gossip parameters were

node fanout (Fn), age (Ta), and the concept ttl (Tt), whilethe concept fanout (Fc) and the ontology view size (|VO|)were kept constant at 4 and 10 concepts respectively.Section 5 presents more details on the simulation en-vironment.

Fig. 5. Comparative results for the size of the conceptview between the ns2 simulation and the analysis for arange of the characteristic parameters.

Fig. 6. Comparison of the discovery ratio between thens2 simulation and the analytical results predicted byequation (10).

Figure 5 demonstrates the results between analysisand ns2 simulations. For simulation results, the 95%confidence interval is calculated over 350 rounds ofprotocol execution. In each experiment, all nodes recordthe size of their concept view after every transmission.To illustrate results closer to the steady state distribution

Page 13: c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of semantic similarity, stable connectivity, and global semantic knowledge. INGA [14] exploits

12

of the concept view, the initial 30 recordings are dis-carded. Each simulation point in the graph represents themedian value from a sequence of average concept viewsizes obtained from five experiments. For the analyticalresults, what is depicted is the expected value of the statevector (E[V ]).

For almost all parameter permutation, the stochasticanalysis gives a very accurate prediction for the be-haviour of the gossip protocol. The variability that isobserved when the concept view size is large, is causedpartly by the analytical assumptions made during thederivation of the transition matrix in the analysis andpartly by the use of weighted random sampling in theoptimised gossip implementation and the failed trans-missions in the simulation.

Figure 6 shows the comparative performance betweenthe ns2 simulation and the analysis of the random walkdiscovery protocol. In each query, the source node selectsone concept (|Q| = 1) uniformly at random from a setthat includes all available concepts, except the ones inthe ontology view of the source node. The source nodesubsequently searches its concept view for that conceptand if it is not found, the random walk protocol isexecuted. It is clear that the size of the concept viewinfluences the discovery probability, so we used the gos-sip ttl (Tt = 2, 3) to vary the concept view. After the first30 rounds of the ns2 simulation, every node transmittedone discovery query per second. In all experiments thequery ttl (ParameterRequestTTL) was set to five. The y-axisdepicts the discovery ratio per node in the random walk.It is the probability of locating the query concept duringthe random walk. The discovery probability is computedusing the cumulative number of queries satisfied in eachsuccessive node divided by the total number of queries,including those that were lost.

The discovery ratio always increases in proportionto the size of the concept view. With respect to net-work size, the discovery ratio shows a linear decreasewith increasing number of nodes and hence increas-ing number of concepts. Note that this is a naturalconsequence of the intrinsic relationship in OntoMobilbetween nodes and content (concepts), which is contraryto the usual treatment of nodes and content as separateentities. OntoMobil can guarantee a probabilistic boundon its properties such as discoverability or the latencyof semantic matching, provided that its characteristicparameters adjust to variations in the network size. Forexample, given a desirable discovery ratio of 60% ina network of 30 nodes with the gossip ttl parameterset to 2, to guarantee a similar discovery ratio whennodes increase to 60 it suffices to increase the gossip ttlto 3. In general, assuming similar ontology view sizesbetween nodes even as the network grows or shrinks,constant probabilistic bounds on OntoMobil propertiesare guaranteed by varying the values of the characteristicparameters to keep the ratio of the expected conceptview size to total concepts (E[V ]/Gtotal) constant.

Finally, the correspondence between the analytical and

the experimental results verifies the assumption that theunion of the concept and ontology views constitute asimple random sample from the population of all con-cepts (assumption A2 in Section 4.1), which is a furthervalidation of the discovery probability model capturedby equation (10).

5 EXPERIMENTAL EVALUATION

In order to assess the design choices made in Onto-Mobil, specifically the use of replication and gossiping,we devised DiscBCast, a broadcast-based protocol usedfor the experimental evaluation. DiscBCast does notuse replication or gossiping, but instead broadcasts adiscovery query to all connected nodes. To improvethe performance of DiscBCast, we used an establishedbroadcast optimisation technique, namely BCAST, asdescribed in [24]. The aim of DiscBCast is similar toOntoMobil: to discover semantic content in MANETs, as-suming multiple heterogeneous ontologies. To facilitateconcept matching, a random subset of concepts from thequery’s source node are always piggybacked with everyquery. DiscBCast was also implemented in ns2 [25] andsimulations between the two protocols were conductedusing similar parameters.

The experimental evaluation compared OntoMobilagainst DiscBCast in terms of the ontology matchinglatency, the overhead a node sustains due to the dis-tributed matching approach and the network overhead.The latency reveals the required time before a semanticquery discovers any potential matches for its concepts.It is measured as the ratio between the matching asso-ciations established when the query executes versus allpotential matches. The node overhead demonstrates theimpact of matching in each node. It is a measure of thenumber of concept comparisons per second. Finally, thenetwork overhead illustrates the cost of each protocoland is measured in the number of message receptionsper second.

5.1 Experimental Setup

Simulations in ns2 were conducted with network sizesof 30 and 60 nodes. Experiments last for 500 secondseach, with each experiment repeated five times. Resultsare always averages of these five runs. The experimentalsetup uses the random trip mobility model [26] in asimulation area that maintains a constant density. Thearea dimensions are 915 × 915 and 1297 × 1297 meterscorresponding to network sizes of 30 and 60 nodes. Allnodes act as participants, with each node maintainingan ontology of 10 concepts.

Since DiscBCast does not feature epidemic dissemina-tion or replication, the parameters of node fanout, ageand ttl are not used. Concept fanout is used with similarsemantics as in OntoMobil. It represents the numberof concepts that are piggybacked with every discoveryrequest.

Page 14: c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of semantic similarity, stable connectivity, and global semantic knowledge. INGA [14] exploits

13

The fundamental trade-off faced in DiscBCast whensetting parameters is the network overhead vis-a-vismatching ratio. Excluding the option to vary the conceptfanout parameter, the only two parameters that influencethis trade-off is the discovery timeout and the numberof nodes querying the network. OntoMobil uses twodifferent timeouts, one for the gossip and one for thediscovery protocol. The gossip timeout is selected ran-domly between the range of U [0.68, 1.11] seconds, whilethe discovery timeout expires every 1 second. It is clearthat a high rate of discovery queries (low timeout) forDiscBCast will accelerate convergence of the multipleontologies, but will also generate excessive traffic. ForDiscBCast, we have set the discovery timeout to thevalue of OntoMobil’s gossip timeout and also used acompensation factor that gives each node a probabilityof 0.7 of actually executing the discovery query, reducingsomewhat the broadcast storm problem [27].

The evaluation of the ontology matching latency usedsynthetic data. A set of random and unique conceptswere generated that populated the ontology view ofeach node. To establish a notion of similarity, a numberof concepts were randomly selected from the ontologyview of each node and a mapping is created betweeneach source concept, and one target concept selectedfrom the remainder nodes. The mapping enables theappearance of transitive mapping relations, meaning thata concept maybe be related to more than one concepts inother ontologies. Note that a concept does not contain areference to all other concepts in the transitive mapping.Instead, a concept maintains a mapping to only a subsetof the concepts that compose the transitive relation.Constructing the mapping relation in such a way ismore realistic in a distributed environment, since eachontology is not required to have complete knowledgeof all other ontologies. On the other hand, such anapproach induces additional latency in order to facilitatecomplete semantic agreement.

5.2 Results

Figure 7 depicts the matching latency with respect toprotocol rounds. Latency is measured by having eachnode issue a discovery query every one second for Onto-Mobil and with 70% probability once every U [0.68, 1.11]for each node in the DiscBCast simulation. The query iscomposed of a single concept and we record the numberof established matches in each visited node. As thenumber of potential matches is known for each concept,it is straight-forward to assess the matching ratio. Themain factor which influences the number of rounds re-quired before complete semantic matching is established,i.e., when all concepts are augmented with all potentialmatches, is network size. The DiscBCast protocol per-forms marginally better, which is expected because ofthe higher rate of concept matching facilitated by thebroadcast nature of the protocol. As nodes increase, wealso observe an increase in the matching latency. This

Fig. 7. Comparison of matching latency between Onto-Mobil and DiscBCast.

TABLE 1Node overhead measured in concept comparisons per

node per second.

Protocol |N | = 30 |N | = 60

OntoMobil Tt = 2 400 290OntoMobil Tt = 3 841 890DiscBCast 445 457

is attributed to the fact that the matching latency in-creases linearly with the number of total concepts, givingan idealised mean number of concept comparisons pernode of N × |VO|2, where |VO| is a constant factor. Thegraph shows that combining a gossip unicast approachwith replication can achieve comparable results with abroadcast protocol but, as shown in the next paragraphs,with significantly reduced node and network overhead.

Table 1 displays the overhead incurred by the match-ing process between OntoMobil and DiscBCast. We mea-sure the overhead using the number of concept compar-isons per node per second. It is interesting to observethat there is no significant difference as the number ofnodes double. In the case of OntoMobil, this is facilitatedby the gossip protocol, which guarantees a constantnumber of message receptions per node, regardless ofthe network size. In the case of DiscBCast, the relativelyconstant number of concept comparisons is due to i) theincrease in the simulation area, guaranteeing a constantnode density and hence reducing packet collisions, andii) to the scalability of the BCAST optimisation, whichprevents network congestion by eliminating redundanttransmissions. As already mentioned in Section 4, the de-termining factor that influences the processing overheadin each node is the size of the concept view. As expected,by increasing the Tt parameter from two to three resultsin doubling the size of the concept view, which accountsfor the higher number of concept comparisons betweenthe two variations of the OntoMobil protocol.

Page 15: c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of semantic similarity, stable connectivity, and global semantic knowledge. INGA [14] exploits

14

TABLE 2Network overhead measured as the average number of

received messages per node per second.

Protocol |N | = 30 |N | = 60

OntoMobil Tt = 2 7.7 8.9OntoMobil Tt = 3 6.5 8.2DiscBCast 34.0 34.1

Table 2 shows the network overhead incurred by thetwo protocols as measured by the number of protocolpackets received per node per second. Message receptioncan provide us with an accurate metric of networksaturation, especially when the comparison involves abroadcast and a unicast protocol. We have excluded rout-ing and MAC control packets from this measurement toremove any overhead associated with OLSR and BCAST.As expected, the broadcast nature of DiscBCast incurs ahigher toll on the network, leading to increased packettransmission failures and bandwidth saturation.

6 CONCLUSION

This paper presented OntoMobil, a model for the dis-covery of semantic content that caters for mobile adhoc networks and semantic decentralisation. The modelrelies on a randomised concept dissemination mecha-nism to build a semantic overlay topology. Such anoverlay facilitates eventual semantic agreement betweenheterogeneous ontologies and provides a substrate forthe discovery of content. We have provided a stochasticanalysis of the proposed model and verified the accuracyagainst simulation results. We have also compared thetwo main features of OntoMobil, replication and unicastgossiping, against a broadcast protocol in order to shedlight on the trade-offs involved. OntoMobil performsfavourably in terms of discoverability with significantlyreduced overhead.

At the moment, the stochastic analysis (Section 4.2)does not incorporate a failure model. This has an impactin the accuracy of the stochastic model in scenarioswith high message failure rate (> 15%). Integrating theprobability of lost messages, will enhance the predictivepower of the model, increasing the accuracy of theexpected concept view size.

Another aspect of OntoMobil that requires future at-tention is the integration of a leave protocol that usesa soft state mechanism, rather than rely on an explicitdisconnection procedure (Section 3.3.2). One approachis to rely on the randomised nature of transmission anduse the property that with some probability each nodewill eventually contact every other node, unless failureoccurs.

APPENDIXNOTATION

1) VCi is the concept view for node i.

2) VOi is the ontology view for node i with constantsize G.

3) VNi is the node view for node i with constant sizeParameterNodeView.

4) Tt, network-wide constant for the ttl parameter.5) Ta, network-wide constant for the age parameter.6) Fn, network-wide constant for the node fanout.7) Fc, network-wide constant for the concept fanout.8) Gtotal, total number of concepts in the network.9) G, network-wide constant for the size of the ontol-

ogy view.10) Gc, number of concepts in the network excluding G

concepts from the ontology view of a single node.11) ζ(i), given i concepts in the concept view of a node,

ζ(i) returns the total number of concepts in bothontology and concept views.

12) η(i), given i concepts in the concept view of agossip sender, η(i) returns the number of conceptsthat have ttl value different to one and do not existin either the receiver’s ontology or concept views.

13) N , the number of nodes in the network.14) ParameterNodeView is a constant representing

the number of node ids in a node view,ParameterNodeView = |VN|

15) ParameterRequestTTL, network-wide constant repre-senting the number of hops before a semanticquery expires.

ACKNOWLEDGMENT

The authors would like to acknowledge the support ofLero: The Irish Software Engineering Research Centre,funded by Science Foundation Ireland. We would alsolike to thank Vinny Cahill and Stefan Weber for theirvaluable comments and support in this work.

REFERENCES

[1] D. Martin, M. Burstein, J. Hobbs, O. Lassila, D. McDermott,S. McIlraith, S. Narayanan, M. Paolucci, B. Parsia, T. Payne,E. Sirin, N. Srinivasan, and K. Sycara, “OWL-S: Semantic Markupfor Web Services,” November 2004, w3C. [Online]. Available:http://www.w3.org/Submission/OWL-S

[2] K. Aberer, P. Cudre-Mauroux, A. M. Ouksel, T. Catarci, M.-S. Hacid, A. Illarramendi, V. Kashyap, M. Mecella, E. Mena,E. J. Neuhold, O. De Troyer, T. Risse, M. Scannapieco, F. Saltor,L. De Santis, S. Spaccapietra, S. Staab, and R. Studer, “EmergentSemantics Principles and Issues,” in DASFAA 2004, 2004, pp. 25–38.

[3] K. Aberer, P. Cudre-Mauroux, and M. Hauswirth, “The ChattyWeb: Emergent Semantics Through Gossiping,” in Proceedings ofthe 12th International Conference on World Wide Web (WWW’03).New York, NY, USA: ACM Press, 2003, pp. 197–206.

[4] A. Doan, J. Madhavan, P. Domingos, and A. Halevy, “Learningto map between ontologies on the semantic web,” in Proceedingsof the 11th international conference on World Wide Web (WWW ’02).New York, NY, USA: ACM Press, 2002, pp. 662–673.

[5] S. Castano, A. Ferrara, and S. Montanelli, “H-MATCH: an Algo-rithm for Dynamically Matching Ontologies in Peer-based Sys-tems,” in SWDB, 2003, pp. 231–250.

[6] K. Sycara, S. Widoff, M. Klusch, and J. Lu, “LARKS: DynamicMatchmaking Among Heterogeneous Software Agents in Cy-berspace,” Autonomous Agents and Multi-Agent Systems, vol. 5,no. 2, pp. 173–203, June 2002.

Page 16: c 2008 IEEE. Personal use of this material is permitted. How- · them are the assumptions of semantic similarity, stable connectivity, and global semantic knowledge. INGA [14] exploits

15

[7] K. Arnold, R. Scheifler, J. Waldo, B. O’Sullivan, and A. Wollrath,Jini Specification. Addison-Wesley Longman Publishing Co., Inc.,1999.

[8] E. Guttman, C. Perkins, J.Veizades, and M.Day, Service LocationProtocol, Version 2, IETF, 1999.

[9] U. C. Kozat and L. Tassiulas, “Service discovery in mobile adhoc networks: an overall perspective on architectural choices andnetwork layer support issues,” Ad Hoc Networks, vol. 2, no. 1, pp.23–44, 2004.

[10] F. Sailhan and V. Issarny, “Scalable Service Discovery forMANET,” in Proceedings of the 3rd IEEE International Conference onPervasive Computing and Communications (PERCOM’05). Wash-ington, DC, USA: IEEE Computer Society, 2005, pp. 235–244.

[11] V. Lenders, M. May, and B. Plattner, “Service Discovery in MobileAd Hoc Networks: A Field Theoretic Approach,” in InternationalSymposium on a World of Wireless, Mobile and Multimedia Networks(WoWMoM). IEEE, June 2005.

[12] W. Nejdl, B. Wolf, C. Qu, S. Decker, M. Sintek, A. Naeve, M. Nils-son, M. Palmer, and T. Risch, “EDUTELLA: A P2P NetworkingInfrastructure Based on RDF,” in Proceedings of the 11th Interna-tional Conference on World Wide Web (WWW’02). New York, NY,USA: ACM Press, 2002, pp. 604–615.

[13] P. Haase, R. Siebes, and F. van Harmelen, “Peer Selection inPeer-to-Peer Networks with Semantic Topologies,” in Proceed-ings of the 1st International Conference on Semantics in a Net-worked World (ICNSW’04), ser. Lecture Notes in Computer Science,M. Bouzeghoub, Ed., vol. 3226. Springer Verlag, June 2004, pp.108–125.

[14] A. Loser, S. Staab, and C. Tempich, “Semantic social overlaynetworks,” Selected Areas in Communications, IEEE Journal on,vol. 25, no. 1, pp. 5–14, 2007.

[15] B. F. Cooper and H. Garcia-Molina, “SIL: a model for analyz-ing scalable peer-to-peer search networks,” Computer Networks,vol. 50, no. 13, pp. 2380–2400, 2006.

[16] F. Cuenca-Acuna, C. Peery, R. Martin, and T. Nguyen, “Plan-etP: using gossiping to build content addressable peer-to-peerinformation sharing communities,” High Performance DistributedComputing, 2003. Proceedings. 12th IEEE International Symposiumon, pp. 236–246, 2003.

[17] D. Chakraborty, A. Joshi, T. Finin, and Y. Yesha, “GSD: ANovel Group Based Service Discovery Protocol for MANETs,”in Proceedings of the 4th IEEE Conference on Mobile and WirelessCommunications Networks (MWCN’02). IEEE Press, September2002.

[18] P. T. Eugster, R. Guerraoui, S. B. Handurukande, P. Kouznetsov,and A. M. Kermarrec, “Lightweight Probabilistic Broadcast,”ACM Transactions on Computer Systems, vol. 21, no. 4, pp. 341–374, 2003.

[19] Luo, Jun and Eugster, Patrick Th. and Hubaux, Jean-Pierre, “PI-LOT: ProbabilistIc Lightweight grOup communication sysTem forMobile Ad Hoc Networks,” IEEE Transactions on Mobile Comput-ing, vol. 3, no. 2, pp. 164–179, 2004.

[20] A. J. Ganesh, A.-M. Kermarrec, and L. Massoulie, “SCAMP: Peer-to-Peer Lightweight Membership Service for Large-Scale GroupCommunication,” in Third International Workshop on NetworkedGroup Communications (NGC 2001), London, UK, November 2001.

[21] A. Nedos, K. Singh, R. Cunningham, and S. Clarke, “A gossip pro-tocol to support service discovery with heterogeneous ontologiesin manets,” in Third IEEE International Conference on Wireless andMobile Computing, Networking and Communications (WiMob 2007).IEEE, Oct. 2007, to appear.

[22] A. Nedos, K. Singh, and S. Clarke, “Mobile Ad Hoc Services:Semantic Service Discovery in Mobile Ad Hoc Networks,”in Proceedings of the 4th International Conference on Service-OrientedComputing (ICSOC), ser. Lecture Notes in Computer Science,A. Dan and W. Lamersdorf, Eds., vol. 4294. Springer, Dec.2006, pp. 90–103. [Online]. Available: https://www.cs.tcd.ie/publications/tech-reports/reports.07/TCD-CS-2007-21.pdf

[23] R. D. Yates and D. J. Goodman, Probability and Stochastic Processes:a friendly introduction for electrical and computer engineers. Wiley,1999.

[24] T. Kunz, “Multicasting in mobile ad-hoc networks: achieving highpacket delivery ratios,” in CASCON ’03: Proceedings of the 2003conference of the Centre for Advanced Studies on Collaborative research.IBM Press, 2003, pp. 156–170.

[25] S. McCanne and S. Floyd, “ns, Network Simulator,” 1997.

[26] S. PalChaudhuri, J.-Y. L. Boudec, and M. Vojnovic, “Perfect Sim-ulations for Random Trip Mobility Models,” in Proceedings of the38th annual Symposium on Simulation (ANSS’05). Washington, DC,USA: IEEE Computer Society, 2005, pp. 72–79.

[27] Y.-C. Tseng, S.-Y. Ni, Y.-S. Chen, and J.-P. Sheu, “The broadcaststorm problem in a mobile ad hoc network,” Wireless Networks,vol. 8, no. 2/3, pp. 153–167, 2002.

Andronikos Nedos is a Research Fellow atthe Department of Computer Science, TrinityCollege Dublin. He received both his M.Sc. andPh.D. degree from Trinity College Dublin. Hismain research interests include MANET servicediscovery protocols, group communication, ran-domised algorithms and distributed middleware.

Kulpreet Singh studied Group CommunicationSystems for Mobile Ad-Hoc networks during hisPhD at Trinity College Dublin. His interests in-clude new networking paradigms for MANETsthat question and challenge the assumptionsmade for the Internet. Kulpreet studied CivilEngineering in the Himalayas and has lived onthree different continents up till now.

Raymond Cunningham is a Research Fellowat the Department of Computer Science, TrinityCollege Dublin. He holds a B.A. degree in Math-ematics and M.Sc. and Ph.D. degrees in Com-puter Science, both from Trinity College Dublin.His research interests cover the area of mobiledistributed systems, distributed systems optimi-sation techniques and adaptive middleware. Hehas published a number of peer-reviewed pa-pers in the areas of ad hoc networking, peer-to-peer and distributed optimisation.

Siobh an Clarke is a Senior Lecturer and Fellowof Trinity College Dublin, where she leads theDistributed Systems Group. Her research inter-ests are programming models and middlewarefor mobile and embedded systems.