A Distributed Data Storage Protocol for Heterogeneous Wireless

download A Distributed Data Storage Protocol for Heterogeneous Wireless

of 15

Transcript of A Distributed Data Storage Protocol for Heterogeneous Wireless

  • for

    . V

    b

    a r t i c l e i n f o

    Article history:

    called sink. Contrary to ordinary sensor nodes, the sinknode has enhanced computational, storage and powercapabilities since it is responsible for storing and process-ing the network sensed data. Moreover, the network mayemploy a static or a mobile sink. In the former case, the

    ink typically con-from otheray happention by th

    To alleviate this problem, some studies propose usibile sinks [9,21,26,31]. In this approach, a mobile sithe exibility for traversing the network to gathsensed data directly from the sensor nodes, and, thus, nosingle node should suffer from the overhead of relayingdata from all other nodes in the network.

    This work considers WSNs with one or more mobilesinks. In this context and for scalability reasons, it may

    1570-8705/$ - see front matter 2013 Elsevier B.V. All rights reserved.

    Corresponding author. Tel.: +55 31 3409 5863; fax: +55 31 3409 5858.E-mail address: [email protected] (G. Maia).

    Ad Hoc Networks 11 (2013) 15881602

    Contents lists available at SciVerse ScienceDirect

    Ad Hoc Ne

    .e lshttp://dx.doi.org/10.1016/j.adhoc.2013.01.004surveillance), which operate unattended for long periods oftime and generate a considerable amount of data, posesseveral challenges [1,10,38]. One of them is how to retrievethe sensed data. This is usually performed by a special node

    [19,20,36] in which nodes closer to the ssume more energy due to data relayingin the network. Hence, disconnections mnetwork, compromising the data collecnodesin thee sink.ng mo-nk haser thethe data distribution by storage nodes. 2013 Elsevier B.V. All rights reserved.

    1. Introduction

    The deployment of large-scale Wireless Sensor Network(WSN) applications (e.g., environment sensing and military

    ordinary sensors need to route the sensed data to the sink,so connectivity to at least one sink in the network must bemaintained to guarantee a good data collection. However,such an approach suffers from the energy hole problemReceived 15 June 2012Received in revised form 5 January 2013Accepted 16 January 2013Available online 4 February 2013

    Keywords:Distributed data storageHeterogeneous wireless sensor networksMobile sinksa b s t r a c t

    This paper presents ProFlex, a distributed data storage protocol for large-scale Heteroge-neous Wireless Sensor Networks (HWSNs) with mobile sinks. ProFlex guarantees robust-ness in data collection by intelligently managing data replication among selected storagenodes in the network. Contrarily to related protocols in the literature, ProFlex considersthe resource constraints of sensor nodes and constructs multiple data replication struc-tures, which are managed by more powerful nodes. Additionally, ProFlex takes advantageof the higher communication range of such powerful nodes and uses the long-range linksto improve data distribution by storage nodes. When compared with related protocols, weshow through simulation that Proex has an acceptable performance under message lossscenarios, decreases the overhead of transmitted messages, and decreases the occurrenceof the energy hole problem. Moreover, we propose an improvement that allows the proto-col to leverage the inherent data correlation and redundancy of wireless sensor networksin order to decrease even further the protocols overhead without affecting the quality ofINRIA, Saclay, Orsay, Francec Federal University of Alagoas, Macei, AL, Brazild Pontical Catholic University, Belo Horizonte, MG, BrazilA distributed data storage protocolsensor networks with mobile sinks

    Guilherme Maia a,, Daniel L. Guidoni a, Aline CAntonio A.F. Loureiro a

    a Federal University of Minas Gerais, Belo Horizonte, MG, Brazil

    journal homepage: wwwheterogeneous wireless

    iana b, Andre L.L. Aquino c, Raquel A.F. Mini d,

    tworks

    evier .com/locate /adhoc

  • G. Maia et al. / Ad Hoc Networks 11 (2013) 15881602 1589be infeasible for the mobile sink to visit all the networknodes in order to collect the sensed data. Therefore, akey research challenge is how can the sensed data be dis-tributed among sensor nodes (data dissemination), so itcan be later gathered by a mobile sink without the needof visiting all sensor nodes in the network. Dependingon how the data is distributed, the mobile sink (i) mayhave to follow a predened trajectory in which it needsto visit specic storing nodes or locations in the network[6,13,17,22,30,32,37] or (ii) it can be free to follow anuncontrolled mobility pattern [5,33,34]. Clearly, avoidingrestrictions on the mobile sink trajectory is benecialfor both the mobile sink and sensor nodes, since the net-work is free to adapt to changing conditions. Hence, here-in we keep our focus on how to select well distributedstoring nodes in WSNs with mobile sinks whose trajecto-ries are unknown to the sensor nodes.

    There are a few studies [3,28,29] with the same afore-mentioned focus that use more powerful nodes in con-junction with ordinary sensor nodes to performdistributed data storage. In this scenario, only those pow-erful nodes are responsible for storing all sensed data,since the assumption is that they have no storage con-straints. Moreover, this heterogeneous conguration doesnot present the same performance and scalability issuesas homogeneous WSNs [8,12,18,27]. Nevertheless, theuse of more powerful nodes does not overcome the prob-lem of data losses, since these nodes still can fail. To in-crease the network resilience to failures, a possibleapproach is to replicate a given data and keep it at differ-ent storage nodes. Furthermore, it should be noticed thatin the presence of a mobile sink, a good setup on the num-ber of replicas and a good selection of well-distributedstoring nodes might enable the sink to get a representa-tive sample of the entire network data by visiting only asmall percentage of the storing nodes [5,11,33]. Hence,the use of a suitable replication mechanism together witha well-thought decision of where to store a specic datapacket are key elements for the effectiveness and ef-ciency of a data storage protocol.

    Given these principles, in this work we propose a proto-col named ProFlex that employs powerful nodes to performdistributed data storage in Heterogeneous Wireless SensorNetworks (HWSNs) (see Section 3). However, besides usingthe extra storage features of these nodes, we take advan-tage of their powerful communication capability and usethe long-range links to improve data distribution. Thus,ProFlex is by design aware of the WSN heterogeneoustopology.

    Simulation results (Section 4) show that using a heter-ogeneous network topology, ProFlex has an acceptableperformance under scenarios with message losses, de-creases the overhead of transmitted messages andachieves good collection efciency results. Moreover, Pro-Flex can perform data correlation and redundancy inher-ent to WSNs, resulting in a overhead decrease oftransmitted messages without any negative effect on thedata gathering efciency results. In addition, we discussthe related work and present the system model in Sec-tion 2, and conclude the paper and discuss the futurework in Section 5.2. Background

    2.1. Data storage protocols

    In the following, we present some of the proposals fordata storage in WSNs discussed in the following.

    Sheng et al. [28,29] study the data storage placement inWSNs to deal with the traditional problem of trafc over-head and, consequently, high energy consumption of nodescloser to the sink. To overcome this problem, they proposetwo networkmodels. The rst one considers a tree topologyrooted at the sink and a subset of them are selected as stor-age nodes, which are responsible for storing data collectedby their descendants in the tree. In the secondmodel, a treetopology is constructed after the planned deployment ofthe storage nodes, whose positions are obtained from a lin-ear programming optimization. Nevertheless, node failuresare not considered in both models, which might result inthe loss of all data collected by storage nodes descendants.

    Bar-Yossef et al. [5] propose a lightweight randommembership service for ad hoc networks called RaWMS.The protocol provides each node with a partial uniformlychosen view of network nodes. The protocol is based on areverse maximum degree random walk (RW) samplingtechnique. In RaWMS, every data producer node starts areverse maximum degree RW, whose message carries thenodes identier and data. Each RW traverses the networkfor a predened number of hops, so every message has anassociated time-to-live (TTL) eld that denes the length ofthe RW. The last node in the RW appears as if it was pickeduniformly at random out of all network nodes and it will beresponsible for storing the data carried by the RW. Theauthors prove that when the RW nishes, each node willhave a uniform random view with data from the nodes inthe network. In other words, RaWMS uniformly distributesthe network data to the sensor nodes. Although the resultsof RaWMS to uniformly distribute the monitored datathroughout the network are quite encouraging in termsof data gathering efciency, as we shall present in our per-formance analysis (see Section 4), RaWMS incurs a highoverhead, resulting in a short network lifetime.

    Vecchio et al. [33] propose Deep, a density-based proac-tive data dissemination protocol for WSNs with uncon-trolled sink mobility. The goal is to obtain an effectiveuniform distribution of the sensed data at a reduced com-munication overhead. Deep combines a probabilistic ood-ing with a probabilistic storing scheme that allows the sinkto gather a representative view of the networks senseddata by visiting any set of x out of n total nodes, wherex n. In Deep, when node i receives a message m for therst time, it rebroadcasts m with probability

    p min 1; bjNpj

    , where N(p) is the one hop neighborhood

    of i and b is the desired average number of retransmissionsin the neighborhood. Moreover, if node i does not rebroad-castm but does not hear any other rebroadcast ofm after aperiod of time, then i rebroadcasts m after all. Finally, for apartial view (local data sample) of size

    n

    p, where n is the

    number of nodes in the network, node i stores a new re-

    ceived message m with probability equal to s n

    pn . A node

    in Deep must keep track of all received messages during its

  • sensor network comprised of just H-sensor nodes? Whereas

    1590 G. Maia et al. / Ad Hoc Networks 11 (2013) 15881602operation, what might be impracticable in scenarios with ahigh number of produced messages. Moreover, Deep pre-sents efcient data gathering results comparable to RaW-MS, although incurring a far lower message overhead.Nevertheless, such overhead is still high when comparedwith more recent approaches.

    One such approach is the Supple protocol [11] a exi-ble probabilistic data dissemination protocol for WSNs thatconsiders static or mobile sinks. The Supple protocol hasthree phases: tree construction, weight distribution, anddata replication. The rst phase is a tree construction initi-ated by a central sensor node of the sensing area (e.g., thesink node). The central sensor node is responsible forreceiving and replicating the collected data in the network.The second phase assigns weights to nodes, which repre-sent the probability of a node storing data. Supple usesthe hop distance of a node to the central node to calculatethis probability. In the last phase, the sensor nodes sendtheir data to the central node and this node replicates eachdata r(v) times using the tree infrastructure and accordingto its storage probability. The value of r(v) depends on theweights and on the amount of data each node is allowedto store. Viana et al. [11] claim that a mobile sink visitinga small fraction of nodes, i.e., about 2:3

    n

    p, for a total of n

    nodes, can retrieve all the generated data in the network.Moreover, thanks to r(v), there will be no data losses in casethere is a failure of a small number of nodes. However, Sup-ple does not consider the problems of nding a good posi-tioned central node, energy consumption and trafcoverload at nodes closer to the central node.

    Notice that the main solutions described here rely onsome kind of replication mechanism in order to increasethe protocols resilience to node failures and messagelosses, and also as a support mechanism for the properdata distribution among storing nodes. Moreover, the datadissemination from one node to another may rely on an al-ways-up routing structure like the one employed by Sup-ple or it may be based on other communicationmechanisms like the gossip-based approach employed byDeep. Clearly, each existing proposal shares its strengthsand weaknesses, thus our solution proposes to benetfrom the best features employed by those protocol com-bined with the recent advances in HWSN design. Further-more, due to resource constraints of these networks wefocus on (i) increasing the network resilience to failureswith the use of multiple replication structures and (ii)reducing the overhead as much as possible at the cost ofa small penalty in the data distribution efciency.

    2.2. System model

    The main adopted assumptions are detailed hereafter.

    2.2.1. NodesWe consider that there is a large number n of sensor

    nodes scattered on a given geographic area for collectingdata or monitoring events. All sensors are uniquely identi-ed and can be of two types. The rst one, named L-sensorfor low-end sensor, is a node with limited resources,including processor, storage, communication and powerresources. The second type, named H-sensor for high-endH-sensor nodes are more powerful when compared withL-sensor nodes, the latter are much less expensive. Hence,it is assumed the network is composed of nL L-sensor nodes,and nH H-sensor nodes, where n = nL + nH and nL nH.Moreover, nodes later selected as storage node are pro-vided with a partial view (local data sample) v with datafrom some other nodes, including itself. This set of storingnodes is dened as S. Therefore, each node may act as astorage node for some other nodes, but not for all of them.Due to the limited buffer of L-sensor nodes, power-awarecompression [23] and reduction algorithms [4] may be em-ployed. Finally, as an abuse of terminology, we also use theID of a node as a reference to the data produced by thenode. Hence, for the sake of presentation, the partial viewv of a node i is a collection of IDs stored at node i.

    2.2.2. CommunicationWe consider a connected network topology along the

    time. Given the expected network lifetime, we can estimatethe amount of sensor nodes to achieve this goal. An L-sensornode i can communicate with another node j (L-sensor or H-sensor) that is inside its communication radius r1, i.e., thedistance between i and j should be less than or equal to r1(d(i, j) 6 r1). H-sensor nodes are equipped with two radios,each onewith a different frequency and a different commu-nication radius (r1 and r2, r2 r1). It is also assumed thatradio frequenciesdonot interferewith eachother. AnH-sen-sor node can communicate with both L-sensor and H-sensornodes inside communication radius r1 and r2, respectively.

    2.2.3. Initial knowledgeInitially, a node i only knows its identity, which is un-

    ique, and a parameter I(i) that denes its importance factorin the network (I:S? [0,1], called the importance factorfunction). Similar to the Supple protocol [11], importancefactors are initially assigned to nodes based on an externalcriterion. It determines nodes in the network responsiblefor storing data. For instance, if the criterion is the sensorlocation, only nodes at the target location will be used asstoring nodes and will have I(i) > 0, whereas nodes outsidethe target location will have I(i) = 0. It may also be desiredto choose as storing nodes only H-sensor nodes, since theyare more powerful than L-sensor nodes. In this scenario, H-sensor nodes will have I(i) > 0 and L-sensor nodes will haveI(i) = 0. On the other hand, if all nodes can be uniformly se-lected as storing nodes, then all nodes in the network willhave I(i) = 1. For comparison reasons, in this paper, this lastscheme is used, hence for all L-sensor and H-sensor nodes,I(i) = 1. In summary, the attribution of importance factorsamong nodes will dene the set of storing nodes S.

    3. Proposed protocol

    In this section, we present ProFlex, a Distributed DataStorage Protocol for Heterogeneous Wireless Sensorsensor, is a node with more sophisticated resources. Thus,H-sensor nodes have improved processing, storage, batteryand communication power when compared with L-sensornodes. A question that might arise is: Why not designing a

  • and its assignment may follow any distribution. For com-parison reasons with the protocols described in Section 2.1,here we use the uniform distribution. Thus, for all nodes,I(i) = 1. This gives to all nodes in S equal chance of being as-signed to the role of storing node and it also gives to allnodes equal chance of storing the same amount of data.Note that the importance factor of both L-sensor and H-sen-sor nodes are equal, thus they both have the same proba-bility of storing data for other nodes. It is straightforwardto notice that due to their better resource capabilities, H-sensor nodes could have a greater importance factor thanL-sensor nodes. Therefore, the former would store moredata than the latter, but we chose not to do so. Such adecision is supported by the fact that in this work, we onlyintend to leverage the communication characteristicsinherent to heterogeneous WSNs and not the increasedstorage capacity of H-sensor nodes.

    After dening the set of storing nodes, a key challenge ishow much data should a node store. Stating another way,what should be the partial view size jvj of nodes in the setof storing nodes? In existing protocols [5,11,33], the partialview size jvj (i.e., maximum number of allowed storedpackets at a given node) is a statically congured parame-

    G. Maia et al. / Ad Hoc Networks 11 (2013) 15881602 1591tree rooted at the closest H-sensor node. Hence, duringthe tree construction, when an L-sensor node receives sev-eralH-sensors messages, it will update its local informationand forward the message further only if the message isfrom a closer H-sensor node. In case there is a tie, i.e., thereare two or more shortest paths, the L-sensor node will use acriterion (e.g., geographic location of the H-sensor node).Otherwise, the node will simply discard the message.

    For the sake of presentation and of comparison withSupple, we consider the use of a binary tree as a routingstructure. Also, in our performance analysis, the PeerNet[14] protocol was used to build such a binary tree. ProFlexsupports, however, any other routing structure.

    3.2. Importance factor distribution

    In ProFlex, all nodes in the set of storing nodes have animportance factor assigned by the function I:S? [0,1],which will be later used at the computation of node isstoring probability. The importance factor assigned to aparticular node i dictates whether node i will play the roleof storing data for other nodes (I(i) > 0) or not (I(i) = 0). Fur-thermore, it dictates how much data a node should storeNetworks with Mobile Sinks. The protocol is composed ofthree phases: tree construction, importance factor distri-bution, and data distribution. At the end of the threephases, ProFlex guarantees that a node in the set of storingnodes will store an amount of data proportional to itsimportance factor. For instance, if all nodes in the set ofstoring nodes have the same importance factor, then Pro-Flex guarantees a uniform distribution of network dataamong the nodes in the set of storing nodes. Algorithm 1presents a general overview of the protocol and the follow-ing sections a detailed description of its phases.

    3.1. Tree construction

    The rst step of ProFlex is the tree construction initiatedby allH-sensor nodes in the network.More specically, con-trarily to Supple [11], Proex deals with the problem oftrafc overload at nodes closer to the unique tree root.For this end, multiple trees (i.e., replication structures) areconstructed according to the number and positioning ofthe H-sensor nodes. These trees aggregate the shortestpaths from each L-sensor node to the closest H-sensor node.In this work, the shortest path means the minimum num-ber of hops between an L-sensor and its closest H-sensornode, but any other metric can be used, i.e., delay, capacity,etc. Notice that although each H-sensor node builds a treerooted at itself, each L-sensor node will belong only to theter. It is congured considering a uniform distribution ofthe data and is based on the size of the set of storing nodesat a specic replication structure. By doing this, they en-sure there will be enough space to store the data generatedby the entire network. In particular, in Supple [11], a un-ique tree-based replication structure is considered and, ifuniform selection criterion is used, the size of the set ofstoring nodes will be equivalent to the number of nodesin the tree, i.e., n nodes. However, ProFlex uses severalreplication structures at the data distribution phase (seeSection 3.3 and Fig. 1), which are given by the multipleconstructed trees. Thus, each tree denes different storingnodes and sizes of set S. This requires a dynamic congura-tion of the partial view size of nodes per replication

    |S | = 500

    |S | = 300

    |S | = 200

    A

    B

    C

    H-sensor

    L-sensor

    T

    T

    T

    A

    A

    B

    B

    C

    C

    Fig. 1. Network with 1000 sensor nodes of which three are H-sensornodes.

  • 1592 G. Maia et al. / Ad Hoc Networks 11 (2013) 15881602structure since each structure will store a different amountof data.

    In Supple [11], given a partial view of size jvj and a net-work with n data producers, the set of storing nodes Smustcontain at least Hnv lnn nodes in order to guarantee withhigh probability a good storage of all n collected data. Onthe other way, the partial view size must be v P njSj lnn.A partial view size jvj np provides a good compromisebetween resilience and sensors resource consumptionwhen jSj = n. Notice that, the partial view size depends onthe set size jSj and the number of data producers n in thereplication structure. Since ProFlex uses several replicationstructures rather than one, the partial view size jvj of stor-ing nodes will be different for each tree T. As will be dis-cussed in the next section, an H-sensor node h stores inits tree Th all data produced by nodes belonging to itsown tree and by nodes belonging to neighboring trees.Neighboring trees are trees rooted at H-sensor neighborsof the H-sensor node h, denoted by N(h). For instance,Fig. 1 shows a network with three H-sensor nodes (A, Band C) and their respective trees. In that gure, the treeTA rooted at H-sensor node A has 500 storing nodes (jSA-j = 500) and H-sensor node A has two H-sensor neighbors(B and C), hence it has two neighboring trees (jN(A)j = 2),i.e., TB and TC. Thus, later, H-sensor node A will store in TAdata produced by nodes in TA and TN(A).

    For the special case where there are n data producers,an H-sensor node only needs to know the number of stor-ing nodes on its own tree and on the neighboring trees.This information may be piggybacked in packets duringthe importance factor distribution. Algorithm 2 showshow the size of the set of storing nodes and the importancefactor are distributed. The main idea behind this algorithmis to initialize each node i with a tuple (Il(i), I(i), Ir(i)), whereIl(i) (similarly to the third component Ir(i)) is the impor-tance factor of the left (similarly to the right) subtree of i,and I(i) is the importance factor of node i. Note that Il(i)(similarly to Ir(i)) is the sum of all importance factors ofnodes in the left (right) subtree of node i.

    When the H-sensor node h knows the size of the set ofstoring nodes jShj in its tree, it will forward this value toits H-sensor neighbors. For the special case where I(i) = 1for all nodes, then the sum Il(h) + I(h) + Ir(h) gives the sizeof the set of storing nodes jShj. Eventually, node h will alsoreceive this value from its neighboring H-sensor nodes. Fi-nally, node h calculates the size of the set of aggregated

    storing nodes Shaggr , i.e., its own set of storing nodes plus

    the set of storing nodes at its neighboring trees:

    Shaggr jShj Pj2NhjSjj. Based on this information, nodeh calculates the partial view size jv j

    Shaggr

    rfor storing

    nodes on its tree and embeds this value on every datapacket replicated on its own tree (see Section 3.3). Usingthis information, a node in the set of storing nodes knowsthe maximum volume of data it can store. Summarizing, anH-sensor h initially builds its tree Th and computes thenumber of storing nodes jShj in its tree. Then it forwardsjShj to H-sensor neighbors and eventually receives the num-ber of storing nodes in the neighboring trees. Finally, usinga uniform distribution of the network sensed data amongthe set of storing nodes. In fact, the partial view v of nodesis constructed due to the distribution of r(v) replicas ofeach data packet. More specically, the transmission ofr(v) replicas by each root h of each tree Th will guarantee

    the storage of jvj Shaggr

    rdata packets at each storing

    node of each tree.Algorithm 3 shows the main steps a node must perform

    when it produces or receives a data packet. Initially, whenan L-sensor node produces a data packet or receives a datapacket from a child node, it just forwards the data to itsparent until it reaches the H-sensor node that is at the rootof its tree.jShj and the number of storing nodes in neighboringtrees, H-sensor h computes the partial view size jvj andembeds this value on every data packet replicated on itsown tree.

    For instance, in Fig. 1, for the H-sensor node A, we havejSAj = 500 and

    Pj2NAjSjj 300 200, hence SAaggr

    1000and jvj = 31. Thus, all nodes in As tree will have a partialview size jvj = 31. H-sensor nodes B and C will also executethese same steps to compute the value jvj for their trees.

    3.3. Data distribution

    The data distribution phase is at the heart of ProFlexand is responsible for properly propagating the sensed datato the set of storing nodes. For scenarios in which equalimportance factors are assigned to nodes, ProFlex ensures

  • Moreover, when an L-sensor node receives a data packetfrom its parent or an H-sensor receives a data packet from aneighboring H-sensor node, it also calls ForwardData todetermine whether it will forward or store the packet.The algorithm stops when a leaf node in the set of storingnodes receives the message. At the end of the data distri-bution, all nodes of the set of storing nodes will have, withhigh probability, a partial view size jvj.

    G. Maia et al. / Ad Hoc Networks 11 (2013) 15881602 1593When an H-sensor node h produces a data packet or re-ceives one from a child L-sensor node, it rst computes thenumber of replicas r(v) for the packet to be forwarded to itschildren and H-sensor neighbors. Such computation en-sures that nodes belonging to the set of storing nodes re-ceive with high probability jvj distinct data packets.

    As shown in [11], the number of replicas r(v) is com-puted based on the desired partial view size of nodes inthe set of storing nodes. For the special case where

    jv j Shaggr

    r, then rv

    Shaggr

    r. Hence, in order to com-

    pute the number of replicas for a data packet, an H-sensor hneeds to know the number of storing nodes on its own treeand on neighboring trees. Such information is computed in

    Algorithm 2, thus rv Shaggr

    r.

    After computing the number of replicas r(v) for a givendata packet, H-sensor h determines how many replicasfrom r(v) goes to its own tree and to each neighboring tree.This is proportional to the percentage of storing nodes ineach tree with respect to the total number of storing nodes

    in Shaggr . Let rk(v) be the number of replicas for a tree Tk fork 2 N(h) [ {h}, thus rkv jSk jShaggrj j rv. After determiningthe number of replicas, H-sensor h sends rk(v) data replicasto each H-sensor k 2 N(h).

    For instance, in Fig. 1, H-sensor A calculates r(v) = 31.Therefore node A sends rBv 3001000 31 9 replicas ofeach data to node B, rCv 2001000 31 6 replicas of eachdata to node C and rAv 5001000 31 16 to its own tree.

    Finally, H-sensor h calls ForwardData (Algorithm 4)r (v) times. The propagation by the H-sensor node is done

    I(h) = 1

    I (h) = 12 I (h) = 18l r

    h

    Fig. 2. Forward data example.h

    according to the importance factor of its left and right sub-trees and also to its own importance factor. To understandhow is the ForwardData operation, in Fig. 2 a node h re-ceives a data packet and needs to make a decision whetherthe packet should be stored locally, or should be forwardedto the left or right subtree. Thus, node h sums the value ofits own importance factor (I(h) = 1) and the values of theimportance factor of the left (Il(h) = 12) and right(Ir(h) = 18) subtrees. The total sum is equal to 31. Then,node h picks uniformly and randomly a value x in the inter-val [0,31]. If x < 12, then forwards data to left subtree. If12 6 x 6 13, then node h stores the packet. Otherwise, itforwards the packet to the right subtree.duces a single data packet, and a gathering phase. In therst phase, each node executes a data storage protocolfor performing data distribution over the network, accord-ing to each considered storage protocol. Then, in the gath-ering phase, a mobile sink node collects data from thestoring nodes. In particular, the sink performs as many vis-its as necessary to get a representative amount of data

    Table 1Simulation parameters.

    Parameter Value

    Number of sensor nodes (nL + nH) 1000Sensor eld 800 800 m2Network density 20L-Sensor communication range 60 mImportance Factor I(i) 1jvj (Deep, RaWMS and Supple) 31r(v) (RaWMS and Supple) 31TTL (RaWMS) 125b (Deep) 5.44. Performance analysis

    In order to assess the proposed protocol, we performeda series of simulations. As stated earlier, ProFlex operatesunder a heterogeneous WSN made up of two kinds of sen-sor nodes, H-sensors and L-sensors. Also, H-sensor nodeshave an increased communication capacity when com-pared with L-sensor nodes. Therefore, the rst step of ourperformance analysis was to nd out what should be thenumber of H-sensor nodes (nH) in the network and its com-munication radius (r2) in order to achieve a good trade-offbetween data gathering efciency and message overhead.

    Thereafter, ProFlex was compared with the storage pro-tocols described in Section 2.1, named Supple [11], RaWMS[5], and Deep [33], under different simulation scenarios. Fi-nally, we proposed and evaluated an improvement on Pro-Flex to decrease even further the protocols overhead.

    All simulations were performed using the Sinalgo [16]simulator, version 0.75.3. Hereafter, all results are thearithmetic mean of a number of simulations necessary toaccomplish a condence interval of 95%. Each simulationcomprises a dissemination phase, where each node pro-

  • from the network, i.e., to get n different entries of storingnodes views. The main adopted parameters are shown inTable 1. Notice that some of the presented parametersare exclusive for some of the literatures protocols.

    4.1. ProFlex assessment

    A question someone using ProFlex might ask is howmany H-sensor nodes (nH) should be employed in the net-work and what should be the communication radius (r2)between them. Moreover, what is the impact that thesetwo parameters have on ProFlexs performance. Clearly,as stated in Section 2.2, the number of H-sensor nodes(nH) should be much lower than the number of L-sensornodes (nL) or the cost to deploy the network will end uptoo high. Furthermore, the communication radius betweenH-sensor nodes should not exceed the technological limitsimposed by current wireless communication radio

    1594 G. Maia et al. / Ad Hoc Networks 11 (2013) 15881602interfaces.With these issues in mind, for a given number of H-sen-

    sor nodes (nH) and a given communication radius (r2) be-tween them, we evaluated ProFlexs data collectionefciency and message overhead. By data collection ef-ciency, we mean that, after the distribution of data to thenetwork, a mobile sink placed at a random position visitsthe node at this position and then chooses the next posi-tion to visit, as described in [15]. When visiting a node,the mobile sink gathers all data stored at the partial viewof this node. This procedure continues until all nodes havebeen visited. The protocol that collects 100% of the data byvisiting the lowest number of nodes is the protocol withthe best data gathering efciency. Also, by message over-head, we mean the total number of transmitted messagesnecessary to distribute the network data to selected stor-age nodes.

    Fig. 3 shows the collection efciency and the messageoverhead for different values of nH (5,10,15,20) and r2(120,180,240,300,360,420,480). It is worth noticing thevalues chosen for nH satises the condition that nL nH,i.e. for a network with a 1000 nodes, then at most 2% ofthe nodes are H-sensor nodes. In the gure, each curve rep-resents a value for nH and each marked point represents a

    Fig. 3. Trade-off between data gathering efciency and messagesoverhead.value for r2. Also, the x-axis accounts for the number ofnodes the mobile sink must visit in order to collect 90%of the network data, while the y-axis accounts for the totalnumber of transmitted messages in order to distribute thenetwork data to storing nodes. For instance, the trianglepoint of the dotted green line represents ProFlex with 5H-sensor nodes and a communication radius between themequal to 480 m. Thus, with this conguration the mobilesink needs to visit about 100 nodes in order to collect90% of the network data and ProFlex transmits almost240,000 messages in order to distribute the network datato the storage nodes. Looking at this gure it becomes clearthat there is a trade-off between data collection efciencyand overhead, i.e., for a better collection efciency, it isnecessary to transmit more messages during the datadistribution.

    4.1.1. Varying the number of H-sensor nodesLooking at nH in isolation, we conclude the following. By

    increasing nH, we decrease the data collection efciency (itbecomes worse) and also decrease the protocols overhead.For instance, keeping r2 = 120 m, when nH = 5, ProFlex has acollection efciency of about 170 nodes and an overhead ofalmost 190,000 messages. But, when nH = 20, ProFlexscollection efciency becomes much worse, about 420nodes, and the overhead also decreases to almost 70,000messages. The explanation for such a behavior is subtleand is related to the way ProFlex computes the numberof replicas r(v) for each data packet (cf. Section 3.3). For-instance, consider the network in Fig. 1. We know thatrAv

    1000

    p 31; rBv

    800

    p 28; and rCv

    700p

    26. Hence, each H-sensor node replicates rA jSAj = 15,500; rB jSBj = 8400; rC jSCj = 5200, and the totalnumber of replicas in the network is equal to15500 + 8400 + 5200 = 29,100 data replicas. Now, imaginethe network in Fig. 1 has only the H-sensor node A, withjSAj = 1000. Hence, rAv

    1000

    p 31, and rA jTAj =

    31 1000 = 31,000 data replicas, or 1900 more data repli-cas than the former case, showing that an increase on nHresults in a decrease of the overhead. Finally, as statedearlier, with a decrease of the overhead there is also a de-crease of the data collection efciency.

    4.1.2. Varying communication radius between H-sensor nodesNow, looking at r2 in isolation we conclude the follow-

    ing. By increasing r2, we increase the data collection ef-ciency and also increase the protocols overhead. Thisbehavior is due to the fact that with a greater r2, data pack-ets can be disseminated farthest away from its origin point,enabling the data dissemination to more spots in the net-work and thus, improving the data gathering efciency.But, in order to reach more spots in the network, eachpacket must be relayed more times until it nds its naldestination, thus increasing ProFlexs message overhead.

    Based on the above result, it becomes clear that a net-work designer using ProFlex must decide if the mainrequirement is a good data collection efciency or a lowoverhead, i.e., accomplishing a competing data collectionefciency at the cost of a high overhead, or accomplishinga very low overhead at the cost of a worse data collectionefciency. Considering the nature of WSN applications,

  • we argue that in most cases the focus on reducing the mes-sage overhead as much as possible is the prevailing one.Furthermore, a not so good data collection efciency canbe easily circumvented, for instance, by employing morethan one mobile sink. Hence, since our focus is on reducingProFlexs overhead and also accomplishing a competingdata collection efciency, hereafter, all simulations wereperformed using a heterogeneous network composed ofnL = 990 L-sensor nodes and nH = 10 H-sensor nodes withcommunication radius between H-sensor nodes equal tor2= 360 m.

    4.2. ProFlex vs. literature protocols

    This section evaluates the ProFlex behavior comparedwith existing protocols in the literature, more specically,Supple [11], Deep [33] and RaWMS [5]. Those protocolswere chosen due to their similarities with ProFlex or dueto their rst-class performance. It is worth noting that inour evaluation, the literature protocols also operate undera heterogeneous WSN with the same parameters (nH and

    increase the number of H-sensor nodes to be deployed (asdiscussed in the previous section).

    4.2.2. Loss and failure robustnessFig. 6 shows the data gathering efciency under a sce-

    nario with message loss. The protocol that was presentingthe best results so far, Supple, does not perform as well un-der a message loss scenario as it did under a reliable sce-nario, but can recover 100% of the data. ProFlex stillpresents a worse data gathering efciency than Deep andRaWMS, but performs better than Supple. Actually, it ispossible to see that when the message loss probability in-creases, the data gathering efciency of ProFlex, Deep andRaWMS is almost unaffected. Hence, we can conclude thatmessage loss does not have as much effect on ProFlex,Deep and RaWMS as it does on Supple.

    Under a scenario with node failure the results change alittle bit. As shown in Fig. 7, both ProFlex and Supple areaffected by node failures, whereas Deep and RaWMS arepractically unaffected. The explanation for such a resultcan be attributed to the fact that both ProFlex and Supple

    G. Maia et al. / Ad Hoc Networks 11 (2013) 15881602 1595r2) used by ProFlex even if they were not designed for thiskind of network. This fact has no negative effect on theevaluated protocols. On the contrary, it leads to a fair thecomparison with ProFlex.

    4.2.1. Data gathering efciencyFig. 4 shows the data gathering efciency for all proto-

    cols in a scenario with no message loss nor node failure. Ascan be observed, ProFlex has a slightly worse data gather-ing efciency than the literature protocols. The explanationfor such a result can be attributed to the fact that ProFlextransmits less messages than the other protocols (as willbe shown later in this section) and, consequently, thereare less data replicas throughout the network. Moreover,contrary to the literature protocols, a data packet producedin the tree of one H-sensor node is only replicated on itsown tree and on its neighboring trees. This associated tothe fact that not every H-sensor node is connected to each

    Visited Nodes

    Col

    lect

    ed D

    ata

    (%)

    0

    20

    40

    60

    80

    100

    0 200 400 600 800 1000

    DeepProFlexRaWMSSupple

    Fig. 4. Data gathering efciency for a scenario with no message loss nornode failure.other limits the regions in the network where a data packetwill be replicated, i.e., the sink can nd it. Such a fact is pre-sented in Fig. 5. Here, we divided the network sensor eldin 16 cells of equal size and computed for a data packet thenumber of cells it was found by the mobile sink. As can beobserved, the literature protocols disseminate most of thedata at about 14 cells in the network (i.e., 87,5% of networkcells), while in ProFlex the number of cells a data packetcan be found is about 8 (i.e., 50% of network cells). Sucha difference clearly presents an impact on the data gather-ing efciency of ProFlex. However, as discussed earlier,such a difference between ProFlexs data gathering ef-ciency when compared with existing protocols was antici-pated and actually purposeful, since our main focus herewas to keep ProFlexs overhead as low as possible. If thedata gathering efciency becomes an issue, a possiblealternative is to employ more than one mobile sink to per-form the data gathering or, in detriment of the energy, to

    Fig. 5. Data dissemination efciency for a scenario with no message lossnor node failure.Number of Storing Cells

    Perc

    enta

    ge o

    f Dat

    a

    0

    5

    10

    15

    20

    25

    30

    5 10 15

    DeepProFlexRaWMSSupple

  • 1596 G. Maia et al. / Ad Hoc Networks 11 (2013) 15881602Col

    lect

    ed D

    ata

    (%)

    0

    20

    40

    60

    80

    100

    DeepProFlexRaWMSSuppleuse a tree for doing data dissemination while Deep andRaWMS do not rely on any kind of predened replicationstructure, i.e., RaWMS relies on random walks and Deepuses a gossip-based communication. Thus, when a nodein the tree fails, all of its children are also compromised,preventing all of them from receiving any data packet.However, such a drawback can be easily circumvented byusing some kind of salvation mechanism like the one em-ployed by RaWMS, i.e., when a node forwards a messagein the RW and does not receive an acknowledgement, thenit forwards the message to another neighbor. Finally, it isworth noticing that although ProFlexs data gathering ef-ciency is still worse when compared with Deep and RaW-MS under loss and failures scenarios, ProFlex does notloose data, i.e., the mobile sink is able to gather 100% ofthe network data. Hence, the same alternatives to improve

    Visited Nodes

    Col

    lect

    ed D

    ata

    (%)

    0

    20

    40

    60

    80

    100

    0 200 400 600 800 1000

    Vis0 200 40

    Fig. 6. Data gathering efciency forCol

    lect

    ed D

    ata

    (%)

    0

    20

    40

    60

    80

    100

    DeepProFlexRaWMSSuppleProFlexs data gathering efciency under a reliable sce-nario may also be employed under loss and failure ones.

    4.2.3. Energy hole vulnerabilityA well-known problem in WSNs is the energy hole

    problem [19,20,36] in which nodes closer to the sink tendto consume its energy resources faster than other nodes,since they have to route packets from all other nodes inthe network. In ProFlex and Supple (Deep and RaWMS donot use trees for data dissemination), when a node has datato distribute to the network, it rst sends the data to theroot of the tree, and only then the root node will be respon-sible for distributing data to the network. Hence, nodescloser to the root node tend to route more packets thanother nodes resulting in the energy hole problem (seeFig. 7).

    ited Nodes0 600 800 1000

    Visited Nodes0 200 400 600 800 1000

    DeepProFlexRaWMSSupple

    a scenario with message loss.

  • G. Maia et al. / Ad Hoc Networks 11 (2013) 15881602 1597Col

    lect

    ed D

    ata

    (%)

    0

    20

    40

    60

    80

    100

    DeepProFlexRaWMSSuppleIn order to evaluate the energy consumed by nodesaccording to their depth, Fig. 8 shows the average numberof data packets sent by a node as a function of its depth inthe binary tree, in a scenario with no message loss or nodefailure. As expected, the closer the node is to the root, themore messages it has to send. However, in ProFlex, eachH-sensor node is also a root node, and, consequently, moretrees for data distribution are created in the network. Theoverhead is, thus, distributed among the trees, alleviatingthe impact of the energy hole problem. Despite Supple isalso operating under a heterogeneous infrastructure, it isnot tailored to take advantage of the features provided bythis kind of network, opposed to ProFlex. For instance,when the nodes depth is 1, ProFlex sends 91% less datapackets than Supple. Therefore, we can conclude thatincreasing the number of H-sensor nodes, decreases thechances of the energy-hole problem.

    Visited Nodes

    Col

    lect

    ed D

    ata

    (%)

    0

    20

    40

    60

    80

    100

    0 200 400 600 800 1000

    Vis0 200 4

    Fig. 7. Data gathering efciency forDeepProFlexRaWMSSupple

    Col

    lect

    ed D

    ata

    (%)

    0

    20

    40

    60

    80

    1004.2.4. Total communication overheadTable 2 shows the number of transmitted messages for

    all evaluated protocols. As can be observed, besides miti-gating the energy hole problem, ProFlex is the protocolwith the lowest overhead. For instance, ProFlex transmitsabout 95% less messages than RaWMS. When comparedwith Supple and Deep, the decrease is about 47% and53%, respectively. Assuming that communication is themain activity responsible for the energy consumption inWSNs [7], these results are a strong indication that ProFlexwill incur the lowest energy consumption for sensor nodes,thus increasing the network lifetime.

    4.3. Improving ProFlex

    The results presented so far show that when comparedwith the literature protocols, ProFlex does not suffer from

    DeepProFlexRaWMSSupple

    Visited Nodes0 200 400 600 800 1000

    ited Nodes00 600 800 1000

    a scenario with node failure.

  • ity is not allowed in the literature protocols), and, nally,performs well under scenarios with message loss and nodefailure. However, there is one characteristic inherent toWSNs that to the best of our knowledge was not leveragedby any existing state-of-the-art protocol, and that can de-crease even further the overhead of ProFlex, if it is ex-plored. Due to the nature of WSN applications and thehighly dense deployments, the readings of sensor nodescan have a high degree of correlation and redundancy[2,24,25,35]. Therefore, before starting the data distribu-tion to storing nodes it is possible to apply some kind ofsummarization function to correlated data packets andonly then start the data distribution.

    To accomplish this, a small modication to Algorithm 3is necessary. When a sensor node produces a data packet, it

    Node Depth

    Mes

    sage

    s Se

    nt

    0

    5000

    10000

    15000

    1 5 10 15 20 25

    ProFlexSupple

    Fig. 8. Average number of messages sent by a node as a function of itsdepth in the tree.

    Table 2

    Table 3Overhead of some versions of ProFlex.

    Protocol Total messages transmitted

    ProFlex 163390ProFlex d = 10, t = 1 161847ProFlex d = 10, t = 4 157786ProFlex d = 30, t = 1 151000ProFlex d = 30, t = 4 127672

    1598 G. Maia et al. / Ad Hoc Networks 11 (2013) 15881602Protocols overhead.

    Protocol Total messages transmitted

    ProFlex 163390Supple 310934the energy hole problem as much as Supple, it is the proto-col with the lowest overhead, its data gathering efciencyis competitive and can be easily managed by a networkoperator when deploying the H-sensor nodes (such exibil-

    Deep 348046RaWMS 3587087

    Visited Nodes

    Col

    lect

    ed D

    ata

    (%)

    0

    20

    40

    60

    80

    100

    ProFlexProFlex d = 10, t = 1ProFlex d = 10, t = 4ProFlex d = 30, t = 1ProFlex d = 30, t = 4

    0 200 400 600 800 1000

    Fig. 9. Data gathering efciency and data dissemination efcienforwards the data to the root of the tree just like before.But when an H-sensor node receives a packet from its chil-dren, it does not replicate the packet immediately butrather it buffers the received packet locally and waits fora period t. After t expires, the H-sensor node summarizesthe correlated packets that are locally buffered into a singlepacket and replicates it onto its own tree and neighboringtrees just like before. Here, we assume that a collection ofbuffered data packets are correlated if the distance be-tween nodes that produced these packets is smaller orequal to a predened value d. Hence, when a node pro-duces a data packet, it inserts its position into the packets

    rcen

    tage

    of D

    ata

    5

    10

    15

    ProFlexProFlex d = 10, t = 1ProFlex d = 10, t = 4ProFlex d = 30, t = 1ProFlex d = 30, t = 4Number of Storing Cells

    Pe

    0

    5 10 15

    cy for some versions of ProFlex under a reliable scenario.

  • header. Thus, it is assumed that every sensor node knowsits position. For instance, Fig. 12 shows a network com-prised of four nodes. Assuming d = 10, when an H-sensornode receives the data packets from these nodes it will g-ure out that only data packets belonging to the nodes A, B,and C are correlated since the distance between them islower than or equal to d. Then, the H-sensor node summa-rizes the readings of nodes A, B and C using a summariza-tion function like average, build a packet with thesummarized value and replicates to the network. Sincethe data from node D is not correlated to any other node,its value is replicated unchanged. This strategy reducesthe total number of transmitted messages, and the ques-tion is to determine this amount.

    Table 3 shows the overhead of ProFlex and of four ver-sions of ProFlex with data summarization. This resultshows that by controlling the parameters t and d it is still

    possible to reduce ProFlexs overhead. For instance, whend = 30 and t = 4, the overhead is reduced by 21%. Moreover,Fig. 9a shows that even transmitting less messages thanthe original ProFlex, the versions with data summarizationstill posses a data gathering efciency comparable to theoriginal version. This result is a bit surprising since it wasthought that the decrease in the number of transmittedmessages and the summarization itself would make thedata dissemination a little worse, fact that did not proveditself as can be observed in Fig. 9b.

    Figs. 10 and 11 show the performance of ProFlex and itsversions with data summarization in a scenario with mes-sage loss and node failures, respectively. These resultsshow that the new versions of ProFlex perform exactlythe same as the original version. When a summarizedpacket is lost, it is not the data of a single node that is beinglost but rather the information of all nodes that were

    lexlex lex lex lex

    ProFlexProFlex d = 10, t = 1ProFlex d = 10, t = 4ProFlex d = 30, t = 1ProFlex d = 30, t = 4

    Visited Nodes

    Col

    lect

    ed D

    ata

    (%)

    0

    20

    40

    60

    80

    100

    100

    Col

    lect

    ed D

    ata

    (%)

    0

    20

    40

    60

    80

    100

    0 200 400 600 800 1000

    ed No

    Visited Nodes0 200 400 600 800 1000

    ProFlexProFlex d = 10, t = 1ProFlex d = 10, t = 4ProFlex d = 30, t = 1ProFlex d = 30, t = 4

    sions o

    G. Maia et al. / Ad Hoc Networks 11 (2013) 15881602 1599ProFProFProFProFProF

    Col

    lect

    ed D

    ata

    (%)

    0

    20

    40

    60

    80

    Visit0 200 400

    Fig. 10. Data gathering efciency for some verd = 10, t = 1d = 10, t = 4d = 30, t = 1d = 30, t = 4

    des600 800 1000

    f ProFlex under a scenario with message loss.

  • considered as being correlated to each other. However, thereplication mechanism ensures that even if a packet is lost,it will still be possible to recover this same packet fromsome other part of the network.

    From the aforementioned results, we conclude that, byemploying data summarization, ProFlexs overhead is re-duced without affecting its data gathering efciency. Infact, by choosing the right values for nH, r2 (cf. Section 4.1),d, and t, it is possible to come up with a version of ProFlexwith a data gathering efciency comparable to existingprotocols, and still with a lower overhead. When usingnH = 5 and r2= 480 m, we showed in Fig. 3 that ProFlex pre-sents a high data gathering efciency at the cost of a highoverhead. Nevertheless, if the data summarization exten-sion is applied in such conguration, for instanced = 30 m and t = 4 s, a lower overhead can be obtained.Fig. 13 shows the data gathering efciency for this version

    of ProFlex (nH = 5, r2= 480 m, d = 30 m and t = 4 s) whencompared with existing protocols, for a scenario with nomessage loss nor node failure, and Table 4 shows theresulting overhead. As can be observed, ProFlex now pre-sents a comparable data gathering efciency and sendsabout 43% less messages than Supple, the existing protocolwith the lowest overhead.

    5. Conclusion and future work

    This work presents ProFlex, a distributed data storageprotocol for heterogeneous WSNs. We showed that theuse of a heterogeneous infrastructure virtually reducesthe occurrence of hot spots in the network. For instance,nodes closer to the root of the trees relay about 91% lessmessages when compared with Supple. Moreover, ProFlexis the protocol with the lowest overhead when compared

    Visi400

    Visited Nodes0 200 400 600 800 1000

    Visited Nodes0 200 400 600 800 1000

    100

    Col

    lect

    ed D

    ata

    (%)

    0

    20

    40

    60

    80

    100

    Col

    lect

    ed D

    ata

    (%)

    0

    20

    40

    60

    80

    100

    sions

    1600 G. Maia et al. / Ad Hoc Networks 11 (2013) 158816020 200

    Col

    lect

    ed D

    ata

    (%)

    0

    20

    40

    60

    80

    Fig. 11. Data gathering efciency for some verted Nodes600 800 1000

    of ProFlex under a scenario with node failure.

  • G. Maia et al. / Ad Hoc Networks 11 (2013) 15881602 1601A

    B C

    D10m

    8m

    5m

    35m

    Fig. 12. Example of a network with correlated data.

    Col

    lect

    ed D

    ata

    (%)

    0

    20

    40

    60

    80

    100

    DeepProFlex d = 30, t = 4RaWMSSupplewith state-of-the-art protocols and is still able to accom-plish a competitive data gathering efciency. For instance,ProFlex transmits about 95% less messages than RaWMSand 47% less messages than Supple. Additionally, underscenarios with message loss, ProFlex performs much betterthan Supple, however under scenarios with node failurestheir performance are comparable. We also proposed animprovement to ProFlex to leverage the inherent data cor-relation of WSN applications. Such an improvement wascapable of reducing ProFlexs overhead by about 21% andmaintain the data gathering efciency of the original ver-sion. When compared with existing protocols, a versionof ProFlex with the proposed improvement was capableof achieving similar data gathering efciency as existingprotocols with about 43% less messages sending.

    As future work, it would be interesting to assess thedata distribution efciency under different topologies, pro-pose salvation mechanisms for failure scenarios, investi-gate the impact of other information summarizationtechniques and propose a model that gives the optimumnumber of H-sensor nodes and its communication rangefor a given network conguration.

    Visited Nodes0 200 400 600 800 1000

    Fig. 13. Data gathering efciency for a scenario with no message loss nornode failure and a network with nH = 5 and r2= 480 m.

    Table 4Overhead in a network with nH = 5 and r2= 480 m.

    Protocol Total messages transmitted

    ProFlex d = 30, t = 4 178975Supple 316578Deep 349197RaWMS 3574989References

    [1] I. Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci, Wireless sensornetworks: a survey, Computer Networks 38 (4) (2002) 393422.

    [2] I.F. Akyildiz, M.C. Vuran, O.B. Akan, On exploiting spatial andtemporal correlation in wireless sensor networks, in: Modeling andOptimization in Mobile, Ad Hoc and Wireless Networks, 2004, pp.7180.

    [3] G. Anastasi, M. Conti, M. Di Francesco, Data collection in sensornetworks with data mules: an integrated simulation analysis, in:IEEE Symposium on Computers and Communications, 2008, pp.10961102.

    [4] A.L.L. Aquino, E.F. Nakamura, Data centric sensor stream reductionfor real-time applications in wireless sensor networks, Sensors 9 (12)(2009) 96669688.

    [5] Z. Bar-Yossef, R. Friedman, G. Kliot, RaWMS random walk basedlightweight membership service for wireless ad hoc networks, ACMTransactions on Computer Systems 26 (2008) 5:15:66.

    [6] S. Basagni, A. Carosi, E. Melachrinoudis, C. Petrioli, Z.M. Wang,Controlled sink mobility for prolonging wireless sensor networkslifetime, Wireless Networks 14 (2008) 831858.

    [7] A. Boukerche, X. Cheng, J. Linus, Energy-aware data-centric routingin microsensor networks, in: 6th ACM international workshop onModeling analysis and simulation of wireless and mobile systems,MSWIM, 2003, pp. 4249.

    [8] A. Boukerche, Performance evaluation of routing protocols for ad hocwireless networks, Mobile Networks and Applications 4 (2004) 333342.

    [9] [9] A. Boukerche, Handbook of Algorithms for Wireless Networkingand Mobile Computing, Chapman & Hall, CRC, 2005.

    [10] A. Boukerche, Algorithms and Protocols for Wireless SensorNetworks, Wiley-IEEE Press, 2008.

    [11] C. Viana, T. Herault, T. Largillier, S. Peyronnet, F. Zadi, Supple: aexible probabilistic data dissemination protocol for wireless sensornetworks, in: 13th ACM International Conference on Modeling,analysis, and simulation of wireless and mobile systems, 2010, pp.385392.

    [12] D. Cavalcanti, D. Agrawal, J. Kelner, D. Sadok, Exploiting the small-world effect to increase connectivity in wireless ad hoc networks, in:11th IEEE International Conference on Telecommunications, 2004.

    [13] I. Chatzigiannakis, A. Kinalis, S. Nikoletseas, Efcient datapropagation strategies in wireless sensor networks using a singlemobile sink, Computer Communications 31 (2008) 896914.

    [14] J. Eriksson, M. Faloutsos, S. Krishnamurthy, Scalable ad hoc routing:the case for dynamic addressing, in: 23rd Annual Joint Conference ofthe IEEE Computer and Communications Societies, vol. 2. 2010, pp.11081119.

    [15] R. Friedman, G. Kliot, C. Avin, Probabilistic quorum systems inwireless ad hoc networks, in: 38th IEEE International Conference onDependable Systems and Networks, 2008, pp. 277286.

    [16] E.D.C.Group, Sinalgo Simulator forNetworkAlgorithms, 2008. .

    [17] E.B. Hamida, G. Chelius, A line-based data dissemination protocol forwireless sensor networks with mobile sink, in: IEEE InternationalConference on Communications, 2008, pp. 22012205.

    [18] A. Helmy, Small worlds in wireless networks, CommunicationsLetters, IEEE 7 (10) (2003) 490492.

    [19] J. Li, P. Mohapatra, Analytical modeling and mitigation techniquesfor the energy hole problem in sensor networks, Pervasive MobileComputing 3 (2007) 233254.

    [20] A.-F. Liu, X.-Y. Wu, Z.-G. Chen, W.-H. Gui, Research on the energyhole problem based on unequal cluster-radius for wireless sensornetworks, Computer Communications 33 (3) (2010) 302321.

    [21] H. Luo, F. Ye, J. Cheng, S. Lu, L. Zhang, Ttdd: Two-tier datadissemination in large-scale wireless sensor networks, WirelessNetworks 11 (2005) 161175.

    [22] J. Luo, J. Panchard, M. Pirkowski, M. Grossglauser, J. Pierre Hubaux,Mobiroute: routing towards a mobile sink for improving lifetime insensor networks, in: Sensor Networks in the 2nd IEEE/ACM DCOSS,2006, pp. 480497.

    [23] F. Marcelloni, M. Vecchio, A simple algorithm for data compressionin wireless sensor networks, IEEE Communications Letters 12 (6)(2008) 411413.

    [24] E.F. Nakamura, A.A.F. Loureiro, A.C. Frery, Information fusion forwireless sensor networks: methods, models, and classications,ACM Computer Surveys (2007) 39.

    [25] S. Pattem, B. Krishnamachari, R. Govindan, The impact of spatialcorrelation on routing with compression in wireless sensor

  • networks, ACM Transactions on Sensor Networks 4 (2008) 24:124:33.

    Aline C. Viana is a permanent INRIA ResearchScientist. Before joining the INRIA in October2006, she was a Post-Doc in the PARIS team atthe IRISA/INRIA Rennes. She spent three and ahalf years at the LIP6 laboratory of the Uni-versit Pierre et Marie Curie Sorbonne Univer-sits, France, fromwhere she received her Ph.D.degree in July 2005. She received the Bache-lors Degree in Computer Science (1998) andthe M.Sc. Degree in Electrical Engineering(2000) from the Federal University of Goias(UFG). Her research interests include: wire-

    1602 G. Maia et al. / Ad Hoc Networks 11 (2013) 15881602with mobile sinks: Sparsely deployed sensors, IEEE Transactions onVehicular Technology 56 (4) (2007) 18261836.

    [32] R. Urgaonkar, B. Krishnamachari, Flow: an efcient forwardingscheme to mobile sink in wireless sensor networks, in: Poster ofIEEE SECON, 2004.

    [33] M. Vecchio, A.C. Viana, A. Ziviani, R. Friedman, Deep: density-basedproactive data dissemination protocol for wireless sensor networkswith uncontrolled sink mobility, Elsevier Computer Communication33 (8) (2010).

    [34] A. Viana, A. Ziviani, R. Friedman, Decoupling data disseminationfrom mobile sinks trajectory in wireless sensor networks,Communications Letters, IEEE 13 (3) (2009) 178180.

    [35] M.C. Vuran, O.B. Akan, I.F. Akyildiz, Spatio-temporal correlation:theory and applications for wireless sensor networks, ComputerNetworks 45 (3) (2004) 245259.

    [36] X. Wu, G. Chen, S. Das, Avoiding energy holes in wireless sensornetworks with nonuniform node distribution, IEEE Transactions onParallel and Distributed Systems 19 (5) (2008) 710720.

    [37] H. Yang, F. Ye, B. Sikdar, Simple: Using swarm intelligencemethodology to design data acquisition protocol in sensornetworks with mobile sinks, in: INFOCOM 2006. 25th IEEEInternational Conference on Computer Communications.Proceedings, 2006, pp. 112.

    [38] J. Yick, B. Mukherjee, D. Ghosal, Wireless sensor network survey,Computer Networks 52 (2008) 22922330.

    Guilherme Maia is a PhD student in Com-puter Science at the Federal University ofMinas Gerais, Brazil. His research interestsinclude distributed algorithms, wireless adhoc and sensor networks. In addition, he haspublished several papers in the area of wire-less sensor networks.

    Daniel L. Guidoni is a Professor at the FederalUniversity of Sao Joao Del Rei, Brazil. Hereceived his Ph.D. in Computer Science fromthe Federal University of Minas Gerais, Brazil,in 2011. His research interests include dis-tributed algorithms, wireless ad hoc and sen-sor networks. In addition, he has publishedseveral papers in the area of wireless sensornetworks.[26] R.W. Pazzi, A. Boukerche, Mobile data collector strategy for delay-sensitive applications over wireless sensor networks, ComputerCommunications 31 (5) (2008) 10281039.

    [27] G. Sharma, R. Mazumdar, A case for hybrid sensor networks, IEEE/ACM Transactions on Networking 16 (5) (2008) 11211132.

    [28] B. Sheng, Q. Li, W. Mao, Data storage placement in sensor networks,in: 7th ACM International symposium on Mobile Ad Hoc Networkingand Computing, 2006, pp. 344355.

    [29] B. Sheng, C.C. Tan, Q. Li, W. Mao, An approximation algorithm fordata storage placement in sensor networks, in: InternationalConference on Wireless Algorithms Systems and Applications,2007, pp. 7178.

    [30] S. Shenker, S. Ratnasamy, B. Karp, R. Govindan, D. Estrin, Data-centricstorage in sensornets, SIGCOMM Computer Communication Review33 (2003) 137142.

    [31] L. Song, D. Hatzinakos, Architecture of wireless sensor networksences and journals related to those areas, and also presented tutorials atinternational conferences.national conferences. Her main research areasare sensor networks, mobile computing, and

    ubiquitous computing.

    Antonio A.F. Loureiro received his B.Sc. andM.Sc. degrees in computer science from theFederal University of Minas Gerais (UFMG),Brazil, and the Ph.D. degree in computer sci-ence from the University of British Columbia,Canada. Currently, he is a full professor ofcomputer science at UFMG, where he leadsthe research group in wireless sensor net-works. His main research areas are wirelesssensor networks, mobile computing, and dis-tributed algorithms. In the last 10 years he haspublished regularly in international confer-Raquel A.F. Mini holds a B.Sc., M.Sc., andPh.D. in Computer Science from Federal Uni-versity of Minas Gerais (UFMG), Brazil. Cur-rently she is an Associate Professor ofComputer Science at PUC Minas, Brazil. Shehas worked for 9 years in the protocol designfor wireless sensor networks with more than30 papers published in this area. In the lastthree years, she presented two tutorials aboutenergy in wireless sensor networks in inter-less self-organizing and adaptive networks; data dissemination and col-lection protocols; epidemic and gossip algorithms; opportunisticforwarding protocols; social wireless networks; and network coding.

    Andre L.L. Aquino is a Professor at the FederalUniversity of Alagoas, Brazil. He received hisPh.D. in Computer Science from the FederalUniversity of Minas Gerais, Brazil, in 2008. Hisresearch interests include data reduction,distributed algorithms, wireless ad hoc andsensor networks, mobile and pervasive com-puting. In addition, he has published severalpapers in the area of wireless sensor net-works.

    A distributed data storage protocol for heterogeneous wireless sensor networks with mobile sinks1 Introduction2 Background2.1 Data storage protocols2.2 System model2.2.1 Nodes2.2.2 Communication2.2.3 Initial knowledge

    3 Proposed protocol3.1 Tree construction3.2 Importance factor distribution3.3 Data distribution

    4 Performance analysis4.1 ProFlex assessment4.1.1 Varying the number of H-sensor nodes4.1.2 Varying communication radius between H-sensor nodes

    4.2 ProFlex vs. literature protocols4.2.1 Data gathering efficiency4.2.2 Loss and failure robustness4.2.3 Energy hole vulnerability4.2.4 Total communication overhead

    4.3 Improving ProFlex

    5 Conclusion and future workReferences