IEEE SECON CoWPER Ð Toward A City-Wide Pervasive … · 2019. 1. 22. · IEEE SECON CoWPER Ð...

4
Optimal citizen-centric sensor placement for citywide environmental monitoring: A submodular approach Chenxi Sun, Victor O. K. Li, Jacqueline C.K. Lam The University of Hong Kong {cxsun, vli, jcklam}@eee.hku.hk Abstract—The general environmental monitoring problem refers to the task of placing sensors or stations to optimize certain objectives under budget constraints. Application scenarios include monitoring temperature, water contamination, air quality etc. As citizens are increasingly concerned about the surrounding environment, it is important to provide sufficient and accurate information to the public. In this study, we focus on the problem of optimal citizen-centric sensor placement, i.e, given a set of locations within the city, we aim at placing sensors or stations at locations that will benefit as many citizens as possible under budget constraints. We prove that the problem is NP-hard, yet the objective function has the nice monotone and submodular property. Then the efficient greedy algorithm and its variants can be adopted with a guaranteed approximation ratio of (1-1/e) for the unit cost case and 1 2 (1 -1/e) for the general cost case. Finally we demonstrate the effectiveness of the proposed approach by comparing with two baseline algorithms through a case study. Keywords—Sensor Placement, Submodular Maximization, En- vironmental Monitoring I. I NTRODUCTION The environmental monitoring problem refers to the task of placing sensors or stations to optimize certain objectives such as information obtained under budget constraints, with applications in monitoring temperature, water contamination, air quality etc. Take air quality as an example. As people are increasingly concerned about the surrounding air quality, it is important to provide sufficient and accurate information to the public, especially when the air is bad. Studies have shown that exercise outdoors when the air pollution level is high will cause adversarial effect on health [1]. Furthermore, the air quality information is pivotal for health studies. Generally, the environment information can be obtained either through low-cost mobile wireless sensors or official monitoring stations with sophisticated measurement equip- ments and calibration. In this study, we focus on deployment strategies for static monitoring stations as their measurements are more accurate and reliable. However, deploying static monitoring stations can be very costly, incurring not only construction cost but also operational cost. Hence only a limited number of stations can be placed under a total cost constraint. There have been much work on sensor placement for environment monitoring. For example, [2] studied the problem of selecting a most informative subset of correlated random variables under size constraints. [3] proposed a near optimal sensor placement strategy for temperature monitoring by maximizing mutual information under size constraint. [4] studied the wind monitoring problem. However, all the studies mentioned above aim to place sensors or stations at the most informative locations that are defined either in terms of entropy or mutual information. Furthermore, by modeling each location as a random variable, the proposed approach would require the covariance between any two random variables, usually through a strong Gaussian Process assumption on the underlying field. In this paper, we take a different perspective on the sensor placement problem. Specifically, we are interested in citizen- centric sensor placement for citywide monitoring: given a fixed cost constraint C, we aim to find the set of locations within the city that will bring maximum benefit to the citizens. The main contribution of this study are as follows: 1) we formulate the citizen-centric sensor placement prob- lem whose aim is to maximize human satisfaction on the deployment scheme and prove its NP-hardness. As far as we know, this is the first formulation of the sensor placement problem in the context of maximizing overall citizen satisfaction. 2) We prove the monotonicity and submodularity of the objective function, i.e., our problem is a submodular maximization optimization problem under general cost constraints, which can be solved with greedy algorithm and its variants with approximation guarantee. 3) We provide a case study in Hong Kong to demonstrate the effectiveness of our approach. The rest of the paper is organized as follows. In Section II we introduce the problem formulation. In Section III we describe the proposed approach. In Section IV we discuss how to incorporate other objectives to this formulation. In Section V we provide a case study in Hong Kong. We draw conclusion in Section VI. II. PROBLEM FORMULATION In this paper, we focus on the utility gain of placing sensors with respect to citizen satisfaction. Noticing that people will naturally rely on the observations from the nearest station to obtain environment information when there are a number of IEEE SECON CoWPER – Toward A City-Wide Pervasive EnviRonment, 11 - 13 June 2018, Hong Kong, China.

Transcript of IEEE SECON CoWPER Ð Toward A City-Wide Pervasive … · 2019. 1. 22. · IEEE SECON CoWPER Ð...

Page 1: IEEE SECON CoWPER Ð Toward A City-Wide Pervasive … · 2019. 1. 22. · IEEE SECON CoWPER Ð Toward A City-Wide Pervasive EnviRonment, 11 - 13 June 2018, Hong Kong, China. stations

Optimal citizen-centric sensor placement forcitywide environmental monitoring: A submodular

approachChenxi Sun, Victor O. K. Li, Jacqueline C.K. Lam

The University of Hong Kong{cxsun, vli, jcklam}@eee.hku.hk

Abstract—The general environmental monitoring problemrefers to the task of placing sensors or stations to optimizecertain objectives under budget constraints. Application scenariosinclude monitoring temperature, water contamination, air qualityetc. As citizens are increasingly concerned about the surroundingenvironment, it is important to provide sufficient and accurateinformation to the public. In this study, we focus on the problemof optimal citizen-centric sensor placement, i.e, given a set oflocations within the city, we aim at placing sensors or stationsat locations that will benefit as many citizens as possible underbudget constraints. We prove that the problem is NP-hard, yetthe objective function has the nice monotone and submodularproperty. Then the efficient greedy algorithm and its variants canbe adopted with a guaranteed approximation ratio of (1−1/e) forthe unit cost case and 1

2(1−1/e) for the general cost case. Finally

we demonstrate the effectiveness of the proposed approach bycomparing with two baseline algorithms through a case study.

Keywords—Sensor Placement, Submodular Maximization, En-vironmental Monitoring

I. INTRODUCTION

The environmental monitoring problem refers to the taskof placing sensors or stations to optimize certain objectivessuch as information obtained under budget constraints, withapplications in monitoring temperature, water contamination,air quality etc.

Take air quality as an example. As people are increasinglyconcerned about the surrounding air quality, it is importantto provide sufficient and accurate information to the public,especially when the air is bad. Studies have shown thatexercise outdoors when the air pollution level is high willcause adversarial effect on health [1]. Furthermore, the airquality information is pivotal for health studies.

Generally, the environment information can be obtainedeither through low-cost mobile wireless sensors or officialmonitoring stations with sophisticated measurement equip-ments and calibration. In this study, we focus on deploymentstrategies for static monitoring stations as their measurementsare more accurate and reliable. However, deploying staticmonitoring stations can be very costly, incurring not onlyconstruction cost but also operational cost. Hence only alimited number of stations can be placed under a total costconstraint. There have been much work on sensor placementfor environment monitoring. For example, [2] studied theproblem of selecting a most informative subset of correlated

random variables under size constraints. [3] proposed a nearoptimal sensor placement strategy for temperature monitoringby maximizing mutual information under size constraint. [4]studied the wind monitoring problem.

However, all the studies mentioned above aim to placesensors or stations at the most informative locations that aredefined either in terms of entropy or mutual information.Furthermore, by modeling each location as a random variable,the proposed approach would require the covariance betweenany two random variables, usually through a strong GaussianProcess assumption on the underlying field.

In this paper, we take a different perspective on the sensorplacement problem. Specifically, we are interested in citizen-centric sensor placement for citywide monitoring: given a fixedcost constraint C, we aim to find the set of locations withinthe city that will bring maximum benefit to the citizens.

The main contribution of this study are as follows:1) we formulate the citizen-centric sensor placement prob-

lem whose aim is to maximize human satisfaction onthe deployment scheme and prove its NP-hardness. Asfar as we know, this is the first formulation of the sensorplacement problem in the context of maximizing overallcitizen satisfaction.

2) We prove the monotonicity and submodularity of theobjective function, i.e., our problem is a submodularmaximization optimization problem under general costconstraints, which can be solved with greedy algorithmand its variants with approximation guarantee.

3) We provide a case study in Hong Kong to demonstratethe effectiveness of our approach.

The rest of the paper is organized as follows. In SectionII we introduce the problem formulation. In Section III wedescribe the proposed approach. In Section IV we discuss howto incorporate other objectives to this formulation. In SectionV we provide a case study in Hong Kong. We draw conclusionin Section VI.

II. PROBLEM FORMULATION

In this paper, we focus on the utility gain of placing sensorswith respect to citizen satisfaction. Noticing that people willnaturally rely on the observations from the nearest station toobtain environment information when there are a number of

IEEE SECON CoWPER – Toward A City-Wide Pervasive EnviRonment, 11 - 13 June 2018, Hong Kong, China.

Page 2: IEEE SECON CoWPER Ð Toward A City-Wide Pervasive … · 2019. 1. 22. · IEEE SECON CoWPER Ð Toward A City-Wide Pervasive EnviRonment, 11 - 13 June 2018, Hong Kong, China. stations

stations available within the city, we model an individual’ssatisfaction with the sensor placement scheme as a functionof his/her distance to the nearest sensor.

Following common practice in environmental monitoring[3], [5], we divide the city into discrete equal-sized squaregrids, in which sensors will be placed, with at most one sensorin each grid. Let V = {vi : i = 1, 2 . . . , n} denote the setof grids in the region of interest, where n = |V | is the totalnumber of grids. Let pi denote the percentage population livingin the i-th grid vi and (xi, yi) denote its location (on the wholemap). Then we can use the Hamming distance to model thedistance between any two grids i, j, i.e.,

d(i, j) = |xi − xj |+ |yi − yj |. (1)

We further define the minimum distance between grid i and aset of grids A as follows:

d(i, A) = minj∈A

d(i, j). (2)

When A = ∅, we set d(i, A) = ∞. Then we can define theaverage satisfaction ratio of choosing a set of grids A forplacing sensors as

f(A) =∑i∈V

pi exp

(−d(i, A)

θ

)(3)

where θ is an exponential decay parameter controlling thedecay speed. By default we can choose θ = 1. A smallerθ will make the satisfaction vanish much more rapidly withthe increase in distance.

Let c(·) be the cost associated with subset A. Then ourobjective is to select a subset of locations A ⊆ V that canmaximize the average satisfaction ratio under the total costconstraint c(A) ≤ C.

Formally, our optimal citizen-centric sensor placement prob-lem is as follows:

max f(A)

s.t. c(A) ≤ C,A ⊆ V(4)

In the simplest case, the cost is uniform for all locations, andthe cost constraint is reduced to the cardinality constraint:

max f(A)

s.t. |A| = k,A ⊆ V(5)

where k is the total number of sensors that can be placedsubject to the total cost constraint C.

III. SOLUTION APPROACH

A. Hardness of the problem

We prove the NP-hardness of the problem in the unit-costcase by showing its reduction from the k-median of a network,which is known to be NP-hard [6]. The k-median (of a net-work) problem is defined as follows. A network G = (V,E)has a nonnegative number w(v) associated with each of its|V | = n vertices and a positive number l(e) associated witheach of its |E| edges. Let Xk = {x1, x2, . . . , xk} be a set of

k points on G. The goal is to find X∗k such that the distancesum of the set H(Xk) =

∑v∈V w(v)min1≤i≤k{d(v, xi)} is

minimized.Reduction: The corresponding relationship between these

two set systems are as follows: the weight w(v) of v ∈ Vcorresponds to the population pi of grid i ∈ V and the distanced(v, xi) between v and xi ∈ Xk corresponds to the non-negative satisfaction loss 1− exp(1− d(i,j)

θ ) of location i byplacing sensor at location j. Then selection of a set of pointsXk ⊆ V on G corresponds to selecting a set of locationsA ⊆ V with size |A| = k.

B. Properties of the objective functionDespite the hardness of the problem, We show that the

objective function f has the nice monotone and submodularproperties.

Lemma 1 (Monotonicity). For any A ⊆ V and a sensors ∈ V , we have

f(A ∪ {s})− f(A) ≥ 0. (6)

Proof. By definition we have f(∅) = 0. For all i ∈ V ,we have d(i, A) = minj∈A d(i, j) ≥ minj∈A∪{s} d(i, j) =d(i, A ∪ {s}), ∀A ⊆ V, s ∈ V . Hence we have f(A ∪ {s}) =∑i∈V wi exp(−d(i, A{s}) ≥

∑i∈V wi exp(−d(i, A)) =

f(A) for any A ⊆ V , s ∈ V .

Furthermore, we can also show that the objective functionf is submodular.

Lemma 2 (Submodularity). For all placements A ⊆ B ⊆ Vand a sensor s ∈ V \B, we have

f(A ∪ {s})− f(A) ≥ f(B ∪ {s})− f(B). (7)

Proof. By definition of f we need to prove that∑i∈V wi exp(−d(i, A)) −

∑i∈V wi exp(−d(i, A ∪ {s})) ≥∑

i∈V wi exp(−d(i, B)) −∑i∈V wi exp(−d(i, B ∪ {s})). It

suffices to show that for each i ∈ V , we have exp(−d(i, A))−exp(−d(i, A∪{s})) ≥ exp(−d(i, B))−exp(−d(i, B∪{s})).Suppose that d(i, A) = d(i, a) and d(i, B) = d(i, b). SinceA ⊆ B, a ∈ A, and b ∈ B, we have d(i, b) ≤ d(i, a).

Here we have three scenarios.1) d(i, s) ≥ d(i, a).

Then d(i, A ∪ {s}) = d(i, A) and d(i, B ∪ {s}) =d(i, B). Hence exp(−d(i, A))− exp(−d(i, A∪{s})) =exp(−d(i, B))− exp(−d(i, B ∪ {s})) = 0.

2) d(i, b) ≤ d(i, s) < d(i, a).Then d(i, A∪{s}) = d(i, s) < d(i, A), d(i, B ∪{s}) =d(i, B) and hence exp(−d(i, A)) − exp(−d(i, A ∪{s})) > exp(−d(i, B))− exp(−d(i, B ∪ {s})) = 0.

3) d(i, s) < d(i, b).Then d(i, A∪ {s}) = d(i, B ∪ {s}) = d(i, s) and henceexp(−d(i, A))−exp(−d(i, A∪{s}) = exp(−d(i, A))−exp(−d(i, s)) ≥ exp(−d(i, B)) − exp(−d(i, s)) =exp(−d(i, B))− exp(−d(i, B ∪ {s})).

Hence f satisfies the above property.

Hence the the optimal citizen-centric sensor placementproblem (4) is a monotone submodular maximization problem.

IEEE SECON CoWPER – Toward A City-Wide Pervasive EnviRonment, 11 - 13 June 2018, Hong Kong, China.

Page 3: IEEE SECON CoWPER Ð Toward A City-Wide Pervasive … · 2019. 1. 22. · IEEE SECON CoWPER Ð Toward A City-Wide Pervasive EnviRonment, 11 - 13 June 2018, Hong Kong, China. stations

C. Placement constraints

We start with the simplest case for the cost constraint whenthe cost is uniform for all locations. In this case, a simplegreedy approach which selects

si = arg maxs∈V \Ai−1

f(Ai−1 ∪ {s})− f(Ai−1) (8)

at the i th iteration until the cardinality constraint is no longersatisfied has been shown to achieve at least 1 − 1/e of theoptimal solution.

Theorem 1 ( [7]). If f is a submodular monotone set function,the greedy algorithm finds a set A∗ such that f(A∗) ≥ (1 −1/e)max|A|=k f(A) with at most O(kn) evaluations of f .

The station deployment algorithm with unit cost constraintis summarized in Algorithm 1.

Algorithm 1 Station deployment algorithm with unit costconstraintInput: Sensor number constraint k, a set of grids V with asso-

ciated population {wi}ni=1, distance function d, exponentialdecay parameter θ

Output: A subset of locations A ⊆ VA = ∅for i = 1 to k do4s =

∑i∈V wi(exp(−

d(i,A∪{s})θ )− exp(−d(i,A)

θ ))s∗ = argmaxs∈V \S4sA = A ∪ {s∗}return A

A more general scenario is when the cost is non-uniform forall locations, which is also known as the knapsack constraint.In this case, the simple greedy algorithm fails to provide asatisfactory result when placing station at a location is muchmore expensive but only provides a marginally better utilitygain. A modified greedy selection rule that takes cost intoaccount is:

sk = arg maxs∈V \Ak−1

f(Ak−1 ∪ {s})− f(Ak−1)c(s)

(9)

However the result for this condition can also be arbitrarilybad. Let AG denote the greedy selection result using criteria(8) and ACEG denote the greedy selection result using criteria(9). Then the following theorem shows that the better of thesetwo solutions provides 1

2 (1 − 1/e) approximation guaranteewith at most O(kn) function evaluations.

Theorem 2 ( [8]). If f is a submodular monotone set function,we have

max{f(AG), f(ACEG)} ≥1

2(1− 1/e) max

A,c(A)≤Cf(A) (10)

The station deployment algorithm with general cost con-straint is summarized in Algorithm 2.

Algorithm 2 Station deployment algorithm with general costconstraintInput: Cost constraint C, a set of grids V with associated

population {wi}ni=1, distance function d, exponential decayparameter θ, a cost function c

Output: A subset of locations A ⊆ VA1 = A2 = ∅, V ′1 = V ′2 = V,C ′1 = C ′2 = 0while V ′1 6= 0 do

for all s ∈ V ′1 do4s =

∑i∈V wi(exp(−

d(i,A1∪{s})θ )− exp(−d(i,A1)

θ ))s∗ = argmaxs∈V ′4sfif c(s∗) + C ′1 ≤ C thenC ′1 = C ′1 + c(s∗), A1 = A1 ∪ {s∗}

V ′1 = V ′1 \ {s∗}while V ′2 6= 0 do

for all s ∈ V ′2 do4s =

∑i∈V wi(exp(−

d(i,A2∪{s})θ )− exp(−d(i,A2)

θ ))

s∗ = argmaxs∈V ′24s

c(s∗)

if c(s∗) + C ′ ≤ B thenC ′2 = C ′2 + c(s∗), S2 = S2 ∪ {s∗}

V ′2 = V ′2 \ {s∗}A = argmaxA∈{A1,A2}G(A)return A

IV. DISCUSSION

A. Incorporating other factors

Many objective functions in existing sensor placement for-mulations are submodular in nature, for example, maximumentropy sampling [2], mutual information maximization [3],outbreak detection [8] etc. Hence our formulation can easilyincorporate other objectives and be reformulated as a multi-objective optimization problem. Then we can find Pareto-optimal solutions [9] by weighted sum transformation:

max∑i

λifi(A)

s.t. c(A) ≤ C,A ⊆ V(11)

where λi is the weight factor of the i-th objective function fi.

V. A CASE STUDY IN HONG KONG

A. Dataset and setup

We conduct a case study in Hong Kong with the LandScan2010™ data1 to evaluate the proposed algorithm. The datasetcontains the finest resolution population distribution informa-tion at approximately 1km resolution in 2010 and there are1612 non-empty grids in total for Hong Kong. Figure 1 showsthe gridded population density in Hong Kong in 2010.

B. Solution quality

We evaluate the solution quality in the unit cost case com-pared to two algorithms: (i) randomSelect. It selects k gridsrandomly. (ii) maxSelect. It selects k grids with maximumpopulation density. As can be seen from Figures 4-6, the

1Oak Ridge National Laboratory (ORNL). LandScan Data Availability.Geographic Information Science and Technology (GIST) http://web.ornl.gov/sci/landscan/landscan data avail.shtml (2010).

IEEE SECON CoWPER – Toward A City-Wide Pervasive EnviRonment, 11 - 13 June 2018, Hong Kong, China.

Page 4: IEEE SECON CoWPER Ð Toward A City-Wide Pervasive … · 2019. 1. 22. · IEEE SECON CoWPER Ð Toward A City-Wide Pervasive EnviRonment, 11 - 13 June 2018, Hong Kong, China. stations

Fig. 1. Gridded population density in HK in 2010 Fig. 2. Placement results with θ = 1, k = 16 Fig. 3. Placement results with θ = 5, k = 16

Fig. 4. Algorithm performance with θ = 1 Fig. 5. Algorithm performance with θ = 5 Fig. 6. Algorithm performance with θ = 10

proposed algorithm performs better than randomSelect andmaxSelect in all three scenarios with different θ. The reasonis that maxSelect does not take into account the satisfactiongain of placing sensors nearby and randomSelect choosespopulation-dense areas with only small probability. By thedefinition of average satisfaction ratio, each sensor will benefita broader region as θ increases. Hence when placing the samenumber of sensors, a larger θ will result in a higher averagesatisfaction ratio. This also explains the improved performanceof random selection as its randomness will help spread out thedeployment.

Figure 2 and Figure 3 visualize the placement results by theproposed approach in Hong Kong with θ = 1, 5. The blacktriangles on the map indicate the 16 candidate locations fordeploying stations. As can be seen, the locations are usuallythe center of population-dense areas and large θ will cause thestations. to spread more evenly over the city. The results arealso consistent with most of the existing locations2.

VI. CONCLUSION

In this paper we study the citizen-centric sensor placementproblem. We show the NP-hardness of the problem by re-duction from the k-median on network problem. We provethe submodularity and monotonicity of the objective function,and hence the greedy algorithm and its variants are proposedto solve the problem with a guaranteed approximation ratio of(1−1/e) for the unit cost case and 1

2 (1−1/e) for the generalcost case. Finally, we perform a case study in Hong Kong toevaluate the effectiveness of the proposed approach.

2http://www.aqhi.gov.hk/en.html

ACKNOWLEDGEMENTS

This research is supported in part by the Theme-basedResearch Scheme of the Research Grants Council of HongKong, Under Grant No. T41-709/17-N. We thank Ms YuYangwen for the helpful discussions.

REFERENCES

[1] A. Abelsohn and D. M. Stieb, “Health effects of outdoor air pollution:approach to counseling patients using the air quality health index,”Canadian Family Physician, vol. 57, no. 8, pp. 881–887, 2011.

[2] C.-W. Ko, J. Lee, and M. Queyranne, “An exact algorithm for maximumentropy sampling,” Operations Research, vol. 43, no. 4, pp. 684–691,1995.

[3] A. Krause, A. Singh, and C. Guestrin, “Near-optimal sensor placements ingaussian processes: Theory, efficient algorithms and empirical studies,”Journal of Machine Learning Research, vol. 9, no. Feb, pp. 235–284,2008.

[4] W. Du, Z. Xing, M. Li, B. He, L. H. C. Chua, and H. Miao, “Optimalsensor placement and measurement of wind for water quality studies inurban reservoirs,” in Information Processing in Sensor Networks, IPSN-14 Proceedings of the 13th International Symposium on. IEEE, 2014,pp. 167–178.

[5] H.-P. Hsieh, S.-D. Lin, and Y. Zheng, “Inferring air quality for stationlocation recommendation based on urban big data,” in Proceedings of the21th ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining. ACM, 2015, pp. 437–446.

[6] O. Kariv and S. L. Hakimi, “An algorithmic approach to network locationproblems. ii: The p-medians,” SIAM Journal on Applied Mathematics,vol. 37, no. 3, pp. 539–560, 1979.

[7] S. Khuller, A. Moss, and J. S. Naor, “The budgeted maximum coverageproblem,” Information Processing Letters, vol. 70, no. 1, pp. 39–45, 1999.

[8] J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, andN. Glance, “Cost-effective outbreak detection in networks,” in Proceed-ings of the 13th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM, 2007, pp. 420–429.

[9] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge univer-sity press, 2004.

IEEE SECON CoWPER – Toward A City-Wide Pervasive EnviRonment, 11 - 13 June 2018, Hong Kong, China.